Methodology
The research idea is to investigate whether analytical models that are trained and calibrated on 2013–2014 data will more accurately predict 2015 gas leaks than a naive guess, using only historical trends to predict the frequency of future incidents. The working null hypothesis is that there will be no significant improvement with the given model than a naive method. For the naive method, the authors predicted that the average number of actual gas leaks (per zip code or tract) in the 2013–2014 data would remain constant. This annual average would then be the predicted number of gas leaks in 2015. The target prediction value is the number of leaks per building unit (per zip code or tract) in 2015. For final model evaluation, the Root Mean Squared Error (RMSE) of the predicted leaks per building unit relative to actual values was used: