While there have been successful attempts to use statistical analysis on large data sets to assess building risk in general, no analysis has been completed on gas leak data to date. In 2013, the New York City Mayor's Office of Data Analytics (MODA) worked with the Fire Department of New York (FDNY) to improve its Risk Based Inspection System. The system is designed to prioritize the inspections of buildings in the jurisdiction of FDNY so as to aid in the detection of severe violations. Prior to MODA's involvement, the FDNY model was based on limited data about the city's buildings and weighted simply by anecdotal evidence from firefighters. The improved model uses a statistical regression based on additional building data sets to increase the efficacy of inspections. The model performance went from not being much better than random selection to finding nearly three-quarters of severe violations in just the first one-quarter of inspections. \citep{nyc_moda_nyc_2013}
In a similar fashion, these authors wish to determine if regression, or another machine learning method, can be used to predict the locations of natural gas leaks as reported to FDNY. The authors investigate if an effective predictive model for NYC natural gas leaks can be developed using only publicly available data sets and use various data selection strategies and machine learning methods to develop multiple models. Comparisons of the model performances with regard to their ability to predict 2015 gas leaks based upon 2013–2014 data results in the identification of some challenges and opportunities for further research.