Then, a bar plot of all zip codes', over layering the two variables, was observed. The plot didn't give a further understanding of the data nor the possible relationship between the variables and its magnitude or direction.
(plot of bar plot)
A linear model was built to assess the power of Permit issuance with predicting Building Violation complaints. As shown in the model summary, the R-squared is extremely low although the p-value is below the significance threshold which might imply for a meaningful effect of it on Building Violations complaints. Figure 3 is a scatter of the data, BV complaints over permit issuance, and the fitted line.
(plot of scatter)
At that point I detect the possible outliers of the data. That part was interesting and surprising, having the zip codes with the highest number of permits in Staten Island and South Queens. I performed the methods above on a shortened dataset of only Manhattan and Brooklyn. The results were even less significant than the whole city's.
(table? bdfjdbjv scatter w/wo outleirs?)
Conclusions and Limitations:
The analysis' results were not significant enough to consider permit issuance as a reliable predictor for building violations complaints. The initial assumption, according to which higher number of permits issued in a certain place will probably mean the same place will encounter less building violation complaints did not approved in this research. Even when dividing the data into two sub samples - of Manhattan and Brooklyn, there was no consistency detected in the behaviour of the variables and their relationship.
A possible limitation of this study is the use of 311 complaints regarding building violations as representative for actual building / use violations, which by definition are not compliant. Furthermore, as assumed in the data section, it is hard to assess the use of 311 system across the city. Further work could analyze DOB approved violations rather that 311 data, although the later being normalized.
Another approach that could have made this study more accurate is to assess the housing units-per-permit, and by this to weight the overall affect of each permit over the city as a whole. This could prevent areas with single family houses such as Staten Island or Raw houses to be mistakenly revealed as urban renewed areas.
To conclude, the assumption according which higher permit issuance will result with lower number of building violations complaints is a long-shot assumption, which might be true in very specific area of rapid urban renewal, and even then - other factors such as the ratio of renter-owner occupancy and income level could be significant to the sense of predicting housing violations and should be taken into consideration.