Overview
NYC is a very active city, so implementing safety measures is very important for millions of people. My underlying hypothesis was that if all cars had some sort of safety features (ex. blind-spot warning), there would be fewer crashes and fewer injuries and deaths from those crashes. Below are some visualizations I created to help understand the problem. Everything was implemented using Python and its libraries. I've used data from multiple data sources explained in the Data section. With thoroughly understanding the data, and using what I've learned in my Data Science course to make the visualizations, I hope this is a start to a greater change!
Data
Data sources I have used include NYC Open Data, Consumer Reports, and Kanner and Pintaluga (kpattorney). The most important data came from NYC Open Data. I used their CSV file highlighting motor vehicle collisions. It includes data like crash date, location, causes, whether it resulted in an injury/death, etc. This CSV gave me most of the data needed to complete this project. Another dataset I used was from Consumer Reports which showed some popular vehicles and whether or not they come with blind-spot warnings and pedestrian detection as standard, optional, and not available. The last piece of data needed was the percent decrease in collisions and those that result in an injury from Kanner and Pintaluga. Links to each data source are listed at the bottom.
Techniques
Thanks to my Data Science class, I have learned many techniques that aided my process when doing this project. I created a CSV file for the data from Consumer Reports to be read easily as a dataframe. Having data in the form of a dataframe is very helpful to read and implement when creating visualizations. It was very easy to find data with collisions because NYC Open Data had everything available. It took more time to find data involving the impact some safety features have to prevent collisions and injuries. It took some time to carefully clean and choose what data I needed. I used Python 3 to implement all my data cleaning and display the visualizations. I had some previous knowledge with SQL so I enjoyed using it to clean my data. Important libraries that helped make the process easier were pandas, matplotlib.pyplot, seaborn, folium, and psql.
*All data is from zipcode 11234
This categorical bar chart is showing whether or not popular cars have blind-spot warnings and pedestrian detection.
This horizontal bar chart shows the most prominent causes of an accident. Many of these look like they can be mitigated with safety features equipped.
This interactive folium map shows the top 30 intersections with the most crashes. It's highlighting the intersections where the possibility of having all cars with some sort of safety features can make an impact.
These two line graphs show the relation between the number of crashes to the number of people injured and killed. The graph on the left is the actual data from NYC Open Data. The one on the right is the same as the left but with an 11% decrease in crashes and a 21% decrease in those injured with collision warning and automatic braking.
Citations
-
Motor Vehicle Collisions NYC Open Data: https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
-
Consumer Reports: https://www.consumerreports.org/car-safety/chevy-silverado-toyota-highlander-and-other-popular-cars-lack-standard-safety-features/
-
PKAttorney: https://kpattorney.com/vehicle-safety-features-make-driving-safer/