Project Overview
Github page: UCB-stat-159-s23/project-group19
In our project, we explored the Motor Vehicles Collisions dataset which comes from the city of New York. The dataset contains information on motor vehicle crashes that occured from 2014-2022.
Motor crashes are a major public health and safety concern, particularly in urban areas with high traffic volumes such as New York City. In recent years, the city has seen a significant increase in the number of motor crashes, leading to a growing concern about their impact on public safety, transportation infrastructure, and economic productivity. To address these concerns, data analysis can play a crucial role in identifying the factors that contribute to motor crashes and developing effective strategies for prevention and mitigation.
This paper aims to analyze the data on motor crashes in New York City over the past few years, with a focus on identifying the key factors associated with these incidents and proposing actionable solutions to reduce their frequency and severity. We aim to find a relationship between the data included and how severe the crashes are for people. We hope the analysis conducted in the project will lead key desicion makers to improving safety and road rules.
Dataset
The dataset can be found at this link:
https://catalog.data.gov/dataset/motor-vehicle-collisions-crashes.
Environment
Useful Commands
Creating an environment from an environment.yml file:
conda env create -f environment.yml
conda activate final_proj
conda install ipykernel
python -m ipykernel install --user --name final_proj --display-name "IPython - final_proj
To see the Jupyter Book by running
cd _build/html
python -m http.server
and then heading to this URL: https://stat159.datahub.berkeley.edu/user-redirect/proxy/8000/index.html.
To build the JupyterBook
jupyter-book build .
jupyter-book config sphinx .
sphinx-build . _build/html -D html_baseurl=${JUPYTERHUB_SERVICE_PREFIX}/proxy/absolute/8000
pip install ghp-import
ghp-import -n -p -f _build/html
Testing
To test the analysis functions, navigate to the root directory and run pytest
.
Running the JupyterBook
You can run the JupyterBook with this link (also above):
https://mybinder.org/v2/gh/UCB-stat-159-s23/project-group19.git/HEAD
Opening the JupyterBook
You can open the JupyterBook with this link (also above):
https://ucb-stat-159-s23.github.io/project-group19/Main.html
Repository Structure
The repository is structured as follows:
data: Contains the raw and processed datasets we used.
tool: Contains utils.py, housing the functions used in the Analysis notebook. In this folder as well is a tests folder to run tests on these functions.
Analysis.ipynb: Contains the code we wrote to analyze the data. We did EDA and modelling.
Main.ipynb: Contains a summary of the code and analysis of the EDA and modelling from Analysis.ipynb
environment.yml: Contains all the packages and dependencies needed for the project.
Makefile: Contains all the information needed to build a JupyterBook for this project.