0% found this document useful (0 votes)
21 views5 pages

SF Crime Reports

Uploaded by

sifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

SF Crime Reports

Uploaded by

sifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SF Crime

Reports
SF Police Department Incident Reports

CitySafe Analytics
Data Engineering
Group 2
Data Pipeline Status Report

Overview
This project aims to build an analytics pipeline to unlock insights from San Francisco crime
data. The pipeline will ingest incident data, process it, and load it into a cloud data warehouse.
Tableau will connect to the warehouse for interactive analysis and dashboards.

Challenges
● Data volume required optimizing BigQuery schema design and query performance.
● Initial API data transfers had reliability issues causing fallback to CSV loads.
● The high number of columns coupled with significant missing values necessitated
considerable effort in data cleaning and imputation to ensure data quality and integrity.
● Load the data into Google BigQuery according to the format of dataset
Successes
● Extract & store the data to Cloud Storage and load to Google BigQuery
● Connect the data from Cloud to Tableau
● Able to load & clean the data using Collaborative

A high-level diagram of your data system or pipeline. Include inputs and outputs,
and where your data is arriving from:
A high-level diagram that discusses the functional components of your system:

Input
● Raw data from CSV file from SF Police government website

Output
● Processed and cleaned data
● Results and insights of Data analysis with visualization
● Reports to the stakeholder

Functional Components:

❖ Data Sources:
➢ CSV files(Python, Pyspark)
❖ Data Ingestion:
➢ ETL (Extract, Transform, Load) processes (Google BigQuery, Collaborative)
❖ Data Storage:
➢ Data lakes (Google cloud)
❖ Data Preprocessing:
➢ Cleaning, data transformation, feature engineering, data normalization (Google
BigQuery, Collaborative)
❖ Data Analysis:
➢ Data exploration, Statistical analysis (Collaborative)
❖ Visualization and Reporting:
➢ Charts, graphs, dashboards, summary reports, interactive visualizations
(Tableau)
Exploration of Dataset

A high-level characterization of the datasets and data schema you will be


ingesting, processing, and finally delivering:
Data Sources:

● Primary Source: "Incident Reports" from San Francisco Police Department.


● Key Information: Date and time, incident categories, location, and incident-specific
details.

Data Transformation (ETL):

● Phase Purpose: Clean, enrich, and structure raw incident data for analysis.
● Transformed Data Schema:
○ Incident Datetime: Date and time of the incident.
○ Incident Category: Categorization of the incident (e.g., theft, assault).
○ Incident Subcategory: Further classification within categories.
○ Intersection: Location of the incident.
○ Police District: Jurisdictional district where the incident occurred.
○ Analysis Neighborhood: Neighborhood in which the incident is situated.
○ Supervisor District: Political district information.
○ Latitude and Longitude: Geographic coordinates of the incident location.

Data Analysis:

● Phase Purpose: Apply data analysis techniques.


● Anticipated Analyzed Data Schema:
○ Predicted Crime Locations: Predictions about future incident locations.
○ Resolution Status: Information on case closure status.
○ Incident Descriptions: Detailed information about incident nature.

Data Delivery (Reporting & Dashboards):

● Phase Purpose: Deliver analyzed data to law enforcement agencies and stakeholders.
● Data Schema for Reporting and Dashboards:
○ Structured reports and dashboards that summarize key findings and insights.
○ Resource allocation recommendations based on incident patterns.

Reference

Police Department Incident Reports: 2018 to Present | DataSF | City and County of San
Francisco (sfgov.org)
https://openaccess.thecvf.com/content/ICCV2021/papers/He_Inferring_High-Resolution_Traf
[…]t_Risk_Maps_Based_on_Satellite_Imagery_ICCV_2021_paper.pdf
https://www.csail.mit.edu/news/deep-learning-helps-predict-traffic-crashes-they-happen

https://plotly.com/python/hexbin-mapbox/

dataengineering_project: Lucidchart

You might also like