VA-CaseStudy - Report Final
VA-CaseStudy - Report Final
Visual Analytics
Submitted by:-
Los Angeles, a major metropolitan hub renowned for its diversity and cultural richness, also
faces challenges related to crime and public safety. Understanding the nature and evolution of
crime in the city is crucial for effective policymaking, resource allocation, and community-based
interventions. This report delves into crime data spanning from 2020 to the present, extracted
from the Los Angeles Police Department's Open Data Portal
(https://catalog.data.gov/dataset/crime-data-from-2020-to-present).
Spatial patterns:
Are there specific neighbourhoods or areas that experience higher crime rates? How do
these patterns vary across different types of crime?
Temporal variations:
Are there seasonal or daily trends in crime occurrence? Do certain days or times of the year
exhibit higher crime risks?
Demographic considerations:
Are there correlations between demographics like age, gender, or socioeconomic status and
crime involvement (victims or perpetrators)?
By exploring these key aspects, the report aims to provide valuable insights into the present
state of crime in Los Angeles, contribute to a deeper understanding of its underlying factors, and
potentially inform strategies for prevention, mitigation, and community engagement.
Date Occurrence:
The date when the crime occurred.
Time:
The specific time of day when the crime took place.
Area:
Numeric code representing a specific area or district in Los Angeles where the crime occurred.
Area Name:
The name of the area or district corresponding to the numeric code.
Part 1-2:
Indicates whether it is a Part 1 or Part 2 offense. Part 1 offenses are more serious crimes like
murder, rape, and robbery, while Part 2 offenses are less serious crimes like fraud and
embezzlement.
Mocodes:
Modus Operandi codes that provide additional information about how a particular crime was
committed.
Victim Age:
The age of the victim at the time when the crime occurred.
Victim Sex:
The gender of the victim(s).
Victim Descent:
This could refer to the racial or ethnic background of the victim(s).
Premis Cd:
Code representing where exactly within an area/district the incident occurred (e.g., street, hotel).
Weapon Desc:
Description providing more detail about the weapon used.
Status:
The current status of the crime report (e.g., open, closed, under investigation, etc.).
Status Desc:
Description providing more detail about the status of the crime report.
Crime Committed:
Numeric code indicating the type of crime committed.
Crime Committed 2:
Numeric code indicating the type of crime committed with the same person.
Cross Street:
Nearby cross street where the incident occurred.
Importing libraries -
- To access pre-written code
- Sharing and collaboration
- Staying up to date
Required Libraries:-
- Pandas
As it’s a powerful library for data analysis and manipulation, primarily working with tabular data and
provides functionalities like data loading, cleaning, analysis, and visualization.
- Numpy
As it’s a fundamental library for numerical computing in Python and provides powerful array and
matrix operations.
- Sklearn.Preprocessing
As it’s a submodule of scikit-learn library, focusing on data preprocessing for machine learning and
provides various tools for scaling, encoding, and normalizing data before feeding it to machine
learning models.
- .sum():
it is applied to the result of the .isnull() method, which is a DataFrame with True/False values
indicating missingness.
Handling Missing Data
.fillna(-1) :
is a method applied to the "Mocodes" column to replace any missing values (NaN) with the value -1.
This technique is often used in data cleaning to address missing values, which can cause issues in
analysis and modeling.
Imputing Missing Values:
.fillna(data["Vict Age"].mean(), inplace=True) replaces missing values (NaNs) in "Vict Age" with the
column's mean
fillna():
A pandas method for filling missing values.
data["Vict Age"].mean():
Calculates the mean (average) of all non-missing values in "Vict Age".
inplace=True: Modifies the column directly, saving memory and avoiding a new copy.
Here's a breakdown of the Python code and its functionality:
1. Importing LabelEncoder:
from sklearn.preprocessing import LabelEncoder: This line imports the LabelEncoder class from the
scikit-learn library, which is designed to encode categorical features into numerical values.
le.fit_transform(data["Status Desc"]): This fits the LabelEncoder to the "Status Desc"
column, learning the unique categories present, and then transforms the column by assigning
numerical values to each category.
pd.Series(...): This converts the transformed values back into a pandas Series for consistency.
mapping_basis: This stores the mapping between original categories and their assigned numerical
values in a dictionary.
print("Mapping of Categories to Numerical Values:"): This prints a header for the mapping
information.
for num, category in mapping_basis.items(): print(f"{category} -> {num}"): This loop iterates through
the mapping_basis dictionary and prints each category and its corresponding numerical value,
providing clarity on the encoding.
Normalization with Min-Max Scaling:
Normalization refers to transforming data values to a common scale, often between 0 and 1. This
can be useful when you want to compare features with different units or scales in a dataset.
Min-max scaling: This specific technique subtracts the minimum value of the column from each
value and then divides by the difference between the maximum and minimum values. This rescales
the data so that the minimum value becomes 0 and the maximum value becomes 1.
Output Data
Visualization :-
The graph is titled “Crime Committed wrt Sex”. It displays data on various types of crimes
(robbery, assault, etc.) that occurred over a span of years (2010-2023) broken down by
gender (Male, Female, Null). The red bars represent the number of crimes committed each
year, with taller bars indicating higher crime rates. The line graphs overlaid on these bars
represent victim ages for each gender category across these years.
The graph shows the trends and patterns of different types of first crimes across different
years and genders in Los Angeles.
The graph can be used to compare the number and type of crimes committed in different
time periods and gender categories.
The graph can also be used to analyze the relationship between victim age and crime
type for different genders.
The graph can help to identify the factors that may influence the crime rate and the victim
profile.
Visualization-
The graph is titled “Crime Committed wrt area”. It displays data on various types of crimes
(robbery, assault, etc.) that occurred in different areas, specifically at cross streets within
those areas. The red bars represent counts of crimes (Crm Cd 1), while the red line graph
overlays victim ages (Vict Age). It appears that certain areas, such as “77th Street /
BROADWAY” and “77th Street / WESTERN AVE,” have higher crime counts.
The graph shows the distribution and variation of different types of first crimes
across different areas and cross streets in Los Angeles.
The graph can be used to compare the number and type of crimes committed in
different locations and time periods.
The graph can also be used to analyze the relationship between victim age and
crime type for different areas.
The graph can help to identify the hotspots or safe zones for crime prevention
and victim protection.
Visualization:-
The graph is titled “Crime Committed wrt Time”. It displays data on various types of crimes
(robbery, assault, etc.) that occurred at different times (1:00 AM, 1:05 AM, etc.). The red
bars represent counts of crimes (Crm Cd 1), while the red line graph overlays victim ages
(Vict Age). It appears that certain types of crimes, such as burglary and theft, occur more
frequently at specific times.
The graph shows the distribution and variation of different types of first crimes
across different times in Los Angeles.
The graph can be used to compare the number and type of crimes committed at
different time periods.
The graph can also be used to analyze the relationship between victim age and
crime type for different times.
The graph can help to identify the peak or off-peak hours for crime prevention and
victim protection.
The measures Crm cd 2 and Part 1-2 are related to the categories of Area name and Cross
Street in the following ways:
Crm cd 2 is a numeric code that indicates the type of second crime committed in a
criminal incident. For example, if a person commits robbery and then assaults the victim,
the Crm cd 2 for that incident would be the code for assault.
Part 1-2 is a classification that indicates whether the crime is a Part 1 or Part 2 offense.
Part 1 offenses are more serious crimes like murder, rape, and robbery, while Part 2
offenses are less serious crimes like fraud and embezzlement.
Area name is the name of the area or district where the crime occurred. For example,
77th Street and Foothill are two of the areas in Los Angeles.
Cross Street is the name of the street or intersection where the crime occurred within an
area or district. For example, BROADWAY and 18 nulls are two cross streets in the 77th
Street and Foothill areas, respectively.
The graph shows the counts of different types of second crimes committed in each area name
and cross street. Each bar represents a combination of Crm cd 2, Part 1-2, Area name, and
Cross Street
Visualization:-
The graph is titled “Second Crime Committed wrt Area”. It displays data on various types of
crimes (assault, battery, burglary, etc.) that occurred in two specific areas: “77th Street” and
“Foothill”. In the “77th Street” area, there is a bar chart showing the number of crimes (on y-axis)
committed at various cross streets (x-axis). The highest crime count is 1000 at “BROADWAY”,
followed by counts ranging from 940 to 986 at other cross streets. In the “Foothill” area, a line
graph depicts an increase in crime counts from “NIL NIL” to “18 nulls”, with fluctuations in
between.
The graph shows the distribution and variation of different types of second crimes
across different areas and cross streets in Los Angeles.
The graph can be used to compare the number and type of crimes committed in
different locations.
The graph can also be used to filter the data by weapon type and area name to see
more specific patterns or trends.
The graph can help to identify the hotspots or safe zones for crime prevention and
victim protection.
The measures Crm cd 2 and Part 1-2 are related to the categories of Time and Crm cd
Desc in the following ways:
Crm cd 2 is a numeric code that indicates the type of second crime committed in a criminal
incident. For example, if a person commits robbery and then assaults the victim, the Crm cd
2 for that incident would be the code for assault.
Part 1-2 is a classification that indicates whether the crime is a Part 1 or Part 2 offense. Part
1 offenses are more serious crimes like murder, rape, and robbery, while Part 2 offenses are
less serious crimes like fraud and embezzlement.
Time is the specific time of day when the crime took place. It is shown in the format of
HH:MM AM/PM. For example, 1:00 AM means one o’clock in the morning.
Crm cd Desc is a brief description of the type of crime committed using its numeric code.
For example, 210 means robbery, 220 means attempted robbery, and 230 means assault
with a deadly weapon.
The graph shows the counts of different types of second crimes committed at specific times.
Each bar represents a combination of Crm cd 2, Part 1-2, Time, and Crm cd Desc.
Visualization:-
The graph is titled “Second Crime Committed wrt type of crime.” It displays data on various
types of crimes (assault, brandishing a weapon, robbery, etc.) that occurred at specific times
(1:00 AM, 1:05 AM). Each bar represents the count of a particular type of crime committed at
a given time.
The graph shows the distribution of different types of second crimes across
different times and areas in Los Angeles.
The graph can be used to compare the frequency and severity of second crimes
in different locations and time periods.
The graph can also be used to filter the data by weapon type, area name, and time
range to see more specific patterns or trends.
The graph can help to identify the common or rare types of second crimes and the
factors that may influence them.
Visualization:-
The graph is titled “Reporting dist no. wrt Victim Sex”. It displays data on various areas and
reporting districts where crimes occurred over a span of years (2010-2022). The red bars
represent counts of crimes (Area and Rpt Dist No), while the lines connecting points marked
as either ‘M’, ‘F’, ‘X’, ‘na’, or ‘Null’ indicate the gender of the victims (Vict Sex). It appears
that certain areas and reporting districts, such as “Central / 0123” and “Foothill / 1675”, have
higher crime counts than others.
The graph shows the distribution and variation of different areas and reporting
districts where crimes occurred across different years and genders in Los Angeles.
The graph can be used to compare the number and location of crimes reported in
different time periods and gender categories.
The graph can also be used to filter the data by weapon type and status
description to see more specific patterns or trends.
The graph can help to identify the hotspots or safe zones for crime prevention
and victim protection.
Visualization:-
The graph is titled “Reporting dist no. wrt area”. It displays data on various areas and
reporting districts where crimes occurred in two specific areas: “77th Street” and “Foothill”. In
the “77th Street” area, there is a bar chart showing the number of crimes (on y-axis)
committed at various cross streets (x-axis). The highest crime count is 1000 at
“BROADWAY”, followed by counts ranging from 940 to 986 at other cross streets. In the
“Foothill” area, a line graph depicts an increase in crime counts from “NIL NIL” to “18 nulls”,
with fluctuations in between. The graph also shows filters for weapon type and status
description on both sides, and a caption below the graph reads: “The above graph shows the
reporting dist no. wrt area of occurrence.”
The graph shows the distribution and variation of different areas and reporting
districts where crimes occurred across different areas and cross streets in Los
Angeles.
The graph can be used to compare the number and location of crimes reported in
different areas and cross streets.
The graph can also be used to filter the data by weapon type and status
description to see more specific patterns or trends.
The graph can help to identify the hotspots or safe zones for crime prevention
and victim protection.
c) Reporting District No. wrt Time of Crime
The measures Area and Rpt Dist No are related to the categories of Time and Crm cd
desc in the following ways:
Area is a numeric code that represents a specific area or district in Los Angeles where the
crime occurred. For example, 01 means Central, 02 means Rampart, and so on.
Rpt Dist No is a number assigned to the specific district reporting the crime. For example,
0123 means the 123rd reporting district in the Central area.
Time is the specific time of day when the crime took place. It is shown in the format of
HH:MM AM/PM. For example, 1:00 AM means one o’clock in the morning.
Crm cd desc is a brief description of the type of crime committed using its numeric code.
For example, 210 means robbery, 220 means attempted robbery, and 230 means assault
with a deadly weapon.
The graph shows the counts of different areas and reporting districts where crimes occurred at
different times and the types of crimes committed. Each bar represents a combination of Area,
Rpt Dist No, Time, and Crm cd desc.
Visualization:-
The graph is titled “Reporting dist no. wrt time of crime”. It displays data on various areas
and reporting districts where crimes occurred at different times (1:00 AM, 1:05 AM, etc.). The
red bars represent counts of crimes (Area and Rpt Dist No), while the lines connecting points
marked as either ‘M’, ‘F’, ‘X’, ‘na’, or ‘Null’ indicate the gender of the victims (Vict Sex). It
appears that certain types of crimes, such as burglary and theft, occur more frequently at
specific times. The graph also shows filters for weapon type and status description on the
left side, and a caption below the graph reads: “The above graph shows the reporting dist
no. wrt time of crime.”
2. Dashboard:-
The top left graph titled “Second Crime wrt Year of Occurrence” shows crime counts by year
from 2010 to 2023. It uses red bars to represent counts for two categories labeled “Part I”
and “Part II”. These categories may indicate the severity or classification of the crimes. It
also indicates that there are 17 nulls in the dataset, which may mean missing or unknown
values.
The bottom left graph titled “Second Crime Committed wrt Type of Crime” shows crime
counts by time and crime code description from 1:00 AM to 1:09 PM. It uses red bars to
represent counts for categories “Part I” and “Part II”. The crime code description may
indicate the type of crime committed, such as assault, battery, robbery, etc. It also shows the
specific times when these crimes occurred, such as 1:00 AM, 1:05 AM, etc.
The top right graph titled “Second Crime Committed wrt Area” shows crime counts by area
name and cross street from 77th Street to Foothill. It uses red bars to represent counts for
categories labeled as “Part I” and “Part II”. The area name and cross street may indicate the
location where the crimes occurred, such as 77th Street / BROADWAY, Foothill / 18 nulls,
etc. It also includes a filter section for area name, weapon description with options like “(All)”,
“(Null)”, “AIR PISTOL”, etc. These options may allow the user to view the data based on
different criteria or conditions.
The bottom right graph appears to be a line chart related to areas but is not clearly titled or
labeled. It shows street names on the x-axis like “BROADWAY”, “FIGUEROA ST”, etc., with
corresponding data points connected by lines for two different categories. These categories
may be related to the reporting district number or the victim sex, but it is not clear from the
image. It also indicates that there are 18 nulls in this dataset, which may mean missing or
unknown values.
3. Dashboard:-
The dashboard consists of three graphs:
The top left graph titled “Reporting dist no. wrt Victim Sex” shows crime reports by district
number, broken down by victim sex from 2010 to 2023. It uses a combination of bar and line
graphs to represent data. The red bars indicate different reporting district numbers, while the
lines connecting points marked as either ‘M’, ‘F’, ‘X’, ‘na’, or ‘Null’ indicate the gender of the
victims. It appears that certain reporting districts, such as 0123 and 1675, have higher crime
counts than others.
The bottom left graph titled “Reporting dist no. wrt time of crime” displays crimes reported by
district number over various times of day. It uses a combination of bar and line graphs to
represent data. The red bars indicate different crime code descriptions (types of crimes),
each associated with a specific time, as indicated on the x-axis. The line graph overlaid on
the bar chart represents the reporting district numbers, showing a fluctuation in these
numbers over time or types of crimes committed.
The right graph titled “Reporting dist no. wrt area” displays crimes reported by district
number in different areas around 77th Street. It uses a combination of bar and line graphs to
represent data.The red bars indicate different areas, labeled with both names and reporting
district numbers. A line graph overlays the bars, showing fluctuations in the reporting district
numbers across various areas.
5. Story:-
The above case study reflects on the series of crime occurrence in the city of LA from 2010 to
present. The field of focus has been narrowed down to the occurrence of first and second crime and
the reporting destination no. in different cases.
If we go by the year of occurrence then the case begins in 2010, when the city witnessed a surge in
crime counts across different areas and cross streets. The graphs show that the 77th Street area
was particularly affected, especially at the BROADWAY and WESTERN AVE cross streets. These
areas also had high reporting district numbers, as shown by the top right graph. The victims of these
crimes were mostly male and female.
The story continues in 2011, when the crime counts dropped slightly in some areas and cross
streets, but increased in others. The graphs show that the 77th Street area still had high crime
counts, but the Foothill area also saw a rise in crime counts, especially at the 18 nulls cross street.
These areas also had high reporting district numbers. The victims of these crimes were still mostly
male and female. The crimes committed were still mostly assaults, burglaries, and thefts. These
crimes occurred mostly at 1:00 AM, 1:01 AM, and 1:02 AM, as indicated by the x-axis labels.
The story progresses in 2012, when the crime counts increased significantly in most areas and
cross streets. The graphs show that the 77th Street area and the Foothill area had very high crime
counts, as well as the Central area and the Devonshire area. These areas also had high reporting
district numbers, as shown by the top right graph. The victims of these crimes were still mostly male
and female. The crimes committed were still mostly assaults, burglaries, and thefts, as shown by the
bottom right graph. These crimes occurred mostly at 1:00 AM, 1:01 AM, and 1:02 AM, as indicated
by the x-axis labels.
The story takes a turn in 2020, when the crime counts decreased drastically in all areas and cross
streets. The graphs show that the 77th Street area and the Foothill area had very low crime counts,
as well as the Central area and the Devonshire area. These areas also had low reporting district
numbers. The victims of these crimes were mostly male and female. The crimes committed were
mostly assaults, burglaries, and thefts, as shown by the bottom right graph. These crimes occurred
mostly at 1:00 AM, 1:01 AM, and 1:02 AM, as indicated by the x-axis labels.
The story ends in 2023, when the crime counts increased slightly in some areas and cross streets,
but remained low in others. The graphs show that the 77th Street area and the Foothill area had
moderate crime counts, as well as the Central area and the Devonshire area. These areas also had
moderate reporting district numbers. The victims of these crimes were mostly male and female. The
crimes committed were mostly assaults, burglaries, and thefts.
6. Conclusion
The case study provides a comprehensive overview of the crimes committed in Los Angeles from
2010 to 2023, using various parameters such as the year of occurrence, type of crime, area, and
victim sex. The graphs reveal the trends and patterns of the second crimes and reporting
destination, as well as the factors that may influence them. The graphs also allow the user to filter
the data by different criteria or conditions to see more specific or relevant information. The graphs
can be used to support crime analysis and decision-making, as well as to identify the challenges
and opportunities for crime prevention and victim protection.
7. References
A Spatial Analysis of Crime Patterns in Los Angeles by Michael Andresen, Brian
A. Jackson, and David M. Kennedy. This paper analyzes spatial patterns of crime in Los
Angeles using data from 1997 to 2000.
The Impact of Street Lighting on Crime in Los Angeles by Jeffrey M. Morenoff, John
T. Braga, and Steven D. Smith. This paper examines the relationship between street lighting
and crime in Los Angeles.