Python - Report - Shayan & Shivani
Python - Report - Shayan & Shivani
Prepared By: -
Shayan Ahmed
(Marketing)
Shivani Yadav
(Marketing)
Submitted To
Dr. Nagaraj S
Associate Professor
Alliance University Bangalore
ACKNOWLEDGEMENT
We would like to express our sincere gratitude to all those who have contributed to
the completion of this project report titled "Python Programming and Data Analysis
on rape case victim in india”.
First, we extend our heartfelt thanks to our mentor, Mr. Nagaraj S, for his invaluable
guidance, support, and encouragement throughout this project. His expertise,
insights, and unwavering commitment have been instrumental in shaping our
understanding of Python programming, data analysis techniques, and exploratory
data analysis (EDA).
This project has been a rewarding learning experience for us, and we are grateful to
everyone who has been a part of our journey.
Thank you.
Sincerely,
Shayan Ahmed
Shivani Yadav
“Marketing”
CERTIFICATE OF DECLARATION
This is to certify that Shayan Ahmed and Shivani Yadav have successfully completed
a project on "Python Programming and Data Analysis of Rape victim cases in India"
under the guidance of Mr. Nagaraj S.
During this project, the students showed proficiency in various areas. Firstly, they
exhibited a thorough understanding of Python programming language,
encompassing its syntax, data structures, and control flow mechanisms. Their grasp
of Python's versatility and readability was evident in the implementation of various
programming tasks.
In terms of data analysis techniques, Shayan Ahmed and Shivani Yadav showcased
their ability to effectively apply various methods using Python libraries such as
Pandas, NumPy, Matplotlib, Seaborn, and Plotly. They demonstrated proficiency in
importing, cleaning, manipulating, and visualizing data to derive meaningful insights.
One of the highlights of their project was the comprehensive exploratory data
analysis (EDA) conducted on the "rape case victim in India" dataset. Through EDA,
the students displayed their capability to understand data characteristics, identify
patterns, detect anomalies, and communicate insights through visualization.
Nagaraj
Associate Professor
PYTHON
Python is a general-purpose language, used to create a range of applications, including data
science, software and web development, automation, and improving the ease of everyday
tasks. Python is an experiment in how much freedom program-meres need. “Too much
freedom and nobody can read another's code; too little and expressive-ness is endangered.”
It is a computer programming language often used to build websites and software, automate
tasks, and analyse data. Python is a general-purpose language, not specialized for any
specific problems, and used to create various programmes.
Python is an easy to learn, powerful programming language. It has efficient high- level data
structures and a simple but effective approach to object-oriented programming. Python's
elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
Why to use Python
• Python is object-oriented Structure supports such concepts as polymorphism, operation
overloading and multiple inheritance.
• Indentation is one of the greatest features in python.
• It is free (open source) Downloading python and installing python is free and easy.
Functions of Pandas
a) Data visualization: - Pandas, despite lacking visualization capabilities, seamlessly
integrates with popular data visualization libraries like Matplotlib and Seaborn to generate
informative plots and charts directly from Data Frame objects.
B) Time series analysis: - Pandas, despite lacking visualization capabilities, seamlessly
integrates with popular data visualization libraries like Matplotlib and Seaborn to generate
informative plots and charts directly from Data Frame objects.
c) Data Structures: Pandas provides two primary data structures: Series and Data Frame.
Series is a one-dimensional labelled array, while Data Frame is a two-dimensional labelled
data structure resembling a table or spreadsheet.
d) Data Manipulation: Pandas allows for flexible data manipulation operations such as
indexing, slicing, filtering, sorting, joining, merging, reshaping, and pivoting. These
operations enable users to clean, transform, and preprocess data efficiently.
Technical Advantages of Python:
Python offers several technical advantages that contribute to its widespread adoption and
popularity among developers:
Simplicity: Python's clean and straightforward syntax makes it easy to learn and read,
reducing the time and effort required to write and maintain code.
Extensive Standard Library: Python comes with a comprehensive standard library that
provides a wide range of modules and functions for tasks such as file I/O, networking, data
manipulation, and more, allowing developers to accomplish complex tasks with minimal
code.
Community Support: Python has a vibrant and active community of developers, enthusiasts,
and contributors who continually create and share libraries, frameworks, and resources to
enhance Python's capabilities and solve real-world problems.
Data Cleaning and Preparation: Pandas offers a wide range of functions for handling missing
data, reshaping data, and performing data transformations, making it a valuable tool for data
cleaning and preparation tasks.
Data Analysis: Pandas provides powerful methods for statistical analysis, grouping,
aggregation, and computation of summary statistics, enabling users to gain insights from
their data quickly.
Integration with Other Libraries: Pandas seamlessly integrates with other libraries in the
Python ecosystem, such as NumPy, scikit-learn, and matplotlib, making it a versatile tool for
data analysis and machine learning workflows.
MATPLOTLIB
Matplotlib is a widely used Python library for creating static, interactive, and animated
visualizations in Python. It provides a MATLAB-like interface and is highly customizable,
making it a popular choice among scientists, engineers, data analysts, and researchers for
data visualization tasks.
It was introduced by John Hunter in the year 2002.
It can be used in python scripts, shell, web application, and another graphical user interface
toolkit.
One of the greatest benefits of Matplotlib for visualization is that it allows us visual access to
huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like bar, Pie, Scatter, histogram, Area, Line etc.
Features of Matplotlib
a) Simple and Flexible: Matplotlib provides a simple interface to quickly create various types
of plots like line plots, scatter plots, bar plots, histograms, 3D plots, etc.
b) Customization: Users have fine-grained control over every aspect of the plot, including
line styles, markers, colours, fonts, axis properties, annotations, etc.
c)Support for Latex: Matplotlib supports LaTeX formatting for text rendering, allowing users
to incorporate mathematical expressions and symbols seamlessly into their plots.
d)Integration with pandas: Matplotlib works well with pandas, a popular data analysis library
in Python, allowing users to create plots directly from pandas Data Frame objects.
e) Matplotlib pyplot Interface: Matplotlib provides a MATLAB-like interface through its pyplot
module, which simplifies the process of creating plots by automatically creating and
managing figures and axes.
EDA
Exploratory Information Examination (EDA) is a crucial step in the information investigation
process, aiming to understand the characteristics of a dataset before applying formal factual
strategies or building models. The primary objectives of EDA are to identify designs,
connections, peculiarities, and knowledge within the information, guiding further examination
and navigation.
Python offers a rich ecosystem of libraries and tools for conducting EDA efficiently and
effectively. Key libraries for EDA in Python include:
· Pandas: Pandas provide powerful data structures and functions for data
manipulation and analysis, making it well-suited for exploratory data analysis tasks
such as data cleaning, summarization, and transformation.
· NumPy: NumPy is a fundamental library for numerical computing in Python, offering
support for array operations, mathematical functions, and linear algebra. It
complements Pandas by providing efficient handling of numerical data and arrays.
· Matplotlib: Matplotlib is a versatile library for creating static, animated, and
interactive visualizations in Python. It offers a wide range of plotting functions and
customization options for exploring data visually.
· Seaborn: Seaborn is a statistical data visualization library built on top of Matplotlib,
offering high-level functions for creating informative and attractive statistical graphics.
It simplifies the process of creating complex visualizations and supports advanced
statistical plotting.
· Plotly: Plotly is a powerful library for creating interactive and web-based
visualizations in Python. It enables the creation of interactive dashboards, plots, and
charts that can be shared and explored online.
Data visualization plays a crucial role in EDA by providing a visual representation of the data
that facilitates exploration and interpretation. Effective data visualization techniques include:
1. Data Manipulation
2. Renaming the column: -
3. Checking the Null Values: -
Result Analysis: -
Data Integrity: Renaming columns helps in making the data more
understandable and manageable. Identifying null values is crucial for data
integrity, ensuring that missing data is handled appropriately.
Data Visualization (Scatter Plot): The scatter plot depicts the distribution of
reported rape cases over time. It provides a visual representation of the trend and
any potential outliers or patterns in the data.
Insights from the Scatter Plot:
Trend Identification: The scatter plot helps in identifying any trends or patterns
in the reported rape cases over the years. For instance, if there's a consistent
increase or decrease in the number of cases over time.
Outlier Detection: Outliers, if present, can be identified from the scatter plot.
These outliers might indicate specific years with exceptionally high or low
reported cases, which could warrant further investigation.
Further Analysis: Additional analysis could involve exploring the distribution of
reported cases across different locations (Area_Name), examining the
demographics of the victims, and identifying any correlations or patterns.
Overall, the provided analysis gives a foundational understanding of the data and
sets the stage for further exploration and in-depth analysis to uncover insights
and trends related to reported rape cases.
CONCLUSION
In conclusion, the analysis of reported rape cases offers valuable insights into the
prevalence and distribution of such incidents over time. Renaming columns and
identifying null values ensure data integrity and facilitate clearer interpretation of
the dataset. The scatter plot visualization illustrates the distribution of reported
cases across different years, aiding in the identification of trends and patterns.
Further exploration could delve into geographical variations and victim
demographics, providing deeper insights for targeted interventions. The insights
gleaned from this analysis can inform policy decisions and resource allocation to
address and prevent sexual violence effectively. However, it's essential to
acknowledge the dataset's limitations, such as underreporting and reporting
inconsistencies, which may impact the comprehensiveness of the analysis.
Despite these limitations, continued monitoring and analysis are vital for
addressing the complex challenges associated with combating sexual violence
and promoting a safer society.
THANK YOU