0% found this document useful (0 votes)
43 views14 pages

Python - Report - Shayan & Shivani

The document discusses a mini report on using Python programming and data analysis to study rape victim cases in India. It includes an acknowledgment, certificate of declaration, history of Python, overview of Python and its libraries like Pandas and Matplotlib, and advantages of Python.

Uploaded by

ydshivani07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views14 pages

Python - Report - Shayan & Shivani

The document discusses a mini report on using Python programming and data analysis to study rape victim cases in India. It includes an acknowledgment, certificate of declaration, history of Python, overview of Python and its libraries like Pandas and Matplotlib, and advantages of Python.

Uploaded by

ydshivani07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Course: Digital Business – Python Programming

Mini Report on "Python Programming and Data Analysis of


Rape victim cases in India"

Prepared By: -

Shayan Ahmed
(Marketing)
Shivani Yadav
(Marketing)

Submitted To
Dr. Nagaraj S
Associate Professor
Alliance University Bangalore
ACKNOWLEDGEMENT

We would like to express our sincere gratitude to all those who have contributed to
the completion of this project report titled "Python Programming and Data Analysis
on rape case victim in india”.

First, we extend our heartfelt thanks to our mentor, Mr. Nagaraj S, for his invaluable
guidance, support, and encouragement throughout this project. His expertise,
insights, and unwavering commitment have been instrumental in shaping our
understanding of Python programming, data analysis techniques, and exploratory
data analysis (EDA).

Furthermore, we would like to acknowledge the contributions of our peers, friends,


and family members who have provided us with moral support, encouragement, and
understanding during the course of this project.

Additionally, we express our gratitude to the authors of the various resources,


tutorials, and documentation materials that we have referenced and consulted
throughout our learning journey in Python programming and data analysis.

This project has been a rewarding learning experience for us, and we are grateful to
everyone who has been a part of our journey.

Thank you.

Sincerely,

Shayan Ahmed

Shivani Yadav

“Marketing”
CERTIFICATE OF DECLARATION
This is to certify that Shayan Ahmed and Shivani Yadav have successfully completed
a project on "Python Programming and Data Analysis of Rape victim cases in India"
under the guidance of Mr. Nagaraj S.

During this project, the students showed proficiency in various areas. Firstly, they
exhibited a thorough understanding of Python programming language,
encompassing its syntax, data structures, and control flow mechanisms. Their grasp
of Python's versatility and readability was evident in the implementation of various
programming tasks.

In terms of data analysis techniques, Shayan Ahmed and Shivani Yadav showcased
their ability to effectively apply various methods using Python libraries such as
Pandas, NumPy, Matplotlib, Seaborn, and Plotly. They demonstrated proficiency in
importing, cleaning, manipulating, and visualizing data to derive meaningful insights.

One of the highlights of their project was the comprehensive exploratory data
analysis (EDA) conducted on the "rape case victim in India" dataset. Through EDA,
the students displayed their capability to understand data characteristics, identify
patterns, detect anomalies, and communicate insights through visualization.

In terms of visualization techniques, Shayan Ahmed and Shivani Yadav utilized a


diverse range of methods, including bar charts, histograms, scatter plots, and pie
charts. They effectively represented and analyzed data, showcasing their ability to
select appropriate visualization methods based on the nature of the data and
analysis goals.

Furthermore, the students meticulously documented their project work, covering


aspects such as the introduction to Python, history of Python, technical advantages,
Python libraries for data analysis, exploratory data analysis process goals, analysis
results, and conclusion. Their documentation was clear, organized, and insightful,
reflecting their dedication and attention to detail.

Shayan Ahmed and Shivani Yadav have demonstrated exceptional dedication,


analytical skills, and proficiency in Python programming and data analysis. They
have successfully completed the project requirements and have showcased their
ability to apply theoretical concepts to practical scenarios effectively.

Nagaraj

Associate Professor

Alliance University, Bangalore.


HISTORY OF PYTHON
Python, a high-level programming language, was created by Guido van Rossum in the late
1980s with a focus on simplicity and readability. Its first version, Python 0.9.0, was released
in February 1991, and subsequent versions introduced new features and improvements.
Python 2.0, released in October 2000, marked a significant milestone, followed by Python
3.0 in December 2008, which introduced backward-incompatible changes. Despite initial
resistance to migration, Python 3.x became the standard, leading to widespread adoption
across industries. Today, Python is renowned for its versatility and is widely used in web
development, data science, artificial intelligence, and more. The Python Software
Foundation, established in 2001, supports Python's development and community, further
solidifying its position as one of the most popular and powerful programming languages
globally.

PYTHON
Python is a general-purpose language, used to create a range of applications, including data
science, software and web development, automation, and improving the ease of everyday
tasks. Python is an experiment in how much freedom program-meres need. “Too much
freedom and nobody can read another's code; too little and expressive-ness is endangered.”
It is a computer programming language often used to build websites and software, automate
tasks, and analyse data. Python is a general-purpose language, not specialized for any
specific problems, and used to create various programmes.
Python is an easy to learn, powerful programming language. It has efficient high- level data
structures and a simple but effective approach to object-oriented programming. Python's
elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
Why to use Python
• Python is object-oriented Structure supports such concepts as polymorphism, operation
overloading and multiple inheritance.
• Indentation is one of the greatest features in python.
• It is free (open source) Downloading python and installing python is free and easy.

PYTOHN AND ITS LIBRARY


PANDAS
• Pandas is a Python library used for working with data sets. The term "Pandas" refers to an
open-source library for manipulating high-performance data in Python.
• The name "Pandas" has a reference to both "Panel Data" and "Python Data Analysis".
• It was created in 2008 by Wes McKinney and is used for data analysis in Python.

• Pandas is an open-source library that provides high-performance data manipulation in


Python.
• NumPy, SciPy, Cython, and Panda are just a few of the fast data processing tools
available.
• Five crucial steps that are necessary for processing and analyzing it: load, manipulate,
prepare, model, and analyze.

Functions of Pandas
a) Data visualization: - Pandas, despite lacking visualization capabilities, seamlessly
integrates with popular data visualization libraries like Matplotlib and Seaborn to generate
informative plots and charts directly from Data Frame objects.
B) Time series analysis: - Pandas, despite lacking visualization capabilities, seamlessly
integrates with popular data visualization libraries like Matplotlib and Seaborn to generate
informative plots and charts directly from Data Frame objects.
c) Data Structures: Pandas provides two primary data structures: Series and Data Frame.
Series is a one-dimensional labelled array, while Data Frame is a two-dimensional labelled
data structure resembling a table or spreadsheet.
d) Data Manipulation: Pandas allows for flexible data manipulation operations such as
indexing, slicing, filtering, sorting, joining, merging, reshaping, and pivoting. These
operations enable users to clean, transform, and preprocess data efficiently.
Technical Advantages of Python:

Python offers several technical advantages that contribute to its widespread adoption and
popularity among developers:

Simplicity: Python's clean and straightforward syntax makes it easy to learn and read,
reducing the time and effort required to write and maintain code.

Versatility: Python is a general-purpose programming language that supports multiple


programming paradigms, including procedural, object-oriented, and functional programming.

Extensive Standard Library: Python comes with a comprehensive standard library that
provides a wide range of modules and functions for tasks such as file I/O, networking, data
manipulation, and more, allowing developers to accomplish complex tasks with minimal
code.

Community Support: Python has a vibrant and active community of developers, enthusiasts,
and contributors who continually create and share libraries, frameworks, and resources to
enhance Python's capabilities and solve real-world problems.

Data Cleaning and Preparation: Pandas offers a wide range of functions for handling missing
data, reshaping data, and performing data transformations, making it a valuable tool for data
cleaning and preparation tasks.

Data Analysis: Pandas provides powerful methods for statistical analysis, grouping,
aggregation, and computation of summary statistics, enabling users to gain insights from
their data quickly.

Integration with Other Libraries: Pandas seamlessly integrates with other libraries in the
Python ecosystem, such as NumPy, scikit-learn, and matplotlib, making it a versatile tool for
data analysis and machine learning workflows.
MATPLOTLIB
Matplotlib is a widely used Python library for creating static, interactive, and animated
visualizations in Python. It provides a MATLAB-like interface and is highly customizable,
making it a popular choice among scientists, engineers, data analysts, and researchers for
data visualization tasks.
It was introduced by John Hunter in the year 2002.
It can be used in python scripts, shell, web application, and another graphical user interface
toolkit.
One of the greatest benefits of Matplotlib for visualization is that it allows us visual access to
huge amounts of data in easily digestible visuals.
Matplotlib consists of several plots like bar, Pie, Scatter, histogram, Area, Line etc.
Features of Matplotlib
a) Simple and Flexible: Matplotlib provides a simple interface to quickly create various types
of plots like line plots, scatter plots, bar plots, histograms, 3D plots, etc.
b) Customization: Users have fine-grained control over every aspect of the plot, including
line styles, markers, colours, fonts, axis properties, annotations, etc.
c)Support for Latex: Matplotlib supports LaTeX formatting for text rendering, allowing users
to incorporate mathematical expressions and symbols seamlessly into their plots.
d)Integration with pandas: Matplotlib works well with pandas, a popular data analysis library
in Python, allowing users to create plots directly from pandas Data Frame objects.

e) Matplotlib pyplot Interface: Matplotlib provides a MATLAB-like interface through its pyplot
module, which simplifies the process of creating plots by automatically creating and
managing figures and axes.
EDA
Exploratory Information Examination (EDA) is a crucial step in the information investigation
process, aiming to understand the characteristics of a dataset before applying formal factual
strategies or building models. The primary objectives of EDA are to identify designs,
connections, peculiarities, and knowledge within the information, guiding further examination
and navigation.

EDA with Python

Python offers a rich ecosystem of libraries and tools for conducting EDA efficiently and
effectively. Key libraries for EDA in Python include:

· Pandas: Pandas provide powerful data structures and functions for data
manipulation and analysis, making it well-suited for exploratory data analysis tasks
such as data cleaning, summarization, and transformation.
· NumPy: NumPy is a fundamental library for numerical computing in Python, offering
support for array operations, mathematical functions, and linear algebra. It
complements Pandas by providing efficient handling of numerical data and arrays.
· Matplotlib: Matplotlib is a versatile library for creating static, animated, and
interactive visualizations in Python. It offers a wide range of plotting functions and
customization options for exploring data visually.
· Seaborn: Seaborn is a statistical data visualization library built on top of Matplotlib,
offering high-level functions for creating informative and attractive statistical graphics.
It simplifies the process of creating complex visualizations and supports advanced
statistical plotting.
· Plotly: Plotly is a powerful library for creating interactive and web-based
visualizations in Python. It enables the creation of interactive dashboards, plots, and
charts that can be shared and explored online.

EDA and Data Visualization

Data visualization plays a crucial role in EDA by providing a visual representation of the data
that facilitates exploration and interpretation. Effective data visualization techniques include:

· Univariate Analysis: Visualizing individual variables using histograms, bar charts,


and box plots helps in understanding their distribution, central tendency, and
variability.
· Bivariate Analysis: Visualizing relationships between pairs of variables using scatter
plots, line plots, and heatmaps helps in exploring correlations, dependencies, and
patterns.
· Multivariate Analysis: Visualizing relationships between multiple variables using
multi-dimensional plots, such as parallel coordinates plots and pair plots, helps in
identifying complex interactions and patterns.
· Time Series Analysis: Visualizing time-series data using line plots, time-series
plots, and seasonal decomposition plots helps in understanding temporal patterns,
trends, and seasonality.
· Geospatial Analysis: Visualizing spatial data using maps, choropleth maps, and
heatmaps helps in exploring geographical patterns, distributions, and clusters.
PYTHON PROMPT & FUNCTION PERFORMED OF THE DATA SET AND
VISUALIZATION OF RAPE CASE IN INDIA

1. Data Manipulation
2. Renaming the column: -
3. Checking the Null Values: -

4. Checking the list of columns: -

5. Imported Matplotlib,pyplot and created bar plot: -


6. Filtering the rows with null values and extracting the data for scatter
plot: -
REPORT ANALYSIS

 Data Preparation and Exploration (Prompt Runned): -


The provided prompt includes loading the data into a Data Frame, renaming
columns, identifying null values, and creating a scatter plot to visualize the
relationship between 'Year' and 'Rape Cases Reported'.
The data appears to contain information about reported rape cases, including the
year, location, Area-Name, and various victim demographics.

 Result Analysis: -
Data Integrity: Renaming columns helps in making the data more
understandable and manageable. Identifying null values is crucial for data
integrity, ensuring that missing data is handled appropriately.
Data Visualization (Scatter Plot): The scatter plot depicts the distribution of
reported rape cases over time. It provides a visual representation of the trend and
any potential outliers or patterns in the data.
Insights from the Scatter Plot:
Trend Identification: The scatter plot helps in identifying any trends or patterns
in the reported rape cases over the years. For instance, if there's a consistent
increase or decrease in the number of cases over time.
Outlier Detection: Outliers, if present, can be identified from the scatter plot.
These outliers might indicate specific years with exceptionally high or low
reported cases, which could warrant further investigation.
Further Analysis: Additional analysis could involve exploring the distribution of
reported cases across different locations (Area_Name), examining the
demographics of the victims, and identifying any correlations or patterns.
Overall, the provided analysis gives a foundational understanding of the data and
sets the stage for further exploration and in-depth analysis to uncover insights
and trends related to reported rape cases.

CONCLUSION

In conclusion, the analysis of reported rape cases offers valuable insights into the
prevalence and distribution of such incidents over time. Renaming columns and
identifying null values ensure data integrity and facilitate clearer interpretation of
the dataset. The scatter plot visualization illustrates the distribution of reported
cases across different years, aiding in the identification of trends and patterns.
Further exploration could delve into geographical variations and victim
demographics, providing deeper insights for targeted interventions. The insights
gleaned from this analysis can inform policy decisions and resource allocation to
address and prevent sexual violence effectively. However, it's essential to
acknowledge the dataset's limitations, such as underreporting and reporting
inconsistencies, which may impact the comprehensiveness of the analysis.
Despite these limitations, continued monitoring and analysis are vital for
addressing the complex challenges associated with combating sexual violence
and promoting a safer society.
THANK YOU

You might also like