Python Project Data Analysis-1
Python Project Data Analysis-1
on
3 Methodology 8 – 10
4 Analysis 9 – 24
5 Conclusion 25
6 Learning Outcomes 26
7 Bibliography 27
8 Appendices-1 28
INTRODUCTION TO PYTHON
Python is a high-level, versatile, and dynamically typed programming language that has
gained immense popularity in the world of software development since its creation in the late
1980s by Guido van Rossum. Known for its simplicity, readability, and an extensive standard
library, Python is a top choice for both beginners and experienced programmers. In this
introduction, we'll explore the key aspects of Python, its history, features, and why it's a
preferred language for various applications.
History:
Python's journey began in the late 1980s when Guido van Rossum, a Dutch programmer,
started working on the language. He aimed to create a language that emphasized code
readability and allowed developers to express concepts in fewer lines of code. Python's name
is derived from the British comedy group Monty Python, showcasing its creator's sense of
humor.
Key Features:
1. Readability: Python's elegant and clean syntax makes it easy to read and write code. Its
use of indentation for block structures enforces a consistent and visually appealing code
style.
3. Large Standard Library: Python comes with a rich standard library that simplifies
many tasks, reducing the need for developers to write code from scratch. This library
includes modules for file handling, networking, regular expressions, and more.
6. Extensibility: Python can be easily extended through modules and packages written in
other languages like C or C++, allowing developers to integrate existing code seamlessly.
7. Interpreted: Python is an interpreted language, which means that you don't need to
compile your code before running it. This rapid development cycle is excellent for
prototyping and testing.
8. Dynamically Typed: Python uses dynamic typing, which means you don't need to
declare variable types explicitly. The interpreter infers the type at runtime, providing
flexibility but requiring careful attention to type-related issues.
Use Cases:
Python's versatility has led to its adoption in various domains:
1. Web Development: Frameworks like Django and Flask simplify web application
development, making Python a top choice for web developers.
2. Data Science: Python is widely used for data analysis and visualization. Libraries like
Pandas, NumPy, and Matplotlib facilitate data manipulation and exploration.
3. Machine Learning and AI: Python's extensive ecosystem includes libraries like
TensorFlow and PyTorch, enabling the development of cutting-edge machine learning
models and artificial intelligence applications.
4. Scientific Computing: Scientists and researchers use Python for tasks such as simulation,
modeling, and data analysis due to its powerful libraries like SciPy.
5. Automation: Python's simplicity and ease of use make it an ideal choice for automating
tasks and writing scripts.
The study of weather trends in India is of paramount importance due to its far-reaching
implications for agriculture, economy, public health, and environmental sustainability. The
Indian subcontinent is renowned for its climatic diversity, with varying weather patterns
across different regions and seasons. Understanding the historical trends, recent changes, and
potential future scenarios in Indian weather is crucial in addressing the challenges posed by
climate change.
Indian Monsoon
The Indian monsoon is the lifeblood of agriculture in the subcontinent. A critical component
of Indian weather, it is characterized by complex interactions between land and oceanic
systems. Researchers at institutions like the Indian Institute of Tropical Meteorology (IITM)
have been diligently studying monsoon dynamics. Recent findings reveal changing monsoon
behavior, including variations in onset, withdrawal, and intensity.
Conclusion
The study of Indian weather trends offers insights into a complex interplay of meteorological
factors, climate change, and socioeconomic consequences. It serves as a foundation for
evidence-based decision-making in various sectors. While much has been documented,
continued research is vital in comprehending the evolving dynamics of Indian weather and
charting a sustainable course for the future.
METHODOLOGY
Performing predictive and descriptive analysis using Python typically involves several steps
and libraries. Here's a general methodology for conducting these types of analysis:
Importing Libraries: importing the necessary libraries like Numpy: NumPy, short for
Numerical Python, is a fundamental library that provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical functions to operate on these
arrays. Its significance lies in its capacity to efficiently handle and analyze data, making it
indispensable for researchers in various domains.
Pandas: Pandas is a versatile library that provides data structures and functions to efficiently
handle and analyze structured data. Its significance lies in its capacity to simplify complex
data tasks, making it an essential asset for researchers across various domains.
Matplotlib: Matplotlib is a powerful library for creating static, animated, and interactive
visualizations in Python. Its significance lies in its capacity to produce publication-quality
graphics, enabling researchers to convey complex data findings with clarity and precision.
Seaborn: Seaborn is a high-level data visualization library built on top of Matplotlib. Its
significance lies in its ability to simplify the creation of complex statistical plots and provide
a visually appealing framework for representing data, facilitating data exploration, and
enhancing the interpretability of research findings.
Os: The os library is a standard Python module that provides a platform-independent way to
interact with the operating system, file systems, and directories. Its significance lies in its
capacity to simplify and automate tasks related to data management, file access, and system-
level operations, ensuring reproducibility and efficiency in research workflows.
1. Loaded the dataset using pd.read function: pd.read is a function within the Pandas
library, a widely-used tool in data analysis and manipulation. Its primary purpose is to
load data from various file formats, such as CSV, Excel, SQL databases, and more,
into Pandas DataFrames, which serve as the foundational data structures for
subsequent research analysis.
7. Visualized the relation between 2 variables using relplot: relplot offers a powerful
means to visualize relationships between variables through scatterplots, line plots, or
other relational visualizations. It is instrumental in uncovering patterns, trends, and
correlations within the data, allowing researchers to make informed decisions and
generate insights.
Data Analysis
In [1]: # Importing all the required python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
In [2]: print(os.listdir())
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 1/16
9/19/23, 12:02 AM python_assignment
Out[3]:
location_name region latitude longitude timezone last_updated_epoch last_updat
Madhya 29/08/20
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10
Madhya 29/08/20
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10
Madhya 29/08/20
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10
Madhya 29/08/20
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10
Madhya 29/08/20
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10
Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04
15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04
15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04
15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04
15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04
In [4]: data.head(5)
Out[4]:
location_name region latitude longitude timezone last_updated_epoch last_updated tem
Madhya 29/08/2023
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10:45
Madhya 29/08/2023
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10:45
Madhya 29/08/2023
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10:45
Madhya 29/08/2023
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10:45
Madhya 29/08/2023
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10:45
5 rows × 41 columns
In [5]: data.shape
(9840, 41)
Out[5]:
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 2/16
9/19/23, 12:02 AM python_assignment
In [6]: data.tail(5)
Out[6]:
location_name region latitude longitude timezone last_updated_epoch last_updat
Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04
15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04
15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04
15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04
15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04
5 rows × 41 columns
In [7]: data.isnull().sum()
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 3/16
9/19/23, 12:02 AM python_assignment
location_name 0
Out[7]:
region 0
latitude 0
longitude 0
timezone 0
last_updated_epoch 0
last_updated 0
temperature_celsius 0
temperature_fahrenheit 0
condition_text 0
wind_mph 0
wind_kph 0
wind_degree 0
wind_direction 0
pressure_mb 0
pressure_in 0
precip_mm 0
precip_in 0
humidity 0
cloud 0
feels_like_celsius 0
feels_like_fahrenheit 0
visibility_km 0
visibility_miles 0
uv_index 0
gust_mph 0
gust_kph 0
air_quality_Carbon_Monoxide 0
air_quality_Ozone 0
air_quality_Nitrogen_dioxide 0
air_quality_Sulphur_dioxide 0
air_quality_PM2.5 0
air_quality_PM10 0
air_quality_us-epa-index 0
air_quality_gb-defra-index 0
sunrise 0
sunset 0
moonrise 0
moonset 0
moon_phase 0
moon_illumination 0
dtype: int64
In [8]: data.describe()
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 4/16
9/19/23, 12:02 AM python_assignment
Out[8]:
latitude longitude last_updated_epoch temperature_celsius temperature_fahrenheit
8 rows × 30 columns
In [9]: data.info()
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 5/16
9/19/23, 12:02 AM python_assignment
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9840 entries, 0 to 9839
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location_name 9840 non-null object
1 region 9840 non-null object
2 latitude 9840 non-null float64
3 longitude 9840 non-null float64
4 timezone 9840 non-null object
5 last_updated_epoch 9840 non-null int64
6 last_updated 9840 non-null object
7 temperature_celsius 9840 non-null float64
8 temperature_fahrenheit 9840 non-null float64
9 condition_text 9840 non-null object
10 wind_mph 9840 non-null float64
11 wind_kph 9840 non-null float64
12 wind_degree 9840 non-null int64
13 wind_direction 9840 non-null object
14 pressure_mb 9840 non-null int64
15 pressure_in 9840 non-null float64
16 precip_mm 9840 non-null float64
17 precip_in 9840 non-null float64
18 humidity 9840 non-null int64
19 cloud 9840 non-null int64
20 feels_like_celsius 9840 non-null float64
21 feels_like_fahrenheit 9840 non-null float64
22 visibility_km 9840 non-null float64
23 visibility_miles 9840 non-null int64
24 uv_index 9840 non-null int64
25 gust_mph 9840 non-null float64
26 gust_kph 9840 non-null float64
27 air_quality_Carbon_Monoxide 9840 non-null float64
28 air_quality_Ozone 9840 non-null float64
29 air_quality_Nitrogen_dioxide 9840 non-null float64
30 air_quality_Sulphur_dioxide 9840 non-null float64
31 air_quality_PM2.5 9840 non-null float64
32 air_quality_PM10 9840 non-null float64
33 air_quality_us-epa-index 9840 non-null int64
34 air_quality_gb-defra-index 9840 non-null int64
35 sunrise 9840 non-null object
36 sunset 9840 non-null object
37 moonrise 9840 non-null object
38 moonset 9840 non-null object
39 moon_phase 9840 non-null object
40 moon_illumination 9840 non-null int64
dtypes: float64(20), int64(10), object(11)
memory usage: 3.1+ MB
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 6/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 7/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 8/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 9/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 10/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 11/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 12/16
9/19/23, 12:02 AM python_assignment
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 13/16
9/19/23, 12:02 AM python_assignment
In [15]: sns.relplot(x="uv_index",y="region",data=data)
<seaborn.axisgrid.FacetGrid at 0x2114b347d90>
Out[15]:
In [16]: sns.relplot(x="temperature_celsius",y="region",data=data)
<seaborn.axisgrid.FacetGrid at 0x2114a59f460>
Out[16]:
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 14/16
9/19/23, 12:02 AM python_assignment
In [20]: X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.3,random_stat
In [22]: regression.fit(X_train,y_train)
Out[22]: ▾ LinearRegression
LinearRegression()
In [24]: predict
In [25]: regression.score(X_test,y_test)
0.6970810366341045
Out[25]:
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 15/16
9/19/23, 12:02 AM python_assignment
In [26]: sns.pairplot(data)
<seaborn.axisgrid.PairGrid at 0x2114c528dc0>
Out[26]:
localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 16/16
CONCLUSION
Data Understanding: Through exploratory data analysis (EDA), we gained a thorough
understanding of the dataset's features and characteristics. We visualized the data to identify
patterns and relationships between variables.
Data Preprocessing:
We handled missing data and encoded categorical variables, ensuring that the data
was ready for modeling.
Predictive Analysis:
The model's performance was evaluated using metrics like accuracy, precision, recall,
and F1-score, providing a quantitative assessment of its effectiveness.
Descriptive Analysis:
This insight is crucial for understanding the factors that contribute to breast cancer
diagnosis.
LEARNINGS
References:
2. Python notes in Google classroom by Ms. Shilpi Yadav, Assistant Professor, Jagannath
International Management School, Kalkaji.
These references cover various aspects of data analysis, machine learning, and Python
programming, which are relevant to the learnings and insights from the analysis of the
dataset.