0% found this document useful (0 votes)

2K views27 pages

Python Project Data Analysis-1

This document provides an introduction and overview of a Python project on analyzing Indian weather trends. It discusses the importance of studying India's climate patterns and changes over time. The methodology section outlines typical steps for predictive and descriptive analysis using Python, including importing key libraries like NumPy and Pandas for working with data. The document contains sections on the project introduction, literature review on Indian weather topics, proposed methodology, planned analysis, and expected learning outcomes.

Uploaded by

ripunjay080808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views27 pages

Python Project Data Analysis-1

Uploaded by

ripunjay080808

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

PYTHON PROJECT

ANALYSIS OF INDIAN WEATHER USING PYTHON

Post Graduate Diploma in Management

(PGDM)

Submitted to: Ms. Shilpi Yadav Submitted by: Ronit Saini

Assistant Professor Roll No.: 92
Batch -2023-25

Jagannath International Management School

MOR, Pocket-105, Kalkaji, New Delhi-110019

(Approved by All India Council for Technical Education (AICTE) and

Accredited by NBA SAQS and NAAC)
CONTENTS
Sr. No Topic Page No

1 Introduction – Python 3-5

2 Literature Review 6-7

3 Methodology 8 – 10

4 Analysis 9 – 24

5 Conclusion 25

6 Learning Outcomes 26

7 Bibliography 27

8 Appendices-1 28
INTRODUCTION TO PYTHON
Python is a high-level, versatile, and dynamically typed programming language that has
gained immense popularity in the world of software development since its creation in the late
1980s by Guido van Rossum. Known for its simplicity, readability, and an extensive standard
library, Python is a top choice for both beginners and experienced programmers. In this
introduction, we'll explore the key aspects of Python, its history, features, and why it's a
preferred language for various applications.

History:
Python's journey began in the late 1980s when Guido van Rossum, a Dutch programmer,
started working on the language. He aimed to create a language that emphasized code
readability and allowed developers to express concepts in fewer lines of code. Python's name
is derived from the British comedy group Monty Python, showcasing its creator's sense of
humor.

Key Features:
1. Readability: Python's elegant and clean syntax makes it easy to read and write code. Its
use of indentation for block structures enforces a consistent and visually appealing code
style.

2. Versatility: Python is a versatile language, suitable for various applications, including

web development, data analysis, machine learning, scientific computing, and more.

3. Large Standard Library: Python comes with a rich standard library that simplifies
many tasks, reducing the need for developers to write code from scratch. This library
includes modules for file handling, networking, regular expressions, and more.

4. Cross-Platform: Python is available for multiple platforms, including Windows, macOS,

and various Unix-based systems, making it a platform-independent language.

5. Open Source: Python's open-source nature encourages collaboration and community-

driven development. It also means that Python is free to use and distribute.

6. Extensibility: Python can be easily extended through modules and packages written in
other languages like C or C++, allowing developers to integrate existing code seamlessly.

7. Interpreted: Python is an interpreted language, which means that you don't need to
compile your code before running it. This rapid development cycle is excellent for
prototyping and testing.

8. Dynamically Typed: Python uses dynamic typing, which means you don't need to
declare variable types explicitly. The interpreter infers the type at runtime, providing
flexibility but requiring careful attention to type-related issues.
Use Cases:
Python's versatility has led to its adoption in various domains:

1. Web Development: Frameworks like Django and Flask simplify web application
development, making Python a top choice for web developers.

2. Data Science: Python is widely used for data analysis and visualization. Libraries like
Pandas, NumPy, and Matplotlib facilitate data manipulation and exploration.

3. Machine Learning and AI: Python's extensive ecosystem includes libraries like
TensorFlow and PyTorch, enabling the development of cutting-edge machine learning
models and artificial intelligence applications.

4. Scientific Computing: Scientists and researchers use Python for tasks such as simulation,
modeling, and data analysis due to its powerful libraries like SciPy.

5. Automation: Python's simplicity and ease of use make it an ideal choice for automating
tasks and writing scripts.

Community and Ecosystem:

Python has a vibrant and supportive community. Python Package Index (PyPI) hosts
thousands of third-party packages and libraries, expanding Python's capabilities and making
it suitable for nearly any task.
In conclusion, Python's readability, versatility, and extensive ecosystem have made it one of
the most popular programming languages globally. Whether you're a beginner or a seasoned
developer, Python's simplicity and power make it a valuable tool for a wide range of
applications, from web development to data science and beyond. Its open-source nature and
strong community ensure that Python will continue to evolve and remain relevant in the ever-
changing world of technology.
LITERATURE REVIEW

The study of weather trends in India is of paramount importance due to its far-reaching
implications for agriculture, economy, public health, and environmental sustainability. The
Indian subcontinent is renowned for its climatic diversity, with varying weather patterns
across different regions and seasons. Understanding the historical trends, recent changes, and
potential future scenarios in Indian weather is crucial in addressing the challenges posed by
climate change.

Historical Weather Trends in India

India's meteorological history is rich and well-documented, thanks to organizations like the
India Meteorological Department (IMD). Historical data reveals notable patterns, such as the
annual monsoon season and the oscillations of El Niño and La Niña. Decades of records
show temperature fluctuations, shifts in precipitation, and instances of extreme weather
events that have shaped India's climate landscape.

Climate Change and Variability

India, like the rest of the world, faces the undeniable impact of climate change. Rising
temperatures, erratic rainfall, and altered monsoon patterns have become evident in recent
decades. These shifts have consequences for agriculture, water resources, and ecosystems.
The Intergovernmental Panel on Climate Change (IPCC) reports have highlighted the global
nature of climate change and underscored the need for mitigation and adaptation strategies.

Indian Monsoon
The Indian monsoon is the lifeblood of agriculture in the subcontinent. A critical component
of Indian weather, it is characterized by complex interactions between land and oceanic
systems. Researchers at institutions like the Indian Institute of Tropical Meteorology (IITM)
have been diligently studying monsoon dynamics. Recent findings reveal changing monsoon
behavior, including variations in onset, withdrawal, and intensity.

Extreme Weather Events

India is no stranger to extreme weather events. Cyclones, heatwaves, and floods have
impacted various regions, causing loss of life and property. Instances like Cyclone Fani in
2019 and the devastating Chennai floods in 2015 underscore the significance of studying
extreme weather events and improving preparedness and resilience.
Impact on Agriculture and Economy
Agriculture, a major contributor to India's economy, is highly sensitive to weather conditions.
Changing weather patterns influence crop yields, food security, and livelihoods. Moreover,
the Indian economy as a whole is intertwined with weather, affecting sectors such as energy,
water resources, and tourism. The economic ramifications of weather trends necessitate
strategic planning and adaptation measures.

Public Health and Vulnerability

Weather trends have direct implications for public health. Rising temperatures, air quality
deterioration, and the spread of vector-borne diseases are health concerns that cannot be
ignored. Vulnerable populations, including urban communities and marginalized groups, face
disproportionate risks. Public health studies and government initiatives are pivotal in
addressing these challenges.

Adaption and Mitigation Strategies

In response to changing weather patterns, India has initiated numerous strategies. These
include climate-resilient agriculture practices, investments in renewable energy, and disaster
preparedness programs. Government policies and local community efforts are critical
components of India's approach to mitigating the adverse effects of changing weather.

Conclusion
The study of Indian weather trends offers insights into a complex interplay of meteorological
factors, climate change, and socioeconomic consequences. It serves as a foundation for
evidence-based decision-making in various sectors. While much has been documented,
continued research is vital in comprehending the evolving dynamics of Indian weather and
charting a sustainable course for the future.
METHODOLOGY

Performing predictive and descriptive analysis using Python typically involves several steps
and libraries. Here's a general methodology for conducting these types of analysis:

Importing Libraries: importing the necessary libraries like Numpy: NumPy, short for
Numerical Python, is a fundamental library that provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical functions to operate on these
arrays. Its significance lies in its capacity to efficiently handle and analyze data, making it
indispensable for researchers in various domains.

Pandas: Pandas is a versatile library that provides data structures and functions to efficiently
handle and analyze structured data. Its significance lies in its capacity to simplify complex
data tasks, making it an essential asset for researchers across various domains.

Matplotlib: Matplotlib is a powerful library for creating static, animated, and interactive
visualizations in Python. Its significance lies in its capacity to produce publication-quality
graphics, enabling researchers to convey complex data findings with clarity and precision.

Seaborn: Seaborn is a high-level data visualization library built on top of Matplotlib. Its
significance lies in its ability to simplify the creation of complex statistical plots and provide
a visually appealing framework for representing data, facilitating data exploration, and
enhancing the interpretability of research findings.

Os: The os library is a standard Python module that provides a platform-independent way to
interact with the operating system, file systems, and directories. Its significance lies in its
capacity to simplify and automate tasks related to data management, file access, and system-
level operations, ensuring reproducibility and efficiency in research workflows.

Loading the dataset:

1. Loaded the dataset using pd.read function: pd.read is a function within the Pandas
library, a widely-used tool in data analysis and manipulation. Its primary purpose is to
load data from various file formats, such as CSV, Excel, SQL databases, and more,
into Pandas DataFrames, which serve as the foundational data structures for
subsequent research analysis.

2. Used ISNULL function: The isnull function is a fundamental component of the

Pandas library in Python. Its significance lies in its ability to detect missing values
within datasets, facilitating data quality assessment, and enabling researchers to make
informed decisions about handling missing data.

3. Used plt.plot: plt.plot is a fundamental function in Matplotlib, a powerful Python

library for data visualization. Its significance lies in its ability to create a variety of
plots, from simple line charts to complex, customized visualizations. Researchers use
it to convey data findings effectively and enhance the interpretability of research
results.
4. Performing basic functions like: data.shape, data.columns, data.head(), data.tail(),
value_counts().

5. Performing basic statistical functions using data.describe: The data.describe()

method, a pivotal feature within the Pandas library, serves as a fundamental
component in the process of quantitative data analysis. It provides a succinct yet
comprehensive summary of key statistical measures that are essential for
understanding and interpreting the characteristics of a dataset.

6. Used Correlation: Correlation analysis assesses the degree and direction of

association between two or more variables. Commonly used for linear relationships,
quantifies the strength and direction of a linear relationship between two continuous
variables.

7. Visualized the relation between 2 variables using relplot: relplot offers a powerful
means to visualize relationships between variables through scatterplots, line plots, or
other relational visualizations. It is instrumental in uncovering patterns, trends, and
correlations within the data, allowing researchers to make informed decisions and
generate insights.

8. Used scikit.learn library for regression analysis and prediction: Scikit-learn

(sklearn) is a comprehensive library that offers a wide array of tools for data analysis
and machine learning. Its robust and well-documented functionality empowers
researchers to employ state-of-the-art machine learning algorithms, assess model
performance, and extract meaningful insights from complex datasets.

9. Used regplot to show relation between 2 variables: regplot is a versatile function

that enables the creation of scatterplots with overlaid regression models. It provides
researchers with a visually intuitive means of exploring the relationships between two
continuous variables, elucidating potential patterns, trends, and associations within the
data.

10. Used various other visualization functions like:

a. CountPlot: countplot is a versatile function that simplifies the creation of bar

plots to visualize the counts or frequencies of categorical data. Its significance
lies in its ability to provide researchers with an effective means to explore and
communicate patterns and relationships within categorical variables.

b. BoxPlot: Also known as box-and-whisker plots, are essential tools for

summarizing the distribution of numerical data. They offer a concise
representation of key statistics, including the median, quartiles, and potential
outliers..

c. PairPlot: pairplot is a versatile function that enables researchers to create a

matrix of scatterplots, making it possible to visualize pairwise relationships
between multiple variables in a dataset. Its significance lies in its capacity to
reveal complex interdependencies and patterns facilitating data exploration
and hypothesis generation.
9/19/23, 12:02 AM python_assignment

Data Analysis
In [1]: # Importing all the required python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [2]: print(os.listdir())

['.ipynb_checkpoints', 'afm-cw.xlsx', 'afm-q3.xlsx', 'Agenda and Minutes.pdf', 'ag

enda&minutes.docx', 'agenda&minutes.pdf', 'agendas.xlsx', 'coursera.lnk', 'DC.cs
v', 'desktop.ini', 'FINAL ROUND.xlsx', 'forage.lnk', 'IndianWeather2.csv', 'Indian
WeatherRepository.csv', 'ISM-Epayment.docx', 'ISM-Epayment.pptx', 'MarketingSWOTbo
x.xlsx', 'MCom-MoneyMindset.pptx', 'MCOMM_Format_and_Practical_2.pdf', 'ME-graph.x
lsx', 'meco-assignment1.xlsx', 'ME_Subsidy.docx', 'ME_Subsidy.pdf', 'MM_swotBox.pp
tx', 'Nitin Negi_AFM_73PGDMB2023.xlsx', 'Nitin Negi_ME_073.pdf', 'Nitin Negi_ME_07
3_tomato.docx', 'Nitin Negi_ME_073_tomato.pdf', 'Nitin Negi_Python.pdf', 'Nitin Ne
gi_QTM_PGDM-B.pdf', 'notes_Mcom.docx', 'NSE SMART.exe - Shortcut.lnk', 'oahb-quiz.
docx', 'OAHB_PRESENTATION[1].pptx', 'OB-Group5-CaseStudy.docx', 'OB-Group5-CaseStu
dy.pdf', 'OB_PPT_Organizational_Structure.pptx', 'python_assignment.ipynb', 'pytho
n_assignment2.pdf', 'Python_Project.docx', 'qtm-cw.xlsx', 'supermarkt_sales.csv',
'swot.pdf', 'TEMPLATE.xlsx', 'tips.csv', 'udemy.lnk', 'Visual Studio Code.lnk', '~
$thon_Project.docx']

In [3]: data = pd.read_csv("IndianWeatherRepository.csv")

data

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 1/16
9/19/23, 12:02 AM python_assignment

Out[3]:
location_name region latitude longitude timezone last_updated_epoch last_updat

Madhya 29/08/20
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10

... ... ... ... ... ... ...

Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04

15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04

15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04

15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04

15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04

9840 rows × 41 columns

 

In [4]: data.head(5)

Out[4]:
location_name region latitude longitude timezone last_updated_epoch last_updated tem

Madhya 29/08/2023
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10:45

5 rows × 41 columns

 

In [5]: data.shape

(9840, 41)
Out[5]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 2/16
9/19/23, 12:02 AM python_assignment

In [6]: data.tail(5)

Out[6]:
location_name region latitude longitude timezone last_updated_epoch last_updat

Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04

15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04

15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04

15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04

15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04

5 rows × 41 columns

 

In [7]: data.isnull().sum()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 3/16
9/19/23, 12:02 AM python_assignment

location_name 0
Out[7]:
region 0
latitude 0
longitude 0
timezone 0
last_updated_epoch 0
last_updated 0
temperature_celsius 0
temperature_fahrenheit 0
condition_text 0
wind_mph 0
wind_kph 0
wind_degree 0
wind_direction 0
pressure_mb 0
pressure_in 0
precip_mm 0
precip_in 0
humidity 0
cloud 0
feels_like_celsius 0
feels_like_fahrenheit 0
visibility_km 0
visibility_miles 0
uv_index 0
gust_mph 0
gust_kph 0
air_quality_Carbon_Monoxide 0
air_quality_Ozone 0
air_quality_Nitrogen_dioxide 0
air_quality_Sulphur_dioxide 0
air_quality_PM2.5 0
air_quality_PM10 0
air_quality_us-epa-index 0
air_quality_gb-defra-index 0
sunrise 0
sunset 0
moonrise 0
moonset 0
moon_phase 0
moon_illumination 0
dtype: int64

In [8]: data.describe()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 4/16
9/19/23, 12:02 AM python_assignment

Out[8]:
latitude longitude last_updated_epoch temperature_celsius temperature_fahrenheit

count 9840.000000 9840.000000 9.840000e+03 9840.000000 9840.000000

mean 23.106256 80.229436 1.694004e+09 25.225061 77.405234

std 5.797599 5.761152 4.487798e+05 3.838239 6.909241

min 8.080000 68.970000 1.693286e+09 -2.600000 27.300000

25% 20.270000 76.070000 1.693612e+09 23.600000 74.500000

50% 23.970000 78.670000 1.694041e+09 25.600000 78.100000

75% 26.772500 83.900000 1.694387e+09 27.300000 81.100000

max 34.570000 95.800000 1.694732e+09 38.300000 100.900000

8 rows × 30 columns

 

In [9]: data.info()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 5/16
9/19/23, 12:02 AM python_assignment

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9840 entries, 0 to 9839
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location_name 9840 non-null object
1 region 9840 non-null object
2 latitude 9840 non-null float64
3 longitude 9840 non-null float64
4 timezone 9840 non-null object
5 last_updated_epoch 9840 non-null int64
6 last_updated 9840 non-null object
7 temperature_celsius 9840 non-null float64
8 temperature_fahrenheit 9840 non-null float64
9 condition_text 9840 non-null object
10 wind_mph 9840 non-null float64
11 wind_kph 9840 non-null float64
12 wind_degree 9840 non-null int64
13 wind_direction 9840 non-null object
14 pressure_mb 9840 non-null int64
15 pressure_in 9840 non-null float64
16 precip_mm 9840 non-null float64
17 precip_in 9840 non-null float64
18 humidity 9840 non-null int64
19 cloud 9840 non-null int64
20 feels_like_celsius 9840 non-null float64
21 feels_like_fahrenheit 9840 non-null float64
22 visibility_km 9840 non-null float64
23 visibility_miles 9840 non-null int64
24 uv_index 9840 non-null int64
25 gust_mph 9840 non-null float64
26 gust_kph 9840 non-null float64
27 air_quality_Carbon_Monoxide 9840 non-null float64
28 air_quality_Ozone 9840 non-null float64
29 air_quality_Nitrogen_dioxide 9840 non-null float64
30 air_quality_Sulphur_dioxide 9840 non-null float64
31 air_quality_PM2.5 9840 non-null float64
32 air_quality_PM10 9840 non-null float64
33 air_quality_us-epa-index 9840 non-null int64
34 air_quality_gb-defra-index 9840 non-null int64
35 sunrise 9840 non-null object
36 sunset 9840 non-null object
37 moonrise 9840 non-null object
38 moonset 9840 non-null object
39 moon_phase 9840 non-null object
40 moon_illumination 9840 non-null int64
dtypes: float64(20), int64(10), object(11)
memory usage: 3.1+ MB

In [10]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']

for column in columns:
plt.figure(figsize=(10, 6))
data[column].value_counts().plot(kind='bar')
plt.xlabel(column)
plt.ylabel('Frequency')
plt.title(f'Distribution of {column}')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 6/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 7/16
9/19/23, 12:02 AM python_assignment

In [11]: columns = ['temperature_celsius', 'humidity', 'wind_mph']

for column in columns:
plt.figure(figsize=(10, 6))
sns.boxplot(data=data, y=column,color='g')
plt.ylabel(column)
plt.title(f'Box Plot of {column}')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 8/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 9/16
9/19/23, 12:02 AM python_assignment

In [12]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']

for column in columns:
plt.figure(figsize=(10, 6))
sns.countplot(data=data, x=column)
plt.xlabel(column)
plt.ylabel('Count')
plt.title(f'Count Plot of {column}')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 10/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 11/16
9/19/23, 12:02 AM python_assignment

In [13]: column_pairs = [('temperature_celsius', 'humidity'), ('wind_mph', 'pressure_mb')]

for x_column, y_column in column_pairs:
plt.figure(figsize=(10, 6))
plt.scatter(data[x_column], data[y_column])
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.title(f'Scatter Plot: {x_column} vs {y_column}')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 12/16
9/19/23, 12:02 AM python_assignment

In [14]: columns = ['temperature_celsius', 'humidity', 'wind_mph', 'pressure_mb']

correlation_matrix = data[columns].corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 13/16
9/19/23, 12:02 AM python_assignment

In [15]: sns.relplot(x="uv_index",y="region",data=data)

<seaborn.axisgrid.FacetGrid at 0x2114b347d90>
Out[15]:

In [16]: sns.relplot(x="temperature_celsius",y="region",data=data)

<seaborn.axisgrid.FacetGrid at 0x2114a59f460>
Out[16]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 14/16
9/19/23, 12:02 AM python_assignment

In [17]: data_reg = pd.read_csv('IndianWeather2.csv')

In [18]: from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

In [19]: train = data_reg.drop(['humidity','wind_kph','cloud'],axis=1)

test = data_reg ['humidity']

In [20]: X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.3,random_stat

In [21]: regression = LinearRegression()

In [22]: regression.fit(X_train,y_train)

Out[22]: ▾ LinearRegression

LinearRegression()

In [23]: predict = regression.predict(X_test)

In [24]: predict

array([66.1037563 , 76.2523053 , 56.36533543, ..., 72.85677652,

Out[24]:
73.55689017, 83.70177489])

In [25]: regression.score(X_test,y_test)

0.6970810366341045
Out[25]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 15/16
9/19/23, 12:02 AM python_assignment

In [26]: sns.pairplot(data)

<seaborn.axisgrid.PairGrid at 0x2114c528dc0>
Out[26]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 16/16
CONCLUSION
Data Understanding: Through exploratory data analysis (EDA), we gained a thorough
understanding of the dataset's features and characteristics. We visualized the data to identify
patterns and relationships between variables.

Data Preprocessing:

 We handled missing data and encoded categorical variables, ensuring that the data
was ready for modeling.

 Standardization or normalization of features was performed to prevent any undue

influence of variables with different scales.

Predictive Analysis:

 We built a predictive model, using linear and logistic regression as an example, to

classify breast cancer diagnoses as either malignant or benign.

 The model's performance was evaluated using metrics like accuracy, precision, recall,
and F1-score, providing a quantitative assessment of its effectiveness.

Descriptive Analysis:

 Model interpretation techniques, such as feature importance analysis, helped identify

the most influential features in making predictions.

 This insight is crucial for understanding the factors that contribute to breast cancer
diagnosis.
LEARNINGS

 Data pre-processing is a crucial step. Handling missing data, encoding categorical

variables, and splitting the data into training and testing sets are essential tasks to
ensure the data is suitable for analysis.
 EDA helps you understand the dataset's characteristics, such as the distribution of
features and their relationships.
 Visualizations like pair plots and correlation heatmaps can reveal patterns and
potential insights.
 Effective data visualization is essential for communicating findings. In this analysis,
visualizations were used to display relationships between features and the distribution
of data points for each diagnosis type (malignant and benign).
 Building predictive models, such as logistic regression, is a common approach to
classify breast cancer diagnoses.
 Model selection and hyperparameter tuning play a role in improving model
performance.
 Model interpretability techniques like feature importance analysis can help identify
which features are most influential in making predictions.
 Understanding why a model makes certain predictions is important, especially in
medical applications like cancer diagnosis.
 Creating a well-documented and well-organized Jupyter Notebook is essential for
sharing findings and insights with others.
 Clear explanations of the analysis process, results, and visualizations make the
analysis more accessible to non-technical stakeholders.
 Having domain knowledge about Indian weather diversification, its characteristics,
and forecasting methods can aid in feature selection, model interpretation, and making
relevant predictions.

In conclusion, conducting predictive and descriptive analysis on Indian weather pattern

across country, dataset in a Jupyter Notebook provides a hands-on opportunity to apply data
science techniques to a real-world prediction. It offers valuable insights not only into the
dataset itself but also into the broader aspects of data preprocessing, modeling, evaluation,
and ethical considerations when dealing with environmental data. This type of analysis can
contribute to improving predictions accuracy and forecasting in the field of environmental
science.
BIBLIOGRAPHY

References:

1. Dataset from Kaggle https://www.kaggle.com/datasets/nelgiriyewithana/indian-weather-

repository-daily-snapshot by Nidula Elgiriyewithana.

2. Python notes in Google classroom by Ms. Shilpi Yadav, Assistant Professor, Jagannath
International Management School, Kalkaji.

These references cover various aspects of data analysis, machine learning, and Python
programming, which are relevant to the learnings and insights from the analysis of the
dataset.

Ghana Road Design Guide 2023 Revised
No ratings yet
Ghana Road Design Guide 2023 Revised
1,771 pages
THEORY FILE - Information Security (6th Sem) !!!
No ratings yet
THEORY FILE - Information Security (6th Sem) !!!
46 pages
iSmart800+Pro+User+Manual++V 1 00
100% (2)
iSmart800+Pro+User+Manual++V 1 00
50 pages
Final
No ratings yet
Final
16 pages
Syllabus-Topics in Computer Vision
100% (1)
Syllabus-Topics in Computer Vision
5 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Report On User Defined Functions in Python
No ratings yet
Report On User Defined Functions in Python
3 pages
Format - Summer Internship Report
No ratings yet
Format - Summer Internship Report
6 pages
MCA11: Mathematical Foundation For Computer Science 1: Example 2.13
100% (1)
MCA11: Mathematical Foundation For Computer Science 1: Example 2.13
3 pages
E-Travel Booking Site: Submitted As A Part of
0% (1)
E-Travel Booking Site: Submitted As A Part of
23 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
Python P
100% (1)
Python P
11 pages
A Mini Project Report On: "Big Mart Sales Prediction" by
67% (3)
A Mini Project Report On: "Big Mart Sales Prediction" by
23 pages
Dbms Lab Manual RGPV
75% (4)
Dbms Lab Manual RGPV
38 pages
Python Report PDF
No ratings yet
Python Report PDF
41 pages
Dbms Unitwise Questions
No ratings yet
Dbms Unitwise Questions
34 pages
SRM Mess Management System
No ratings yet
SRM Mess Management System
18 pages
Ooad Lab Manual
No ratings yet
Ooad Lab Manual
83 pages
CHANDIGARH UNIVERSITY - Final Project Report 1
100% (1)
CHANDIGARH UNIVERSITY - Final Project Report 1
12 pages
Infosys Campus Registration Guide
No ratings yet
Infosys Campus Registration Guide
7 pages
Online Voting System For College Elections: Project Report (Paper-410)
No ratings yet
Online Voting System For College Elections: Project Report (Paper-410)
26 pages
Base Sas - Course Syllabus
No ratings yet
Base Sas - Course Syllabus
5 pages
A Minor Project Report Team4 On
No ratings yet
A Minor Project Report Team4 On
19 pages
Fantasy Cricket Game Using Python
No ratings yet
Fantasy Cricket Game Using Python
34 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
E Authentication Project Document
No ratings yet
E Authentication Project Document
76 pages
All DBMS MCQ Answers
No ratings yet
All DBMS MCQ Answers
38 pages
Python in One Shot
No ratings yet
Python in One Shot
10 pages
EC8093-Digital Image Processing
50% (2)
EC8093-Digital Image Processing
11 pages
Session 02
No ratings yet
Session 02
16 pages
Unit I Se
No ratings yet
Unit I Se
38 pages
It Project Based On Python Programming
0% (1)
It Project Based On Python Programming
57 pages
Spring CT Questions & Answers
No ratings yet
Spring CT Questions & Answers
6 pages
Managing State: 5.1 The Problem of State in Web Applications
No ratings yet
Managing State: 5.1 The Problem of State in Web Applications
17 pages
Unit 1.2 Stepwise Project Planning
No ratings yet
Unit 1.2 Stepwise Project Planning
27 pages
Co Din G & de Cod Ing: Here Starts Learning
No ratings yet
Co Din G & de Cod Ing: Here Starts Learning
28 pages
SPM Unit 3 Notes
No ratings yet
SPM Unit 3 Notes
13 pages
RMMMN Plan
No ratings yet
RMMMN Plan
20 pages
Case Study (Analysis of Algorithm
No ratings yet
Case Study (Analysis of Algorithm
14 pages
Roadmap Dsa
No ratings yet
Roadmap Dsa
4 pages
TCS NQT Practice Paper 5
No ratings yet
TCS NQT Practice Paper 5
100 pages
HN DAA 15CS43 LectureNotes 1
20% (5)
HN DAA 15CS43 LectureNotes 1
28 pages
Arena Case Theorey
No ratings yet
Arena Case Theorey
19 pages
Cricket Score Management Mini Project PDF Free
0% (1)
Cricket Score Management Mini Project PDF Free
21 pages
Mca Tancet Cutoff 2022
No ratings yet
Mca Tancet Cutoff 2022
9 pages
Co-Po & Pso SPM - 2023
No ratings yet
Co-Po & Pso SPM - 2023
16 pages
ESE (Book) PDF
No ratings yet
ESE (Book) PDF
486 pages
Submitted By: Under Supervisinon Of:: Guru Ghasidas University, Chhatisgarh S E S S I O N: 2 0 2 0 - 2 0 2 2
No ratings yet
Submitted By: Under Supervisinon Of:: Guru Ghasidas University, Chhatisgarh S E S S I O N: 2 0 2 0 - 2 0 2 2
19 pages
SPM Oldqnpapers
No ratings yet
SPM Oldqnpapers
6 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Summer Training Etp: LPU - Object Oriented Programming Using Java - Internship
No ratings yet
Summer Training Etp: LPU - Object Oriented Programming Using Java - Internship
8 pages
Class TestQuestion
No ratings yet
Class TestQuestion
13 pages
Operational Analytics and Investigating Metric Spike
No ratings yet
Operational Analytics and Investigating Metric Spike
9 pages
Project Report On DBMS Project
No ratings yet
Project Report On DBMS Project
22 pages
Student Management Project Report Tkinter Mysql
No ratings yet
Student Management Project Report Tkinter Mysql
31 pages
19CS3002-Database Management Systems Laboratory: Model Practical Examination
No ratings yet
19CS3002-Database Management Systems Laboratory: Model Practical Examination
18 pages
Python Project On Cafe Billing System
No ratings yet
Python Project On Cafe Billing System
29 pages
Accenture Cracker by Pappu Career Guide
No ratings yet
Accenture Cracker by Pappu Career Guide
46 pages
Python Teaching Notes-Exception Handling PDF
No ratings yet
Python Teaching Notes-Exception Handling PDF
9 pages
Introduction to Scientific Programming with Python
From Everand
Introduction to Scientific Programming with Python
Pankaj Jayaraman
No ratings yet
Monsoon Weather Prediction
No ratings yet
Monsoon Weather Prediction
27 pages
JTB RJJ 000 I0 DTS 231 00506 50 1 2
No ratings yet
JTB RJJ 000 I0 DTS 231 00506 50 1 2
14 pages
NetworkHD CertfiedSwitches
No ratings yet
NetworkHD CertfiedSwitches
4 pages
Fire Friend and Foe
No ratings yet
Fire Friend and Foe
18 pages
The Strategic Role of Generation Z in Saving Kenya
No ratings yet
The Strategic Role of Generation Z in Saving Kenya
5 pages
Annual Calendar
No ratings yet
Annual Calendar
14 pages
ZX 500 400 300 - User-V5
No ratings yet
ZX 500 400 300 - User-V5
2 pages
Adgenpsf
No ratings yet
Adgenpsf
63 pages
Ibm Annual Report 2024
No ratings yet
Ibm Annual Report 2024
128 pages
CS3361 Lab Manual
No ratings yet
CS3361 Lab Manual
2 pages
Final Report Fyp
100% (2)
Final Report Fyp
68 pages
NAS 57163364635 10 Acknowledgement Slip
No ratings yet
NAS 57163364635 10 Acknowledgement Slip
1 page
Tracking - Sundarban Courier
No ratings yet
Tracking - Sundarban Courier
1 page
Spain Tin
No ratings yet
Spain Tin
5 pages
A Dynamic-Adjusting Threshold-Voltage Scheme For Finfets Low Power Designs
No ratings yet
A Dynamic-Adjusting Threshold-Voltage Scheme For Finfets Low Power Designs
4 pages
Tigo Usa - Toolkit Trade - 2019 - Printing - Policies
No ratings yet
Tigo Usa - Toolkit Trade - 2019 - Printing - Policies
87 pages
Computer Science Revision
No ratings yet
Computer Science Revision
73 pages
Electrical System Design Design Example
No ratings yet
Electrical System Design Design Example
2 pages
8421 Export Sample
No ratings yet
8421 Export Sample
13 pages
K61 Internal-Photos-Part-2-4658720
No ratings yet
K61 Internal-Photos-Part-2-4658720
40 pages
Resume - Pratyusha Chakraborty - Data Science & Analyst
No ratings yet
Resume - Pratyusha Chakraborty - Data Science & Analyst
3 pages
Topic 2 The Variable Problem ME 416
No ratings yet
Topic 2 The Variable Problem ME 416
7 pages
WEB - Welding Design
No ratings yet
WEB - Welding Design
2 pages
Communication and Management
No ratings yet
Communication and Management
43 pages
4305 Tall Urban Innovation 2020 Dominant Trends
No ratings yet
4305 Tall Urban Innovation 2020 Dominant Trends
8 pages
Mid Nrepm Nropm
No ratings yet
Mid Nrepm Nropm
1 page
2017 Underwater Mining Conference - MUM - Large Modi Able Underwater Mother Ship
No ratings yet
2017 Underwater Mining Conference - MUM - Large Modi Able Underwater Mother Ship
7 pages
Piping Quality Standards
No ratings yet
Piping Quality Standards
31 pages
IW4000 Instruction Guide
No ratings yet
IW4000 Instruction Guide
1 page

Python Project Data Analysis-1

Uploaded by

Python Project Data Analysis-1

Uploaded by

PYTHON PROJECT

ANALYSIS OF INDIAN WEATHER USING PYTHON

Post Graduate Diploma in Management

Submitted to: Ms. Shilpi Yadav Submitted by: Ronit Saini

Jagannath International Management School

(Approved by All India Council for Technical Education (AICTE) and

1 Introduction – Python 3-5

2 Literature Review 6-7

2. Versatility: Python is a versatile language, suitable for various applications, including

4. Cross-Platform: Python is available for multiple platforms, including Windows, macOS,

5. Open Source: Python's open-source nature encourages collaboration and community-

Community and Ecosystem:

Historical Weather Trends in India

Climate Change and Variability

Extreme Weather Events

Public Health and Vulnerability

Adaption and Mitigation Strategies

Loading the dataset:

2. Used ISNULL function: The isnull function is a fundamental component of the

3. Used plt.plot: plt.plot is a fundamental function in Matplotlib, a powerful Python

5. Performing basic statistical functions using data.describe: The data.describe()

6. Used Correlation: Correlation analysis assesses the degree and direction of

8. Used scikit.learn library for regression analysis and prediction: Scikit-learn

9. Used regplot to show relation between 2 variables: regplot is a versatile function

10. Used various other visualization functions like:

a. CountPlot: countplot is a versatile function that simplifies the creation of bar

b. BoxPlot: Also known as box-and-whisker plots, are essential tools for

c. PairPlot: pairplot is a versatile function that enables researchers to create a

['.ipynb_checkpoints', 'afm-cw.xlsx', 'afm-q3.xlsx', 'Agenda and Minutes.pdf', 'ag

In [3]: data = pd.read_csv("IndianWeatherRepository.csv")

... ... ... ... ... ... ...

9840 rows × 41 columns

count 9840.000000 9840.000000 9.840000e+03 9840.000000 9840.000000

mean 23.106256 80.229436 1.694004e+09 25.225061 77.405234

std 5.797599 5.761152 4.487798e+05 3.838239 6.909241

min 8.080000 68.970000 1.693286e+09 -2.600000 27.300000

25% 20.270000 76.070000 1.693612e+09 23.600000 74.500000

50% 23.970000 78.670000 1.694041e+09 25.600000 78.100000

75% 26.772500 83.900000 1.694387e+09 27.300000 81.100000

max 34.570000 95.800000 1.694732e+09 38.300000 100.900000

In [10]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']

In [11]: columns = ['temperature_celsius', 'humidity', 'wind_mph']

In [12]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']

In [13]: column_pairs = [('temperature_celsius', 'humidity'), ('wind_mph', 'pressure_mb')]

In [14]: columns = ['temperature_celsius', 'humidity', 'wind_mph', 'pressure_mb']

In [17]: data_reg = pd.read_csv('IndianWeather2.csv')

In [18]: from sklearn.linear_model import LinearRegression

In [19]: train = data_reg.drop(['humidity','wind_kph','cloud'],axis=1)

In [21]: regression = LinearRegression()

In [23]: predict = regression.predict(X_test)

array([66.1037563 , 76.2523053 , 56.36533543, ..., 72.85677652,

 Standardization or normalization of features was performed to prevent any undue

 We built a predictive model, using linear and logistic regression as an example, to

 Model interpretation techniques, such as feature importance analysis, helped identify

 Data pre-processing is a crucial step. Handling missing data, encoding categorical

In conclusion, conducting predictive and descriptive analysis on Indian weather pattern

1. Dataset from Kaggle https://www.kaggle.com/datasets/nelgiriyewithana/indian-weather-

You might also like