0% found this document useful (0 votes)
857 views

Python Project Data Analysis-1

This document provides an introduction and overview of a Python project on analyzing Indian weather trends. It discusses the importance of studying India's climate patterns and changes over time. The methodology section outlines typical steps for predictive and descriptive analysis using Python, including importing key libraries like NumPy and Pandas for working with data. The document contains sections on the project introduction, literature review on Indian weather topics, proposed methodology, planned analysis, and expected learning outcomes.

Uploaded by

ripunjay080808
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
857 views

Python Project Data Analysis-1

This document provides an introduction and overview of a Python project on analyzing Indian weather trends. It discusses the importance of studying India's climate patterns and changes over time. The methodology section outlines typical steps for predictive and descriptive analysis using Python, including importing key libraries like NumPy and Pandas for working with data. The document contains sections on the project introduction, literature review on Indian weather topics, proposed methodology, planned analysis, and expected learning outcomes.

Uploaded by

ripunjay080808
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

PYTHON PROJECT

on

ANALYSIS OF INDIAN WEATHER USING PYTHON

Post Graduate Diploma in Management


(PGDM)

Submitted to: Ms. Shilpi Yadav Submitted by: Ronit Saini


Assistant Professor Roll No.: 92
Batch -2023-25

Jagannath International Management School


MOR, Pocket-105, Kalkaji, New Delhi-110019

(Approved by All India Council for Technical Education (AICTE) and


Accredited by NBA SAQS and NAAC)
CONTENTS
Sr. No Topic Page No

1 Introduction – Python 3-5

2 Literature Review 6-7

3 Methodology 8 – 10

4 Analysis 9 – 24

5 Conclusion 25

6 Learning Outcomes 26

7 Bibliography 27

8 Appendices-1 28
INTRODUCTION TO PYTHON
Python is a high-level, versatile, and dynamically typed programming language that has
gained immense popularity in the world of software development since its creation in the late
1980s by Guido van Rossum. Known for its simplicity, readability, and an extensive standard
library, Python is a top choice for both beginners and experienced programmers. In this
introduction, we'll explore the key aspects of Python, its history, features, and why it's a
preferred language for various applications.

History:
Python's journey began in the late 1980s when Guido van Rossum, a Dutch programmer,
started working on the language. He aimed to create a language that emphasized code
readability and allowed developers to express concepts in fewer lines of code. Python's name
is derived from the British comedy group Monty Python, showcasing its creator's sense of
humor.

Key Features:
1. Readability: Python's elegant and clean syntax makes it easy to read and write code. Its
use of indentation for block structures enforces a consistent and visually appealing code
style.

2. Versatility: Python is a versatile language, suitable for various applications, including


web development, data analysis, machine learning, scientific computing, and more.

3. Large Standard Library: Python comes with a rich standard library that simplifies
many tasks, reducing the need for developers to write code from scratch. This library
includes modules for file handling, networking, regular expressions, and more.

4. Cross-Platform: Python is available for multiple platforms, including Windows, macOS,


and various Unix-based systems, making it a platform-independent language.

5. Open Source: Python's open-source nature encourages collaboration and community-


driven development. It also means that Python is free to use and distribute.

6. Extensibility: Python can be easily extended through modules and packages written in
other languages like C or C++, allowing developers to integrate existing code seamlessly.

7. Interpreted: Python is an interpreted language, which means that you don't need to
compile your code before running it. This rapid development cycle is excellent for
prototyping and testing.

8. Dynamically Typed: Python uses dynamic typing, which means you don't need to
declare variable types explicitly. The interpreter infers the type at runtime, providing
flexibility but requiring careful attention to type-related issues.
Use Cases:
Python's versatility has led to its adoption in various domains:

1. Web Development: Frameworks like Django and Flask simplify web application
development, making Python a top choice for web developers.

2. Data Science: Python is widely used for data analysis and visualization. Libraries like
Pandas, NumPy, and Matplotlib facilitate data manipulation and exploration.

3. Machine Learning and AI: Python's extensive ecosystem includes libraries like
TensorFlow and PyTorch, enabling the development of cutting-edge machine learning
models and artificial intelligence applications.

4. Scientific Computing: Scientists and researchers use Python for tasks such as simulation,
modeling, and data analysis due to its powerful libraries like SciPy.

5. Automation: Python's simplicity and ease of use make it an ideal choice for automating
tasks and writing scripts.

Community and Ecosystem:


Python has a vibrant and supportive community. Python Package Index (PyPI) hosts
thousands of third-party packages and libraries, expanding Python's capabilities and making
it suitable for nearly any task.
In conclusion, Python's readability, versatility, and extensive ecosystem have made it one of
the most popular programming languages globally. Whether you're a beginner or a seasoned
developer, Python's simplicity and power make it a valuable tool for a wide range of
applications, from web development to data science and beyond. Its open-source nature and
strong community ensure that Python will continue to evolve and remain relevant in the ever-
changing world of technology.
LITERATURE REVIEW

The study of weather trends in India is of paramount importance due to its far-reaching
implications for agriculture, economy, public health, and environmental sustainability. The
Indian subcontinent is renowned for its climatic diversity, with varying weather patterns
across different regions and seasons. Understanding the historical trends, recent changes, and
potential future scenarios in Indian weather is crucial in addressing the challenges posed by
climate change.

Historical Weather Trends in India


India's meteorological history is rich and well-documented, thanks to organizations like the
India Meteorological Department (IMD). Historical data reveals notable patterns, such as the
annual monsoon season and the oscillations of El Niño and La Niña. Decades of records
show temperature fluctuations, shifts in precipitation, and instances of extreme weather
events that have shaped India's climate landscape.

Climate Change and Variability


India, like the rest of the world, faces the undeniable impact of climate change. Rising
temperatures, erratic rainfall, and altered monsoon patterns have become evident in recent
decades. These shifts have consequences for agriculture, water resources, and ecosystems.
The Intergovernmental Panel on Climate Change (IPCC) reports have highlighted the global
nature of climate change and underscored the need for mitigation and adaptation strategies.

Indian Monsoon
The Indian monsoon is the lifeblood of agriculture in the subcontinent. A critical component
of Indian weather, it is characterized by complex interactions between land and oceanic
systems. Researchers at institutions like the Indian Institute of Tropical Meteorology (IITM)
have been diligently studying monsoon dynamics. Recent findings reveal changing monsoon
behavior, including variations in onset, withdrawal, and intensity.

Extreme Weather Events


India is no stranger to extreme weather events. Cyclones, heatwaves, and floods have
impacted various regions, causing loss of life and property. Instances like Cyclone Fani in
2019 and the devastating Chennai floods in 2015 underscore the significance of studying
extreme weather events and improving preparedness and resilience.
Impact on Agriculture and Economy
Agriculture, a major contributor to India's economy, is highly sensitive to weather conditions.
Changing weather patterns influence crop yields, food security, and livelihoods. Moreover,
the Indian economy as a whole is intertwined with weather, affecting sectors such as energy,
water resources, and tourism. The economic ramifications of weather trends necessitate
strategic planning and adaptation measures.

Public Health and Vulnerability


Weather trends have direct implications for public health. Rising temperatures, air quality
deterioration, and the spread of vector-borne diseases are health concerns that cannot be
ignored. Vulnerable populations, including urban communities and marginalized groups, face
disproportionate risks. Public health studies and government initiatives are pivotal in
addressing these challenges.

Adaption and Mitigation Strategies


In response to changing weather patterns, India has initiated numerous strategies. These
include climate-resilient agriculture practices, investments in renewable energy, and disaster
preparedness programs. Government policies and local community efforts are critical
components of India's approach to mitigating the adverse effects of changing weather.

Conclusion
The study of Indian weather trends offers insights into a complex interplay of meteorological
factors, climate change, and socioeconomic consequences. It serves as a foundation for
evidence-based decision-making in various sectors. While much has been documented,
continued research is vital in comprehending the evolving dynamics of Indian weather and
charting a sustainable course for the future.
METHODOLOGY

Performing predictive and descriptive analysis using Python typically involves several steps
and libraries. Here's a general methodology for conducting these types of analysis:

Importing Libraries: importing the necessary libraries like Numpy: NumPy, short for
Numerical Python, is a fundamental library that provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical functions to operate on these
arrays. Its significance lies in its capacity to efficiently handle and analyze data, making it
indispensable for researchers in various domains.

Pandas: Pandas is a versatile library that provides data structures and functions to efficiently
handle and analyze structured data. Its significance lies in its capacity to simplify complex
data tasks, making it an essential asset for researchers across various domains.

Matplotlib: Matplotlib is a powerful library for creating static, animated, and interactive
visualizations in Python. Its significance lies in its capacity to produce publication-quality
graphics, enabling researchers to convey complex data findings with clarity and precision.

Seaborn: Seaborn is a high-level data visualization library built on top of Matplotlib. Its
significance lies in its ability to simplify the creation of complex statistical plots and provide
a visually appealing framework for representing data, facilitating data exploration, and
enhancing the interpretability of research findings.

Os: The os library is a standard Python module that provides a platform-independent way to
interact with the operating system, file systems, and directories. Its significance lies in its
capacity to simplify and automate tasks related to data management, file access, and system-
level operations, ensuring reproducibility and efficiency in research workflows.

Loading the dataset:

1. Loaded the dataset using pd.read function: pd.read is a function within the Pandas
library, a widely-used tool in data analysis and manipulation. Its primary purpose is to
load data from various file formats, such as CSV, Excel, SQL databases, and more,
into Pandas DataFrames, which serve as the foundational data structures for
subsequent research analysis.

2. Used ISNULL function: The isnull function is a fundamental component of the


Pandas library in Python. Its significance lies in its ability to detect missing values
within datasets, facilitating data quality assessment, and enabling researchers to make
informed decisions about handling missing data.

3. Used plt.plot: plt.plot is a fundamental function in Matplotlib, a powerful Python


library for data visualization. Its significance lies in its ability to create a variety of
plots, from simple line charts to complex, customized visualizations. Researchers use
it to convey data findings effectively and enhance the interpretability of research
results.
4. Performing basic functions like: data.shape, data.columns, data.head(), data.tail(),
value_counts().

5. Performing basic statistical functions using data.describe: The data.describe()


method, a pivotal feature within the Pandas library, serves as a fundamental
component in the process of quantitative data analysis. It provides a succinct yet
comprehensive summary of key statistical measures that are essential for
understanding and interpreting the characteristics of a dataset.

6. Used Correlation: Correlation analysis assesses the degree and direction of


association between two or more variables. Commonly used for linear relationships,
quantifies the strength and direction of a linear relationship between two continuous
variables.

7. Visualized the relation between 2 variables using relplot: relplot offers a powerful
means to visualize relationships between variables through scatterplots, line plots, or
other relational visualizations. It is instrumental in uncovering patterns, trends, and
correlations within the data, allowing researchers to make informed decisions and
generate insights.

8. Used scikit.learn library for regression analysis and prediction: Scikit-learn


(sklearn) is a comprehensive library that offers a wide array of tools for data analysis
and machine learning. Its robust and well-documented functionality empowers
researchers to employ state-of-the-art machine learning algorithms, assess model
performance, and extract meaningful insights from complex datasets.

9. Used regplot to show relation between 2 variables: regplot is a versatile function


that enables the creation of scatterplots with overlaid regression models. It provides
researchers with a visually intuitive means of exploring the relationships between two
continuous variables, elucidating potential patterns, trends, and associations within the
data.

10. Used various other visualization functions like:

a. CountPlot: countplot is a versatile function that simplifies the creation of bar


plots to visualize the counts or frequencies of categorical data. Its significance
lies in its ability to provide researchers with an effective means to explore and
communicate patterns and relationships within categorical variables.

b. BoxPlot: Also known as box-and-whisker plots, are essential tools for


summarizing the distribution of numerical data. They offer a concise
representation of key statistics, including the median, quartiles, and potential
outliers..

c. PairPlot: pairplot is a versatile function that enables researchers to create a


matrix of scatterplots, making it possible to visualize pairwise relationships
between multiple variables in a dataset. Its significance lies in its capacity to
reveal complex interdependencies and patterns facilitating data exploration
and hypothesis generation.
9/19/23, 12:02 AM python_assignment

Data Analysis
In [1]: # Importing all the required python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

In [2]: print(os.listdir())

['.ipynb_checkpoints', 'afm-cw.xlsx', 'afm-q3.xlsx', 'Agenda and Minutes.pdf', 'ag


enda&minutes.docx', 'agenda&minutes.pdf', 'agendas.xlsx', 'coursera.lnk', 'DC.cs
v', 'desktop.ini', 'FINAL ROUND.xlsx', 'forage.lnk', 'IndianWeather2.csv', 'Indian
WeatherRepository.csv', 'ISM-Epayment.docx', 'ISM-Epayment.pptx', 'MarketingSWOTbo
x.xlsx', 'MCom-MoneyMindset.pptx', 'MCOMM_Format_and_Practical_2.pdf', 'ME-graph.x
lsx', 'meco-assignment1.xlsx', 'ME_Subsidy.docx', 'ME_Subsidy.pdf', 'MM_swotBox.pp
tx', 'Nitin Negi_AFM_73PGDMB2023.xlsx', 'Nitin Negi_ME_073.pdf', 'Nitin Negi_ME_07
3_tomato.docx', 'Nitin Negi_ME_073_tomato.pdf', 'Nitin Negi_Python.pdf', 'Nitin Ne
gi_QTM_PGDM-B.pdf', 'notes_Mcom.docx', 'NSE SMART.exe - Shortcut.lnk', 'oahb-quiz.
docx', 'OAHB_PRESENTATION[1].pptx', 'OB-Group5-CaseStudy.docx', 'OB-Group5-CaseStu
dy.pdf', 'OB_PPT_Organizational_Structure.pptx', 'python_assignment.ipynb', 'pytho
n_assignment2.pdf', 'Python_Project.docx', 'qtm-cw.xlsx', 'supermarkt_sales.csv',
'swot.pdf', 'TEMPLATE.xlsx', 'tips.csv', 'udemy.lnk', 'Visual Studio Code.lnk', '~
$thon_Project.docx']

In [3]: data = pd.read_csv("IndianWeatherRepository.csv")


data

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 1/16
9/19/23, 12:02 AM python_assignment

Out[3]:
location_name region latitude longitude timezone last_updated_epoch last_updat

Madhya 29/08/20
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10

Madhya 29/08/20
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10

... ... ... ... ... ... ...

Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04

15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04

15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04

15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04

15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04

9840 rows × 41 columns

 

In [4]: data.head(5)

Out[4]:
location_name region latitude longitude timezone last_updated_epoch last_updated tem

Madhya 29/08/2023
0 Ashoknagar 24.57 77.72 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
1 Raisen 23.33 77.80 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
2 Chhindwara 22.07 78.93 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
3 Betul 21.86 77.93 Asia/Kolkata 1693286100
Pradesh 10:45

Madhya 29/08/2023
4 Hoshangabad 22.75 77.72 Asia/Kolkata 1693286100
Pradesh 10:45

5 rows × 41 columns

 

In [5]: data.shape

(9840, 41)
Out[5]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 2/16
9/19/23, 12:02 AM python_assignment

In [6]: data.tail(5)

Out[6]:
location_name region latitude longitude timezone last_updated_epoch last_updat

Uttar 15/09/20
9835 Niwari 28.88 77.53 Asia/Kolkata 1694731500
Pradesh 04

15/09/20
9836 Saitual Mizoram 23.97 92.58 Asia/Kolkata 1694731500
04

15/09/20
9837 Ranipet Tamil Nadu 12.93 79.33 Asia/Kolkata 1694731500
04

15/09/20
9838 Tenkasi Tamil Nadu 8.97 77.30 Asia/Kolkata 1694731500
04

15/09/20
9839 Pendra Maharashtra 21.93 74.15 Asia/Kolkata 1694731500
04

5 rows × 41 columns

 

In [7]: data.isnull().sum()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 3/16
9/19/23, 12:02 AM python_assignment

location_name 0
Out[7]:
region 0
latitude 0
longitude 0
timezone 0
last_updated_epoch 0
last_updated 0
temperature_celsius 0
temperature_fahrenheit 0
condition_text 0
wind_mph 0
wind_kph 0
wind_degree 0
wind_direction 0
pressure_mb 0
pressure_in 0
precip_mm 0
precip_in 0
humidity 0
cloud 0
feels_like_celsius 0
feels_like_fahrenheit 0
visibility_km 0
visibility_miles 0
uv_index 0
gust_mph 0
gust_kph 0
air_quality_Carbon_Monoxide 0
air_quality_Ozone 0
air_quality_Nitrogen_dioxide 0
air_quality_Sulphur_dioxide 0
air_quality_PM2.5 0
air_quality_PM10 0
air_quality_us-epa-index 0
air_quality_gb-defra-index 0
sunrise 0
sunset 0
moonrise 0
moonset 0
moon_phase 0
moon_illumination 0
dtype: int64

In [8]: data.describe()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 4/16
9/19/23, 12:02 AM python_assignment

Out[8]:
latitude longitude last_updated_epoch temperature_celsius temperature_fahrenheit

count 9840.000000 9840.000000 9.840000e+03 9840.000000 9840.000000

mean 23.106256 80.229436 1.694004e+09 25.225061 77.405234

std 5.797599 5.761152 4.487798e+05 3.838239 6.909241

min 8.080000 68.970000 1.693286e+09 -2.600000 27.300000

25% 20.270000 76.070000 1.693612e+09 23.600000 74.500000

50% 23.970000 78.670000 1.694041e+09 25.600000 78.100000

75% 26.772500 83.900000 1.694387e+09 27.300000 81.100000

max 34.570000 95.800000 1.694732e+09 38.300000 100.900000

8 rows × 30 columns

 

In [9]: data.info()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 5/16
9/19/23, 12:02 AM python_assignment

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9840 entries, 0 to 9839
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 location_name 9840 non-null object
1 region 9840 non-null object
2 latitude 9840 non-null float64
3 longitude 9840 non-null float64
4 timezone 9840 non-null object
5 last_updated_epoch 9840 non-null int64
6 last_updated 9840 non-null object
7 temperature_celsius 9840 non-null float64
8 temperature_fahrenheit 9840 non-null float64
9 condition_text 9840 non-null object
10 wind_mph 9840 non-null float64
11 wind_kph 9840 non-null float64
12 wind_degree 9840 non-null int64
13 wind_direction 9840 non-null object
14 pressure_mb 9840 non-null int64
15 pressure_in 9840 non-null float64
16 precip_mm 9840 non-null float64
17 precip_in 9840 non-null float64
18 humidity 9840 non-null int64
19 cloud 9840 non-null int64
20 feels_like_celsius 9840 non-null float64
21 feels_like_fahrenheit 9840 non-null float64
22 visibility_km 9840 non-null float64
23 visibility_miles 9840 non-null int64
24 uv_index 9840 non-null int64
25 gust_mph 9840 non-null float64
26 gust_kph 9840 non-null float64
27 air_quality_Carbon_Monoxide 9840 non-null float64
28 air_quality_Ozone 9840 non-null float64
29 air_quality_Nitrogen_dioxide 9840 non-null float64
30 air_quality_Sulphur_dioxide 9840 non-null float64
31 air_quality_PM2.5 9840 non-null float64
32 air_quality_PM10 9840 non-null float64
33 air_quality_us-epa-index 9840 non-null int64
34 air_quality_gb-defra-index 9840 non-null int64
35 sunrise 9840 non-null object
36 sunset 9840 non-null object
37 moonrise 9840 non-null object
38 moonset 9840 non-null object
39 moon_phase 9840 non-null object
40 moon_illumination 9840 non-null int64
dtypes: float64(20), int64(10), object(11)
memory usage: 3.1+ MB

In [10]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']


for column in columns:
plt.figure(figsize=(10, 6))
data[column].value_counts().plot(kind='bar')
plt.xlabel(column)
plt.ylabel('Frequency')
plt.title(f'Distribution of {column}')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 6/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 7/16
9/19/23, 12:02 AM python_assignment

In [11]: columns = ['temperature_celsius', 'humidity', 'wind_mph']


for column in columns:
plt.figure(figsize=(10, 6))
sns.boxplot(data=data, y=column,color='g')
plt.ylabel(column)
plt.title(f'Box Plot of {column}')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 8/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 9/16
9/19/23, 12:02 AM python_assignment

In [12]: columns = ['region', 'condition_text','wind_direction', 'moon_phase']


for column in columns:
plt.figure(figsize=(10, 6))
sns.countplot(data=data, x=column)
plt.xlabel(column)
plt.ylabel('Count')
plt.title(f'Count Plot of {column}')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 10/16
9/19/23, 12:02 AM python_assignment

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 11/16
9/19/23, 12:02 AM python_assignment

In [13]: column_pairs = [('temperature_celsius', 'humidity'), ('wind_mph', 'pressure_mb')]


for x_column, y_column in column_pairs:
plt.figure(figsize=(10, 6))
plt.scatter(data[x_column], data[y_column])
plt.xlabel(x_column)
plt.ylabel(y_column)
plt.title(f'Scatter Plot: {x_column} vs {y_column}')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 12/16
9/19/23, 12:02 AM python_assignment

In [14]: columns = ['temperature_celsius', 'humidity', 'wind_mph', 'pressure_mb']


correlation_matrix = data[columns].corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.show()

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 13/16
9/19/23, 12:02 AM python_assignment

In [15]: sns.relplot(x="uv_index",y="region",data=data)

<seaborn.axisgrid.FacetGrid at 0x2114b347d90>
Out[15]:

In [16]: sns.relplot(x="temperature_celsius",y="region",data=data)

<seaborn.axisgrid.FacetGrid at 0x2114a59f460>
Out[16]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 14/16
9/19/23, 12:02 AM python_assignment

In [17]: data_reg = pd.read_csv('IndianWeather2.csv')

In [18]: from sklearn.linear_model import LinearRegression


from sklearn.model_selection import train_test_split

In [19]: train = data_reg.drop(['humidity','wind_kph','cloud'],axis=1)


test = data_reg ['humidity']

In [20]: X_train,X_test,y_train,y_test=train_test_split(train,test,test_size=0.3,random_stat

In [21]: regression = LinearRegression()

In [22]: regression.fit(X_train,y_train)

Out[22]: ▾ LinearRegression

LinearRegression()

In [23]: predict = regression.predict(X_test)

In [24]: predict

array([66.1037563 , 76.2523053 , 56.36533543, ..., 72.85677652,


Out[24]:
73.55689017, 83.70177489])

In [25]: regression.score(X_test,y_test)

0.6970810366341045
Out[25]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 15/16
9/19/23, 12:02 AM python_assignment

In [26]: sns.pairplot(data)

<seaborn.axisgrid.PairGrid at 0x2114c528dc0>
Out[26]:

localhost:8888/nbconvert/html/Desktop/python_assignment.ipynb?download=false 16/16
CONCLUSION
Data Understanding: Through exploratory data analysis (EDA), we gained a thorough
understanding of the dataset's features and characteristics. We visualized the data to identify
patterns and relationships between variables.

Data Preprocessing:

 We handled missing data and encoded categorical variables, ensuring that the data
was ready for modeling.

 Standardization or normalization of features was performed to prevent any undue


influence of variables with different scales.

Predictive Analysis:

 We built a predictive model, using linear and logistic regression as an example, to


classify breast cancer diagnoses as either malignant or benign.

 The model's performance was evaluated using metrics like accuracy, precision, recall,
and F1-score, providing a quantitative assessment of its effectiveness.

Descriptive Analysis:

 Model interpretation techniques, such as feature importance analysis, helped identify


the most influential features in making predictions.

 This insight is crucial for understanding the factors that contribute to breast cancer
diagnosis.
LEARNINGS

 Data pre-processing is a crucial step. Handling missing data, encoding categorical


variables, and splitting the data into training and testing sets are essential tasks to
ensure the data is suitable for analysis.
 EDA helps you understand the dataset's characteristics, such as the distribution of
features and their relationships.
 Visualizations like pair plots and correlation heatmaps can reveal patterns and
potential insights.
 Effective data visualization is essential for communicating findings. In this analysis,
visualizations were used to display relationships between features and the distribution
of data points for each diagnosis type (malignant and benign).
 Building predictive models, such as logistic regression, is a common approach to
classify breast cancer diagnoses.
 Model selection and hyperparameter tuning play a role in improving model
performance.
 Model interpretability techniques like feature importance analysis can help identify
which features are most influential in making predictions.
 Understanding why a model makes certain predictions is important, especially in
medical applications like cancer diagnosis.
 Creating a well-documented and well-organized Jupyter Notebook is essential for
sharing findings and insights with others.
 Clear explanations of the analysis process, results, and visualizations make the
analysis more accessible to non-technical stakeholders.
 Having domain knowledge about Indian weather diversification, its characteristics,
and forecasting methods can aid in feature selection, model interpretation, and making
relevant predictions.

In conclusion, conducting predictive and descriptive analysis on Indian weather pattern


across country, dataset in a Jupyter Notebook provides a hands-on opportunity to apply data
science techniques to a real-world prediction. It offers valuable insights not only into the
dataset itself but also into the broader aspects of data preprocessing, modeling, evaluation,
and ethical considerations when dealing with environmental data. This type of analysis can
contribute to improving predictions accuracy and forecasting in the field of environmental
science.
BIBLIOGRAPHY

References:

1. Dataset from Kaggle https://www.kaggle.com/datasets/nelgiriyewithana/indian-weather-


repository-daily-snapshot by Nidula Elgiriyewithana.

2. Python notes in Google classroom by Ms. Shilpi Yadav, Assistant Professor, Jagannath
International Management School, Kalkaji.

These references cover various aspects of data analysis, machine learning, and Python
programming, which are relevant to the learnings and insights from the analysis of the
dataset.

You might also like