0% found this document useful (0 votes)
17 views28 pages

Prince - Saxena1902@gmail - Com WeatherPatternsAnalysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views28 pages

Prince - Saxena1902@gmail - Com WeatherPatternsAnalysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

WEATHER

PATTERNS ANALYSIS
AND PREDICTION

- PRINCE SAXENA
DATE OF SUBMISSION : 10-12-2024
TABLE OF CONTENTS
01 INTRODUCTION 06 CHALLENGES

02 EDA 07 CONCLUSION

03 METHODOLOGY 08 REFERENCES

04 RESULTS

05 INSIGHTS & LEARNINGS


01
INTRODUCTION
OBJECTIVE-
• To Analyse the Weather Statistics Data for patterns and similarities
• To Predict values for unknown features in a data point using Data Science and
Analytics

DATASET OVERVIEW –
• The dataset used contained 730 entries for weather statistics of a certain place on
dates ranging from the year 2015 to 2017
• The dataset contains data features like -
o Weather Conditions o Temperature (oC)
o Dew Point (oC) o Visibility (Km)
o Humidity (%) o Wind Direction
o Pressure (hPa) o Rain Presence
TOOLS & TECHNIQUES –

For the purpose of this analysis, the tools which were used are :-
• Microsoft Excel
• Orange3

The Techniques and algorithms used include : -


• K-NN (K-Nearest Neighbour)
• K-Means Clustering
• EDA (Exploratory Data Analytics)
02
EXPLORATORY
DATA ANALYTICS
OVERVIEW OF STATISTICAL FINDINGS FROM THE DATA

NAME MEAN MODE MEDIAN MINIMUM MAXIMUM

Humidity(%) 36.34 31 34 6 100

Pressure(hPa) 1007.74 1014 1008 994 1026

Temperature(oC) 30.78 35 32 12 45

Visibility 2.417 2 2 0.2 55

Weather Conditions - Haze - - -

Wind Direction - WNW - - -


OBSERVED TRENDS
All the weather features follow a particular trend over the year and the trends are observed at a continuous interval which is 1 year
in case of most features.

Temperature (°C) Pressure (hPa)


50 1030
40 1020
1010
30
1000
20
990
10 980
0 970

The temperature follows a particular trend as can also The data for pressure also follows a similar trend with
be inferred from the graph as the temperature rises the pressure rising to the peak during the early months
towards the months of May and reaches its lowest of the year and falling to the minimum towards the
point by the months of January following a continuous mid time of the year creating a wave like graph
pattern over the two years of data
WIND & CLIMATE CONDITIONS
WIND DIRECTION NNW
CLIMATE CONDITIONS Smoke
NE Light Rain
NW Haze
WNW Patches of Fog
West Mist
SE
ENE Blowing Sand
East Drizzle
SSE Light Drizzle
SW Widespread Dust
ESE Clear
NNE
Light Thunderstorm
South
WSW Rain
North Scattered Clouds
SSW Thunderstorms and Rain
Partly Cloudy
The Wind over the two years is dominated by Unlike Wind-Direction, Climate Conditions are
majorly three wind directions- NW,West,WNW. This highly dominated by the Hazy conditions which
shows a pattern that wind directions associated comprises of more than 67% of the total
with the west direction are more frequent as recorded data for weather conditions
compared to others
03
METHODOLOGY
KNN Classification
Aim- To Predict the possibility of rain on a new entry with the following statistics-

DATE WEATHER DEW POINT HUMIDITY PRESSURE TEMPERATURE VISIBILITY WIND


DIRECTION
06-Jan-2019 Cloudy 13 60 1018 20 1 SE

The basic Steps Involved in the whole process are as follows:

Selecting the nearest Predicting the chaces of rain based


Calculating the Euclidean distance between
neighbours to the new entry on the majority among the K-nearest
the new entry and the existing data
based on Euclidean distance neighbours

In this KNN model, we will use a K = 3 approach, that means 3 nearest neighbours will be considered
EUCLIDEAN DISTANCE
Euclidean Distance is the root of the sum of squares of the difference between corresponding
values for different entries

For this purpose, excel was used to calculate the distance b/w corresponding values and create a
new feature named EUCLIDEAN DISTANCE
Finding Nearest Neighbours & Prediction

The Data Entries are sorted based on the EUCLIDEAN DISTANCE feature is ascending order

The top three sorted entries(the entries with the minimum distance from the new entry) are the Nearest Neighbours

The prediction for rainfall on the new entry will be based on the mode of values of rainfall during the 3 NN’s
K-Means Clustering
Aim- To form clusters of similar data points and use them to extract patterns and make predictions
For this algorithm, we are going to use the entries for 1-Jan-2015 and 2-Jan-2015 as the initial cluster centers

DATE Weather_Condition Dew_Point (°C) Humidity (%) Pressure (hPa) Temperature (°C) Visibility (km) Wind_Direction (Compass) Rain_Presence

01-Jan-15 Smoke 11 43 1016 21 1 NNW 0

02-Jan-15 Light Rain 15 100 1017 15 1 NE 1

The basic Steps Involved in the whole process are as follows:

Compare the distance from the


Calculating the Euclidean distance between
two centers C1 and C2 and Assign Data Points to the cluster
the Initial Cluster Centers and all existing
compare the magnitude of the from which their distance is less
data points
two distances
Method and Cluster Recomputation
After creating two new data features of Euclidean
distance from the cluster center C1 and C2 we can
use a simple if statement to allocate a cluster to
each data point
Eg.
In Excel, the statement used to segregate clusters
can be
IF({distance1} < {distance2}, 1 , 2)

• Cluster Recomputation is calculating the cluster centers based on the average values of the data features in a cluster

Recomputated 𝐴𝑉𝐺. 𝑉𝐴𝐿𝑈𝐸𝑆 𝑂𝐹 𝐶𝐿𝑈𝑆𝑇𝐸𝑅 1


Cluster Centers
C1 , C 2
= 𝐴𝑉𝐺. 𝑉𝐴𝐿𝑈𝐸𝑆 𝑂𝐹 𝐶𝐿𝑈𝑆𝑇𝐸𝑅 2
04
RESULTS
K-NN Results :-
DATE WEATHER DEW POINT HUMIDITY PRESSURE TEMPERATURE VISIBILITY WIND
DIRECTION
06-Jan-2019 Cloudy 13 60 1018 20 1 SE

On calculating the Euclidean distance between the These data points have the following values
new entry and the existing data points, the three of rain presence -
lowest values are assigned to the entries for -

DATE DISTANCE DATE RAIN


29-01-2017 2.44949 29-01-2017 0
14-01-2016 3.774917 14-01-2016 0
28-01-2017 4.358899 28-01-2017 0

Since the mode of the values of rain presence on the 3 nearest neighbours is 0.
Thus, the prediction for 06-Jan-2019 is 0.
K-Means Clustering Results :-
Taking the entries for 1-Jan-2015 and 2-Jan-2015 as Initial centers to initialize clusters, we get 2 clusters which
have the following analytics -

Temp. Vs. Humidity


50 CLUSTER INITIAL CENTER NO. OF ELEMENTS
TEMPERATURE

40

30

20 1 1-Jan-2015 706
10

0
0 20 40 60 80 100 120 2 2-Jan-2015 24
HUMIDITY

Cluster 1 Cluster 2

Eg. A Plot of Temperature against Humidity


showcasing the two clusters
Cluster Center Reassignment
To recalculate a more accurate value of the centroids, we will calculate the average of all the
values present in a cluster and find the new point where the center lies

Dew_Point (°C) Humidity (%) Pressure (hPa) Temperature (°C) Visibility (km)
CLUSTER1 16.40084986 34.59631728 1007.898017 30.96600567 2.44674221
CLUSTER2 23.70833333 87.70833333 1003.166667 25.54166667 1.5625
Table showcasing the average values of all data points in a cluster

Thus, the recalculated cluster centers (C1 and C2) become -

C1
=
16.4 34.5 1007.89 30.96 2.44
C2 23.7 87.7 1003.16 25.54 1.56
05
INSIGHTS &
LEARNINGS
Major Insights and Learnings

• The dataset contained weather entries for multiple days from 1-Jan-2015 to 31-Mar-2017

• This dataset also provided us with the information regarding the fact that humidity was less
noticed in the place in most of the time causing cluster 1 (which contained mostly data points
with lesser humidity) to have a huge majority of 706 over 24 over its counterpart cluster

• We also got to know the fact that the weather conditions in the place remain hazy a lot of the
time which can be used to infer that the place might be significantly closer to a huge industrial
area or agricultural area as they are few of the most major human made haze causers

• Also, we got to understand that the temperature in the place remains moderate with an average
value of 30.787 oC

The KNN and K-Means Algorithms also significantly helped in reaching these conclusions as they
helped the data to be better classified and plotted and thus helping us in doing a better analysis
06
CHALLENGES &
RECOMMENDATIONS
Challenges and Recommendations

Challenges Faced :

• Bad Visualizations Charts, some of the charts which we made for the clusters were unable
to show clear differentiation between the clusters due to being a lower dimensional
depiction that the actual clusters

• Uneven Clusters which were formed due to our initialization caused one cluster to be
much more dominant over the other, as a result, the data was not clustered in the most
efficient way that might have caused some mistakes with our predictions
07
CONCLUSION
CONCLUSION
• To conclude, this project taught us a lot about the applications of data science algorithms like K-
NN and K-Means Clustering and also a real-life application where data analytics is actually used
to make predictions.

• This dataset also provided us a lot of insights into the weather condition scenarios of a certain
place and how a significant trend emerged in the weather of the place over the two years.
The main findings of this project include:-
• Rain Predictions
• Similar weather trends over the two years
• Weather conditions of the place(hazy with less humidity)

These kind of analysis also have a major implication in the real world as they can be used to predict
weather in the future based on certain recordable features and these predictions can prove to be
helpful for multiple sectors like –
• Defense
• Transportation
• Tourism etc.
08
REFERENCES
RESOURCES & REFERENCES

TOOLS ORANGE3

MICROSOFT EXCEL

WEBSITES REFERRED SUPPORT.MICROSOFT.COM

STACKOVERFLOW.COM

ORANGEDATAMINING.COM/DOCS

YOUTUBE.COM
THANK YOU

You might also like