Prince - Saxena1902@gmail - Com WeatherPatternsAnalysis
Prince - Saxena1902@gmail - Com WeatherPatternsAnalysis
PATTERNS ANALYSIS
AND PREDICTION
- PRINCE SAXENA
DATE OF SUBMISSION : 10-12-2024
TABLE OF CONTENTS
01 INTRODUCTION 06 CHALLENGES
02 EDA 07 CONCLUSION
03 METHODOLOGY 08 REFERENCES
04 RESULTS
DATASET OVERVIEW –
• The dataset used contained 730 entries for weather statistics of a certain place on
dates ranging from the year 2015 to 2017
• The dataset contains data features like -
o Weather Conditions o Temperature (oC)
o Dew Point (oC) o Visibility (Km)
o Humidity (%) o Wind Direction
o Pressure (hPa) o Rain Presence
TOOLS & TECHNIQUES –
For the purpose of this analysis, the tools which were used are :-
• Microsoft Excel
• Orange3
Temperature(oC) 30.78 35 32 12 45
The temperature follows a particular trend as can also The data for pressure also follows a similar trend with
be inferred from the graph as the temperature rises the pressure rising to the peak during the early months
towards the months of May and reaches its lowest of the year and falling to the minimum towards the
point by the months of January following a continuous mid time of the year creating a wave like graph
pattern over the two years of data
WIND & CLIMATE CONDITIONS
WIND DIRECTION NNW
CLIMATE CONDITIONS Smoke
NE Light Rain
NW Haze
WNW Patches of Fog
West Mist
SE
ENE Blowing Sand
East Drizzle
SSE Light Drizzle
SW Widespread Dust
ESE Clear
NNE
Light Thunderstorm
South
WSW Rain
North Scattered Clouds
SSW Thunderstorms and Rain
Partly Cloudy
The Wind over the two years is dominated by Unlike Wind-Direction, Climate Conditions are
majorly three wind directions- NW,West,WNW. This highly dominated by the Hazy conditions which
shows a pattern that wind directions associated comprises of more than 67% of the total
with the west direction are more frequent as recorded data for weather conditions
compared to others
03
METHODOLOGY
KNN Classification
Aim- To Predict the possibility of rain on a new entry with the following statistics-
In this KNN model, we will use a K = 3 approach, that means 3 nearest neighbours will be considered
EUCLIDEAN DISTANCE
Euclidean Distance is the root of the sum of squares of the difference between corresponding
values for different entries
For this purpose, excel was used to calculate the distance b/w corresponding values and create a
new feature named EUCLIDEAN DISTANCE
Finding Nearest Neighbours & Prediction
The Data Entries are sorted based on the EUCLIDEAN DISTANCE feature is ascending order
The top three sorted entries(the entries with the minimum distance from the new entry) are the Nearest Neighbours
The prediction for rainfall on the new entry will be based on the mode of values of rainfall during the 3 NN’s
K-Means Clustering
Aim- To form clusters of similar data points and use them to extract patterns and make predictions
For this algorithm, we are going to use the entries for 1-Jan-2015 and 2-Jan-2015 as the initial cluster centers
DATE Weather_Condition Dew_Point (°C) Humidity (%) Pressure (hPa) Temperature (°C) Visibility (km) Wind_Direction (Compass) Rain_Presence
• Cluster Recomputation is calculating the cluster centers based on the average values of the data features in a cluster
On calculating the Euclidean distance between the These data points have the following values
new entry and the existing data points, the three of rain presence -
lowest values are assigned to the entries for -
Since the mode of the values of rain presence on the 3 nearest neighbours is 0.
Thus, the prediction for 06-Jan-2019 is 0.
K-Means Clustering Results :-
Taking the entries for 1-Jan-2015 and 2-Jan-2015 as Initial centers to initialize clusters, we get 2 clusters which
have the following analytics -
40
30
20 1 1-Jan-2015 706
10
0
0 20 40 60 80 100 120 2 2-Jan-2015 24
HUMIDITY
Cluster 1 Cluster 2
Dew_Point (°C) Humidity (%) Pressure (hPa) Temperature (°C) Visibility (km)
CLUSTER1 16.40084986 34.59631728 1007.898017 30.96600567 2.44674221
CLUSTER2 23.70833333 87.70833333 1003.166667 25.54166667 1.5625
Table showcasing the average values of all data points in a cluster
C1
=
16.4 34.5 1007.89 30.96 2.44
C2 23.7 87.7 1003.16 25.54 1.56
05
INSIGHTS &
LEARNINGS
Major Insights and Learnings
• The dataset contained weather entries for multiple days from 1-Jan-2015 to 31-Mar-2017
• This dataset also provided us with the information regarding the fact that humidity was less
noticed in the place in most of the time causing cluster 1 (which contained mostly data points
with lesser humidity) to have a huge majority of 706 over 24 over its counterpart cluster
• We also got to know the fact that the weather conditions in the place remain hazy a lot of the
time which can be used to infer that the place might be significantly closer to a huge industrial
area or agricultural area as they are few of the most major human made haze causers
• Also, we got to understand that the temperature in the place remains moderate with an average
value of 30.787 oC
The KNN and K-Means Algorithms also significantly helped in reaching these conclusions as they
helped the data to be better classified and plotted and thus helping us in doing a better analysis
06
CHALLENGES &
RECOMMENDATIONS
Challenges and Recommendations
Challenges Faced :
• Bad Visualizations Charts, some of the charts which we made for the clusters were unable
to show clear differentiation between the clusters due to being a lower dimensional
depiction that the actual clusters
• Uneven Clusters which were formed due to our initialization caused one cluster to be
much more dominant over the other, as a result, the data was not clustered in the most
efficient way that might have caused some mistakes with our predictions
07
CONCLUSION
CONCLUSION
• To conclude, this project taught us a lot about the applications of data science algorithms like K-
NN and K-Means Clustering and also a real-life application where data analytics is actually used
to make predictions.
• This dataset also provided us a lot of insights into the weather condition scenarios of a certain
place and how a significant trend emerged in the weather of the place over the two years.
The main findings of this project include:-
• Rain Predictions
• Similar weather trends over the two years
• Weather conditions of the place(hazy with less humidity)
These kind of analysis also have a major implication in the real world as they can be used to predict
weather in the future based on certain recordable features and these predictions can prove to be
helpful for multiple sectors like –
• Defense
• Transportation
• Tourism etc.
08
REFERENCES
RESOURCES & REFERENCES
TOOLS ORANGE3
MICROSOFT EXCEL
STACKOVERFLOW.COM
ORANGEDATAMINING.COM/DOCS
YOUTUBE.COM
THANK YOU