0% found this document useful (0 votes)
70 views6 pages

Pavuluri 2020

This document discusses forecasting meteorological analysis using machine learning algorithms. It summarizes three algorithms used for weather prediction: 1. Decision tree algorithm, which predicts weather values like temperature using classification trees. 2. Random forest algorithm, which combines many decision trees to improve stability and reduce variance in predictions. 3. K-nearest neighbors algorithm using neural networks, which identifies closely matching values in training data to predict weather, using Euclidean distance and neural networks. The paper aims to use these three algorithms on a weather dataset to most accurately predict attributes like temperature and precipitation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

Pavuluri 2020

This document discusses forecasting meteorological analysis using machine learning algorithms. It summarizes three algorithms used for weather prediction: 1. Decision tree algorithm, which predicts weather values like temperature using classification trees. 2. Random forest algorithm, which combines many decision trees to improve stability and reduce variance in predictions. 3. K-nearest neighbors algorithm using neural networks, which identifies closely matching values in training data to predict weather, using Euclidean distance and neural networks. The paper aims to use these three algorithms on a weather dataset to most accurately predict attributes like temperature and precipitation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)

IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Forecasting Meteorological Analysis using


Machine Learning Algorithms
Pavuluri Jithendra
Bhagya Lakshmi Pavuluri Ramya Sree Vejendla
Department Of CSE
Department Of CSE Department Of CSE
Koneru Lakshmaiah Education
Koneru Lakshmaiah Education Koneru Lakshmaiah Education
Foundation
Foundation Foundation
Vaddeswaram, India
Vaddeswaram, India Vaddeswaram, India
[email protected]
[email protected] [email protected]
Tinnavalli Deepika Shahana Bano
Department Of CSE Department Of CSE
Koneru Lakshmaiah Education Koneru Lakshmaiah Education
Foundation Foundation
Vaddeswaram, India Vaddeswaram, India
[email protected] [email protected]

Abstract— Weather prediction is gaining up ubiquity quickly the normal temperature of the day whether it is hot, cold or
in the current period of Machine learning and Technologies. It rainy by using three following algorithms:
is fundamental to foresee the temperature of the climate for
quite a while. Decision trees, K-NN, Random Forest algorithms
A. Decision Tress Algorithm:
are an integral asset which has been utilized in several Decision trees models are regularly utilized to
prediction works for instance, flood prediction, storm detection examine the dataset and actuate the tree and its process is
etc. In this paper, a simple approach for weather prediction of used for predictions. To build the decision tree there are few
future years by utilizing the past data analysis is proposed by various algorithm such as Classification and Regression
the decision tree, K-NN and random forest algorithm Trees (CART), C5.0 and ID3 etc. But in this work we used
calculations and showing the best accuracy result of these three classification using ID3 for weather prediction. Generally,
algorithms. Weather prediction plays a significant job in the tree contains branch nodes and each node significates the
everyday applications and in this paper the prediction is done decision between various other options, and final leaf node
based on the temperature changes of the certain area. All these signifies the decision. According to the algorithm tree
algorithms calculate the mean values, median, confidence produces two or more branches for each node and when it
values, probability and show the difference between plots of all produces two branches it is known as binary tree or multiway
the three algorithms etc. Finally, using these algorithms in this tree for more branches. In this paper the classification tree
work we can predict whether the temperature increases or predicts the weather values for meteorological information of
decreases, is it a rainy day or not. The dataset is completely
all months in 2016 and 2017.
based on the weather of certain area including few objects like
year, month, and temperature, predicted values and so on. B. Random Forest Algorithm:
Random Forest method is the most popularly used
Keywords— Decision Tree Algorithm, Random Forest algorithm in many fields like medical, power industries,
Algorithm, K-NN Algorithm, Classification, Weather Prediction,
identifying climate changes and weather predictions etc. It is
Hot, Cold, Rain.
the best algorithm to control the high- dimensional data for
both classification and regression. The combination of many
I. INTRODUCTION individual decision trees as training data and testing sample
Utilizing the right algorithm for the future values is the random forest. Which gives the average single
predictions is viral these days. That’s the reason we done weather prediction by the output of all individual trees. This
work on weather forecast too. Machine learning algorithms bagging process helps in improving stability, solving
provide accurate results for predicting climate such as overfitting problem of the large dataset, decreasing the
outlook, humidity, rainfall, temperature, floods and storms. variance and helps in giving best accuracy results.
This part is colossally reliant on past information and man- C. K-Nearest Neighbours Using Neural Networks
made consciousness. Foreseeing the future climate Algorithm:
additionally causes us to settle on choices in crops,
This algorithm is used to identify the nearer values
international sports as well as numerous parts of human lives.
for weather prediction by utilizing the training dataset and it
The main reason of taking three algorithms is to test which
depends on feature similarity for the prediction values of new
algorithm performs well regarding the weather forecast and
data. So, first the K value is initialized and then the distance
after observing many reference papers we observed that they
is calculated for all the point through Euclidian distance
performed mostly using one or two algorithms with few
formula (dist((x,y),(a,b)) = square root(x-a)^2 + (y-b)^2) in
weather objects that’s reason we are motivated to analyze
between the input and training tests. On repeating this process
three machine learning algorithms on one purpose by adding
several times, we get the collection of nearest neighbors of
more variable to weather dataset. So, we decided to anticipate
the given weather dataset which will be utilized to predict the

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 456

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

weather. Finally, for the best representation of K-NN III. PROPOSED WORK
prediction we add the neural network to algorithm so that the We used weather dataset which is taken from the
initial point can be easily identified and remaining as hidden Kaggle website because this website provides many various
point in the hidden layers leading to the output points as
datasets and is popular source for datasets. In the weather
prediction points.
dataset there are 19 variables and 3655 objects for prediction.
Some features of the dataset are:
II. LITERATURE SURVEY
The review of previous studies regarding weather 1. Minimum Frequency
prediction are explained with the motivation of the proposed 2. 1st Quadratic
work. 3. Median
Paper [1]: Analysis of weather prediction using Machine 4. Mean
Learning & Big Data. 5. 3rd Quadratic
Techniques: Linear Regression and Support Vector 6. Maximum Frequency
Machine. 7. Centroid
Performance Analysis: This paper illustrated, how to 8. Events
predict the weather of next 5 days using linear regression and
SVM machine learning algorithms. In the end results are The target value of this dataset is events which contain
measured and confusion matrix for accurate prediction is values like hot, cold and rain without missing values in
given using Big Data. between the data. So, the dataset consists of 1084 cold
Paper [2]: Survey on Weather Forecasting Using Data records, 1388 hot records and 1183 rain records for predicting
Mining. the weather.
Technique: ANN, SVM, Naïve Bayes, Decision Tree
classification algorithms. A. Decision tree algorithm process:
Performance Analysis: The purpose of this paper is to do
survey the various methods and algorithms used for weather Decision tree algorithm is a tree which represents
prediction in data mining field. nodes as variables, branches as decision rule and leaf nodes
Paper [3]: Weather Forecasting Using Machine Learning as outcomes of the dataset. There are different algorithms
Algorithm. used to produce decision tree from data are: Classification
Techniques: Random Forest Classification algorithm, and regression tree CART, ID3, CHAID, ID 4.5.As we used
Raspberry Pi 3 B model and Python language. CART algorithm for the dataset it uses Gini index to represent
Performance Analysis: To forecast the weather a system is metric. According to the weather data set the target function
prepared using Raspberry Pi and python. This project is to is to predict the events (hot, cold or rain) based on the weather
develop a less cost and efficient weather prediction values. From the data, maxtemp, mintemp, maxcold etc are
application using machine learning. the variables of the data. First to build the tree Gini index is
Paper [4]: Weather Forecasting Using Artificial Neural calculated for all the features of dataset.
Network.
Techniques: ANN, LSTM, Recurrent Neural Network. ௖
Performance Analysis: In this paper trained the neural
‫ ݅݊݅ܩ‬ൌ ͳ െ ෍ሺ‫݌‬௜ ሻଶ
network with weather parameters and utilized the LSTM
௜ୀଵ
algorithm to gather weather information. After testing the
data the weather is predicted through the developed model. Eq. (1)
Paper [5]: Rainfall Prediction based on Deep Neural Pi is the proportion of samples that related to class ‘c’ for a
Network: A Review. particular node.
Techniques: Deep Neural Network model with optimization. Gini process value of the dataset is 0.681025.
Performance Analysis: After completion of data pre- We can also use the entropy formula for determining the no-
processing and feature extraction the model parameter des of decision tree.
optimization was done to compare the machine learning ௖
algorithms performance. The Adam optimizer is used in
‫ ݕ݌݋ݎݐ݊ܧ‬ൌ ෍ െ‫݌‬௜ ‫݃݋݈ כ‬ଶ ሺ‫݌‬௜ ሻ
optimization and showed result as deep neural network works
௜ୀଵ
well than machine learning algorithms for weather
prediction. Eq. (2)
This is entropy is for all non-empty classes where p≠0 and if
the samples at a node belong to the same class the entropy v
alues ‘0’.
Entropy process value of the dataset is 1.72352 and it utilize
s the probability of a particular output to make decision on h
ow the nodes should be branched. Furthermore, it is a bit dif

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 457

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

ferent from Gini index because it is having more mathematic Conditional expectations of X and y points, ‘k’ represents th
al intensive as there is logarithmic function is used in its calc e samples.
ulation. After calculating all the features of the data we get t Different variables will have various scaling unites so we do
he final outcome of the weather data as decision tree and we normalization for each variable of the data using (x-
add the prediction function to predict the test values. min(x))/(min(x) - max(x)) to convert all into values between
0 and 1 and is easy to plot. This algorithm will perform well
with the numeric variables because it is dealing with the
B. Random Forest Algorithm:
distance.
Random Forest algorithm is the group of different Now, the dataset is splinted into training as well as testing sets
decision trees and it combines the decisions of all the decision and the k-nn function is applied for the target category to
trees to figure out the prediction value, which signifies the predict the weather. Finally, the accuracy of prediction is done
single average of all the decision trees. This algorithm solves to divide the right predictions of all the remaining predictions.
the overfitting problem of the data and is fast to train the data Æ accuracy (tab)
with test data. The bootstrap samples were produced as [1] 61.20219
individual decision tree. So, we apply the decision tree
algorithm process and add the random forest function to the
dataset as (“output.forest”) for results. We took 500 decision
trees which are processed individually to classify the IV. FLOW CHART
prediction (single average prediction). The following are
random forest confusion values of the weather dataset:

Table 1. Confusion matrix values

objects cold hot rain class.err


-or
cold 434 354 73 0.49768
52
hot 198 862 252 0.34398
78
rain 57 204 1073 0.21391
94

C. K-Nearest Neighbour Algorithm:

This algorithm comes under the supervised learni


ng algorithm it contains the dataset with training label measu
rements(x, y) where x represents the features of dataset and
y represents the target of the dataset ( target= events). Acc-o
rding to the classification problem, the K-NN said that for
the given values the algorithm will identify the K nearest ne-
ighbor of not seen data points and add the class to those data
points to know the K neighbors. So, in order to find the dis-
tance metrics of the points we utilize the following formula:
Euclidean metric

݀ሺ‫ݔ‬ǡ ‫ ݔ‬ᇱ ሻ ൌ ඥሺ‫ݔ‬ଵ െ  ‫ݔ‬ଵ Ԣሻଶ ൅ ‫ ڮ‬൅ ሺ‫ݔ‬௡ െ  ‫ݔ‬௡ Ԣሻଶ

Eq. (3)
‘n’ is the number of dimensions, x and x’ are sample points,
this is used to identify the k closet points.
Finally, the input value x is assigned to class including the
largest probability.

ͳ
ܲሺ‫ ݕ‬ൌ ݆ȁܺ ൌ ‫ݔ‬ሻ ൌ ෍ ‫ܫ‬ሺ› ሺ௜ሻ ൌ ݆ሻ Fig.1. Overview of the process
݇
௜‫א‬஺

Eq. (4)

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 458

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

This work starts by collecting the weather data of an


area and then processing that data for data pre-processing
which includes few steps such as required libraries are
imported after importing the weather dataset thereafter have
to clear the missing data, noisy data through few techniques
and splitting the dataset as training and testing data for scaling
the result. Now, apply the three algorithms to the same
weather dataset individually to know the best accuracy of the
weather prediction.
As the decision tree algorithm is applied for the data
it divides the data in to various classes depending on the
certain basis of the given data temperatures and produces the
final weather prediction result as hot, cold, and rainy.
Furthermore, the random forest algorithm is applied for the
same weather dataset and it can handle the large dataset very
easily because it is the combination of many decision trees.
This algorithm utilizes bagging as well as feature randomness
while constructing the individual trees for an uncorrelated
forest whose prediction will be more accurate when
compared to the individual trees and after observing all the
produced bootstrap samples it calculates the average single
tree prediction for providing the weather prediction. In
addition, to observe the accuracy of other algorithm we used
k- nearest neighbor with neural networks which calculates the
distance of all the points of data from the initial k-point for
finding any coincidence in between the points and after
observing or calculating all the points from input layer to
hidden layers it provides the weather prediction result as
output layer.

V. RESULTS Fig.2. Decision Tree Result

A. DECISION TREE ALGORITHM:


B. RANDOM FOREST ALGORITHM:
Initially, here we considered a weather dataset and the dataset
is read and pre-processed. Now select a sample of data and The above result explains the random forest output
this part of dataset is trained then the data apart from the which is showing the prediction results individually
sample is to test the algorithm here number of dimensions of depending on the given weather dataset. The y-axis
both test data and train data are taken into account. Now apply represents the error values and x-axis represents the n-tree
r-part to terminate the growth of a decision tree and limit its values where the n-tree value is given 500 trees as default.
depth. Now here plot the decision tree constraints in tree Prediction values are provided as three categories where hot
structure with various nodes and leaf nodes classifying is signified with green colour in the graph, cold as black and
various conditions with asset of queries to differentiate the rain as red colour depending on the attributes used in the
data and classify it. Also to evaluate certain constraints such dataset. These results are given with the error values are
as temperature, humidity or precipitation values. Here to compared with the accuracy because if the error value
obtain and analyse the mechanism the temperature values are decreases the accuracy value will increase and vice versa.
predicted and the decision tree is obtained for all the subset According to the graph we can say that error value is high for
of temperature values such as hot, cold and rain. And the hot result and almost same for both the rain and cold values
dataset discriminates the data based on predicted temperature but as the bootstrap samples of trees increases to 500 we can
variables and construct the tree. observe that the error value is reduced closer to zero. Coming
to the process to produce this result first the required libraries
are installed such as “random forest”, party, mice, VIM,
lattice, caret etc. After completion of data reading and pre-
processing, the data modelling is done by applying
“output.forest” function as “rf” which indicates the random

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 459

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

forest function for the attributes Maxtemp, Mintemp,


Maxcold etc., in the dataset.

Fig.4. K-NN Plot.

Fig.3. Random Forest Result.

C. K-NN USING NEURAL NETWORKS ALGORITHM:


This neural network plot shows the prediction result
produced by the k-nn algorithm for 150 objects and few
attributes like Maxtemp, Mintemp and so on including the
one hidden layer and output layer. The k-nn calculates the
distance in between all the points through Euclidian distance
formula and show it as neural network so while installing the
libraries neural net is the important one among other libraries
and to give more detailed data few functions are used such as
head, tail, length, str and its summary. Thereafter, the sample
size is set for n-rows of the data and divided the data as
training, testing for applying “nn” function which indicates
the neural net function. The hidden layers can be mentioned
in the neural net function and to show the clear result, here
we used only single hidden layer and produced the predicted
result as well as we can see that the error is 0.03 on using this
algorithm. The fig.4 represents the overall prediction value of
the output while the fig.5 shows all the variable of the dataset
and distance calculated values with the hidden layer to
provide the output layer.

Fig.5. K-NN using Neural Networks Plot.

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 460

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

Table 2. Comparison of results.


VII. REFERENCES
SI.NO Result Decision Random K-NN [1] Analysis of Weather Prediction Using Machine
s Tree Forest Learning & Big Data 2018 International Conference
1. Error 0.48437 0.025 0.03551 on Advances in Computing and Communication
values 1 Engineering (ICACCE- 2018) Paris, France 22-23
June 2018, 978-1-5386-4485-0/18.
2. Accura 55% 80% 61.2021 [2] Survey on Weather Forecasting Using Data Mining,
-cy 9% Proc. IEEE Conference on Emerging Devices and
3. Accura 0.55 0.80 0.61 Smart Systems (ICEDSS 2018) 2-3 March 2018, 978-
1-5386-3479-0/18.
-cy
[3] Weather Forecasting Using Machine Learning
score Algorithm, 978-1-5386-9436-7/19, 7-9 March 2019.
4. Data Handles Handles Handles [4] Weather Forecasting Using Artificial Neural Network,
handlin minimu- large data minimu Proceedings of the 2nd International Conference on
Inventive Communication and Computational
-g m data m data
Technologies (ICICCT 2018) IEEE, 978-1-5386-
5. Benefit Classifie Solves Reads 1974-2/18.
-s -s the overfittin all data [5] Rainfall Prediction based on Deep Neural Network: A
decision g and Review, Proceedings of the Second International
easily problem. finds the Conference on Innovative Mechanisms for Industry
Applications (ICIMIA 2020) IEEE.
best [6] Rainfall Forecasting in Bandung Regency using C4.5
probabili Algorithm, 2018 6th International Conference on
ty Information and Communication Technology.
results [7] Haze weather recognition based on multiple features
and Random forest, 2018 International Conference on
out of it Security, Pattern Analysis, and Cybernetics (SPAC).
[8] Dynamic Line Rating Using Numerical Weather
Prediction and machine learning, 2017, IEEE.
Table 3. Decision Tree Values of data-test for [9] Comparative Analysis of Temperature Prediction
split [ , 5]. using Regression Methods and Back Propagation
Neural Networks Survey on Weather Forecasting
Predict- p hot cold rain Using Data Mining, Proc. IEEE Conference on
Emerging Devices and Smart Systems (ICEDSS
hot 0 0 3 2018) 2-3 March 2018, 978-1-5386-3479-0/18.
cold 7 7 1 [10] Weather Analysis to predict rice cultivation time using
rain 0 0 17 multiple linear regression to escalate farmers
exchange rate
[11] 2017 International Conference on Advanced
Informatics, Concepts, Theory and Applications.
Weather prediction based on fuzzy logic algorithm for
supporting general farming automation system, 2017
VI. CONCLUSION 5th International Conference on Instrumentation,
To conclude, our paper has performed the weather Control, and Automation (ICA).
prediction using machine learning algorithms to classify whether it [12] A Hadoop based weather prediction model for
is a hot, cold or rainy day. We utilized various machine learning classification of weather data, 2017 Second
algorithms such as Decision Tree, Random Forest and K-NN with International Conference on Electrical, Computer and
Neural Networks for the prediction and compared the prediction Communication Technologies (ICECCT).
accuracy of those algorithms individually for the same weather
[13] A Quick Review of Machine Learning Algorithms,
dataset with few attributes like maximum temperature, minimum
2019, International Conference on Machine Learning,
temperature, and maximum cold and so on. The main reason of
taking three algorithms is to test which algorithm performs well Big Data, Cloud and Parallel Computing.
regarding the weather forecast. From the above results we can say [14] Long-time Prediction of Climate-weather Change
that Random Forest algorithm provides the best accuracy out of Influence on Critical Infrastructure Safety and
them. Resilience, 2018 IEEE International Conference on
All these algorithms comes under supervised learning Industrial Engineering and Engineering Management
because of classification process and work well with the training (IEEM).
data than the testing data but may not predict the other categories [15] Study of prediction algorithms for selecting
like temperature percentage, outlook and rainfall amount. So, we appropriate classifier in machine learning.( Journal of
cannot suggest this for upcoming years or for any weather changes Advanced Research in Dynamical and Control
of various areas and that is why we are planning to forecast other Systems, 9(Special Issue 18), 257-268. 2017)
different weather condition by adding more attributes with other
algorithms like SVM, Naïve Bayes and ANN.

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 461

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.

You might also like