Pavuluri 2020
Pavuluri 2020
Abstract— Weather prediction is gaining up ubiquity quickly the normal temperature of the day whether it is hot, cold or
in the current period of Machine learning and Technologies. It rainy by using three following algorithms:
is fundamental to foresee the temperature of the climate for
quite a while. Decision trees, K-NN, Random Forest algorithms
A. Decision Tress Algorithm:
are an integral asset which has been utilized in several Decision trees models are regularly utilized to
prediction works for instance, flood prediction, storm detection examine the dataset and actuate the tree and its process is
etc. In this paper, a simple approach for weather prediction of used for predictions. To build the decision tree there are few
future years by utilizing the past data analysis is proposed by various algorithm such as Classification and Regression
the decision tree, K-NN and random forest algorithm Trees (CART), C5.0 and ID3 etc. But in this work we used
calculations and showing the best accuracy result of these three classification using ID3 for weather prediction. Generally,
algorithms. Weather prediction plays a significant job in the tree contains branch nodes and each node significates the
everyday applications and in this paper the prediction is done decision between various other options, and final leaf node
based on the temperature changes of the certain area. All these signifies the decision. According to the algorithm tree
algorithms calculate the mean values, median, confidence produces two or more branches for each node and when it
values, probability and show the difference between plots of all produces two branches it is known as binary tree or multiway
the three algorithms etc. Finally, using these algorithms in this tree for more branches. In this paper the classification tree
work we can predict whether the temperature increases or predicts the weather values for meteorological information of
decreases, is it a rainy day or not. The dataset is completely
all months in 2016 and 2017.
based on the weather of certain area including few objects like
year, month, and temperature, predicted values and so on. B. Random Forest Algorithm:
Random Forest method is the most popularly used
Keywords— Decision Tree Algorithm, Random Forest algorithm in many fields like medical, power industries,
Algorithm, K-NN Algorithm, Classification, Weather Prediction,
identifying climate changes and weather predictions etc. It is
Hot, Cold, Rain.
the best algorithm to control the high- dimensional data for
both classification and regression. The combination of many
I. INTRODUCTION individual decision trees as training data and testing sample
Utilizing the right algorithm for the future values is the random forest. Which gives the average single
predictions is viral these days. That’s the reason we done weather prediction by the output of all individual trees. This
work on weather forecast too. Machine learning algorithms bagging process helps in improving stability, solving
provide accurate results for predicting climate such as overfitting problem of the large dataset, decreasing the
outlook, humidity, rainfall, temperature, floods and storms. variance and helps in giving best accuracy results.
This part is colossally reliant on past information and man- C. K-Nearest Neighbours Using Neural Networks
made consciousness. Foreseeing the future climate Algorithm:
additionally causes us to settle on choices in crops,
This algorithm is used to identify the nearer values
international sports as well as numerous parts of human lives.
for weather prediction by utilizing the training dataset and it
The main reason of taking three algorithms is to test which
depends on feature similarity for the prediction values of new
algorithm performs well regarding the weather forecast and
data. So, first the K value is initialized and then the distance
after observing many reference papers we observed that they
is calculated for all the point through Euclidian distance
performed mostly using one or two algorithms with few
formula (dist((x,y),(a,b)) = square root(x-a)^2 + (y-b)^2) in
weather objects that’s reason we are motivated to analyze
between the input and training tests. On repeating this process
three machine learning algorithms on one purpose by adding
several times, we get the collection of nearest neighbors of
more variable to weather dataset. So, we decided to anticipate
the given weather dataset which will be utilized to predict the
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
weather. Finally, for the best representation of K-NN III. PROPOSED WORK
prediction we add the neural network to algorithm so that the We used weather dataset which is taken from the
initial point can be easily identified and remaining as hidden Kaggle website because this website provides many various
point in the hidden layers leading to the output points as
datasets and is popular source for datasets. In the weather
prediction points.
dataset there are 19 variables and 3655 objects for prediction.
Some features of the dataset are:
II. LITERATURE SURVEY
The review of previous studies regarding weather 1. Minimum Frequency
prediction are explained with the motivation of the proposed 2. 1st Quadratic
work. 3. Median
Paper [1]: Analysis of weather prediction using Machine 4. Mean
Learning & Big Data. 5. 3rd Quadratic
Techniques: Linear Regression and Support Vector 6. Maximum Frequency
Machine. 7. Centroid
Performance Analysis: This paper illustrated, how to 8. Events
predict the weather of next 5 days using linear regression and
SVM machine learning algorithms. In the end results are The target value of this dataset is events which contain
measured and confusion matrix for accurate prediction is values like hot, cold and rain without missing values in
given using Big Data. between the data. So, the dataset consists of 1084 cold
Paper [2]: Survey on Weather Forecasting Using Data records, 1388 hot records and 1183 rain records for predicting
Mining. the weather.
Technique: ANN, SVM, Naïve Bayes, Decision Tree
classification algorithms. A. Decision tree algorithm process:
Performance Analysis: The purpose of this paper is to do
survey the various methods and algorithms used for weather Decision tree algorithm is a tree which represents
prediction in data mining field. nodes as variables, branches as decision rule and leaf nodes
Paper [3]: Weather Forecasting Using Machine Learning as outcomes of the dataset. There are different algorithms
Algorithm. used to produce decision tree from data are: Classification
Techniques: Random Forest Classification algorithm, and regression tree CART, ID3, CHAID, ID 4.5.As we used
Raspberry Pi 3 B model and Python language. CART algorithm for the dataset it uses Gini index to represent
Performance Analysis: To forecast the weather a system is metric. According to the weather data set the target function
prepared using Raspberry Pi and python. This project is to is to predict the events (hot, cold or rain) based on the weather
develop a less cost and efficient weather prediction values. From the data, maxtemp, mintemp, maxcold etc are
application using machine learning. the variables of the data. First to build the tree Gini index is
Paper [4]: Weather Forecasting Using Artificial Neural calculated for all the features of dataset.
Network.
Techniques: ANN, LSTM, Recurrent Neural Network.
Performance Analysis: In this paper trained the neural
݅݊݅ܩൌ ͳ െ ሺ ሻଶ
network with weather parameters and utilized the LSTM
ୀଵ
algorithm to gather weather information. After testing the
data the weather is predicted through the developed model. Eq. (1)
Paper [5]: Rainfall Prediction based on Deep Neural Pi is the proportion of samples that related to class ‘c’ for a
Network: A Review. particular node.
Techniques: Deep Neural Network model with optimization. Gini process value of the dataset is 0.681025.
Performance Analysis: After completion of data pre- We can also use the entropy formula for determining the no-
processing and feature extraction the model parameter des of decision tree.
optimization was done to compare the machine learning
algorithms performance. The Adam optimizer is used in
ݕݎݐ݊ܧൌ െ ݈݃ כଶ ሺ ሻ
optimization and showed result as deep neural network works
ୀଵ
well than machine learning algorithms for weather
prediction. Eq. (2)
This is entropy is for all non-empty classes where p≠0 and if
the samples at a node belong to the same class the entropy v
alues ‘0’.
Entropy process value of the dataset is 1.72352 and it utilize
s the probability of a particular output to make decision on h
ow the nodes should be branched. Furthermore, it is a bit dif
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
ferent from Gini index because it is having more mathematic Conditional expectations of X and y points, ‘k’ represents th
al intensive as there is logarithmic function is used in its calc e samples.
ulation. After calculating all the features of the data we get t Different variables will have various scaling unites so we do
he final outcome of the weather data as decision tree and we normalization for each variable of the data using (x-
add the prediction function to predict the test values. min(x))/(min(x) - max(x)) to convert all into values between
0 and 1 and is easy to plot. This algorithm will perform well
with the numeric variables because it is dealing with the
B. Random Forest Algorithm:
distance.
Random Forest algorithm is the group of different Now, the dataset is splinted into training as well as testing sets
decision trees and it combines the decisions of all the decision and the k-nn function is applied for the target category to
trees to figure out the prediction value, which signifies the predict the weather. Finally, the accuracy of prediction is done
single average of all the decision trees. This algorithm solves to divide the right predictions of all the remaining predictions.
the overfitting problem of the data and is fast to train the data Æ accuracy (tab)
with test data. The bootstrap samples were produced as [1] 61.20219
individual decision tree. So, we apply the decision tree
algorithm process and add the random forest function to the
dataset as (“output.forest”) for results. We took 500 decision
trees which are processed individually to classify the IV. FLOW CHART
prediction (single average prediction). The following are
random forest confusion values of the weather dataset:
Eq. (3)
‘n’ is the number of dimensions, x and x’ are sample points,
this is used to identify the k closet points.
Finally, the input value x is assigned to class including the
largest probability.
ͳ
ܲሺ ݕൌ ݆ȁܺ ൌ ݔሻ ൌ ܫሺ ሺሻ ൌ ݆ሻ Fig.1. Overview of the process
݇
א
Eq. (4)
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 05,2020 at 07:16:35 UTC from IEEE Xplore. Restrictions apply.