0% found this document useful (0 votes)
15 views6 pages

Analysis of Weather Prediction Using

The document discusses the use of machine learning and big data techniques for weather prediction, focusing on the challenges posed by changing climate conditions in India. It highlights the application of statistical methods like linear regression and support vector machines to analyze historical weather data and improve forecasting accuracy. The study emphasizes the importance of data mining and big data analytics in extracting valuable insights for effective weather prediction and management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Analysis of Weather Prediction Using

The document discusses the use of machine learning and big data techniques for weather prediction, focusing on the challenges posed by changing climate conditions in India. It highlights the application of statistical methods like linear regression and support vector machines to analyze historical weather data and improve forecasting accuracy. The study emphasizes the importance of data mining and big data analytics in extracting valuable insights for effective weather prediction and management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)

Paris, France 22-23 June 2018

Analysis of Weather Prediction using


Machine Learning & Big Data
Shubham Madan1, Praveen Kumar2, Seema Rawat3, Tanupriya Choudhury4
Amity University Noida, Uttar Pradesh, India1,2,3, UPES Dehradun4
[email protected], [email protected], [email protected],[email protected]

Abstract : The whole world is plagued by the expectation of climate and forecasting of the weather.
dynamical clement and their facet, to cut back this Presently and now a days, we the people of India
facet effects up to some extent there are several experiencing changing bad weather, pollution and
techniques and algorithms through which we will their reactions. Typically in horticulture field,
predict the weather on the ready reference along ranchers are confronting numerous issues because of
with respective context of given information from surprising climate conditions. Climate anticipating is
past years example temperature, dew, humidity air straightforwardly rely on the regular particles display
pressure and wind direction, . When doing the noticeable all around like (O3) Ozone, Nitrogen
analysis of existing data from past few years we (NO2) dioxide, (CO2) Carbon Dioxide, (SO2) Sulfur
inculcated the proposed scheme or techniques which dioxide and so on. In this paper we have concentrated
have a tendency to conclude that, machine learning on particular area i.e. Delhi. To decrease these
paradigm and permits us to research the given set of reactions up to some degree there are numerous
knowledge and extract the helpful information from strategies and calculations through which we can
the given dataset, thus so as to grasp the unsteady foresee the climate on the premise of given
patterns of climatic conditions, a prognosticative information. Data mining using machine learning
model is also persuaded. During this paper or procedure is utilized as a part of Weather expectation
scheme, we have a tendency to explore progressive process. Climate is best natural[2] requirement in
statistical linear regression and support vector each period of our human life. So climate anticipating
machine techniques of machine learning that teams’ is going excessively utilized as a part of many fields
constant kind information sets along and to like Food security calamities, Agriculture and science.
prefigure the forecast or weather prediction. Under In prior years we have no correct thought regarding
the proposed scheme we have a tendency to climate conditions. So back then, we confronted
inculcate the augmented algorithmic rule that numerous issues in sustenance administration process,
provides approximate and nearby results to forecast industry and agribusiness[8] field. In any case, now in
the climate for the next 5 days and at the end results the period of progression we have numerous
are calculated on the idea of mathematical and approaches to discover climate conditions. This is the
statistical decision tree and conditions vide explanation for applying information mining
confusion matrix for more appropriate and accurate procedures to locate the climate conditions using Big
forecasting using Big Data. Data and its Eco-System [6] along with machine
learning techniques vide linear regression and support
Keywords: Linear regression, support vector vector machine.
machine, decision tree, confusion matrix, machine
learning, big data. Data mining using machine learning are the way
toward extracting important data from the extensive
informational collection. The procedure of
1. Introduction concentrate important data portrayed[6] as
Big Data contains tremendous and mammoth information revelation that can be connected on any
information in the organized, semi-organized and extensive informational index. The primary data
unstructured manner. That is the reason it is extremely mining systems using machine learning are
hard to process, oversee and store to this kind of Classification, Clustering, Association and
information. As of overdue extraordinary sorts of Regression. The distinctive Data digging methods
mechanism, techniques and procedures are there to utilized for taking care of climate changing and
deal with Big Data. Data mining[3] using machine measuring issue. Climate measuring issue incorporate
learning is one of them which we have utilized as a expectation[7] of temperature, rain, mist, winds, and
part of this paper to oversee climate related storm and so forth. Climate sensors gather
information and predict the forecast and certain information consistently at numerous areas and
condition of future weather. Under this scheme we assemble tremendous information. Climate
suggest that how to utilized the data mining and in anticipating is dependably a major test since it is
order retrieval of data using machine learning in the difficult to foresee the condition of the air for the
forthcoming future since atmosphere dataset is

978-1-5386-4485-0/18/$31.00 ©2018 IEEE


259
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)
Paris, France 22-23 June 2018

capricious and again day to day changes as indicated authorized data is used for deciding for legitimate
by worldwide atmosphere changes in context to past choices.
scenarios. The information utilized is from the INDIA • Variability: the data streams may be greatly
METEOROLOGICAL DEPARTMENT (IMD), the incompatible with discontinuous peaks, frequent
arrangement of dataset bolsters a rich arrangement of and occasion triggered peak data burdens can be
meteorological components, which are great trying to oversee, especially with the inclusion of
contender for investigation with huge information unstructured data.
since it is semi-organized and record situated. The • Volatility: When maintenance period lapses, we
term Big Data came around 2005, which implies can without much of a stretch crush it.
datasets that are tremendous, moreover high in • Visualization: implies complex charts that can
collection and speed, which makes them difficult to incorporate a few factors of information while as
process using ordinary devices and frameworks. Huge yet staying justifiable and lucid
information made colossal[4] business and social • Value It has a low-esteem thickness because of
open doors in each field, empowering the revelation extricating an incentive from monstrous
of beforehand shrouded designs and the advancement information. Helpful information should be
of new bits of knowledge to decide, running from web separated from any information write and from a
hunt to content proposal and computational scenarios. colossal measure of information.
The term Big Data is presently utilized wherever in
our everyday life and it is a present innovation and
2. Related Work
furthermore going to manage the world in future and
has risen on the grounds that individuals and diverse Related works included a wide range of and
organizations makes expanding utilization of fascinating systems to attempt to perform climate
information concentrated advancements. Huge figures. While a lot of current determining innovation
information sizes are right now extending from a includes reenactments in light of material science and
Terabyte to Zettabyte in a solitary informational differential conditions, numerous new methodologies
collection. Like the physical universe, the advanced from computerized reasoning utilized essentially
universe is huge. As per look into led by IDC, from machine learning strategies, generally neural systems
2005 to 2020, the advanced universe will develop while others used models which had a probabilistic
from 130 Exabytes to 40,000 Exabyte's, or 40 trillion approach, for example, Bayesian systems. From 3
gigabytes. From now, the advanced universe will papers on climatic expectation from machine learning
about twofold at regular intervals until 2020. As we inspected, 2 of 3 utilized neural systems while one
expressed by IBM, with machine-to-machine (M2M) utilized help vector machines. The most noticeable
correspondences, on the web/portable informal[10] machine learning model is using neural systems for
communities and unavoidable handheld gadgets it determining climate on account of the capacity to
makes 2.5 quintillion bytes of information in every catch the indirect conditions of previous climate
day. patterns and approaching climate setting, dissimilar to
the straight relapse and practical relapse models that
Attributes of Big data– Big Data has copious we utilized. This gives the upside of not accepting
attributes detailed by n V's qualities. Collection of V's basic direct conditions of all highlights over our
characteristics of the Big Data were gathered from models. Approaches using neural systems , one [3]
numerous scientist's productions to have Nine V's utilized a mixture demonstrate that utilized neural
characteristics (9V's attributes). These 9V's qualities systems to show the material science behind climate
are: estimating while the other [4] connected adapting all
• Veracity: Enormous Data veracity alludes to the the more specifically to anticipating climate
inclinations, commotion, and irregularity in conditions. Likewise, the approach utilizing bolster
information.. vector machines [6] additionally connected the
• Variety: Organized, semi-organized, and classifier straightforwardly for climate forecast yet
unstructured information other than content and was more restricted in scope than the neural system
more information composes have risen, for approaches. Different methodologies for climate
example, record, log, sound, and half and half guaging included utilizing Bayesian systems. One
information. intriguing model [2] utilized Bayesian systems to
• Velocity: The developed or made data at a model and make climate expectations however
speedier pace than some time recently, in which utilized a machine learning calculation to locate the
the distinctive mediums of Big Data increment most ideal Bayesian systems and parameters which
the yield matter. was computationally costly due to the substantial
• Volume: the measure of information is known as measure of various conditions yet performed
volume of information, where the measure of extremely well. Another approach [1] concentrated on
information keeps on detonating. a more particular instance of anticipating extreme
• Validity: the correct data or information that is climate for a particular topographical area which
exact for the utilize plan. Most probably,

260
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)
Paris, France 22-23 June 2018

Table 1: Sample data showing the 5 features.

restricted the requirement for calibrating Bayesian on database framework includes that empower deft
system conditions however was constrained in scope. outline and adaptable calculation improvement
utilizing both SQL and Map Reduce interfaces over
1.1 Hadoop an assortment of capacity instruments.
Hadoop is generally utilized as a part of enormous C. Brian Dolan display the outline rationality,
information apps, e.g., spam separating, organize methods and experience giving MAD examination to
looking, clickstream investigation, and social one of the world's biggest promoting systems at Fox
suggestion. Few illustrative cases are underneath. As Audience Network, utilizing the Green plum parallel
proclaimed, Hadoop is run by Yahoo in many servers database framework. We depict database plan
for helping items in administration at 4 server farms, approachs that help the light-footed working style of
e.g. searching and spam separating, and so on. At examiners in these settings.
introduce, the greatest Hadoop bunch has around four
thousand hubs, yet the quantity of hubs will be D. R. P. Singh clarify why a cloud-based arrangement
expanded to around ten thousand with the arrival of is required, depict our model usage, and investigate
Hadoop 2.0. Around the same time, Facebook some case applications we have executed that show
reported that their Hadoop bunch can process 100 PB individual information[11] proprietorship, control,
information, which developed by 0.5 PB for every and examination. He address these issues by outlining
day as in November 2012. Some outstanding offices and executing a cloud-based engineering that
that utilization Hadoop to lead appropriated furnishes buyers with quick access and fine-grained
calculation are recorded in [13]. What's more, control over their utilization information, and also the
numerous organizations give Hadoop business capacity To break down this information with
execution as well as help, including Cloudera, IBM, calculations of their picking, including outsider
MapR, EMC, and Oracle. As indicated by the Gartner applications that investigate that information in a
Research, Bigdata Analytics is a slanting subject in protection saving style.
2014 [14]. Hadoop is an open system generally E. Jeffrey Dean depicts the essential programming
utilized for Bigdata Analytics. MapReduce is a model and gives a few cases. Many ware machines
programming worldview related with the Hadoop. are run using Map and Reduce: numerous terabyte of
data is formed due to Map Reduce calculations on a
2 Literature Survey huge number [7] of machines
A. Adamu Galadima portrays a short take a gander at F. Panagiotis D. Diamantoulakis finds the usage of
the Arduino microcontroller and some of its the Data Analytics in field like Smart Grid for
applications and how it can be utilized as a part of management of energy dynamically. There is a 2 way
learning. Arduino is an open source microcontroller flow among suppliers and consumers of power and
utilized as a part of electronic prototyping. Arduino data for optimizing power for economic efficiency,
equipment and its segments might be taken a gander security, tenability. DEM or dynamic energy
at. Programming and the Environment that Arduino management is promoted by this infrastructure for
keeps running on are both taken a gander at as well. A consumers and producers of micro energy . Reduction
few applications will be taken as illustrations that can of cost of power by user participation is an important
help make learning Arduino additionally fascinating. part.
This can be utilized as a noteworthy method to urge
understudies and others to take in more about gadgets G. L. Aniello investigates the possibility of a structure
and programming. utilizing various information sources to enhance
assurance capacities of CIs. Difficulties and openings
B. Jeffrey Cohen display information parallel are examined along three fundamental research
calculations for advanced factual systems, with an bearings: I) utilization of particular and heterogeneous
emphasis on thickness strategies. At last, he responds information sources, ii) checking with versatile

261
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)
Paris, France 22-23 June 2018

granularity, and iii) assault demonstrating and runtime


mix of various information examination procedures.
4. Proposed Work
which we minimize subject to:
The most outrageous temperature, slightest
temperature, mean clamminess, mean barometrical
weight, and atmosphere gathering for consistently
from year 1996 to 2017 for Delhi, India were gained
from Weather Department website. [10][11]
Primitively, there are 9 atmosphere orders: clear,
scattered fogs, to some degree shady, generally shady, The work flow model of the proposed scheme is as
dimness, overcast, rain, tempest, and snow. Since an under vide Figure 1 :
extensive parcel of these requests are practically
identical and some are meagrely populated, these
were diminished to four atmosphere groupings by
joining scattered fogs and not entirely shady into
sensibly shady; generally shady, foggy, and shady
into extraordinarily shady; and rain, tempest, and
precipitation instead of snow. Past years data were
used to set up the counts, and the latest years data
acted like test set and the alluded data for 1st month
using the table 1 depicted parameters.

Table 1 : Parameters for Regression and Classification


4. Design and Implementation
The essential count which is used was straight
backslide, that tries to suspect the temperature which To gain the desired goals and results in proposed
is high and low as an immediate blend of the scheme the probalastic scenarios i.e. linear regression
attributes. Since straight backslide can't be used with and SVM have be used via Big Data MapReduce.
gathering data, this computation did not use the The below steps depicts the workflow and
atmosphere course of action of consistently [13]. As implementation of proposed scheme.
needs be, just 8 attributes are utilized: the best
temperature, minimum temperature, mean moistness, Step 1. Map Reduce using Big Data (Hadoop)
and mean climatic weight for each of the past two
days. In this way, i-th join of consistent days, x (I) ∈
R9 is a 9 dimension component, where x0 = 1 is
portrayed as the square term. Let y (I) ∈ R 14 imply
the 14-dimensional vector that contains these sums for
the I-th match of progressive days utilizing direct
relapse and further utilizing help vector machine
arrangement limit the blunder work utilizing:

subject to the constraints:

For this type of SVM the error function is:

262
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)
Paris, France 22-23 June 2018

Step 2 Linear Regression :


Results
In totality, dataset was obtained from 6100 to 7800
(records from meteorological department is obtained
for regression) for at slightest seven attributes are
regressed by which waning combinations were
calibrated under this scheme. In erstwhile expression,
of the 7800 rows forming the data cluster is formerly
selected for use in this study below depicts the linear
regression model.
The righteousness of fit [7] character for the model
calibrations are obtainable in below equation, and the
calibrated coefficients are shown in table 4. However
presents standard error (Se) calculated as:-

5. Conclusion and Future Work.


Both machine learning algorithms using hadoop lead
realistic perfection were outflanked by proficient
climate or weather determining directions or
forecasting, in spite of the fact that the error in their
execution diminished altogether for later days approx
next 5 days, demonstrating that over longer
timeframes, our models may beat proficient ones.
Direct relapse turned out to be a low inclination, high
change display while useful relapse ended up being to
Step 3 Support Vector Machine :
be a high predisposition, low difference demonstrate.

263
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Advances in Computing and Communication Engineering (ICACCE-2018)
Paris, France 22-23 June 2018

Results are intrinsically a high and accurate as [10] Weather.com,http://www.weather.com


demonstrated as it is steady for exceptions and [11] Kadambari, Sanchita, Seema Rawat, and Praveen
forecasting, so one approach to enhance the straight Kumar. "A Comprehensive Study on Big Data and Its
relapse show is by accumulation of more information Future Opportunities." In Proceedings of the 2014
using linear regression and SVM. Showing that the Fourth International Conference on Advanced
decision of model was efficient and effective that its Computing & Communication Technologies, pp. 277-
expectations can be enhanced by promote 281. IEEE Computer Society, 2014.
accumulation of information under the proposed [12] Gupta, Subham Kumar, Seema Rawat, and
scheme. For future scope the same can be Praveen Kumar. "A novel based security architecture
incorporated over apache spark for concurrent of cloud computing." In Reliability, Infocom
prediction of weather whereas the same can be Technologies and Optimization (ICRITO)(Trends and
compare with the results obtained from sensors. Future Directions), 2014 3rd International Conference
on, pp. 1-6. IEEE, 2014.
References [13] Wiki (2013). Applications and organizations
using hadoop.
[1] Abramson, Bruce, et al. ”Hailfinder: A Bayesian
http://wiki.apache.org/hadoop/PoweredBy
system for forecasting severe weather.”International
[14] Gartner Research Cycle 2014,
Journal of Forecasting12.1 (1996): 57-71.
http://www.gartner .com
[2] Cofno, Antonio S., et al. ”Bayesian networks for
[15] K. Morton, M. Balazinska and D. Grossman,
probabilistic weather prediction.”15th Eureopean
“Paratimer: a progress indicator for MapReduce
Conference on Artificial Intelligence (ECAI). 2002.
DAGs”, In Proceedings of the 2010 international
[3] Krasnopolsky, Vladimir M., and Michael S.
conference on Management of data, 2010, pp.507–
FoxRabinovitz. ”Complex hybrid models combining
518.
deterministic and machine learning components for
[16] Lu, Wei, et al. “Efficient processing of k nearest
numerical climate modeling and weather
neighbor joins using MapReduce”, Proceedings of the
prediction.”Neural Networks19.2 (2006): 122-134.
VLDB Endowment, Vol. 5, NO. 10, 2012, pp. 1016-
[4] Lai, Loi Lei, et al. ”Intelligent weather
1027.
forecast.”Machine Learning and Cybernetics, 2004.
[17] J. Dean and S. Ghemawat, “Mapreduce: simplied
Proceedings of 2004 International Conference on.
data processing on large clusters”, in OSDI 2004:
Vol. 7. IEEE, 2004.
Proceedings of 6th Symposium on Operating System
[5] Ng, Andrew. ”CS229 Lecture Notes Supervised
Design and Implementation.
Learning” 2016.
[6] Radhika, Y., and M. Shashi. ”Atmospheric
temperature prediction using support vector
machines.”International Journal of Computer Theory
and Engineering1.1 (2009): 55.
[7] ”Stanford, CA” in Weather Underground, The
Weather Company, 2016. [Online]. Available:
https://www.wunderground.com/us/ca/paloalto/zmw:
94305.1.99999. Accessed: Nov 20, 2016.
[8] Gupta, Subham Kumar, Seema Rawat, and
Praveen Kumar. "A novel based security architecture
of cloud computing." In Reliability, Infocom
Technologies and Optimization (ICRITO)(Trends and
Future Directions), 2014 3rd International Conference
on, pp. 1-6. IEEE, 2014.
[9] Saini, Parag, Tanupriya Choudhury, Praveen
Kumar, and Seema Rawat. "Proposal and
implementation of a novel scheme for image and
emotion recognition using Hadoop." In Smart
Technologies For Smart Nation (SmartTechCon),
2017 International Conference On, pp. 1358-1363.
IEEE, 2017.

264
Authorized licensed use limited to: BRAC UNIVERSITY. Downloaded on March 24,2025 at 09:38:56 UTC from IEEE Xplore. Restrictions apply.

You might also like