Data Science and Interdisciplinary Research: Recent Trends and Applications
()
About this ebook
Pradeep Kumar Singh
Pradeep Kumar Singh is Assistant Professor at the Department of Computer Science & Engineering, Jaypee University of Information Technology, India. His research focuses on smart nanosensors for communication technologies.
Read more from Pradeep Kumar Singh
IoT-enabled Sensor Networks: Architecture, Methodologies, Security, and Futuristic Applications Rating: 0 out of 5 stars0 ratingsContainers in OpenStack: Leverage OpenStack services to make the most of Docker, Kubernetes and Mesos Rating: 0 out of 5 stars0 ratings
Related to Data Science and Interdisciplinary Research
Related ebooks
Data Science and Interdisciplinary Research: Recent Trends and Applications Rating: 0 out of 5 stars0 ratingsArtificial Intelligence and Natural Algorithms Rating: 0 out of 5 stars0 ratingsMachine Learning Methods for Engineering Application Development Rating: 0 out of 5 stars0 ratingsIntelligent Technologies for Automated Electronic Systems Rating: 0 out of 5 stars0 ratingsApplied Machine Learning and Multi-criteria Decision-making in Healthcare Rating: 0 out of 5 stars0 ratingsGreen Industrial Applications of Artificial Intelligence and Internet of Things Rating: 0 out of 5 stars0 ratingsAdvanced Technologies for Realizing Sustainable Development Goals 5G, AI, Big Data, Blockchain and Industry 4.0 Applications Rating: 0 out of 5 stars0 ratingsIntelligent Technologies for Research and Engineering Rating: 0 out of 5 stars0 ratingsEmerging Technologies for Digital Infrastructure Development Rating: 0 out of 5 stars0 ratingsComputational Intelligence and Machine Learning Approaches in Biomedical Engineering and Health Care Systems Rating: 0 out of 5 stars0 ratingsExploration of Artificial Intelligence and Blockchain Technology in Smart and Secure Healthcare Rating: 0 out of 5 stars0 ratingsMobile Computing Solutions for Healthcare Systems Rating: 0 out of 5 stars0 ratingsAdvanced Mathematical Applications in Data Science Rating: 0 out of 5 stars0 ratingsFuturistic Projects in Energy and Automation Sectors: A Brief Review of New Technologies Driving Sustainable Development Rating: 0 out of 5 stars0 ratings6G Wireless Communications and Mobile Networking Rating: 0 out of 5 stars0 ratingsMachine Intelligence for Internet of Medical Things: Applications and Future Trends Rating: 0 out of 5 stars0 ratingsMining Over Air: Wireless Communication Networks Analytics Rating: 0 out of 5 stars0 ratingsElectricity Markets: New Players and Pricing Uncertainties Rating: 0 out of 5 stars0 ratingsAI and IoT-based intelligent Health Care & Sanitation Rating: 0 out of 5 stars0 ratingsNanoelectronics Devices: Design, Materials, and Applications (Part I) Rating: 0 out of 5 stars0 ratingsArtificial Intelligence, Machine Learning and User Interface Design Rating: 0 out of 5 stars0 ratingsNext Generation Sequencing and Sequence Assembly: Methodologies and Algorithms Rating: 0 out of 5 stars0 ratingsA Practitioner's Approach for Problem-Solving using AI Rating: 0 out of 5 stars0 ratingsEmbedded Deep Learning: Algorithms, Architectures and Circuits for Always-on Neural Network Processing Rating: 0 out of 5 stars0 ratingsArtificial Intelligence and Knowledge Processing: Methods and Applications Rating: 0 out of 5 stars0 ratingsRecent Developments in Artificial Intelligence and Communication Technologies Rating: 0 out of 5 stars0 ratingsChallenges and Opportunities for Deep Learning Applications in Industry 4.0 Rating: 0 out of 5 stars0 ratingsAutomatic Target Recognition: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsAutomatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition Rating: 0 out of 5 stars0 ratingsNanoelectronics Devices: Design, Materials, and Applications (Part II) Rating: 0 out of 5 stars0 ratings
Computers For You
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsThe Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5ChatGPT 4 $10,000 per Month #1 Beginners Guide to Make Money Online Generated by Artificial Intelligence Rating: 0 out of 5 stars0 ratingsEverybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling Rating: 0 out of 5 stars0 ratingsLearning the Chess Openings Rating: 5 out of 5 stars5/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsPeople Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5
Reviews for Data Science and Interdisciplinary Research
0 ratings0 reviews
Book preview
Data Science and Interdisciplinary Research - Pradeep Kumar Singh
A Comprehensive Study and Analysis on Prediction of Rainfall Across Multiple Countries using Machine Learning
C. Kishor Kumar Reddy¹, *, P.R. Anisha¹, Nguyen Gia Nhu²
¹ Stanley College of Engineering and Technology for Women, Hyderabad, India
² Dean, Graduate School, Duy Tan University, Da Nang, Vietnam
Abstract
Rainfall is one of the most considerable natural occurrences, which is important for both human beings and living beings. Since the environment is changing and there is a huge change in weather, it is noted that the rainfall cycles are also varying and the earth’s temperature is increasing day-by-day. The changes in weather conditions like humidity, pressure, wind speed, dew point and temperature affect the agriculture, industry, production, and construction and also lead to floods and land-slides. Hence it is one of the important factors to be noted for human beings to keep track of the natural occurrences in order to survive. In order to overcome these issues, a system is required which is able to forecast and predict the rainfall using statistical techniques which is the most popular tool in modern technology. This paper provides a detailed survey and comparative analysis of various methodologies used in the prediction of rainfall over multiple countries. Comparison is made in terms of various performance measures: accuracy, precision, recall, RMSE, specificity, sensitivity, MAE, F-Measure, ROC and RAE. Further, the drawbacks with existing approaches applied so far in the prediction are discussed.
Keywords: Artificial Neural Networks, Classification Techniques, Decision Trees, Naïve Bayes, Rainfall, Random Forest, SVM.
* Corresponding author C. Kishor Kumar Reddy: Stanley College of Engineering and Technology for Women, Hyderabad, India; E-mail: [email protected].
INTRODUCTION
Weather forecasting has become one of the most serious problems in the world and many researchers, governments, industries, risk management communities and scientific communities are looking into this issue. Weather is a natural and climatic characteristic that affects the daily routine of human activities such as farming in agriculture, production, construction, generation of electricity, forest and many more factors. Weather forecasting is one of the most important factors because if the weather changes, then it may lead to natural calamities such as landslides, volcanoes, earthquakes, hurricanes, in situ., which lead to a lot of loss to society. Therefore, it is suggested to have a proper approach for the prediction of rainfall with which we can take preventive measures for these natural calamities. This forecasting helps in supervising human activities such as agriculture, production, construction, tourism, and many more. This forecasting also helps disaster prevention agencies with proper predictions regarding the weather with which they can take corrective measures and make a decision in order to prevent society from natural calamities.
Weather forecasting has the highest impact on human life and activities. Changes in the weather are rapidly increasing, hence it is crucial to conduct research on weather prediction and provide precipitation data for the prediction of weather timely and give early warning to avoid natural disasters. Rainfall prediction is one of the most crucial areas in the field of weather forecasting. Rainfall is one of the essential occurrences within the climatic system, whose disordered nature has the highest impact on water resources, agricultural and biological systems. Hence rainfall prediction is important for agricultural farming, tourism, navigation, and sailing. The collection of data related to weather has become very easy and sorted and moreover meteorological data can also be collected due to innovation and research in the fields of science and technology. A large amount of meteorological data should be collected for weather forecasting and it is very difficult to attain good accuracy. Science and development have improved in the field of information technology and computers with which researchers are able to analyse large amounts of data using big data analytics, which have found hidden relationships using machine learning techniques.
Numerous natural disasters across the world are correlated to meteorological phenomena. In the most recent years, lots of machine intelligent learning techniques have been proposed to undertake, unlike issues in the world. In this paper, multiple intelligent learning models are discussed for precipitation forecasting across different parts of the earth. It is pragmatic that intelligent learning methods are able to make predictions with a smaller amount of error rate. An additional improvement in machine intelligent learning methods is computational performance, and it executes more rapidly with much lesser computer resources than methods based on differential equations that are presently used in operational centers. Henceforth, it is significant to consider intelligent machine learning models for precipitation forecasts in operational centres as a method to advance prediction quality and decrease computational costs. The utilization of machine intelligent learning methods coupled with meteorological science has encouraged scientists, academicians, and researchers to apply machine intelligent learning methods to classify and predict events in different climatic conditions.
Predictions can be made using several of methods such as machine learning techniques for classification and prediction, soft computing, and complex methodologies such as artificial neural networks (ANN). These are the most common techniques used for the prediction of weather. The functional correlation between data, and the correlation that is not known or difficult can be made using ANN’s since they learn from examples and are based on self-adaptive mechanism. Problems that are complex can be solved using deep learning techniques. Deep learning refers to a series of several multilayers that are trained using unsupervised methodologies. Learning dense, valid and non-linear data using unsupervised methodologies results in knowledge development and we can be able to predict new data. The above methodology is used in the fields of natural language processing, bioinformatics, object recognition and computer vision. Deep learning has been promising for modelling time series data through methods such as Conditional RBM, Autoencoder, Restricted Boltzmann Machine (RBM), Recurrent neural networks, Convolution and pooling, and Hidden Markov Models. The main idea behind this paper is to present a wide survey of traditional statistical methodologies along with modern methodologies of Machine Learning in the prediction of rainfall accurately. Further, a comparison on rainfall predictions that used different approaches is made. Some reasonable solutions for predicting weather efficiently are also recommended.
RELEVANT WORK
Prediction of rainfall is a tedious work particularly when we expect accurate and exact values for predicting the rainfall. Rainfall prediction is trending research in the field of scientific research areas of technology and innovation in the modern world, since it has a large impact on the socio-economic life of human beings and all living beings.
Du et al. [1] researched and proposed deep belief networks methodology to be used in forecasting weather precipitation. A one year meteorological data is collected and used which is taken from Nanjing station. The author discussed the applications of big data processing methodologies in the field of meteorology for meteorological datasets. The proposed methodology is based on deep belief networks that develop a statistical model between precipitation characteristics and other meteorological information. Abhisheka et al. [2] conducted an experiment on the capability of ANN methodology by developing an efficient and consistent non-linear predictive model for analyzing weather. The authors have made comparison and assessed the performance of the model using diverse transfer functions, hidden layers, temperature of 365 days and neurons for maximal forecasting.
Sun et al. [3] researched on the feasible development of decadal predicting models for autumn rainfall (RA) in central Vietnam by using a published tree-ring reconstruction of October to November rainfall data obtained from the early wood width measurements. Harvey et al. [4] experimented on how the rainfall patterns relate with normal climatic conditions and the rate of occurrence of rainfall cycles. To assess the behaviour of rainfall, the authors have collected the data from certain regions of Brazil that frequently suffer from droughts. The authors have used theoretical rotational methodology that allows rotational components to be modelled unambiguously. The authors discovered that the cyclic parts are random rather than deterministic models.
Kuo and Sun [5] used an intervention model to experiment on typical 10 day stream forecast. The authors have experimented and synthesized the characteristics which influence the unusual situation originated by storms and rigorous diverse asymmetry in the climate near Tanshui river valley located in Taiwan. Chiew et al. [6] made an estimation on 6 rainwater surplus modelling methodologies to replicate every day, monthly and yearly streams in 8 irregular watershed areas. The authors have discovered that time series methodology is good and it efficiently evaluates the monthly and yearly yield inside the water resources of the watershed area.
Langu et al. [7] used time series methodology to spot the alterations in rainwater and the surplus design. The designs that were identified helped in identifying considerable alterations in rainwater timings. The authors applied statistical methodologies to inspect the alterations in rainwater and surplus designs which identify significant transformations in rainwater statistics. The authors researched and developed approaches for statistic modelling inside univariate cases, which is referred to as Univariate Box-Jenkins (UBJ) ARIMA modelling in early 1970s. Based on this approach, several scientists and researchers have developed many different methodologies i.e., time series disintegration methodology, augmented smoothening methodology, vector ARIMA, and ARNAx, etc.
Carter and Elsner [8] accompanied the result commencing the characteristic investigation of decentralization of non-tropical rainstorm convectional rainwater on the isle of Puerto Rico. The authors have involved statistical methodologies in order to discover the capability to calculate rainwater in certain localities. Isle decentralization was applied to 15-year datasets. A data file that contains 3 years of exterior and rainwater data were used in this methodology for prediction. Exterior data is taken from the two first class locations and these were taken as input to partial adaptive categorization tree to predict the occurrence of intense rain on new data.
Al-Ansari and Baban [9] have anticipated a functional mathematical investigation for measuring rainwater in 3 meteorological locations in Jordan such as Amman aerodrome (central Jordan), Irbid (northern Jordan) and Mafraq (eastern Jordan). Conventional functional mathematics, power band analysis and ARIMA models were applied onto the variable annual measurement of rainfall from 3 locations. The outcome shows that the capable intervals in the sequence of 2.3-3.45, 2.5-3.4 and 2.44-4.1 years for Amman, Irbid and Mafraq locations correspondingly were achieved. A statistical methodology for every location was accustomed, organized, and checked analytically and at last, the ARIMA methodology was developed for each location with an assurance intermission of 95% and the methodology was able to predict the yearly rainwater digits for 5 years for Amman, Irbid and Mafraq meteorological locations.
Al-Ansari et al. [10] performed a statistical investigation on records of rainfall at 3 key meteorological locations in Jordan. The authors have applied certain approaches like ordinary statistics, harmonic and power band examination, and time series evaluation. At every location, an ARIMA methodology was developed with an assurance intermission of 95%. The model has shown the outcomes with reducing trends in predicting rainfall at all the stations. Ingsrisawang et al. [11] performed three statistical methods First-order Markov Chain, Logistic model, and Generalized Estimating Equation (GEE) in modelling rainwater forecasting in the eastern location of Thailand. 2 data files called Meteor and GPCM were gathered daily all throughout the years 2004 to 2008. GPCM dataset and Meteor dataset were combined in order to get GPCM+Meteor dataset. Using the Meteor data file, the First-order Markov Chain methodology was performed.
Seyed et al. [12] tuned the specifications of weather by applying arbitrary methodologies such as the ARIMA methodology. The author involved time series approach to tune the parameters of climate in Abadeh located in Iran and suggested ARIMA (0,0,1), (1,1,1) methodology is appropriate for monthly rainwater and ARIMA(2,1,0), (2,1,0) for average monthly temperature for Abadeh location. Mahsin et al. [13] applied Box-Jenkins method for the creation of a seasonal ARIMA model for monthly rainwater information that is taken from Dhaka station, located in Bangladesh. The data contains information from 1981 to 2010 i.e., 30 years. In this paper, the ARIMA (0,0,1),(0,1,1) methodology was discovered to be satisfactory and can be applied for predicting monthly rainwater.
Neural networks are widely used for tuning a wide range of non-linear hydrological processes like forecasting the climate. The ASCE task board has shown few facts regarding the uses of ANNs in the geophysical science area [14]. Hu et al. [15] applied the idea of ANN for predicting the climate. This was the foremost effort made to put into operation the soft computing methodology in this area which led as a base for the latest facet in the climate allied research. French et al. [16] recommended two-dimensional model which forecasts rainfall 1 hour earlier. The author used ANN methodology which is essentially a mathematical methodology for predicting rainfall. The obtained end product of this model is further used as input for future forecasting. The first drawback of this model is that the contact interface and the training time are unbalanced. The second drawback of this model is the contrast between the input and output nodes, and the amount of concealed layers and concealed nodes that are not sufficient. These were needed in order to keep the higher order correlation for effectively evolving the methodology. With this model, the authors faced huge amount of issues, but this model was the primary effort made to use the ANN methodology on geophysical operations. Michaelides et al. [17] conducted an experiment to analyze the performance of the ANN model and determined it in opposition to multiple regressions. The authors worked on the Cyprus region where there was missing rainfall data and evaluated on that.
Kalogirou et al. [18] applied the ANN model using time series information in order to recreate the rainwater data for the Cyprus region. Adyaland Collopy [19] presented 11 strategies to evaluate the ANN model. The authors have applied their theoretical neural networks for commercial forecasting and prediction of rainfall. From 1988 to 1994, they administered 48 investigations. For each examination, the authors have evaluated the efficiency of the model in contrast with substitutes like the efficiency of validation. They also worked on the efficiency of performance of the model. In their investigation, they discovered that 11 studies were efficiently validated and applied, and other 11 investigations were efficiently justified and obtained affirmative outcomes. From these 22 investigations, better results were found using neural networks in 18 studies.
Pucheta et al. [20] developed a feed-forward neural network based NAR methodology for time series forecasting. The Levenberg-Marquardt methodology was implemented for examining rules to fine-tune the neural network weights. The methodology evaluated 5 time series that were acquired from the Mackey-Glass delay differential equation and from monthly growing rainwater. 3 sets of specifications were used for the Mackey-Glass equation. The monthly growing rainfall data is taken from two different locations and time slots. They are La Perla from 1962 to 1971 and Santa Francisca from 2000 to 2010 and these places are located in Cordoba, Argentina. This methodology predicts 18 values for each new time series data replicated by 500 Monte Carlo trials to denote the discrepancy using fractional Gaussian noise.
Adhikari and Agarwal [21] systematically investigated the stupendous ability of ANNs in identifying and forecasting tough seasonal patterns without deleting them from the unprocessed data. Six real world time series data along with prevailing seasonal variations were applied in this work. Practical results showed outstanding efficiency in forecasting tough seasonal fluctuations if the ANN is designed perfectly and the three statistical models for the six time series data were found to be performing well.
Nanda et al. [22] experimented on several ANN models, along with Functional link ANN (FLANN), Legendre Polynomial Equation (LPE), and Multi Layer Perceptron (MLP). After experimenting on these models MLP, FLANN, and LPE, they found that these three models were performing outstandingly for the time series data. The authors suggested ARIMA method along with the ANN methodology. A performance examination was made using MATLAB and was approved by the use of the information gathered from Indian meteorological sector from June to September in the year 2012. The authors stated that the FLANN methodology forecasting improved in contrast to ARIMA with very small Absolute average percentage error (AAPE). Sethi et al. [23] brought up a Multi linear regression (MLR) model for predicting the rainwater. The authors have applied empirical statistical methodology on the 30 years of weather data such as temperature, cloud coverage, vapour pressure, and precipitation from Udaipur city, Rajasthan India. The authors conducted an experiment on the prediction accuracy of rainfall data. They recognized the features of MLR and made a contrast prediction with the concrete data. The authors analyzed the data and showed them in graphs so as to confirm that their model is performing better and be able to obtain the values that are nearer to concrete data. Prasad and Neeraj [24] conducted an investigation on the prediction of weather by applying 9 years’ information collected from Basara city. The authors used different techniques of data mining like association rule mining, classification, aggregation and outlier analysis for prediction of weather.
Helen et al. [25] developed one of the best models for accurate rainfall prediction in the south western Nigeria. The authors developed neural network and fuzzy logic models for prediction. They applied mean squared error, root mean squared error, mean absolute error and accuracy metrics onto these two models for checking their performance. The neural network model performed better with an accuracy of 77.17% than the fuzzy logic model having 68.92% of accuracy. Manandhar et al. [26] recommended an organized methodology for analyzing various specifications that influence precipitation in the environment. Diverse ground-based climatic characteristics such as temperature, humidity, dew point, solar radiation, PWV, seasonal variables and diurnal variables were recognized and a comprehensive characteristic relational study is presented. For rainfall classification, all the climatic characteristics play a key role, but for rainfall prediction, only a few features such as PWV, solar radiation, seasonal and diurnal features play a significant role. On the basis of these, a set of finest characteristics were used in machine learning algorithm for prediction of rainfall. The authors have experimented on 4-year data from 2012 to 2015 and found that the model is able to correctly evaluate 80.4% data and incorrect predictions were found to be 20.3% and the accuracy was found to be 79.6 %. In contrast with the existing relevant work, this methodology outperforms by reducing incorrect predictions.
Manandhar et al. [27] recommended a comprehensive study on PWV values of rainfall using a simple and efficient algorithm for prediction of the beginning of the rainfall in the tropical region. The authors implemented the recommended algorithm on a seasonal dependent PWV threshold value that is enhanced by taking into account the time of the day when most of the rainfall occurs in diverse seasons and an SD threshold is used to predict the beginning of the rainfall in the next 5 minutes using the information from past 30 minutes in a tropical region. The data is collected from Singapore, NTUS GPS station and the algorithm is developed from this data. The authors have experimented on the NTUS data and found that the model has 87.7% evaluation accuracy and incorrect predictions were found to be 38.6%. The authors also collected the data from two more tropical stations SNUS and SALU and applied this algorithm to this data individually. They found that the algorithm performed well and gave good accuracy based on these data. Hence the derived algorithm presents that the PWV and SD are good features for forecasting the rainfall.
Agboola et al. [28] evaluated the performance of fuzzy rules/logic for modelling rainfall in the south western Nigeria. The fuzzy logic model is derived from two efficient components; one is the knowledge base and the other is fuzzy reasoning or decision making unit. The operations that were carried out on the fuzzy logic model are fuzzification and defuzzification. The predicted results were compared with the original rainfall data. Simulation results were found to be good for prediction and were in good agreement with the measured data. Performance metrics like accuracy, root mean squared error, mean absolute error, and prediction error were calculated on the model and it was found that the fuzzy logic methodology was effective and efficient for handling scattered data. The fuzzy logic model was found to be flexible and capable of modelling irregular relationship between input and output variables.
David Saur [29] aimed to fetch large data on numerical models that have the highest accuracy for convective forecasting of precipitation that is based on the examination of 30 situations in the year 2014. These predictions can be helpful when there is a situation of crisis in the Zlin region in unusual natural calamities. Urmay Shah et al. [30] predicted the rainfall using a mixture of diverse machine learning and forecasting algorithms. Rainfall is dependent on many specifications, but we could get good classification accuracy using certain amount of specifications only. The authors classified the rainfall into eight different parts, and discovered that the accuracy was outstanding. Validation was made on the forecasted