A Survey of Big Data and Machine Learning
A Survey of Big Data and Machine Learning
Corresponding Author:
Surender Reddy Salkuti,
Department of Railroad and Electrical Engineering,
Woosong University,
17-2, Jayang-Dong, Dong-Gu, Daejeon 34606, Republic of Korea.
Email: [email protected]
1. INTRODUCTION
Artificial intelligence (AI) technologies can improve the conversation and cooperation of human and
machine. These technologies are used for better interactions between human and machines. Now, we are
living in a time of huge information, i.e., an age described by quick gathering of pervasive data. In numerous
enterprises, it is developing and giving a way to enhance and streamline business. Numerous fields and
segments, going from financial and business exercises to open organization, from national security to logical
research in numerous territories, are associated with huge information issues [1]. Huge information has
changed the world as far as anticipating client conduct. The introduction of huge information cannot abstain
from specifying another current prominent term, interpersonal organizations and the connection between
the two is self-evident, yet convoluted. During the last few years, wind, solar, hydro and nuclear power
companies have greatly benefited from the power of AI, big data, machine learning and predictive models.
They used these technologies to make better predictions, to increase their portfolio’s rate of return and to
lower their costs [2].
AI came into picture in early fifties and sixties. It was mostly about enabling machines to do things
on their own in programming machines which later increased into something like robotics and then in early
90s till 2010 we had this machine learning coming into picture where so many different kind of algorithms
and approaches and different kind of theories were discovered and rented in order to begin machine to start
learning on their own and then from 2010 onwards a new field which is a subset of artificial intelligence (AI),
and machine learning (ML) is the subset of AI, and deep learning is a subset of ML which started
in early 2010. ML is an interdisciplinary field which allows us to achieve some sort of AI by using
statistical techniques [3].
Huge information and interpersonal organizations are reliant, on the grounds that the majority of
present information is produced from person to person communication destinations, yet enormous
information is not generally helpful. The real test of huge information is not in gathering it, however in
overseeing it and also comprehending it. A few instruments are being intended to better comprehend the part
of gigantic measure of information in enhancing business. Analysts and specialists are endeavouring to
investigate the eventual fate of huge information to extricate more advantages. Numerical conventional
approaches are computationally expensive, and hence it is difficult to use for the on-line security assessment.
ML approaches with their learning capabilities, high speed of identifying the potential security boundaries
and pattern recognition can offer an alternative method [4].
Reference [5] presents the definition of big data application scenarios through examples in different
segments of transportation and energy sectors. Reference [6] proposes a platform which can provide
a technical solution to multidisciplinary cooperation of smart grid (SG) monitoring and big data technology.
An assessment of distinguished aspects in big data analytics developments in the domain of power systems
has been presented in [7]. A new core-broker-client system architecture for big data analytics is proposed
in [8]. An overview and potential of big data for smart energy management system is presented in [9].
The ongoing work of application of ML on dynamic security assessment of power systems is
addressed in [10]. A comprehensive review of applications of deep learning approaches on machine health
monitoring tasks is presented in [11]. Machine learning for hourly solar forecasting application is proposed
in [12]. Reference [13] reviews the ML models that are used for condition monitoring in wind turbines.
Reference [14] presents an attack detection model for power systems based on ML that can be trained by
using information and logs collected by phasor measurement units. A method to address the problem of
suggesting the most suitable components for each user by creating a recommender system using intelligent
data analysis is proposed in [15]. Reference [16] trains a ML model to predict the duration of big data
workloads.
First, this paper presents the concepts of big data analytics and ML techniques. One need to
determine the power system problem where we need to apply the ML or big data. If the problem can be
solved with classical methods with desired accuracy within the time frame, then there is no requirement of
these AI techniques. Generally, the ML is used for forecasting, as there is an availability of lot of data.
If we have a rich data set, the ML can narrow down the data based on model based approach or observed
data. ML will not give satisfactory results for highly complex and dynamic problems of power systems,
such as load flow, contingency analysis, transient stability analysis, etc.
Int J Elec & Comp Eng, Vol. 10, No. 1, February 2020 : 575 - 580
Int J Elec & Comp Eng ISSN: 2088-8708 577
recognition, image recognition and many more [20]. Every day we use it several times without knowing it.
The process of ML is depicted in Figure 1.
Training Data
Machine Learning
Algorithm
Machine Learning
Hypothesis Performance
Representation
It is like predicting the future. The companies would already know what kind of decision you are
going to take any given certain situations. If certain parameters, they are provided to you what would be your
reaction because they know in past you have done something like this and you have a set of data your data is
with them that every time when you have into such kind of environment, you do certain kind of actions
companies can predict it and this is what is done as a part of machine learning, where data is provided feature
extraction takes place. Then a predictive model is calculated and this predictive model is then rolled out for
the users for which data was taken [21].
ML consists of two phases, i.e., learning and prediction phases. Figure 2 depicts learning phase of
ML process. Supervised, unsupervised, semi-supervised and reinforcement learnings are the some popular
machine learning types. Supervised learning requires humans to train by providing inputs and desired
output [22]. Unsupervised learning is opposite to the supervised learning, which learns by its own without
any labeled response. In the supervised and unsupervised leanings there is either labeled data or unlabeled
data, whereas semi-supervised learning uses both labeled and unlabeled data for training. Reinforcement
learning algorithms learns by trial and error method in which actions yield greatest reward. This algorithm is
a ML as well as branch of AI. The prediction phase of ML uses the developed model and the new data is fed
to this model. The predicted data is going to be available by using the ML algorithm [23].
Some of the common ML algorithms used include regression analysis, clustering, association
rule mining and collaborative filtering. Regression analysis helps to predict the relation between two
variables (i.e., dependent and independent variables). In the clustering algorithm, the classifier tries to
find some structure from given data set without known classification. Association rule mining tries to find
the association between objects, also known as affinity analysis. It is used to find what goes with
what [24-25]. It is basically a set of analytical techniques that is used to uncover connections and associations
between objects. Collaborative filtering is a technique of predicting user preferences based on the preference
of group of users. This technique is used when we need more complex information than keyword.
Int J Elec & Comp Eng, Vol. 10, No. 1, February 2020 : 575 - 580
Int J Elec & Comp Eng ISSN: 2088-8708 579
5. CONCLUSIONS
A detailed analysis of big data and machine learning (ML) in electrical power and energy sector
including the smart grid has been presented in this paper. Big data analytics involves the processes of
searching a database, mining, and analysing data dedicated to improve the performance of the company.
ML focuses on the development of computer programs that can teach themselves to grow and change when
exposed to the new data. Applications of big data and ML in various industries such as electrical power and
energy including smart grid, transportation, health care, education, e-commerce, financial services, marketing
and sales, etc. Various challenges and opportunities related to big data and machine learning are also
reviewed in this paper.
ACKNOWLEDGEMENTS
This research work has been carried out based on the support of “Woosong University's Academic
Research Funding - 2019”.
REFERENCES
[1] R. J. Bessa, “Chapter 10 - Future Trends for Big Data Application in Power Systems,” Big Data Application in
Power Systems, pp. 223-242, 2018.
[2] Y. Zhang, et al., “A big data driven analytical framework for energy-intensive manufacturing industries,” Journal
of Cleaner Production, vol. 197, pp. 57-72, 2018.
[3] “How Machine Learning, Big Data, & AI Are Changing Energy,” [Online], Available: https://rapidminer.com/
blog/machine-learning-big-data-ai-energy/
[4] N. V. Tomin, et al., “Machine Learning Techniques for Power System Security Assessment,” IFAC-PapersOnLine,
vol. 49, pp. 445-450, 2016.
[5] S. Rusitschka and E. Curry, Big Data in the Energy and Transport Sectors, in Cavanillas J., Curry E., Wahlster W.
(eds), “New Horizons for a Data-Driven Economy,” Springer, Cham, 2016.
[6] Y. Guo, et al., “Complex Power System Status Monitoring and Evaluation Using Big Data Platform and Machine
Learning Algorithms: A Review and a Case Study,” Complexity, vol. 2018, pp. 1-21, 2018.
[7] H. A. Hejazi and H. M. Rad, “Power systems big data analytics: An assessment of paradigm shift barriers and
prospects,” Energy Reports, vol. 4, pp. 91-100, 2018.
[8] T. Wilcox, et al., “A Big Data platform for smart meter data analytics,” Computers in Industry, vol. 105,
pp. 250-259, 2019.
[9] K. Zhou, et al., “Big data driven smart energy management: From big data to big insights,” Renewable and
Sustainable Energy Reviews, vol. 56, pp. 215-225, 2016.
[10] E. M. Voumvoulakis, et al., “Application of Machine Learning on Power System Dynamic Security Assessment,”
International Conference on Intelligent Systems Applications to Power Systems, Toki Messe, Niigata,
pp. 1-6, 2007.
[11] R. Zhao, et al., “Deep learning and its applications to machine health monitoring,” Mechanical Systems and Signal
Processing, vol. 115, pp. 213-237, 2019.
[12] G. M. Yagli, et al., “Automatic hourly solar forecasting using machine learning models,” Renewable and
Sustainable Energy Reviews, vol. 105, pp. 487-498, 2019.
[13] Stetco, et al., “Machine learning methods for wind turbine condition monitoring: A review,” Renewable Energy,
vol. 133, pp. 620-635, 2019.
[14] D. Wang, et al., “Detection of power grid disturbances and cyber-attacks based on machine learning,” Journal of
Information Security and Applications, vol. 46, pp. 42-52, 2019.
[15] A. J. F. García, et al., “A recommender system for component-based applications using machine learning
techniques,” Knowledge-Based Systems, vol. 164, pp. 68-84, 2019.
[16] Á. B. Hernández, et al., “Using machine learning to optimize parallelism in big data applications,” Future
Generation Computer Systems, vol. 86, pp. 1076-1092, 2018.
[17] R. Arghandeh and Y. Zhou, Big Data Application in Power Systems, Elsevier Science, 2018.
[18] J. Lee, et al., “Data Analysis for Solar Energy Generation in a University Microgrid,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 8, pp. 1324-1330, 2018.
[19] https://www2.microstrategy.com/producthelp/10.10/WebUser/WebHelp/Lang_1033/Content/mstr_big_data.htm
[20] M. Farhadi and N. Mollayi, “Application of the least square support vector machine for point-to-point
forecasting of the PV power,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9,
pp. 2205-2211, 2019.
[21] V. Malbasa, et al., “Voltage Stability Prediction Using Active Machine Learning,” IEEE Transactions on Smart
Grid, vol. 8, pp. 3117-3124, 2017.
[22] Rahman, et al., “Power disaggregation of combined HVAC loads using supervised machine learning algorithms,”
Energy and Buildings, vol. 172, pp. 57-66, 2018.
[23] Md. A. Rahman, et al., “A Survey of Machine Learning Techniques for Self - tuning Hadoop Performance,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 8, pp. 1854-1862, 2018.
[24] S. Preda, et al., “PV Forecasting Using Support Vector Machine Learning in a Big Data Analytics Context,”
Symmetry, vol. 10, 2018.
[25] C. S. Sindhu and N. P. Hegde, “A Novel Integrated Framework to Ensure Better Data Quality in Big Data
Analytics over Cloud Environment,” International Journal of Electrical and Computer Engineering (IJECE),
vol. 7, pp. 2798-2805, 2017.
[26] R. Eskandarpour and A. Khodaei, “Machine Learning Based Power Grid Outage Prediction in Response to
Extreme Events,” IEEE Transactions on Power Systems, vol. 32, pp. 3315-3316, 2017.
[27] G. Bathla, et al., “A Novel Approach for Clustering Big Data based on MapReduce,” International Journal of
Electrical and Computer Engineering (IJECE), vol. 8, pp. 1711-1719, 2018.
[28] W. Xiang, et al., “Machine learning based optimization for vehicle-to-infrastructure communications,” Future
Generation Computer Systems, vol. 94, pp. 488-495, 2019.
[29] C. Tu, et al., “Big data issues in smart grid – A review,” Renewable and Sustainable Energy Reviews, vol. 79,
pp. 1099-1107, 2017.
[30] B. A. S. Leech, et al., “Big Data issues and opportunities for electric utilities,” Renewable and Sustainable Energy
Reviews, vol. 52, pp. 937-947, 2015.
[31] R. Y. Zhong, et al., “Big Data for supply chain management in the service and manufacturing sectors: Challenges,
opportunities, and future perspectives,” Computers & Industrial Engineering, vol. 101, pp. 572-591, 2016.
[32] D. Radhika and D. A. Kumari, “Misusability Measure Based Sanitization of Big Data for Privacy Preserving
MapReduce Programming,” International Journal of Electrical and Computer Engineering (IJECE), vol. 8,
pp. 4524-4532, 2018.
[33] R. A. Archana, et al., “A Study on Big Data Privacy Protection Models using Data Masking Methods,”
International Journal of Electrical and Computer Engineering (IJECE), vol. 8, pp. 3976-3983, 2018.
[34] E. Hossain, et al., “Application of Big Data and Machine Learning in Smart Grid, and Associated Security
Concerns: A Review,” IEEE Access, vol. 7, pp. 13960-13988, 2019.
[35] T. Yuan, et al., “HyperOXN: A Novel Data Center Topology Driven by Machine Learning,” 13th APCA
International Conference on Automatic Control and Soft Computing, pp. 573-578, 2018.
[36] M. K. Saggi and S. Jain, “A survey towards an integration of big data analytics to big insights for value-creation,”
Information Processing & Management, vol. 54, pp. 758-790, 2018.
Int J Elec & Comp Eng, Vol. 10, No. 1, February 2020 : 575 - 580