Big Data Analytics and Machine Learning
Big Data Analytics and Machine Learning
0 Interoperability,
Analytics, Security, and
Case Studies
Big Data for Industry 4.0: Challenges and Applications
Industry 4.0 or fourth industrial revolution refers to interconnectivity, automation and real time
data exchange between machines and processes. There is a tremendous growth in big data from
internet of things (IoT) and information services which drives the industry to develop new models
and distributed tools to handle big data. Cutting-edge digital technologies are being harnessed to
optimize and automate production including upstream supply-chain processes, warehouse man-
agement systems, automated guided vehicles, drones etc. The ultimate goal of industry 4.0 is to
drive manufacturing or services in a progressive way to be faster, effective and effcient that can
only be achieved by embedding modern day technology in machines, components, and parts that
will transmit real-time data to networked IT systems. These, in turn, apply advanced soft comput-
ing paradigms such as machine learning algorithms to run the process automatically without any
manual operations. The new book series will provide readers with an overview of the state-of-the-
art in the feld of Industry 4.0 and related research advancements. The respective books will iden-
tify and discuss new dimensions of both risk factors and success factors, along with performance
metrics that can be employed in future research work. The series will also discuss a number of
real-time issues, problems and applications with corresponding solutions and suggestions. Sharing
new theoretical fndings, tools and techniques for Industry 4.0, and covering both theoretical and
application-oriented approaches. The book series will offer a valuable asset for newcomers to the
feld and practicing professionals alike. The focus is to collate the recent advances in the feld, so
that undergraduate and postgraduate students, researchers, academicians, and industry people can
easily understand the implications and applications of the feld.
Edited by
G. Rajesh, X. Mercilin Raajini,
and Hien Dang
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and publish-
ers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microflming, and recording, or in any information storage or retrieval system, without
written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or con-
tact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For
works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only
for identifcation and explanation without intent to infringe.
Typeset in Times
by Deanta Global Publishing Services, Chennai, India
Contents
Editors ......................................................................................................................vii
Contributors ..............................................................................................................ix
Chapter 1 Big Data Analytics and Machine Learning for Industry 4.0:
An Overview ........................................................................................ 1
Nguyen Tuan Thanh Le and Manh Linh Pham
v
vi Contents
vii
viii Editors
S. Aruna A. Devi
Department of Software Engineering Department of ECE
SRM Institute of Science & Technology IFET College of Engineering
India Villupuram, India
ix
x Contributors
Mohak Narang
T. Ananth Kumar Department of Software Engineering
Department of ECE SRM Institute of Science & Technology
IFET College of Engineering Chennai, India
Villupuram, India
Hung Nguyen
L. Lakshmanan Model Validation Analyst
School of Computing Regions Financial Corp.
Sathyabama Institute of Science and Birmingham, Alabama
Technology
Chennai, India
G. Matthew Palmer
Department of CSE
S. Lakshmi Karunya Institute of Technology and
Department of Electronics and Sciences
Communication Engineering Coimbatore, India
Sathyabama Institute of Science and
Technology
Chennai, India J. Dinesh Peter
Department of CSE
T. Samraj Lawrence Karunya Institute of Technology and
Department of IT Sciences
Dambi Dolo University Coimbatore, India
Ethiopia
Manh Linh Pham
Nguyen Tuan Thanh Le Faculty of Information Technology
Faculty of Computer Science and VNU University of Engineering and
Engineering, Technology
Thuyloi University Hanoi, Vietnam
Hanoi, Vietnam
L. Arun Raj
D.R. Anita Sofia Liz Department of Computer Applications
Department of Computer Science and B. S. Abdur Rahman Crescent
Engineering Institute of Science & Technology
New Prince Shri Bhavani College of Chennai, India
Engineering and Technology
Chennai, India
xii Contributors
R. Vignesh M. Yuvaraju
School of Computing Department of Electrical and
Sathyabama Institute of Electronics Engineering
Science and Technology Anna University Regional
Chennai, India Campus
Coimbatore, India
M. Stefi Vinciya
Department of Computer Science and
Engineering
New Prince Shri Bhavani College of
Engineering and Technology
Chennai, India
Big Data Analytics
1 and Machine Learning
for Industry 4.0:
An Overview
Nguyen Tuan Thanh Le and Manh Linh Pham
CONTENTS
1.1 Big Data Analytics for Industry 4.0 .................................................................. 1
1.1.1 Characteristics of Big Data ...................................................................1
1.1.2 Characteristics of Big Data Analytics .................................................. 3
1.2 Machine Learning for Industry 4.0 ..................................................................4
1.2.1 Supervised Learning.............................................................................5
1.2.2 Unsupervised Learning ........................................................................5
1.2.3 Semi-Supervised Learning ...................................................................6
1.2.4 Reinforcement Learning .......................................................................6
1.2.5 Machine Learning for Big Data............................................................ 6
1.3 Deep Learning for Industry 4.0: State of the Art .............................................7
1.4 Conclusion ........................................................................................................9
Acknowledgments......................................................................................................9
References ..................................................................................................................9
1
2 Industry 4.0 Interoperability, Analytics, Security, and Case Studies
Volume hints to the size and/or scale of datasets. Until now, there is no univer-
sal threshold for data volume to be considered as big data, because of the time and
diversity of datasets. Generally, big data can have the volume starting from exabyte
(EB) or zettabyte (ZB) [4].
Variety implies the diversity of data in different forms which contain structured,
semi-structured, or unstructured ones. Real-world datasets, coming from heteroge-
neous sources, are mostly under unstructured or semi-structured form that makes
the analysis challenging because of the inconsistency, incompleteness, and noise.
Therefore, data prepossessing is needed to remove noise, which includes some steps
such as data cleaning, data integrating, and data transforming [5].
Velocity indicates the speed of processing data. It can fall into three categories:
streaming processing, real-time processing, or batch processing. This characteristic
emphasizes that the speed of producing data should keep up with the speed of pro-
cessing data [4].
Value alludes to the usefulness of data for decision making. Giant companies
(e.g., Amazon, Google, Facebook, etc.) analyze daily large scale datasets of users
and their behavior to give recommendations, improve location services, or provide
targeted advertising, etc. [3].
Veracity denotes the quality and trustworthiness of datasets. Due to the variety
characteristic of data, the accuracy and trust become harder to accomplish and they
play an essential role in applications of big data analytics (BDA). As with analyz-
ing millions of health care entries in order to respond to an outbreak that impacts
on a huge number of people (e.g., the CoVid-19 pandemic) or veterinary records to
guess the plague in swine herd (e.g., African swine fever or porcine reproductive and
Big Data Analytics for Industry 4.0 3
• (1) data – the more diverse and bigger the data, the better the result; (2) fea-
tures – also know as parameters or variables (e.g., age, gender, stock price,
etc.), they are the factors that the machine is looking at; and (3) algorithms –
the steps we follow to solve the given problem that affects the precision,
performance, and size of our model [18].
Generally, ML algorithms can be classifed into four main types: (1) supervised
learning, (2) unsupervised learning, (3) semi-supervised learning, and (4) reinforce-
ment learning [17, 19], as shown in Fig. 1.2.
Big Data Analytics for Industry 4.0 5
FIGURE 1.3 (a) Typical Architecture of Deep Learning Neural Network with One Output,
One Input, and K Hidden Layers; (b) Artifcial Neuron: Basic Computational Building Block
for Neural Networks
Deep learning (DL) can be employed to extract complex and high- level abstrac-
tions of data representations. It is done by using a hierarchical, layered architecture
of learning, where more abstract features (i.e., higher- level) are stated, described,
and implemented on top of less abstract ones (i.e., lower-level) [24] – see Fig. 1.3(a).
DL techniques can analyze and learn from an enormous amount of unsupervised
data, which is suitable for BDA in which raw data is mostly unlabeled as well as
uncategorized [24]. We will focus on DL for Industry 4.0 in the next section.
machines joined with SdA. Their results showed that the features detected by SdA
led to a more improved classifcation in comparison with hand-crafted features.
Shao et al. [31] extracted features in a fault diagnosis system for rotating devices
with the input of vibration data by applying Deep Neural Networks. The authors
combined Denoising Auto-Encoders with Contractive Auto-Encoders. To diagnose
the fault, they refned the learned features using Locality Preserving Projection, then
put them into a softmax classifer. Seven conditions were considered in their sys-
tem, including: rubbing fault, compound faults (rub and unbalance), four levels of
imbalance faults as well as normal operation. The device status is identifed based
on exploitation of vibration data by the diagnosis system. It fgures out whether the
device is in fault or normal condition. Their approach used on the experiments to
diagnose the fault of locomotive bearing devices and rotors showed that it could beat
Convolutional Neural Network and other shallow learning methods.
Lee [32] supported that the detection of faults belongs to several defect types
often appearing on headlight modules of cars in a setting of vehicle manufacturer by
proposing a Deep Belief Network (DBN) model together with a cloud platform and
an IoT deployment. The results showed that the DBN model outperformed two other
baseline methods (i.e., Radial Basis Function, and Support Vector Machine) with
regard to error rate in test datasets.
1.4 CONCLUSION
In this chapter, we have reviewed two promising technologies for Industry 4.0,
namely BDA and ML. We focused on the data aspect of smart manufacturing, which
is fast and massive, and cannot be handled effciently by conventional approaches.
Indeed, by employing BDA and ML, especially DL, a wide range of industrial appli-
cations is proven to be accelerated. Although few successful works were reported
in the literature, we believe that an optimizing and fully automated production on
a large scale could be achieved in the very near future because of these potential
advanced technologies.
ACKNOWLEDGMENTS
This work has been partly supported by Vietnam National University, Hanoi (VNU),
under Project No. QG.20.55.
REFERENCES
1. R. Magoulas and B. Lorica. Introduction to big data. O’Reilly Media, Sebastopol, CA,
February 2009.
2. J. Gantz and D. Reinsel. Extracting value from chaos. IDC iview, 1142(2011):1–12,
2011.
3. R. H. Hariri, E. M. Fredericks, and K. M. Bowers. Uncertainty in big data analytics:
survey, opportunities, and challenges. Journal of Big Data, 6(1):44, 2019.
4. M. Chen, S. Mao, and Y. Liu. Big data: A survey. Mobile Networks and Applications,
19(2):171–209, 2014.
10 Industry 4.0 Interoperability, Analytics, Security, and Case Studies
5. J. Han, J. Pei, and M. Kamber. Data mining: concepts and techniques. Elsevier, 2011.
6. M. Schuldenfrei. Big data challenges of industry 4.0. Datanami, April 25 2019.
7. N. Golchha. Big data-the information revolution. International Journal of Advanced
Research, 1(12):791–794, 2015.
8. C.-W. Tsai, C.-F. Lai, H.-C. Chao, and A. V. Vasilakos. Big data analytics: A survey.
Journal of Big data, 2(1):21, 2015.
9. M. Hilbert. Big data for development: A review of promises and challenges.
Development Policy Review, 34(1):135–174, 2016.
10. X. Wang and Y. He. Learning from uncertainty for big data: Future analytical chal-
lenges and strategies. IEEE Systems, Man, and Cybernetics Magazine, 2(2):26–31,
2016.
11. A. Bargiela and W. Pedrycz. Granular computing. In Handbook on Computational
Intelligence: Volume 1: Fuzzy Logic, Systems, Artifcial Neural Networks, and
Learning Systems, pages 43–66. World Scientifc, 2016.
12. J. Kacprzyk, D. Filev, and G. Beliakov. Granular, Soft and Fuzzy Approaches for
Intelligent Systems. Springer International Publishing, New York City, NY, 2017.
13. R. R. Yager. Decision making under measure-based granular uncertainty. Granular
Computing, 3(4):345–353, 2018.
14. H. Liu and H. Motoda. Computational Methods of Feature Selection. CRC Press, Boca
Raton, FL, 2007.
15. J. A. Olvera-López, J. A. Carrasco-Ochoa, J. F. Martínez-Trinidad, and J. Kittler. A
review of instance selection methods. Artifcial Intelligence Review, 34(2):133–143,
2010.
16. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015.
17. A. Burkov. The Hundred-Page Machine Learning Book. Andriy Burkov, Quebec City,
2019.
18. V. Zubarev. Machine learning for everyone: In simple words. with real-world examples.
yes, again, 2019.
19. F. Chollet. Deep Learning with Python. Manning Publications Co., 2017.
20. K. Weiss, T. M. Khoshgoftaar, and D. Wang. A survey of transfer learning. Journal of
Big data, 3(1):9, 2016.
21. J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng. A survey of machine learning for big data
processing. EURASIP Journal on Advances in Signal Processing, 2016(1):67, 2016.
22. L. M. Pham and T.-M. Pham. Autonomic fne-grained migration and replication of
component-based applications across multi-clouds. In 2015 2nd National Foundation
for Science and Technology Development Conference on Information and Computer
Science (NICS), pages 5–10. IEEE, 2015.
23. S. Athmaja, M. Hanumanthappa, and V. Kavitha. A survey of machine learning algo-
rithms for big data analytics. In 2017 International Conference on Innovations in
Information, Embedded and Communication Systems (ICIIECS), pages 1–4. IEEE,
2017.
24. M. M. Najafabadi, F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald, and E.
Muharemagic. Deep learning applications and challenges in big data analytics. Journal
of Big Data, 2(1):1, 2015.
25. M. Mohammadi, A. Al-Fuqaha, S. Sorour, and M. Guizani. Deep learning for iot big
data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials,
20(4):2923–2960, 2018.
26. J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks,
61:85–117, 2015.
27. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
Big Data Analytics for Industry 4.0 11
28. A. Luckow, M. Cook, N. Ashcraft, E. Weill, E. Djerekarov, and B. Vorster. Deep learn-
ing in the automotive industry: Applications and tools. In 2016 IEEE International
Conference on Big Data (Big Data), pages 3759–3768. IEEE, 2016.
29. H. Lee, Y. Kim, and C. O. Kim. A deep learning model for robust wafer fault monitoring
with sensor measurement noise. IEEE Transactions on Semiconductor Manufacturing,
30(1):23–31, 2016.
30. W. Yan and L. Yu. On accurate and reliable anomaly detection for gas turbine combus-
tors: A deep learning approach. arXiv preprint arXiv:1908.09238, 2019.
31. H. Shao, H. Jiang, F. Wang, and H. Zhao. An enhancement deep feature fusion method
for rotating machinery fault diagnosis. Knowledge-Based Systems, 119:200–220, 2017.
32. H. Lee. Framework and development of fault detection classifcation using iot device
and cloud environment. Journal of Manufacturing Systems, 43:257–270, 2017.
Big Data Analytics and Machine Learning for Industry 4.0: An Overview
R. Magoulas and B. Lorica . Introduction to big data. Oâ Reilly Media, Sebastopol, CA, February
™
€
2009.
J. Gantz and D. Reinsel . Extracting value from chaos. IDC iview, 1142(2011):1â 12, 2011.
“
€
R. H. Hariri , E. M. Fredericks , and K. M. Bowers . Uncertainty in big data analytics: survey,
opportunities, and challenges. Journal of Big Data, 6(1):44, 2019.
M. Chen , S. Mao , and Y. Liu . Big data: A survey. Mobile Networks and Applications,
19(2):171â209, 2014.
“
€
10 J. Han , J. Pei , and M. Kamber . Data mining: concepts and techniques. Elsevier, 2011.
M. Schuldenfrei . Big data challenges of industry 4.0. Datanami, April 25 2019.
N. Golchha . Big data-the information revolution. International Journal of Advanced Research,
1(12):791â794, 2015.
“
€
C. W. Tsai , C. F. Lai , H. C. Chao , and A. V. Vasilakos . Big data analytics: A survey. Journal
of Big data, 2(1):21, 2015.
M. Hilbert . Big data for development: A review of promises and challenges. Development Policy
Review, 34(1):135â 174, 2016.
“
€
X. Wang and Y. He . Learning from uncertainty for big data: Future analytical challenges and
strategies. IEEE Systems, Man, and Cybernetics Magazine, 2(2):26â 31, 2016.
“
€
A. Bargiela and W. Pedrycz . Granular computing. In Handbook on Computational Intelligence:
Volume 1: Fuzzy Logic, Systems, Artificial Neural Networks, and Learning Systems, pages
43â66. World Scientific, 2016.
“
€
J. Kacprzyk , D. Filev , and G. Beliakov . Granular, Soft and Fuzzy Approaches for Intelligent
Systems. Springer International Publishing, New York City, NY, 2017.
R. R. Yager . Decision making under measure-based granular uncertainty. Granular Computing,
3(4):345â353, 2018.
“
€
H. Liu and H. Motoda . Computational Methods of Feature Selection. CRC Press, Boca Raton,
FL, 2007.
J. A. Olvera-LÃ pez , J. A. Carrasco-Ochoa , J. F. MartÃ
³ nez-Trinidad , and J. Kittler . A review of
-
instance selection methods. Artificial Intelligence Review, 34(2):133â 143, 2010.
“
€
Y. LeCun , Y. Bengio , and G. Hinton . Deep learning. Nature, 521(7553):436â 444, 2015.
“
€
A. Burkov . The Hundred-Page Machine Learning Book. Andriy Burkov, Quebec City, 2019.
V. Zubarev . Machine learning for everyone: In simple words. with real-world examples. yes,
again, 2019.
F. Chollet . Deep Learning with Python. Manning Publications Co., 2017.
K. Weiss , T. M. Khoshgoftaar , and D. Wang . A survey of transfer learning. Journal of Big data,
3(1):9, 2016.
J. Qiu , Q. Wu , G. Ding , Y. Xu , and S. Feng . A survey of machine learning for big data
processing. EURASIP Journal on Advances in Signal Processing, 2016(1):67, 2016.
L. M. Pham and T. M. Pham . Autonomic fine-grained migration and replication of component-
based applications across multi-clouds. In 2015 2nd National Foundation for Science and
Technology Development Conference on Information and Computer Science (NICS), pages
5â10. IEEE, 2015.
“
€
S. Athmaja , M. Hanumanthappa , and V. Kavitha . A survey of machine learning algorithms for
big data analytics. In 2017 International Conference on Innovations in Information, Embedded
and Communication Systems (ICIIECS), pages 1â 4. IEEE, 2017.
“
€
M. M. Najafabadi , F. Villanustre , T. M. Khoshgoftaar , N. Seliya , R. Wald , and E.
Muharemagic . Deep learning applications and challenges in big data analytics. Journal of Big
Data, 2(1):1, 2015.
M. Mohammadi , A. Al-Fuqaha , S. Sorour , and M. Guizani . Deep learning for iot big data and
streaming analytics: A survey. IEEE Communications Surveys & Tutorials, 20(4):2923â 2960,
“
€
2018.
J. Schmidhuber . Deep learning in neural networks: An overview. Neural Networks, 61:85â 117,
“
€
2015.
I. Goodfellow , Y. Bengio , and A. Courville . Deep Learning. MIT Press, 2016.
11 12 A. Luckow , M. Cook , N. Ashcraft , E. Weill , E. Djerekarov , and B. Vorster . Deep
learning in the automotive industry: Applications and tools. In 2016 IEEE International
Conference on Big Data (Big Data), pages 3759â 3768. IEEE, 2016.
“
€
H. Lee , Y. Kim , and C. O. Kim . A deep learning model for robust wafer fault monitoring with
sensor measurement noise. IEEE Transactions on Semiconductor Manufacturing, 30(1):23â 31,
“
€
2016.
W. Yan and L. Yu . On accurate and reliable anomaly detection for gas turbine combustors: A
deep learning approach. arXiv preprint arXiv:1908.09238, 2019.
H. Shao , H. Jiang , F. Wang , and H. Zhao . An enhancement deep feature fusion method for
rotating machinery fault diagnosis. Knowledge-Based Systems, 119:200â 220, 2017.
“
€
H. Lee . Framework and development of fault detection classification using iot device and cloud
environment. Journal of Manufacturing Systems, 43:257â270, 2017.
“
€