0% found this document useful (0 votes)
58 views83 pages

Machine Learning For Computer Scientists and Data Analysts From An Applied Perspective Setareh Rafatirad Download

The document is a comprehensive overview of the book 'Machine Learning for Computer Scientists and Data Analysts' by Setareh Rafatirad and others, which focuses on the application of machine learning techniques across various fields. It covers foundational concepts, advanced techniques, and real-world applications, aiming to equip readers with both theoretical knowledge and practical skills in machine learning. The book is structured into three parts, detailing basics, advanced techniques, and specific applications in health monitoring, security, and resource management.

Uploaded by

nbilanlisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views83 pages

Machine Learning For Computer Scientists and Data Analysts From An Applied Perspective Setareh Rafatirad Download

The document is a comprehensive overview of the book 'Machine Learning for Computer Scientists and Data Analysts' by Setareh Rafatirad and others, which focuses on the application of machine learning techniques across various fields. It covers foundational concepts, advanced techniques, and real-world applications, aiming to equip readers with both theoretical knowledge and practical skills in machine learning. The book is structured into three parts, detailing basics, advanced techniques, and specific applications in health monitoring, security, and resource management.

Uploaded by

nbilanlisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Machine Learning For Computer Scientists And

Data Analysts From An Applied Perspective


Setareh Rafatirad download

https://ebookbell.com/product/machine-learning-for-computer-
scientists-and-data-analysts-from-an-applied-perspective-setareh-
rafatirad-43847510

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Machine Learning For Computer Vision 1st Edition Cheston Tan

https://ebookbell.com/product/machine-learning-for-computer-
vision-1st-edition-cheston-tan-4230616

Machine Learning For Computer And Cyber Security Principles Algorithms


And Practices Gupta

https://ebookbell.com/product/machine-learning-for-computer-and-cyber-
security-principles-algorithms-and-practices-gupta-10504934

Practical Machine Learning For Computer Vision Endtoend Machine


Learning For Images 1st Edition Valliappa Lakshmanan

https://ebookbell.com/product/practical-machine-learning-for-computer-
vision-endtoend-machine-learning-for-images-1st-edition-valliappa-
lakshmanan-48775628

Practical Machine Learning For Computer Vision Valliappa Lakshmanan

https://ebookbell.com/product/practical-machine-learning-for-computer-
vision-valliappa-lakshmanan-170415602
Handson Java Deep Learning For Computer Vision Implement Machine
Learning And Neural Network Methodologies To Perform Computer
Visionrelated Tasks Klevis Ramo

https://ebookbell.com/product/handson-java-deep-learning-for-computer-
vision-implement-machine-learning-and-neural-network-methodologies-to-
perform-computer-visionrelated-tasks-klevis-ramo-9994594

Deep Learning For Computer Vision Image Classification Object


Detection And Face Recognition In Python V14 Jason Brownlee

https://ebookbell.com/product/deep-learning-for-computer-vision-image-
classification-object-detection-and-face-recognition-in-
python-v14-jason-brownlee-33715580

Machine Learning And Data Mining For Computer Security Methods And
Applications 1st Edition Marcus A Maloof Auth

https://ebookbell.com/product/machine-learning-and-data-mining-for-
computer-security-methods-and-applications-1st-edition-marcus-a-
maloof-auth-4239824

Deployable Machine Learning For Security Defense Second International


Workshop Mlhat 2021 Virtual Event August 15 2021 Proceedings
Communications In Computer And Information Science 1st Ed 2021 Gang
Wang Editor
https://ebookbell.com/product/deployable-machine-learning-for-
security-defense-second-international-workshop-mlhat-2021-virtual-
event-august-15-2021-proceedings-communications-in-computer-and-
information-science-1st-ed-2021-gang-wang-editor-38451846

Python For Computer Vision Unlocking Image Processing And Machine


Learning With Python Mark Jackson

https://ebookbell.com/product/python-for-computer-vision-unlocking-
image-processing-and-machine-learning-with-python-mark-
jackson-147611288
Machine Learning for Computer Scientists
and Data Analysts
Setareh Rafatirad • Houman Homayoun •
Zhiqian Chen • Sai Manoj Pudukotai Dinakarrao

Machine Learning for


Computer Scientists
and Data Analysts
From an Applied Perspective
Setareh Rafatirad Houman Homayoun
George Mason University University of California, Davis
Fairfax, VA, USA Davis, CA, USA

Zhiqian Chen Sai Manoj Pudukotai Dinakarrao


Mississippi State University George Mason University
Mississippi State, MS, USA Fairfax, VA, USA

ISBN 978-3-030-96755-0 ISBN 978-3-030-96756-7 (eBook)


https://doi.org/10.1007/978-3-030-96756-7

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The recent popularity gained by the field of machine learning (ML) has led to its
adaptation into almost all the known applications. The applications range from smart
homes, smart grids, and forex markets to military applications and autonomous
drones. There exists a plethora of machine learning techniques that were introduced
in the past few years, and each of these techniques fits greatly for a specific set of
applications rather than a one-size-fits-all approach.
In order to better determine the application of ML for a given problem, it is non-
trivial to understand the current state of the art of the existing ML techniques, pros
and cons, their behavior, and existing applications that have already adopted them.
This book thus aims at researchers and practitioners who are familiar with their
application requirements, and are interested in the application of ML techniques
in their applications not only for better performance but also for ensuring that the
adopted ML technique is not an overkill to the considered application. We hope that
this book will provide a structured introduction and relevant background to aspiring
engineers who are new to the field, while also helping to revise the background
for the researchers familiar with this field. This introduction will be further used to
build and introduce current and emerging ML paradigms and their applications in
multiple case studies.
Organization This book is organized into three parts that consist of multiple
chapters. The first part introduces the relevant background information pertaining
to ML, traditional learning approaches that are widely used.
• Chapter 1 introduces the concept of applied machine learning. The metrics
used for evaluating the machine learning performance, data pre-processing, and
techniques to visualize and analyze the outputs (classification or regression or
other applications) are discussed.
• Chapter 2 presents a brief review of the probability theory and linear algebra that
are essential for a better understanding of the ML techniques discussed in the
later parts of the book.

v
vi Preface

• Chapter 3 introduces the machine learning techniques. Supervised learning is


primarily discussed in this chapter. Multiple supervised learning techniques,
learning techniques, and applications along with pros and cons for each of
the techniques are discussed. A qualitative comparison of different supervised
learning techniques is presented along with their suitability to different kinds of
applications.
• Unsupervised learning is introduced in Chap. 4. The differences compared to
the supervised learning and application scenarios are discussed first. Different
supervised learning for different applications including classification and feature
selection is discussed along with examples in this chapter.
• Reinforcement learning is a human learning-inspired technique, which can be
laid between supervised and unsupervised learning techniques in the spectrum.
Chapter 5 discusses the basics of reinforcement learning along with its variants
together with a comparison among different techniques.
Building on top of the basic concepts of machine learning, advanced machine
learning techniques used in real-world applications are discussed in the second part
of this book.
• The majority of the supervised learning techniques and their learning mecha-
nisms discussed in the first part of this book focus on offline or batch learning.
However, the learning in real-world applications needs to happen in an online
manner. As such, Chap. 6 introduces the online learning technique and different
variants of online learning techniques.
• With a diverse spectrum of Web applications demonstrating the importance of
learning from user behavior, recommender systems are widely used by the bulk
of social media companies. Chapter 7 of this book discusses approaches for
recommender learning.
• Chapter 8 offers approaches for graph learning. Graphs are used to depict things
and their connections in a variety of real-world applications, including social
networking, transportation, and disease spreading. Methods for learning graphs
and the relationships between nodes are discussed.
• In addition to advancements in machine learning algorithms, researchers have
also focused on exploiting the vulnerabilities in machine learning techniques.
Chapter 9 introduces adversarial machine learning techniques that discuss tech-
niques to inject the adversarial perturbations into the input samples to mislead the
machine learning algorithms. In addition, the techniques to harden the machine
learning techniques against these adversarial perturbations are discussed.
In addition to the advanced learning techniques, the application of machine
learning algorithms with entire discussions dedicated to real-world applications is
presented in the third part of this book.
Preface vii

• The application of machine learning techniques for health monitoring is one of


the critical real-world applications, especially with the introduction of wearable
devices including fitness trackers. Chapters 10 and 11 focus on the application of
machine learning techniques for health applications, particularly in the context
of wearable devices.
• Another pivotal application of machine learning is anomaly detection in the
context of security. Here, security refers to the security of the computing systems
including mobile devices. Chapter 12 focuses on the application of machine
learning to detect malware applications in resource-constrained devices, where
lightweight machine learning techniques are preferable compared to heavy deep
learning techniques.
• In contrast to other applications discussed, the final chapter of this book discusses
the application of machine learning for cloud resource management applications.
In particular, memory management, and resource distribution according to
the workload in a cognitive manner through machine learning techniques is
discussed in this chapter.
What’s New?
Numerous publications exist that give readers theoretical insights, and similarly,
there are books that focus on practical implementation through programming
exercises. However, our proposed book incorporates theoretical and practical
perspectives, as well as real-world case studies, and covers advanced machine
learning ideas. Additionally, this book contains various case studies, examples,
and solutions covering topics ranging from simple forecasting to enormous net-
work optimization and housing price prediction employing a massive database.
Finally, this book includes real implementation examples and exercises that allow
readers to practice and enhance their programming skills for machine learning
applications.
Scope of Book
This book introduces the theoretical aspects of machine learning (ML) algorithms
starting from simple neuron basics all the way to the complex neural networks
including generative adversarial neural networks and graph convolutional networks.
Most importantly, this book helps the readers in understanding the concepts of ML
algorithms and provides the necessary skills for the reader to choose an apt ML
algorithm for a problem that the reader wishes to solve.
Acknowledgements
The authors of this book would like to thank the colleagues at George Mason Uni-
versity, University of California Davis, and Mississippi State University, especially
the members of the Hardware Architecture and Artificial Intelligence (HArt) lab. We
would also like to express our deepest appreciation to the following faculty members
and students for their support: Abhijit Dhavlle (GMU), Sanket Shukla (GMU),
Sreenitha Kasarapu (GMU), Sathwika Bavikadi (GMU), Ali Mirzaein (GMU),
• “What Is Applied Machine Learning?”: Mahdi Orooji (University of California
Davis), Mitra Rezaei, Roya Paridar
• “Reinforcement Learning”: Qi Zhang (University of South Carolina)
viii Preface

• “Online Learning”: Shuo Lei (Virginia Tech), Yifeng Gao (University of Texas
Rio Grande Valley), Xuchao Zhang (NEC Labs America)
• “Recommender Learning”: Shanshan Feng (Harbin Institute of Technology,
Shenzhen), Kaiqi Zhao (University of Auckland)
• “Graph Learning”: Liang Zhao (Emory University)
• “SensorNet: An Educational Neural Network Framework for 167 Low-Power
Multimodal Data Classification”: Tinoosh Mohsenin (University of Maryland
Baltimore County), Arnab Mazumder (University of Maryland Baltimore
County), Hasib-Al- Rashid (University of Maryland Baltimore County)
• “Transfer Learning in Mobile Health”: Hassan Ghasemzadeh (Arizona State
University)
• “Applied Machine Learning for Computer Architecture Security”: Hossein
Sayadi (California State University, Long Beach)
• “Applied Machine Learning for Cloud Resource Management”: Hossein
Mohammadi Makrani (University of California Davis), Najme Nazari
(University of California Davis)
Kaiqi Zhao (University of Auckland), Shanshan Feng (Harbin Institute of Technol-
ogy, Shenzhen), Xuchao Zhang (NEC Labs America), Yifeng Gao (University of
Texas Rio Grande Valley), Shuo Lei (Virginia Tech), Zonghan Zhang (Mississippi
State University), and Qi Zhang (University of South Carolina).

Fairfax, VA, USA Sai Manoj Pudukotai Dinakarrao


Fairfax, VA, USA Setareh Rafatirad
Davis, CA, USA Houman Homayoun
Mississippi State, MS, USA Zhiqian Chen
November 2021
Contents

Part I Basics of Machine Learning


1 What Is Applied Machine Learning?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Machine Learning Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Knowing the Application and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Getting Started Using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Metadata Extraction and Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 A Practice for Performing Exploratory Data Analysis . . . . . . . . . . . . . 18
1.7.1 Importing the Required Libraries for EDA . . . . . . . . . . . . . . . . 19
1.7.2 Loading the Data Into Dataframe . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7.3 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7.4 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7.5 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.9 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 A Brief Review of Probability Theory and Linear Algebra. . . . . . . . . . . . 35
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Fundamental of the Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Discrete Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.1 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.2 Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.3 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4 Continuous Random Variable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.1 Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.2 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Common Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5.1 Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.5.2 Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ix
x Contents

2.6 Joint Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


2.6.1 Joint Distribution: Discrete Random Variables . . . . . . . . . . . 64
2.6.2 Joint Distribution: Continuous Random Variables . . . . . . . . 66
2.6.3 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.6.4 Multivariate Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . . 72
2.7 Matrix Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.7.1 Eigenvalue Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.7.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.9 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Preparing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2.1 Data Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2.2 Dealing with Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.3 Dealing with Imbalanced Datasets. . . . . . . . . . . . . . . . . . . . . . . . . 90
3.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.3.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.3.2 Multi-Variable Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.3.3 Multi-Variable Adaptive Regression Splines (MARS) . . . 96
3.3.4 AutoRegressive Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.3.5 Bayesian Linear Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.3.6 Logistic Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4.1 Modeling of Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4.2 Implementing Logical Gates with ANN . . . . . . . . . . . . . . . . . . . 107
3.4.3 Multi-Layer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.4.4 Training of MLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.4.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.4.6 Issues with Multi-Layer Perceptron . . . . . . . . . . . . . . . . . . . . . . . 115
3.4.7 Instances of Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 120
3.5 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.5.1 SVM Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.5.2 Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.6 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.6.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.6.2 AdaBoost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.6.3 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.6.4 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
3.6.5 Stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.7 Other Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.7.1 Bayesian Model Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.7.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Contents xi

3.7.3 Tree-Based Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143


3.7.4 AutoEncoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3.9 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.2.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.2.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.3 Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3 Unsupervised Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.3.1 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.3.2 Generative Adversarial Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.3.3 Deep Belief Nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.3.4 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
4.4 Feature Selection Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.4.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.4.2 T-Distributed Stochastic Neighbor Embedding . . . . . . . . . . . 195
4.4.3 Pearson Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.4.4 Independent Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.4.5 Non-negative Matrix Factorization (NMF) . . . . . . . . . . . . . . . . 206
4.5 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.6 Google Page Ranking Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
4.7 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
4.8 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.2 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.2.1 Accelerated Q-learning by Environment Exploration . . . . 221
5.3 TD(λ)-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.4 SARSA Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.5 Deep Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
5.6 Policy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.6.1 Stochastic Policy Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.6.2 REINFORCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.7 Gradient-Based Policy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.9 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Part II Advanced Machine Learning


6 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
xii Contents

6.2 Online Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236


6.2.1 First-/Second-Order Online Learning . . . . . . . . . . . . . . . . . . . . . 236
6.2.2 Online Learning with Regularization . . . . . . . . . . . . . . . . . . . . . . 239
6.3 Online Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.3.1 Online Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.3.2 Other Unsupervised Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.4 Application and Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.4.1 Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.4.2 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
6.4.3 Online Portfolio Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
6.4.4 Other Applications: Combined with Deep Learning . . . . . . 251
6.4.5 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
6.5 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.6 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7 Recommender Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.2 The Recommendation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.3 Content-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
7.4 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.4.1 Memory-Based Collaborative Filtering. . . . . . . . . . . . . . . . . . . . 261
7.4.2 Latent Factor Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.5 Factorization Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.6 Deep Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.7 Application and Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.7.1 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
7.7.2 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.8 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.9 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8 Graph Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.2 Basics of Math. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.2.1 Matrix Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.2.2 Eigendecomposition on Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.2.3 Approximation Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.2.4 Graph Representations and Graph Signal . . . . . . . . . . . . . . . . . 281
8.2.5 Spectral Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
8.3 Graph Neural Network Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.3.1 Spatial-Based Graph Convolution Networks . . . . . . . . . . . . . . 288
8.3.2 Spectral-Based Graph Convolution Networks . . . . . . . . . . . . 296
8.3.3 Other Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.4 Application and Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.5 Put It All Together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
8.6 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Contents xiii

9 Adversarial Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
9.2 Adversarial Attacks and Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
9.2.1 Adversarial Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
9.2.2 Adversarial Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.3 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
9.3.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
9.3.2 Performance with Adversarial Attacks . . . . . . . . . . . . . . . . . . . . 322
9.3.3 Effective Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
9.4 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
9.5 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

Part III Machine Learning in the Field


10 SensorNet: An Educational Neural Network Framework for
Low-Power Multimodal Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
10.2 SensorNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
10.2.1 Deep Neural Networks Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 332
10.2.2 Signal Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
10.2.3 Neural Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
10.3 SensorNet Evaluation using Three Case Studies . . . . . . . . . . . . . . . . . . . 338
10.3.1 Case Study 1: Physical Activity Monitoring . . . . . . . . . . . . . . 339
10.3.2 Case Study 2: Stand-Alone Dual-Mode Tongue
Drive System (sdTDS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
10.3.3 Case Study 3: Stress Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
10.4 SensorNet Optimization and Complexity Reduction . . . . . . . . . . . . . . . 343
10.4.1 The Number of Convolutional Layers . . . . . . . . . . . . . . . . . . . . . 344
10.4.2 The Number of Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10.4.3 Filter Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
10.4.4 Zero-Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
10.4.5 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
10.5 SensorNet Hardware Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10.5.1 Exploiting Efficient Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
10.5.2 Hardware Performance Parameters . . . . . . . . . . . . . . . . . . . . . . . . 352
10.6 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
10.7 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
11 Transfer Learning in Mobile Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
11.2 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
11.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
11.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
11.4 TransFall Framework Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
11.4.1 Vertical Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
xiv Contents

11.4.2 Horizontal Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368


11.4.3 Label Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
11.5 Validation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
11.5.1 Overview of the Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
11.5.2 Cross-Domain Transfer Learning Scenarios . . . . . . . . . . . . . . 374
11.5.3 Comparison Approach and Performance Metrics . . . . . . . . . 375
11.5.4 Choice of Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
11.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
11.6.1 Cross-Platform Transfer Learning Results . . . . . . . . . . . . . . . . 376
11.6.2 Cross-Subject Transfer Learning Results. . . . . . . . . . . . . . . . . . 377
11.6.3 Hybrid Transfer Learning Results . . . . . . . . . . . . . . . . . . . . . . . . . 378
11.6.4 Transformation Module Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 378
11.6.5 Parameter Examination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
11.7 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
12 Applied Machine Learning for Computer Architecture Security . . . . . 383
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.1.1 Malware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
12.1.2 Microarchitectural Side-Channel Attacks . . . . . . . . . . . . . . . . . 386
12.2 Challenges Associated with Traditional Security Mechanisms . . . . 388
12.3 Deployment of Hardware Performance Counters for
Computer Architecture Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
12.4 Application of Machine Learning for Computer
Architecture Security Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
12.4.1 Feature Selection: Key Microarchitectural Features . . . . . . 391
12.5 ML for Hardware-Assisted Malware Detection:
Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
12.5.1 Experimental Setup and Data Collection . . . . . . . . . . . . . . . . . . 394
12.5.2 Feature Selection and ML Classifiers Implementation . . . 395
12.5.3 Evaluation Results of ML-Based Malware Detectors. . . . . 396
12.6 ML for Microarchitectural SCAs Detection: Comparative
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
12.6.1 Detection Based on Victim Applications’ HPCs Data . . . . 399
12.6.2 ML Classifiers Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
12.6.3 Evaluation Results of ML-Based SCAs Detectors . . . . . . . . 401
12.7 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
13 Applied Machine Learning for Cloud Resource Management . . . . . . . . 405
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
13.1.1 Challenge of Diversity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
13.2 Modern Resource Provisioning Systems: ML Comes to
the Rescue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
13.3 Applications of Machine Learning in Resource
Provisioning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
13.3.1 Monitoring and Prediction of Applications’ Behavior . . . . 412
13.3.2 Using ML for Performance/Cost/Energy Estimation . . . . . 414
Contents xv

13.3.3 Explore and Optimize the Selection . . . . . . . . . . . . . . . . . . . . . . . 417


13.3.4 Decision Making. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
13.4 Security Threats in Cloud Rooted from ML-Based RPS . . . . . . . . . . . 420
13.4.1 Adversarial Machine Learning Attack to RPS . . . . . . . . . . . . 422
13.4.2 Isolation as a Remedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
13.5 Exercise Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Part I
Basics of Machine Learning
Chapter 1
What Is Applied Machine Learning?

We begin this chapter by discussing the importance of understanding data in order


to address various problems about the distribution of data, significant features, how
to transform features, and how to construct models to perform a specific machine
learning task in various problem domains. Let us begin our conversation with
Artificial Intelligence (AI), a collection of concepts that enables computers to mimic
human behavior. The primary objective of the field of artificial intelligence is to
develop artificial algorithms that can be used to inform intelligent future judgments.
Machine learning (ML) is an area of artificial intelligence that is concerned with
instructing/training an algorithm to execute such tasks. It is a scientific technique
for uncovering hidden patterns and conclusions in structured and unstructured data
by building mathematical models using a sample dataset referred to as training
set. Computing systems use machine learning models to transform data into
actionable results and carry out specific tasks, such as detecting malicious activity
in an IoT system, classifying an object in an autonomous driving application, or
discovering interesting correlations between variables in a patient dataset in a health
application domain. Machine learning algorithms include regression, instance-
based learning, regularization, decision tree, Bayesian, clustering, association-rule
learning, reinforcement learning, support vector machines, ensemble learning,
artificial neural network, deep learning, adversarial learning, federated learning,
zero-shot learning, and explainable machine learning.
Requirements of such techniques and applications will be discussed in the first
part of this book.

1.1 Introduction

Machine learning can be approached in two distinct ways: theoretical machine


learning and applied machine learning (Applied ML). Both the paths empower an
individual to solve problems in disparate ways. Theoretical machine learning is con-

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 3


S. Rafatirad et al., Machine Learning for Computer Scientists and Data Analysts,
https://doi.org/10.1007/978-3-030-96756-7_1
4 1 What Is Applied Machine Learning?

cerned with an understanding of the fundamental concepts behind machine learn-


ing algorithms, mathematics, statistics, and probability theory. However, applied
machine learning is about achieving the potential and impact of theoretical machine
learning developments. Thus, the purpose of Applied Machine Learning is to get a
sufficient understanding of fundamental machine learning principles and to address
real-world issues utilizing tools and frameworks that incorporate machine learning
algorithms. It is concerned with developing a workable learning system for a
particular application. Indeed, skill in applied machine learning comes from solving
numerous issues sequentially in multiple areas, which requires a grasp of the data
and the challenges encountered. This is not an easy undertaking, as no dataset or
algorithm exists that is optimal for all applications or situations.
Applied machine learning can be thought of as a search problem, where the
objective is to find the optimal mapping of inputs to outputs given a set of
data and a machine learning method. In other words, Applied Machine Learning
illustrates how an algorithm learns and the justification for combining approaches
and algorithms. The application of machine learning techniques has developed
dramatically from a specialty to a mainstream practice. They have been applied
to a variety of sectors to address specific issues, including autonomous driving,
Internet of Things (IoT) security, computer system cybersecurity [1–3], multimedia
computing, health [4, 5], and many more. Machine learning encompasses a broad
variety of tasks, from data collection to pre-processing and imputation, from data
exploration to feature selection, and finally, model construction and evaluation.
At each stage of this pipeline, decisions are made based on two key factors: an
awareness of the application’s characteristics and the availability of the required
data. The primary objective is to overcome machine learning issues posed by these
factors. For instance, in autonomous driving, some of the machine learning problems
include the size, completeness, and validity of the training set, as well as the safety
of the deep neural networks utilized against adversarial perturbations that could
force the system to misclassify an image [6]. Adversarial perturbations comprise
minor image manipulations such as scaling, cropping, and changing the lighting
conditions.
Another rapidly expanding application of machine learning is security for the
Internet of Things (IoT). Due to advancements in the IoT technology stack,
massive amounts of data are being generated in a variety of sectors with distinct
characteristics. As a result, computing devices have become more connected than
ever before, spanning the spectrum from standard computing devices (such as
laptops) to resource-constrained embedded devices, servers, edge nodes, sensors,
and actuators. The Internet of Things (IoT) network is a collection of internet-
connected non-traditional computer devices that are often low power and have
limited processing and storage capabilities. Parallel to the exponential growth of the
Internet of Things, the number of IoT assaults has risen tremendously. Due to a lack
of security protection and monitoring systems for IoT devices and networks, we are
in desperate need of creating secure machine learning approaches for IoT device
protection. As a result, such solutions must be sturdy, yet resource-constrained,
making their development a difficult challenge. Thus, tasks such as developing safe
1.2 The Machine Learning Pipeline 5

and durable models and performing hardware analysis on trained models (in terms
of hardware latency and area) are significant applied machine learning problems to
address in this sector [1, 7, 8].
The majority of this book discusses the difficulties and best practices associated
with constructing machine learning models, including understanding an applica-
tion’s properties and the underlying sample dataset.

1.2 The Machine Learning Pipeline

What is Machine Learning Pipeline? How do you describe the goal of machine
learning? What are the main steps in the machine learning pipeline? We will answer
these questions both through formal definitions and practical examples. Machine
learning pipeline is meant to help with automating the machine learning workflow,
in order to obtain actionable insights from big datasets. The goal of machine
learning is to train an accurate model to solve an underlying problem. However, the
term pipeline is misleading as many of the steps involved in the machine learning
workflow may be repeated iteratively so to enhance and improve the accuracy of the
model. The cyclical architecture of machine learning pipelines is demonstrated in
Fig. 1.1.
Initially, the input (or collected) data is prepared before performing any analysis.
This step includes tasks such as data cleaning, data imputation, feature engineering,
data scaling/standardization, and data sampling for dealing with issues includ-
ing noise, outliers, transforming categorical variables, normalizing/standardizing
dataset features, and imbalanced (or biased) datasets.
In the Exploratory Data Analysis step (EDA), data is analyzed to understand
its characteristics such as having a normal or skewed distribution (see Fig. 1.2).
Skewedness in data affects a statistical model’s performance, especially in the case
of regression-based models. To prevent harming the results due to skewness, it is a
common practice to apply a transformation over the whole set of values (such as log
transformation) and use the transformed data for the statistical model.

Fig. 1.1 The cyclical architecture of machine learning pipelines


6 1 What Is Applied Machine Learning?

(a) (b) (c)

Fig. 1.2 Comparison of different data distributions. In Right-Skewed or positive distribution, most
data falls to the positive, or the right side of the peak. In Left-Skewed or negative distribution, most
data falls to the negative, or the left side the peak. (a) Right-skewed. (b) Normal distribution. (c)
Left-skewed

Another prominent task performed during EDA is discovering the correlations


between attributes of the dataset to identify the independent variables that are
eventually used in the training process. For instance, if feature a1 is highly correlated
with feature a2 , then only one of those features should be considered for training
a model. Furthermore, in datasets where there is a linear relationship between
input and output variables, it is important to realize the relationships between the
variables such as positive correlations (when an input variable increases/decreases
as the target (i.e., output) variable increases/decreases) and negative correlations
(when an input variable increases/decreases as the target (i.e., output) variable
decreases/increases), or no correlation. Visualization techniques such as plotting
the colinearity in the data using a correlation map, or a scatter plot matrix (also
called pair-plot) can show a bi-variate or pairwise relationships between different
combinations of variables in a dataset. An example of a correlation matrix is
illustrated in Fig. 1.3.
Next, in the Feature Selection step, important features for training a machine
learning model using a dataset are identified. Important benefits of Feature Selection
include reducing over-fitting, improving the accuracy of the model, and reducing the
training time. Attribute selection can be conducted in different ways. Leveraging
known relationships between the variables can guide the selection of features.
However, when the number of features grows, data-driven exploratory techniques
come in handy. Some of the most common dimensionality reduction techniques
include Principal Component Analysis (PCA), t-distributed Stochastic Neighbor
Embedding (t-SNE), Independent Component Analysis (ICA), and clustering algo-
rithms (e.g., Gaussian mixture model).
Real-world datasets contain many attributes, among which, just a subset of them
help with the analysis. For instance, for lane detection in the autonomous driving
applications, important features include edge, gradient, and intensity [9, 10] as they
rely on the different intensity between the road surface and the lane markings. Once
important features are identified to perform a particular machine learning task in
an application, the prepared dataset is partitioned into a training and testing set; the
training data is used to train a machine learning algorithm to construct a model,
followed by the evaluation process, which is relied on the test data.
1.3 Knowing the Application and Data 7

Fig. 1.3 A correlation matrix for an airline dataset

As illustrated in Fig. 1.1, training a model is an iterative process; depending on


the performance factors and accuracy of the generated model, it is usually tuned
iteratively to enhance the weights of its parameters until no further improvements
are possible, or a satisfactory outcome is obtained.

1.3 Knowing the Application and Data

We live in the age of big data where data lives in various sources and repositories
stored in different formats: structured and unstructured. Raw input data can contain
structured data (e.g., numeric information, date), unstructured data (e.g., image,
text), or a combination of both, which is called semi-structured data. Structured data
is quantitative data that can fit nicely into a relational database such as the dataset in
Table 1.2 where the information is stored in tabular form. Structured attributes can
be transformed into quantitative values that can be processed by a machine. Unlike
structured data, unstructured data needs to be further processed to extract structured
information from it; such information is referred to as data about data, or what we
call metadata in this book.
Table 1.1 demonstrates a dataset for a sample text corpus related to research
publications in the public domain. This is an example of a semi-structured dataset,
which includes both structured and unstructured attributes: the attributes of year and
8

Table 1.1 A semi-structured dataset collected from GoogleScholar data source


Year Title Citations Authors Conference Abstract
1881 Surveylance: automatically 0 A Kharraz, 39th S&P 2018: [. . . we present SURVEYLANCE, the first system that
detecting online survey scams W Robertson, San Francisco, automatically identifies survey scams using machine
E Kirda CA, USA learning techniques. Our evaluation demonstrates . . . ]
1885 EyeTell: vide-assisted 2 Y Chen, T Li, 39th S&P 2018: [. . . Keystroke inference attacks pose an increasing threat
touchscreen keystroke R Zhang, San Francisco, to ubiquitous mobile devices. This paper . . . ]
inference from eye Y Zhanga CA, USA
movements
1886 Understanding linux malware 4 E Cozzi, 39th S&P 2018: [. . . For the past two decades, the security community has
M Graziano, San Francisco, been fighting malicious programs for Windows-based
Y Fratantonioa CA, USA operating . . . ]
1989 SoK: keylogging side 1 J Monaco 39th S&P 2018: [. . . The first keylogging side channel attack was
channels San Francisco, discovered over 50 years ago when Bell Laboratory
CA, USA researchers noticed an electro . . . ]
1869 FuturesMEX: secure, 2 F Massacci, 39th S&P 2018: [. . . in a futures-exchange, such as the Chicago mercantile
distributed futures market CN Ngo, J Nie, San Francisco, exchange, traders buy and sell contractual promises
exchange D Venturia CA, USA (futures) to acquire or deliver, at some future . . . ]
1 What Is Applied Machine Learning?
1.3 Knowing the Application and Data 9

citation are structured variables (categorical and numerical, respectively), whereas


the title, authors, conference, and abstract contain unstructured data (i.e., raw text).
In applied machine learning problems, we begin with understanding the data
behind an application in case there is no limited background knowledge available
about an application. Knowing the application can help make accurate decisions
about the important metadata to extract from the unstructured variables, which
techniques to entertain for metadata extraction—e.g., in case of raw text, extract
information such as the frequency of concepts using bag-of-words model, which
metadata standard to use, how to encode features (e.g., one-hot encoding), as well
as the selection of top features to help with generating the output model, which will
be later deployed on unseen data (Table 1.2).
Understanding the data behind an application transpires through performing
Exploratory Data Analysis (EDA), which is an approach used by the data analysts
to use visual explorations to understand what is in the dataset and the data
characteristics such as the relationships between the attributes and distribution of
data. There are many visualization techniques to use for understanding the data
within an application such as correlation matrix, histogram, box plot, and scatter
plot.
Let us take a look at a sample customer airline dataset in Table 1.4, which
contains 7 attributes including INDEX, FARE, SEATCLASS, GUESTS, GENDER,
AGE, and the class variable SUCCESS for a fictitious airline A. The ultimate goal is
to identify the factors that are helpful to understand why some customers are flying
with the airline, and why others are canceling. Here is a brief description of the
features:
• CUSTOMERID: A unique ID associated to a customer.
• GUESTS: Number of guests accompanying the customer.
• SUCCESS: Categorical variable that displays whether customer traveled or not.
• SEATCLASS: Categorical variable that displays the seat class of the customer.
• AGE: Numerical variable corresponding to the age of the customer.
• GENDER: Categorical variable describing the gender of the customer.
• FARE: Numeric variable for the total fare paid by the customer.
• SUCCESS: Categorical class variable indicating if the customer flies with the
airline.
The correlation matrix illustrated in Fig. 1.3 is an example of a technique used to
understand the relationships between the attributes of a dataset.
A correlation matrix is a tool to show the degree of association between a
pair of variables in a dataset. It visually describes the direction and strength of
a linear relationship between two variables. This correlation matrix visualizes the
correlations between the variables of the airline dataset. According to this plot,
GENDER and SEATCLASS have the highest correlations with the class variable;
GENDER is positively correlated with SUCCESS (with the degree of +0.54), while
SEATCLASS is negatively correlated (with the degree of −0.36).
Histogram is a graphical technique used to understand the distribution of data.
Figure 1.4 illustrates the distribution of the airline dataset over its structured vari-
10

Table 1.2 Structured malware dataset, obtained from virusshare and virustotal, covering 5 different classes of malware
Bus-cycles Branch-instructions Cache-references Node-loads Node-stores Cache-misses Branch-loads LLC-loads L1-dcache-stores Class
11463 37940 8057 1104 111 2419 37190 2360 38598 Backdoor
1551 5055 1096 165 17 333 4916 330 5003 Backdoor
29560 126030 20008 1769 146 4098 108108 5987 99237 Backdoor
26211 117761 14783 1666 48 4182 117250 4788 91070 Backdoor
30139 123550 20744 1800 158 4238 124724 6969 115862 Backdoor
12989 30012 9076 1252 136 5412 27909 2000 27170 Benign
6546 12767 4953 548 87 3683 13157 864 12361 Benign
8532 31803 7087 699 124 3240 34722 1970 34974 Benign
14350 27451 9157 1843 178 6611 28507 2411 24908 Benign
13837 25436 12235 1296 192 7148 24747 2533 23757 Benign
1068674 8211420 168839 42612 28574 73696 6298568 64166 6202146 Rootkit
1054761 8187337 162526 41245 28389 71576 6688738 67408 6655480 Rootkit
1046053 8196952 158955 40525 28113 70250 6981991 69597 6950106 Rootkit
1038524 8124926 157896 40207 28214 69910 7134795 71132 7148734 Rootkit
1030773 8069156 158085 39603 28265 69356 7230800 72226 7294250 Rootkit
999182 29000000 455 64 5 94 29000000 289 14000000 Trojan
999189 29000000 457 65 5 95 29000000 288 14000000 Trojan
999260 29000000 457 65 6 96 29000000 287 14000000 Trojan
999265 29000000 459 67 6 98 29000000 287 14000000 Trojan
999277 29000000 459 67 6 98 29000000 288 14000000 Trojan
1 What Is Applied Machine Learning?
989865 9128084 2549 169 37 268 9404871 923 9614242 Virus
989984 9130539 2529 168 37 266 9402351 920 9611680 Virus
990117 9132992 2510 167 36 264 9400377 914 9609689 Virus
990233 9135227 2491 165 36 262 9397484 909 9606756 Virus
990366 9137694 2473 164 36 260 9395002 903 9604237 Virus
760836 7851079 165236 8891 4047 13803 10530146 38930 4454651 Worm
765750 7957382 161998 8717 3967 13533 10573140 38205 4453953 Worm
770445 8059123 158884 8549 3891 13273 10606358 37508 4450452 Worm
774993 8157690 155888 8388 3818 13022 10660033 36824 4454598 Worm
779347 8251754 153008 8237 3747 12785 10693711 36171 4452344 Worm
1.3 Knowing the Application and Data
11
12 1 What Is Applied Machine Learning?

Table 1.4 A sample dataset of an airline’s customers


Index Description Success Guests Seat class Customer ID Fare Age Title Gender
0 Braund, 0 1 3 1 7.25 22 Mr Male
Mr. Owen
Harris; 22
1 Cumings, 1 1 1 2 71.3 38 Mrs Female
Mrs. John
Bradley . . .
2 Heikkinen, 1 0 3 3 7.92 26 Miss Female
Miss. Laina; 26
3 Futrelle, 1 1 1 4 53.1 35 Mrs Female
Mrs. Jacques
Heath. . .
4 Allen, 0 0 3 5 8.05 35 Mr Male
Mr. William
Henry. . .
5 Moran, 0 0 3 6 8.46 0 Mr Male
Mr. James;
6 McCarthy, 0 0 1 7 51.9 54 Mr Male
Mr. Timothy J;
54

Fig. 1.4 Data distribution over the variables of an airline dataset

ables including SEAT CLASS, GUESTS, FARE, and customer TITLE. Histograms
display a general distribution of a set of numeric values corresponding to a dataset
variable over a range.
Plots are great means to help with understanding the data behind an application.
Some example application of such plots is described in Table 1.5. It is important to
1.4 Getting Started Using Python 13

note that every plot is deployed for a different purpose and applied to a particular
type of data. Therefore, it is crucial to understand the need for such techniques
used during the EDA step. Such graphical tools can help maximize insight, reveal
underlying structure, check for outliers, test assumptions, and discover optimal
factors.
As indicated in Table 1.5, several Python libraries offer very useful tools to plot
your data. Python is a real generic programming language with a very large user
community. It is purpose-built for large datasets and machine learning analysis. In
this book, we focus on using Python language for various machine learning tasks
and hands-on examples and exercises.

1.4 Getting Started Using Python

Before getting started using Python for applying machine learning techniques
on a problem, you may want to find out which IDEs (Integrated Development
Environment) and text editors are tailored for Python programming or looking
at code samples that you may find helpful. IDE is a program dedicated to
software development. A Python IDE usually includes an editor to write and handle
Python code, build, execution, debugging tools, and some form of source control.
Several Python programming environments exist depending on how advanced is
a Python programmer to perform a machine learning task. For example, Jupyter
Notebook is a very helpful environment for beginners who have just started with
traditional machine learning or deep learning. Jupyter Notebook can be installed in
a virtual environment using Anaconda-Navigator, which helps with creating virtual
environments and installing packages needed for data science and deep learning.
While Jupyter Notebook is more suitable for beginners, there are other machine
learning frameworks such as TensorFlow that are mostly used for deep learning
tasks. As such, depending on how advanced you are in Python programming, you
may end up using a particular Python programming environment. In this book, we
will begin with using Jupyter Notebook for programming examples and hands-on
exercises. As we move toward more advanced machine learning tasks, we switch to
TensorFlow. You can download and install Anaconda-Navigator on your machine
using the following link by selecting Python 3.7 version: https://www.anaconda.
com/distribution/.
Once it is installed, navigate to Jupyter Notebook and hit “Launch.” You will then
have to choose or create a workspace folder that you will use to store all your Python
programs. Navigate to your workspace directory and hit the “New” button to create a
new Python program and select Python 3. Use the following link to get familiar with
the environment: https://docs.anaconda.com/anaconda/user-guide/getting-started/.
In the remaining part of this chapter, you will learn how to conduct preliminary
machine learning tasks through multiple Python programming examples.
14 1 What Is Applied Machine Learning?

Table 1.5 Popular Python tools for understanding the data behind an application. https://
github.com/dgrtwo/gleam
Plot Python Usage
type library description Example
Line plot Plotly Trends in data

2.5
Scatter plot Gleam Multivariate data 2.0

petal_width
1.5

1.0

0.5

0.0
1 2 3 4 5 6 7
petal_length

Layered ggplot Compare trend over time


area chart

Nullity Missingno Data sparsity


matrix

Bar plot Bokeh Streaming & real-time data

Scatter plot Seaborn Bivariate data correlations


matrix

Box plot Pygal, Seaborn Outliers and data distribution 7

6
petal_length

1
setosa versicolor virginica
species
3.0

Histogram Matplotlib Outliers & data distribution 2.5

2.0

species
1.5 setosa
versicolor
virginica
1.0

0.5

0.0
1 2 3 4 5 6 7 8
petal_length

Heatmap, Seaborn Uses a system of color coding


dot-density to represent different values
1.5 Metadata Extraction and Data Pre-processing 15

1.5 Metadata Extraction and Data Pre-processing

It is a no-brainer that data is a crucial aspect of machine learning. Data is used to


train machine learning models and tune their parameters to improve their accuracy
and performance. Data is available in various types, and different forms: structured
and unstructured. Metadata contains information about a dataset. Such information
describes the characteristics of data such as format, location, author, content,
relationship, and data quality. It can also include information about features, models,
and other artifacts from the machine learning pipeline. Metadata is highly structured
and actionable information about a dataset.
The popularity of metadata grows due to the proliferation of devices that generate
data and data integration, dealing with heterogeneity and diversity of data. Metadata
extraction is the process of extracting salient features from a dataset. Depending
on the type of data (e.g., text, image, so forth), its metadata is extracted and
represented in different ways. For instance, weather information (metadata) can be
generated using the timestamp and location information of the image provided in
the image’s EXIF tag that is used largely to encode contextual information related
to image generation by digital cameras. Another example of metadata is Bag-
of-Words (BOW) and its flavors such as Frequency Vectors, One Hot Encoding
(OHE), and Term Frequency/Inverse Document Frequency (TF/IDF), which are
used to generate metadata corresponding to a text document. Such representation
encompasses words that stand for entities in a dataset, leading to the notion of entity
extraction. Feature representation in a dataset is an important step. In some datasets,
features are numeric, such as the attributes displayed in a tabular view in Table 1.2.
However, there are many other datasets that contain categorical information, which
would need feature engineering before performing any machine learning task.
For instance, recording weather information in a dataset using categories such as
cloudy, windy, rainy. Furthermore, applications such as text classification, concept
modeling, language modeling, image captioning, question answering, and speech
recognition are some examples where feature engineering is required to represent
features numerically.
Let us consider a topic modeling application where the goal is to perform
text classification. Table 1.6 illustrates a sample dataset that has ID, content, and
Topic as the original attributes. However, the machine cannot use these attributes
as is to perform mathematical computations as these features (except ID) are not
numeric. Therefore, metadata extraction followed by feature engineering is required
to transform these attributes into numeric values. Let us make this more concrete by
focusing on the content attribute that contains the unstructured raw text. The data in
this column cannot be directly used as features in its original form unless we extract
metadata from it and provide a numeric encoding. For instance, one way is to extract
N-grams, which is a contiguous sequence of n items (such as phonemes, syllables,
letters, words, or base pairs) from a given sample of text or speech.
16 1 What Is Applied Machine Learning?

Example 1.1 (N-Gram Extraction)


Problem: Extract the N-grams from a given string of text and display all the
extracted N-grams.
Solution: To extract N-grams as metadata, off-the-shelf Natural Language
Processing (NLP) tools such as Natural Language Toolkit (NLTK) can be
used, which is a leading platform for building Python programs to work with
human language data. A widely used feature engineering technique in this
situation is One-Hot-Encoding, which provides the mapping of categorical
values into integer values. But first, we need to extract the categories. To
perform this, one can use re and nltk.util packages to apply regular
expression matching operation and finding n-grams to only retain useful
content terms. The Python code below can be used to extract 2-grams and
4-grams. The result is displayed.

1 import re
2 from nltk.util import ngrams
3
4 #input text
5 text = """tighter time analysis for real-time traffic in on-chip \
6 networks with shared priorities"""
7 print(’input text: ’ + text)
8
9 tokens = [item for item in text.split(" ") if item != ""]
10
11 output2 = list(ngrams(tokens, 2)) #2-grams
12 output4 = list(ngrams(tokens, 4)) #4-grams
13
14 allOutput=[]
15 for bigram in output2:
16 if bigram[0]!= "for" and bigram[1]!= "for" and bigram[0]!="in" and \
17 bigram[1]!="in" and bigram[0]!="with" and bigram[1]!="with":
18 allOutput.append(bigram)
19
20 print(’\nall extracted bigrams are:’)
21 print(allOutput)
22
23
24 allOutput=[]
25
26 for quadgram in output4:
27 if quadgram[0]!= "for" and quadgram[1]!= "for" and quadgram[2]!= "for" \
28 and quadgram[3]!= "for" and quadgram[0]!="in" and quadgram[1]!="in" \
29 and quadgram[0]!="with" and quadgram[1]!="with":
30 allOutput.append(quadgram)
31
32 print(’\nall extracted quadgrams are:’)
33 print(allOutput)
34
35 >input text: tighter time analysis for real-time traffic in on-chip \
36 >networks with shared priorities
37
38 >all extracted bigrams are:
39 >[(’tighter’, ’time’), (’time’, ’analysis’),\
40 >(’real-time’, ’traffic’), (’on-chip’, ’networks’), (’shared’, ’priorities’)]
41
1.6 Data Exploration 17

Table 1.6 A sample dataset for text classification


ID Content Topic
1 Using benes networks at fault-tolerant and deflec- Fault tolerant systems
tion routing based network-on-chips
2 Tighter time analysis for real-time traffic in on-chip Network-on-chip analysis
networks with shared priorities
3 Loosely coupled in situ visualization: a perspective Scientific visualization
on why it’s here to stay
4 Lessons learned from building in situ coupling In Situ visualization
frameworks
5 An approach to lowering the in situ visualization In situ visualization
barrier
6 PROSA: protocol-driven NoC architecture Computer architecture
7 Hybrid large-area systems and their interconnection Sensor phenomena and character-
backbone ization
8 Bubble budgeting: throughput optimization for Resource management
dynamic workloads by exploiting dark cores in
many core systems

42 >all extracted quadgrams are:


43 >[(’real-time’, ’traffic’, ’in’, ’on-chip’), \
44 >(’on-chip’, ’networks’, ’with’, ’shared’)]

Metadata extraction is an important phase in machine learning. Once the features


are extracted, the dataset should be pre-processed to get prepared for training.
Preprocessing includes data cleaning and data imputation, outlier detection, and
data exploration. Outliers in a dataset are those samples that show abnormal distance
from the other samples in the dataset. There are various methods to detect outliers.
One simple technique is to visually identify irregular sample using a scatter plot, or
a histogram when the problem is not very complex. For more complex problems,
techniques such as one-class SVM, Local Outlier Factor, and Isolation Forest. In
outlier detection, it is important to include the output variable as the outliers form
around the clusters related to the output variable.

1.6 Data Exploration

Data Exploration or Exploratory Data Analysis (EDA) is an important part of data


analysis. It is a systematic process to understand the data, maximize insight, discover
latent correlations between variables, identify important variables, outliers, and
anomalies, and perform dimensionality reduction using various data visualization
and statistical techniques. Data exploration or data understanding is where an
analyst takes a general view of the data to make some sense of it.
Exploratory Data Analysis (EDA) is understanding the datasets by summarizing
their main characteristics often plotting them visually. This step is very important
18 1 What Is Applied Machine Learning?

especially when we arrive at modeling the data in order to apply machine learning.
Plotting in EDA consists of histograms, box plot, scatter plot, and many more. It
often takes much time to explore the data. Through the process of EDA, we can ask
to define the problem statement or definition on our dataset, which is very important.
Some of the common questions one can ask during EDA are:
• What kind of variations exist in data?
• What type of knowledge is discovered from the covariance matrix of data in
terms of the correlations between the variables?
• How are the variables distributed?
• What kind of strategy to follow with regard to the outliers detected in a dataset?
Some typical graphical techniques widely used during EDA include histogram,
confusion matrix, box plot, scatter plot, principal component analysis (PCA), and so
forth. Some of the available popular Python libraries used for EDA include seaborn,
pandas, matplotlib, and NumPy. In this section, we will illustrate multiple examples
showing how EDA is conducted on a sample dataset.

1.7 A Practice for Performing Exploratory Data Analysis

The selection of techniques for performing Exploratory Data Analysis (EDA)


depends on the dataset. There is no single method or common methods in order
to perform EDA. Based on this section, you can practice some common methods
and plots that would be used in the EDA process.
We will perform the EDA for Fisher’s Iris dataset to illustrate different EDA
techniques. The Iris dataset contains 3 classes of 50 instances each, where each
class refers to a type of iris flower. The features in the dataset are sepal length, sepal
width, petal length, and petal width (Fig. 1.5). One class is linearly separable from
the other two; the latter are NOT linearly separable from each other. The predicted
attribute is the class of the Iris flower. The objective is to classify flowers into one
of the categories. In this section, we will perform the EDA on the Iris dataset and
observe the trend.

Fig. 1.5 Flower attributes


“Sepal and Petal”
1.7 A Practice for Performing Exploratory Data Analysis 19

1.7.1 Importing the Required Libraries for EDA

Let us begin the EDA by importing some libraries required to perform EDA.
1 import pandas as pd
2 import seaborn as sns #visualization
3 import matplotlib.pyplot as plt #visualization
4 import numpy as np

1.7.2 Loading the Data Into Dataframe

The first step to performing EDA is to represent the data in a Dataframe form, which
provides one with extensive usage for data analysis and data manipulation. Loading
the data into the Pandas dataframe is certainly one of the most preliminary steps in
EDA, as we can see that the value from the dataset is comma separated. So all we
have to do is to just read the CSV file into a dataframe and pandas dataframe does
the job for us. First, download iris.csv from https://raw.githubusercontent.com/uiuc-
cse/data-fa14/gh-pages/data/iris.csv. Loading the data and determining its statistics
can be done using the following command:
1 import pandas as pd
2 import matplotlib.pyplot as plt
3
4 data = pd.read_csv(’iris.csv’)
5 print(’size of the dataset and the number of features are:’)
6 print(data.shape)
7 print(’\ncolumn names in the dataset:’)
8 print(data.columns)
9 print(’\nnumber of samples for each flower species:’)
10 print(data["species"].value_counts())
11
12 data.plot(kind=’scatter’, x=’petal_length’, y=’petal_width’)
13 plt.show()
14
15 > # size of the dataset and the number of features are:
16 >(150, 5)
17
18 ># column names in the dataset:
19 >Index([’sepal_length’, ’sepal_width’, ’petal_length’, ’petal_width’,’species
’], dtype=’object’)
20
21 ># number of samples for each flower species:
22 >virginica 50
23 >setosa 50
24 >versicolor 50
25 >Name: species, dtype: int64

The value_counts() method helps to understand whether the dataset is balanced


or imbalanced. Based on the output of this method, Iris dataset is a balanced dataset
with 50 samples/data points per species. Now let us use some visualization to
better understand data including distribution of observations, classes, correlation
of attributes, and identifying potential outliers.
20 1 What Is Applied Machine Learning?

Fig. 1.6 2D scatter plot for iris dataset, based on two attributes “petal-length” and “petal-width”

1.7.3 Data Visualization


2D Scatter Plot

A scatter plot can display the distribution of data. Figure 1.6 shows a 2D scatter
plot for visualizing the iris data (the command is included in the previous code
snippet). The plot observed is a 2D scatter plot with petal_length on x-axis and
petal_width on y-axis. However, with this plot, it is difficult to understand per class
distribution of data. Using a color-coded plot can help plot the color coding for
each flower/species/type of class. This can be done using seaborn(sns) library by
executing the following commands:
1 import seaborn as sns
2 sns.set_style("whitegrid")
3 sns.FacetGrid(data, hue="species", height=4) \
4 .map(plt.scatter, "petal_length", "petal_width") \
5 .add_legend()
6 plt.show()

Looking at this scatter plot in Fig. 1.6, it is a bit difficult to make sense of the
data since all data points are displayed with the same color regardless of their label
(i.e., category). However, apply color coding to the plot and we can say a lot about
the data by using a different color for each label. Figure 1.7 shows the color-coded
scatter plot coloring setosa with blue, versicolor with orange, and virginica with
green. One can understand how data is distributed across the two axes of petal-width
and petal-length based on the flower species. The plot clearly shows the distribution
across three clusters (blue, orange, and green), two of which are non-overlapping
(blue and orange), and two overlapping ones (i.e., orange and green).
One important observation that can be realized from this plot is that petal-
width and petal-length attributes can distinguish between setosaa and versicolor
and between setosa and versicolor. However, the same attributes cannot distinguish
1.7 A Practice for Performing Exploratory Data Analysis 21

Fig. 1.7 2D color-coded


2.5
scatter plot for iris dataset to
visualize the distribution of
the iris dataset 2.0

petal_width
1.5
species
setosa
versicolor
1.0
virginica

0.5

0.0
1 2 3 4 5 6 7
petal_length

versicolor from virginica due to their overlapping clusters. This implies that the
analyst should explore other attributes to train an accurate classifier and perform a
reliable classification. So here is the summary of our observations:
• Using petal-length and petal-width features, we can distinguish setosa flowers
from others. How about using all the attributes?
• Separating versicolor from viginica is much harder as they have considerable
overlap using petal-width and petal-length attributes. Would one obtain the same
observation if instead sepal-width and sepal-length attributes were used?
We have also included the 3D scatter plot in the Jupyter notebook for
this tutorial. A sample tutorial for 3D scatter plot with Plotly Express can
be found here, which needs a lot of mouse interaction to interpret data.
https://plot.ly/pandas/3d-scatter-plots/ (What about 4D, 5D, or n-D scatter plot?)

Pair-Plot

When the number of features in a dataset is high, pair-plot can be used to clearly
visualize the correlations between the dataset variables. The pair-plot visualization
helps to view 2D patterns (Fig. 1.8) but fails to visualize higher dimension patterns
in 3D and 4D. Datasets under real-time study contain many features. The relation
between all possible variables should be analyzed. The pair plot gives a scatter plot
between all combinations of variables that you want to analyze and explains the
relationship between the variables (Fig. 1.8).
To plot multiple pairwise bivariate distributions in a dataset, you can use the
pairplot() function in seaborn. This shows the relationship for (n, 2) combination of
variables in a Dataframe as a matrix of plots and the diagonal plots are the univariate
plots. Figure 1.8 illustrates the pair-plot for iris dataset, which lead to the following
observations:
22 1 What Is Applied Machine Learning?

Fig. 1.8 Pair-plot over the variables of iris dataset

• Petal-length and petal-width are the most useful features to identify various
flower types.
• While Setosa can be easily identified (linearly separable), Virnica and Versicolor
have some overlap (almost linearly separable).
With the help of pair-plot, we can find “lines” and “if-else” conditions to build a
simple model to classify the flower types.
1 plt.close();
2 sns.set_style("whitegrid");
3 sns.pairplot(iris, hue="species", height=3);
4 plt.show()
1.7 A Practice for Performing Exploratory Data Analysis 23

Fig. 1.9 Histogram plot showing frequency distribution for variable “petal_length”

Histogram Plot

A histogram plot is a diagram, which shows the underlying frequency plot/distribu-


tion of different variables in a dataset. The plot will allow us to inspect the data for
its underlying distribution (e.g., normal distribution), outliers, skewness, and many
more Fig. 1.9. We can view a histogram plot by using seaborn library with the help
of following commands:
1 sns.FacetGrid(iris, hue="species", height=5) \
2 .map(sns.distplot, "petal$\_$length") \
3 .add_legend();
4 plt.show()

Probability Distribution Function

A probability distribution function (PDF) is a statistical function that describes all


the possible values, likelihoods that a random variable is possible within a given
range. The range is bounded between the minimum and maximum possible values,
but where the possible value is likely to be plotted on the probability distribution
depends on several factors that include the distribution’s mean (average), standard
deviation, skewness, and kurtosis.
24 1 What Is Applied Machine Learning?

Cumulative Distribution Function

The cumulative distribution function (CDF) of a random variable is another method


to describe the distribution of random variables. The advantage of the CDF is that
it can be defined for any kind of random variable (discrete, continuous, and mixed).
The cumulative distribution function is applicable for describing the distribution of
random variables that is either continuous or discrete.

Box Plot

A box and whisker plot also called a box plot displays the five-number summary
of a set of data. The five-number summary is the minimum, first quartile, median,
third quartile, and maximum. In a box plot, we draw a box from the first quartile to
the third quartile. A vertical line goes through the box at the median. The whiskers
go from each quartile to the minimum or maximum. A box and whisker plot is a
way of summarizing a set of data measured on an interval scale. It is often used
in explanatory data analysis. This type of graph is used to show the shape of the
distribution, its central value, and its variability. In a box and whisker plot:
• The ends of the box are the upper and lower quartiles, so the box spans the
interquartile range.
• The median is marked by a vertical line inside the box.
• The whiskers are the two lines outside the box that extend to the highest and
lowest observations.
The following code snippet shows how a box plot is used to visualize the
distribution of the iris dataset. Figure 1.10 shows the box plot visualization across
the iris dataset “species” output variable.
1 sns.boxplot(x=’species’,y=’petal_length’, data=data)
2 plt.show()

Violin Plots

Violin plots are a method of plotting numeric data and can be considered a
combination of the box plot with a kernel density plot. In the violin plot (Fig. 1.11),
we can find the same information as in box plots:
• Median.
• Interquartile range.
• The lower/upper adjacent values are defined as first quartile-1.5 IQR and third
quartile + 1.5 IQR, respectively. These values can be used in a simple outlier
detection (Turkey’s fence) techniques, where observations lying outside of these
“fences” can be considered outliers.
1.7 A Practice for Performing Exploratory Data Analysis 25

5
petal_length

1
setosa versicolor virginica
species

Fig. 1.10 Box plot for Iris dataset over “species” variable

Fig. 1.11 Violin plot over


the variable “petal_length” of
the iris dataset

Violin plots can be easily visualized using seaborn library as follows:


1 sns.violinplot(x="species", y="petal$\_$length", data=iris, size=8)
2 plt.show()

Univariate, Bivariate, and Multivariate Analysis

Univariate is a term commonly used in statistics to describe a type of data that


consists of observations on only a single characteristic or attribute. A simple
example of univariate data would be the salaries of workers in the industry. Like
all the other data, univariate data can be visualized using graphs, images, or other
analysis tools after the data is measured, collected, reported, and analyzed.
26 1 What Is Applied Machine Learning?

Fig. 1.12 Bivariate


relationship of two attributes
in Iris dataset. The univariate
profiles are plotted in the
margin

Data in statistics are sometimes classified according to how many variables are
in a study. For example, “height” might be one variable and “weight” might be
another variable. Depending on the number of variables being looked at, the data is
univariate, or it is bivariate.
Multivariate data analysis is a set of statistical models that examine patterns in
multi-dimensional data by considering at once with several data variables. It is
an expansion of bivariate data analysis, which considers only two variables in its
models. As multivariate models consider more variables, they can examine more
complex analyses/phenomena and find the data patterns that can more accurately
represent the real world. These three analyses can be done by using seaborn library
in the following manner, depicted in Fig. 1.12, showing the bivariate distribution of
“petal-length” and “petal-width,” as well as the univariate profile of each attribute
in the margin.
1 sns.jointplot(x="petal_length", y="petal_width", data=data, kind="kde")
2 plt.show()

Visualization techniques are very effective, helping the analyst understand the
trends in data.
1.7 A Practice for Performing Exploratory Data Analysis 27

1.7.4 Data Analysis

In addition to data visualization, extracting the information related to data is non-


trivial. Here, we discuss different kinds of information that can be extracted related
to the data.

Standard Deviation

The standard deviation is a statistic that measures the dispersion of a dataset relative
to its mean and is calculated as the square root of the variance. The standard
deviation is calculated as the square root of variance by finding each data point’s
deviation in the dataset relative to the mean. If the data points are far from the
mean, there is a higher deviation within the dataset. The more dispersed the data,
the larger the standard deviation; conversely, the more dispersed the data, the smaller
the standard deviation.
1 print("\n Std-dev:");
2 print(np.std(iris_setosa["petal_length"]))
3 print(np.std(iris_virginica["petal_length"]))
4 print(np.std(iris_versicolor["petal_length"]))
5
6 >Std-dev:
7 >0.17191858538273286
8 >0.5463478745268441
9 >0.4651881339845204

Mean/Average

The mean/average is the most popular and well-known measure of central tendency.
It can be used with both discrete and continuous data, although its use is most often
with continuous data. The mean is the sum of all values in the dataset divided by all
the values in the dataset. So, if we have n data points in a dataset and they have values
x1 , x2 , · · · , xn , the sample mean, usually denoted by x, is x = (x1 +x2 +· · ·+xn )/n
1 print("Means:")
2 print(np.mean(iris_setosa["petal_length"]))
3 # Mean with an outlier.
4 print(np.mean(np.append(iris_setosa["petal_length"],50)))
5 print(np.mean(iris_versicolor["petal_length"]))
6
7 >Means:
8 >1.4620000000000002
9 >2.4137254901960787
10 >4.26

Run the above commands to see the output.


28 1 What Is Applied Machine Learning?

Variance

Variance in statistical context is a measurement of the spread between numbers in


a dataset. That is, it measures how far each number in the set is from the mean and
therefore from every other number in the set. Variance is calculated by taking the
differences between each number in the dataset and the mean, then squaring the
differences to make them positive, and finally dividing the sum of the squares by the
number of values in the dataset.
1 print("Variance:")
2 print(np.var(iris_setosa["petal_length"]))
3 # Variance with an outlier.
4 print(np.var(np.append(iris_setosa["petal_length"],50)))
5 print(np.var(iris_versicolor["petal_length"]))
6
7 >Variance:
8 >0.02955600000000001
9 >45.31804690503652
10 >0.21640000000000012

Median

The median is the central/middle number in a set of sorted ascending or descending


list of numbers and can be more descriptive of that dataset than the average. If there
is an odd amount of numbers, the median value is the number that is in the middle,
with the same amount of numbers below and above. If there is an even amount of
numbers in the list, the middle pair must be determined, added together, and divided
by two to find the median value.
1 print("\n Medians:")
2 print(np.median(iris_setosa["petal_length"]))
3 # Median with an outlier
4 print(np.median(np.append(iris_setosa["petal_length"],50)))
5 print(np.median(iris_virginica["petal_length"]))
6 print(np.median(iris_versicolor["petal_length"]))
7
8 >Medians:
9 >1.5
10 >1.5
11 >5.55
12 >4.35

Percentile

Percentiles are used to understand and interpret data. The nth percentile of a set of
data is the value at which n percent of the data is below it. They indicate the values
below which a certain percentage of the data in a dataset is found. Percentiles can be
calculated using the formula n = (P /100) × N, where P = percentile, N = number
1.7 A Practice for Performing Exploratory Data Analysis 29

of values in a dataset (sorted from smallest to largest), and n = ordinal rank of a


given value.
1 print("\n 90th Percentiles:")
2 print(np.percentile(iris_setosa["petal_length"],90))
3 print(np.percentile(iris_virginica["petal_length"],90))
4 print(np.percentile(iris_versicolor["petal_length"], 90))
5
6 >90th Percentiles:
7 >1.7
8 >6.3100000000000005
9 >4.8

Quantile

A quantile is a statistical term describing a division of observations into four defined


intervals based upon the values of the data and how they compare to the entire set
of observations. The median is an estimator but says nothing about how the data
on either side of its value is spread or dispersed. The quantile measures the spread
of values above and below the mean by dividing the distribution into four groups.
We can map the four groups formed from the quantiles. The first group of values
contains the smallest number up to Q1; the second group includes Q1 to the median;
the third set is the median to Q3; and the fourth category comprises Q3 to the highest
data point of the entire set. Each quantile contains 25% of the total observations.
Generally, the data is arranged from smallest to largest: 1. First quantile: the lowest
25% of numbers 2. Second quantile: between 25.1 and 50% (up to the median) 3.
Third quantile: 51–75% (above the median) 4. Fourth quantile: the highest 25% of
numbers
1 print("\n Quantiles:")
2 print(np.percentile(iris_setosa["petal_length"],np.arange(0, 100, 25)))
3 print(np.percentile(iris_virginica["petal_length"],np.arange(0, 100, 25)))
4 print(np.percentile(iris_versicolor["petal_length"], np.arange(0, 100, 25)))
5
6 >Quantiles:
7 >[1. 1.4 1.5 1.575]
8 >[4.5 5.1 5.55 5.875]
9 >[3. 4. 4.35 4.6 ]

Interquartile Range

The IQR describes the middle 50% of values when ordered from lowest to highest.
To find the interquartile range (IQR), initially, find the median (middle value) of the
lower and upper half of the data. These values are quartile 1 (Q1) and quartile 3
(Q3). The IQR is the difference between Q3 and Q1.
30 1 What Is Applied Machine Learning?

Mean Absolute Deviation

The mean absolute deviation of a dataset is the average distance between each data
point and the mean. It gives us an idea about the variability in a dataset. The idea is to
calculate the mean, calculate how far away each data point is from the mean using
positive distances, which are also called absolute deviations, add those deviations
together, and divide the sum by the number of data points.
1 from statsmodels import robust
2
3 print ("\n Median Absolute Deviation")
4 print(robust.mad(iris_setosa["petal_length"]))
5 print(robust.mad(iris_virginica["petal_length"]))
6 print(robust.mad(iris_versicolor["petal_length"]))
7
8 >Median Absolute Deviation
9 >0.14826022185056031
10 >0.6671709983275211
11 >0.5189107764769602

1.7.5 Performance Evaluation Metrics

Evaluating the performance of machine learning classifiers is an important step in


implementing effective ML-based countermeasure techniques. In machine learning
and statistics, there are a variety of measures that can be deployed to evaluate the
performance of a detection method in order to show its detection accuracy. Table 1.7
lists the standard evaluation metrics used for performance analysis of malware and
side-channel attacks detection and classification. For analyzing the detection rate
of ML-based security countermeasures, malicious applications’ samples are often
considered as positive instances. As a result, the True Positive Rate (TPR) metric, or
the hit rate, represents sensitivity that stands for the proportion of correctly identified
positives. It is basically the rate of malware samples (i.e., positive instances)
correctly classified by the classification model. The True Negative Rate (TNR) also
represents specificity that measures the proportion of correctly identified negatives.
In addition, the False Positive Rate (FPR) is the rate of benign files (i.e., negative
instances) wrongly classified (i.e., misclassified as malware samples).
The F-measure (F-score) in ML is interpreted as a weighted average of the
precision (p) and recall (r). The precision is the proportion of the sum of true
positives versus the sum of positive instances and the recall is the proportion of
instances that are predicted positive of all the instances that are positive. F-measure
is a more comprehensive evaluation metric over accuracy (percentage of correctly
classified samples) since it takes both the precision and the recall into consideration.
More importantly, F-measure is also resilient to the class imbalance in the dataset,
which is the case in our experiments. The Detection Accuracy (ACC) measures the
rate of the correctly classified positive and negative samples, which evaluates the
correct classification rate across all tested samples.
1.7 A Practice for Performing Exploratory Data Analysis 31

Table 1.7 Evaluation metrics for performance of ML security countermeasures


Evaluation metric Description
True positive (T P ) Correct positive prediction
False positive (F P ) Incorrect positive prediction
True negative (T N ) Correct negative prediction
False negative (F N ) Incorrect negative prediction
Specificity: true negative rate T N R = T N/(T N + F P )
False positive rate F P R = F P /(F P + T N )
Precision P = T P /(F P + T P )
Recall: true positive rate T P R = T P /(T P + F N )
F-measure (F-score) F measure = 2 × (P × R)/(P + R)
Detection accuracy ACC = (T P + T N )/(T P + F P + T N + F N )
Error rate ERR = (F P + F N )/(P + N )
1 1
Area under the curve AU C = 0 T P R(x)dx = 0 P (A > τ (x))dx

Precision and recall are not adequate for showing the performance of detection
even contradictory to each other because they do not include all the results
and samples in their formula. F-score (i.e., F-measure) is then calculated based
on precision and recall to compensate for this disadvantage. Receiver Operating
Characteristic (ROC) is a statistical plot that depicts a binary detection performance
while its discrimination threshold setting is changeable. The ROC space is supposed
by FPR and TPR as x and y axes, respectively. It helps the detector to determine
trade-offs between TP and FP, in other words, the benefits and costs. Since TPR and
FPR are equivalent to sensitivity and (1-specificity), respectively, each prediction
result represents one point in the ROC space in which the point in the upper left
corner or coordinate (0, 1) of the ROC curve stands for the best detection result,
representing 100% sensitivity and 100% specificity (perfect detection point).

Example 1.2 (Performance Evaluation of ML-Based Malware Detectors)


Problem: A neural network ML classifier is applied on various HPC samples
for hardware-assisted malware detection. Assuming that the FN=2, FP=1,
TP=8, and TN=6, evaluate the performance of the neural network ML in
classifying malware from benign samples by calculating Accuracy, Precision,
Recall, and F-measure metrics.
Solution: As mentioned before, the detection accuracy calculates the rate of
the correctly classified positive and negative samples:

TP +TN 8+6
ACC = = = 0.82. (1.1)
T P + FP + T N + FN 8+1+6+2

(continued)
32 1 What Is Applied Machine Learning?

Example 1.2 (continued)


Precision measures the percentage of malware (positive) samples that are
correctly classified as malware:

TP 8
P = = = 0.89. (1.2)
FP + T P 1+8

Recall measures the percentage of actual malware samples that were


correctly classified by the ML-based detector:

TP 8
R= = = 0.8. (1.3)
T P + FN 8+2

Now, we can calculate F-measure that is interpreted as a weighted average


of the precision and recall:

2 × (P × R) 2 × (0.89 × 0.8)
F − Measure = = = 0.84. (1.4)
P +R 0.89 + 0.8

1.8 Putting It All Together

Applied machine learning, is a rapidly growing field due to its interdisciplinary


nature. It is considered a search problem for finding the optimal mapping of inputs
and outputs given data and a machine learning method. We provided an intro-
duction to the machine learning pipeline and described data metadata extraction,
feature engineering, and preprocessing, data exploration and visualization, and data
standardization and analysis. Each of these important tasks was described using a
hands-on example. We deferred the training of machine learning models to the next
chapters. We also covered several performance evaluation metrics such as TPR,
TNR, precision and recall, F-measure, detection accuracy (ACC), and ROC. We
also covered the rationale for the application of these metrics.

1.9 Exercise Problems

Problem 1.1 Describe the Machine learning Pipeline.


Problem 1.2 Download Haberman Cancer Survival dataset from Kaggle. You
may have to create a Kaggle account to download data (https://www.kaggle.com/
gilsousa/habermans-survival-data-set). Then provide a comprehensive description
1.9 Exercise Problems 33

of the dataset including dataset size, the number of features (dimensions), type
of features (numeric, nominal, discrete, continuous, binary, so forth), and class
attribute (dependent variable).
Problem 1.3 Plot the distribution of data to show the number of data points per
class and describe if the dataset is balanced or not. If the dataset is imbalanced or
skewed, what solution do you propose as a remedy?
Problem 1.4 Identify outliers (if any) in the dataset and propose a solution to deal
with the outliers and explain why it is a suitable approach to be applied to this
dataset. You can use a visualization technique such as a box plot or a scatter plot to
identify outliers.
Problem 1.5 Perform a high-level statistical analysis of the dataset in terms of
reporting the mean, median, mean absolute deviation, and quantile before dealing
with potential outliers.
Problem 1.6 Perform Bi-variate analysis (correlation matrix, pair-plots) to find a
combination of useful features (i.e., independent variables) for classification.
Problem 1.7 Download the Airline .json file from
https://github.com/sathwikabavikadi/Machine-Learning-for-Computer-Scientists-
and-Data-Analysts and convert to .csv file and import into a dataframe.
Problem 1.8 Write your Python code to extract gender, age, and tile (such as “Mr”)
attributes from the “Description” field. Use pandas library.
Problem 1.9 Using the output of question 1.8, write a Python code to perform data
imputation on age and gender attributes. Explain your approach. You can use numpy
library.
Problem 1.10 Write a Python code to plot the distribution of Gender attributes after
imputation using a histogram plot.
Problem 1.11 Write a Python code to plot the distribution of Age attribute and plot
the box plots.
Problem 1.12 Write a Python code to plot the correlations between the dataset
attributes. You can use seaborn and matplotlib libraries. In case of finding correla-
tions between independent variables report them.
Problem 1.13 Outline the EDA techniques discussed in this chapter and the
significance of these techniques.
Problem 1.14 Discuss the prominence of data pre-processing.
Chapter 2
A Brief Review of Probability Theory
and Linear Algebra

2.1 Introduction

In daily life, we encounter various series of events and experiments that are based
on probability and have no certainty about the outcome. Probability theory is
an advantageous tool for quantitatively describing and forecasting the outcomes
of probability-based investigations. By applying probability theory to a problem,
one can simplify its understanding, evaluate it using the relevant mathematical
model, and forecast probable outcomes based on the probability. Two examples
are provided here to help you gain a better understanding of probability theory’s
applicability.
Consider rolling a fair dice as a simple example. When we are rolling a fair dice,
there is no certainty in the output to be achieved. It can be said that the output of this
experiment is based on probability. In more detail, in rolling a fair dice, the outcome
would be “1” with the probability of 1/6. Also, the outcome would be “2” with the
probability of 1/6. Similarly, each of the numbers of the dice would occur with the
probability of 1/6. In other words, if we repeat this experiment too many times, the
outcome “1” would be achieved in 16.66% of the time. A similar interpretation is
also applied to other possible outcomes. It can be seen that the possible outcome is
based on probability. This analysis and interpretation are possible using the concept
of the probability theory according to the definition of the probability theory.
Another example in this field is the entering and existing rate of the customers
in a restaurant. Using the probability theory, the entry rate of the costumers, the
time duration each customer spends in the restaurant, and their existing rate can be
easily modeled and analyzed mathematically. In particular, the average income of
the restaurant can be estimated. In fact, according to these predictions and analyzes,
one can take action to improve the performance of the restaurant.
Probability theory, in general, covers a broad range of applications. In any
subject where complete information is unavailable and hence no certainty about
the outcome, the issue can be controlled through the use of probability theory. Other

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 35


S. Rafatirad et al., Machine Learning for Computer Scientists and Data Analysts,
https://doi.org/10.1007/978-3-030-96756-7_2
36 2 A Brief Review of Probability Theory and Linear Algebra

applications of probability theory include weather forecasting, victory or defeat in a


contest, wireless communication, machine learning, and even the drug distribution
problem in the medical area. This section is considered to be a discussion of the
fundamental notions of probability theory. Additionally, the topic of matrix algebra
is given and studied in relation to machine learning, which is one of the applications
of probability theory.

2.2 Fundamental of the Probability

Consider a probability-based experiment where there is no certainty in the outcome.


Each of the possible outcomes is known as an event. As an example, consider
we are tossing a coin. In this case, two possible outcomes could be achieved:
“Head” and “Tail,” each of them is called the event. Each of the events is specified
with a probability. In this example, the probability of achieving “Head” is 1/2, or
equivalently, we have:

1
P(Head) = .
2
Similarly, the probability of achieving “Tail” would be 1/2. Generally, the proba-
bility of an event is shown as P (X = xi ). In this example, xi could be “Head” or
“Tail.” Note that the probability of an event is always a non-negative, less than or
equal to one value:

0 ≤ P(X = xi ) ≤ 1. (2.1)

The coin-tossing experiment is a simple example in which two possible outcomes


would be achieved. In order to express the concept of probability in a more complex
form, consider two random variables X and Y . Each of the random variables can
take value from their corresponding dictionaries. More precisely, if the random
variable X takes the value xi , and the random variable Y takes the value yj , then we
have:

i ∈ DX ,
j ∈ DY ,

where DX and DX are the dictionaries corresponding to random variables X and


Y , respectively. Consider the case in which N possible outcomes could be achieved
from the combination of these two random variables. The probability of the event
xi corresponding to the random variable X is denoted by P (X = xi ). Similarly,
the probability of the event yj corresponding to the random variable Y is denoted
by P (Y = yj ). Now, consider that we are interested in finding the probability of
2.2 Fundamental of the Probability 37

X = xi and Y = yj jointly. This probability is known as the joint probability and


is denoted by P (X = xi , Y = yj ). The joint probability of X and Y is written as
below:
nij
P (X = xi , Y = yj ) = , (2.2)
N
where nij denotes the number of events in which the probabilities X = xi and Y =
yj occurred jointly. The schematic of the joint probability is depicted in Fig. 2.1. In
this figure, P (X = xi ) and P (Y = yj ) are denoted with the red and blue colors,
respectively. Note that the region where the events xi and yj are met simultaneously
is denoted by nij , as mentioned above.
As shown in Fig. 2.1, the number of events where X = xi is denoted by ci . Also,
the number of events where Y = yi is denoted by rj . Therefore, P (X = xi ) and
P (Y = yj ) can be written as follows:

ci
p(X = xi ) = ,
N
(2.3)
rj
p(Y = yj ) = .
N
In the above equation, ci and rj are achieved as below:

ci = nij ,
j ∈DY
 (2.4)
rj = nij .
i∈DX

According to (2.3) and (2.4), the probability of X = xi and Y = yj can be


rewritten as below:

Fig. 2.1 Illustration of the


probability theory
38 2 A Brief Review of Probability Theory and Linear Algebra

 nij
P (X = xi ) = ,
N
j ∈DY
 nij (2.5)
P (Y = yj ) = .
N
j ∈DX

In particular, consider P (X = xi ). It can be seen from the above formula that the
probability of X = xi is independent of the random variable Y by performing a
summation over j ∈ DY . This is called the “marginal probability” of X which can
be rewritten as below:

P (X = xi ) = P (X = xi , Y = yj ). (2.6)
j ∈DY

Similarly, the marginal probability of Y (or equivalently, P (Y = yj )) could be


found by performing a summation over i ∈ DX . Note that (2.6) is obtained referring
to (2.3) and (2.5). Also, note that (2.6) is known as the “sum rule” of the probability.
As an example, consider we have two random variables X and Y , where X
corresponds to a coin-tossing experiment, and Y corresponds to rolling a fair dice
experiment. The dictionaries of X and Y are specified as below:

DX = {1, 2, · · · , 6},
DY = {Head, Tail}.

Now, consider that we are interested in finding the probability of X =“Head” and
Y = 1. Here, nij = 1 where i corresponds to the event “Head” in random variable
X, and j corresponds to the event “1” corresponds to random variable Y . Moreover,
the total number of events obtained from the combination of X and Y is N = 12.
According to (2.2), the joint probability of X and Y would be obtained as below:

nij 1
P (X = Head, Y = 1) = = . (2.7)
N 12
Also, inspired by (2.6), the marginal probabilities of X and Y would be obtained as
below:
 6
P (X = Head) = P (X = Head, Y = yj ) = ,
12
j ∈DY
 2
P (Y = 1) = P (X = xi , Y = 1) = . (2.8)
12
i∈DX

Consider the case in which the event X = xi occurred given the knowledge that
the event Y = yj has already occurred. The probability of X = xi given Y = yj
Another Random Document on
Scribd Without Any Related Topics
types. He takes what was best in them and sets it forth as a
standard and prophecy for the future, a pattern in the mount to be
realised hereafter in the structure of God's spiritual temple upon
earth.

But the Holy Spirit guided the hopes and intuitions of the sacred
writers to a special fulfilment. We can see that their types have one
antitype in the growth of the Church and the progress of mankind;
but the Old Testament looked for their chief fulfilment in a Divine
Messenger and Deliverer: its ideals are types of the Messiah. The
higher life of a good man was a revelation of God and a promise of
His highest and best manifestation in Christ. We shall endeavour to
show in subsequent chapters how Chronicles served to develop the
idea of the Messiah.

But the chronicler's types are not all prophecies of future progress or
Messianic glory. The brighter portions of his picture are thrown into
relief by a dark background. The good in Jeroboam is as completely
ignored as the evil in David. Apart from any question of historical
accuracy, the type is unfortunately a true one. There is a leaven of
the Pharisees and of Herod, as well as a leaven of the kingdom. If
the base leaven be left to work by itself, it will leaven the whole
mass; [pg 132] and in a final estimate of the character of those who
do evil “with both hands earnestly,” little allowance needs to be
made for redeeming features. Even if we are still able to believe that
there is a seed of goodness in things evil, we are forced to admit
that the seed has remained dead and unfertilised, has had no
growth and borne no fruit. But probably most men may sometimes
be profitably admonished by considering the typical sinner—the man
in whose nature evil has been able to subdue all things to itself.

The strange power of teaching by types has been well expressed by


one who was herself a great mistress of the art: “Ideas are often
poor ghosts: our sun-filled eyes cannot discern them; they pass
athwart us in thin vapour, and cannot make themselves felt; they
breathe upon us with warm breath, they touch us with soft,
responsive hands; they look at us with sad, sincere eyes, and speak
to us in appealing tones; they are clothed in a living human soul; ...
their presence is a power.”134

[pg 133]
Chapter II. David—I. His Tribe And Dynasty.

King and kingdom were so bound up in ancient life that an ideal for
the one implied an ideal for the other; all distinction and glory
possessed by either was shared by both. The tribe and kingdom of
Judah were exalted by the fame of David and Solomon; but, on the
other hand, a specially exalted position is accorded to David in the
Old Testament because he is the representative of the people of
Jehovah. David himself had been anointed by Divine command to be
king of Israel, and he thus became the founder of the only legitimate
dynasty of Hebrew kings. Saul and Ishbosheth had no significance
for the later religious history of the nation. Apparently to the
chronicler the history of true religion in Israel was a blank between
Joshua and David; the revival began when the Ark was brought to
Zion, and the first steps were taken to rear the Temple in succession
to the Mosaic tabernacle. He therefore omits the history of the
Judges and Saul. But the battle of Gilboa is given to introduce the
reign of David, and incidental condemnation is passed on Saul: “So
Saul died for his trespass which he committed against the Lord,
because of the word of the Lord, which he kept not, and also for
that he asked counsel of one that had a familiar spirit, to inquire [pg
134] thereby, and inquired not of the Lord; therefore He slew him
and turned the kingdom unto David the son of Jesse.”

The reign of Saul had been an unsuccessful experiment; its only real
value had been to prepare the way for David. At the same time the
portrait of Saul is not given at full length, like those of the wicked
kings, partly perhaps because the chronicler had little interest for
anything before the time of David and the Temple, but partly, we
may hope, because the record of David's affection for Saul kept alive
a kindly feeling towards the founder of the monarchy.
Inasmuch as Jehovah had “turned the kingdom unto David,” the
reign of Ishbosheth was evidently the intrusion of an illegitimate
pretender; and the chronicler treats it as such. If we had only
Chronicles, we should know nothing about the reign of Ishbosheth,
and should suppose that, on the death of Saul, David succeeded at
once to an undisputed sovereignty over all Israel. The interval of
conflict is ignored because, according to the chronicler's views, David
was, from the first, king de jure over the whole nation. Complete
silence as to Ishbosheth was the most effective way of expressing
this fact.

The same sentiment of hereditary legitimacy, the same formal and


exclusive recognition of a de jure sovereign, has been shown in
modern times by titles like Louis XVIII. and Napoleon III. For both
schools of Legitimists the absence of de facto sovereignty did not
prevent Louis XVII. and Napoleon II. from having been lawful rulers
of France. In Israel, moreover, the Divine right of the one chosen
dynasty had religious as well as political importance. We have
already seen that Israel claimed a hereditary title to [pg 135] its
special privileges; it was therefore natural that a hereditary
qualification should be thought necessary for the kings. They
represented the nation; they were the Divinely appointed guardians
of its religion; they became in time the types of the Messiah, its
promised Saviour. In all this Saul and Ishbosheth had neither part
nor lot; the promise to Israel had always descended in a direct line,
and the special promise that was given to its kings and through
them to their people began with David. There was no need to carry
the history further back.

We have already noticed that, in spite of this general attitude


towards Saul, the genealogy of some of his descendants is given
twice over in the earlier chapters. No doubt the chronicler made this
concession to gratify friends or to conciliate an influential family. It is
interesting to note how personal feeling may interfere with the
symmetrical development of a theological theory. At the same time
we are enabled to discern a practical reason for rigidly ignoring the
kingship of Saul and Ishbosheth. To have recognised Saul as the
Lord's anointed, like David, would have complicated contemporary
dogmatics, and might possibly have given rise to jealousies between
the descendants of Saul and those of David. Within the narrow limits
of the Jewish community such quarrels might have been
inconvenient and even dangerous.

The reasons for denying the legitimacy of the northern kings were
obvious and conclusive. Successful rebels who had destroyed the
political and religious unity of Israel could not inherit “the sure
mercies of David” or be included in the covenant which secured the
permanence of his dynasty.

The exclusive association of Messianic ideas with a [pg 136] single


family emphasises their antiquity, continuity, and development. The
hope of Israel had its roots deep in the history of the people; it had
grown with their growth and maintained itself through their
changing fortunes. As the hope centred in a single family, men were
led to expect an individual personal Messiah; they were being
prepared to see in Christ the fulfilment of all righteousness.

But the choice of the house of David involved the choice of the tribe
of Judah and the rejection of the kingdom of Samaria. The ten
tribes, as well as the kings of Israel, had cut themselves off both
from the Temple and the sacred dynasty, and therefore from the
covenant into which Jehovah had entered with “the man after his
own heart.” Such a limitation of the chosen people was suggested by
many precedents. Chronicles, following the Pentateuch, tells how the
call came to Abraham, but only some of the descendants of one of
his sons inherited the promise. Why should not a selection be made
from among the sons of Jacob? But the twelve tribes had been
explicitly and solemnly included in the unity of Israel, largely through
David himself. The glory of David and Solomon consisted in their
sovereignty over a united people. The national recollection of this
golden age loved to dwell on the union of the twelve tribes. The
Pentateuch added legal sanction to ancient sentiment. The twelve
tribes were associated together in national lyrics, like the “Blessing
of Jacob” and the “Blessing of Moses.” The song of Deborah told
how the northern tribes “came to the help of the Lord against the
mighty.” It was simply impossible for the chronicler to absolutely
repudiate the ten tribes; and so they are formally included in the
genealogies of Israel, and are recognised in the history of David and
[pg 137] Solomon. Then the recognition stops. From the time of the
disruption the northern kingdom is quietly but persistently ignored.
Its prophets and sanctuaries were as illegitimate as its kings. The
great struggle of Elijah and Elisha for the honour of Jehovah is
omitted, with all the rest of their history. Elijah is only mentioned as
sending a letter to Jehoram, king of Judah; Elisha is never even
named.

On the other hand, it is more than once implied that Judah, with the
Levites, and the remnants of Simeon and Benjamin, are the true
Israel. When Rehoboam “was strong he forsook the law of the Lord,
and all Israel with him.” After Shishak's invasion, “the princes of
Israel and the king humbled themselves.”135 The annals of Manasseh,
king of Judah, are said to be “written among the acts of the kings of
Israel.”136 The register of the exiles, who returned with Zerubbabel is
headed “The number of the men of the people of Israel.”137 The
chronicler tacitly anticipates the position of St. Paul: “They are not
all Israel which are of Israel”; and the Apostle might have appealed
to Chronicles to show that the majority of Israel might fail to
recognise and accept the Divine purpose for Israel, and that the true
Israel would then be found in an elect remnant. The Jews of the
second Temple naturally and inevitably came to ignore the ten tribes
and to regard themselves as constituting this true Israel. As a matter
of history, there had been a period during which the prophets of
Samaria were of far more importance to the religion of Jehovah than
the temple at Jerusalem; but in the chronicler's time the very
existence of the ten tribes was ancient history. Then, at any rate,
[pg 138] it was true that God's Israel was to be found in the Jewish
community, at and around Jerusalem. They inherited the religious
spirit of their fathers, and received from them the sacred writings
and traditions, and carried on the sacred ritual. They preserved the
truth and transmitted it from generation to generation, till at last it
was merged in the mightier stream of Christian revelation.

The attitude of the chronicler towards the prophets of the northern


kingdom does not in any way represent the actual importance of
these prophets to the religion of Israel; but it is a very striking
expression of the fact that after the Captivity the ten tribes had long
ceased to exercise any influence upon the spiritual life of their
nation.

The chronicler's attitude is also open to criticism on another side. He


is dominated by his own surroundings, and in his references to the
Judaism of his own time there is no formal recognition of the Jewish
community in Babylon; and yet even his own casual allusions
confirm what we know from other sources, namely that the wealth
and learning of the Jews in Babylon were an important factor in
Judaism until a very late date. This point perhaps rather concerns
Ezra and Nehemiah than Chronicles, but it is closely connected with
our present subject, and is most naturally treated along with it. The
chronicler might have justified himself by saying that the true home
of Israel must be in Palestine, and that a community in Babylon
could only be considered as subsidiary to the nation in its own home
and worshipping at the Temple. Such a sentiment, at any rate, would
have met with universal approval amongst Palestinian Jews. The
chronicler might also have replied that the Jews in [pg 139] Babylon
belonged to Judah and Benjamin and were sufficiently recognised in
the general prominence give to these tribes. In all probability some
Palestinian Jews would have been willing to class their Babylonian
kinsmen with the ten tribes. Voluntary exiles from the Temple, the
Holy City, and the Land of Promise had in great measure cut
themselves off from the full privileges of the people of Jehovah. If,
however, we had a Babylonian book of Chronicles, we should see
both Jerusalem and Babylon in another light.
The chronicler was possessed and inspired by the actual living
present round about him; he was content to let the dead past bury
its dead. He was probably inclined to believe that the absent are
mostly wrong, and that the men who worked with him for the Lord
and His temple were the true Israel and the Church of God. He was
enthusiastic in his own vocation and loyal to his brethren. If his
interests were somewhat narrowed by the urgency of present
circumstances, most men suffer from the same limitations. Few
Englishmen realise that the battle of Agincourt is part of the history
of the United States, and that Canterbury Cathedral is a monument
of certain stages in the growth of the religion of New England. We
are not altogether willing to admit that these voluntary exiles from
our Holy Land belong to the true Anglo-Saxon Israel.

Churches are still apt to ignore their obligations to teachers who, like
the prophets of Samaria, seem to have been associated with alien or
hostile branches of the family of God. A religious movement which
fails to secure for itself a permanent monument is usually labelled
heresy. If it has neither obtained recognition within the Church nor
yet organised a sect [pg 140] for itself, its services are forgotten or
denied. Even the orthodoxy of one generation is sometimes
contemptuous of the older orthodoxy which made it possible; and
yet Gnostics, Arians and Athanasians, Arminians and Calvinists, have
all done something to build up the temple of faith.

The nineteenth century prides itself on a more liberal spirit. But


Romanist historians are not eager to acknowledge the debt of their
Church to the Reformers; and there are Protestant partisans who
deny that we are the heirs of the Christian life and thought of the
mediæval Church and are anxious to trace the genealogy of pure
religion exclusively through a supposed succession of obscure and
half-mythical sects. Limitations like those of the chronicler still
narrow the sympathies of earnest and devout Christians.

But it is time to return to the more positive aspects of the teaching


of Chronicles, and to see how far we have already traced its
exposition of the Messianic idea. The plan of the book implies a
spiritual claim on behalf of the Jewish community of the Restoration.
Because they believed in Jehovah, whose providence had in former
times controlled the destinies of Israel, they returned to their
ancestral home that they might serve and worship the God of their
fathers. Their faith survived the ruin of Judah and their own
captivity; they recognised the power, and wisdom, and love of God
alike in the prosperity and in the misfortunes of their race. “They
believed God, and it was counted unto them for righteousness.” The
great prophet of the Restoration had regarded this new Israel as
itself a Messianic people, perhaps even “a light to the Gentiles” and
“salvation unto the ends of the earth.”138 The [pg 141] chronicler's
hopes were more modest; the new Jerusalem had been seen by the
prophet as an ideal vision; the historian knew it by experience as an
imperfect human society: but he believed none the less in its high
spiritual vocation and prerogatives. He claimed the future for those
who were able to trace the hand of God in their past.

Under the monarchy the fortunes of Jerusalem had been bound up


with those of the house of David. The chronicler brings out all that
was best in the history of the ancient kings of Judah, that this ideal
picture of the state and its rulers might encourage and inspire to
future hope and effort. The character and achievements of David
and his successors were of permanent significance. The grace and
favour accorded to them symbolised the Divine promise for the
future, and this promise was to be realised through a Son of David.

[pg 142]
Chapter III. David—II. His Personal History.

In order to understand why the chronicler entirely recasts the


graphic and candid history of David given in the book of Samuel, we
have to consider the place that David had come to fill in Jewish
religion. It seems probable that among the sources used by the
author of the book of Samuel was a history of David, written not
long after his death, by some one familiar with the inner life of the
court. “No one,” says the proverb, “is an hero to his valet”; very
much what a valet is to a private gentleman courtiers are to a king:
their knowledge of their master approaches to the familiarity which
breeds contempt. Not that David was ever a subject for contempt or
less than an hero even to his own courtiers; but they knew him as a
very human hero, great in his vices as well as in his virtues, daring
in battle and wise in counsel, sometimes also reckless in sin, yet
capable of unbounded repentance, loving not wisely, but too well.
And as they knew him, so they described him; and their picture is an
immortal possession for all students of sacred life and literature. But
it is not the portrait of a Messiah; when we think of the “Son of
David,” we do not want to be reminded of Bath-sheba.

During the six or seven centuries that elapsed between [pg 143] the
death of David and the chronicler, the name of David had come to
have a symbolic meaning, which was largely independent of the
personal character and career of the actual king. His reign had
become idealised by the magic of antiquity; it was a glory of “the
good old times.” His own sins and failures were obscured by the
crimes and disasters of later kings. And yet, in spite of all its
shortcomings, the “house of David” still remained the symbol alike of
ancient glory and of future hopes. We have seen from the
genealogies how intimate the connection was between the family
and its founder. Ephraim and Benjamin may mean either patriarchs
or tribes. A Jew was not always anxious to distinguish between the
family and the founder. “David” and “the house of David” became
almost interchangeable terms.

Even the prophets of the eighth century connect the future destiny
of Israel with David and his house. The child, of whom Isaiah
prophesied, was to sit “upon the throne of David” and be “over his
kingdom, to establish it and to uphold it with judgment and with
righteousness from henceforth even for ever.”139 And, again, the king
who is to “sit ... in truth, ... judging, and seeking judgment, and
swift to do righteousness,” is to have “his throne ... established in
mercy in the tent of David.”140 When Sennacherib attacked
Jerusalem, the city was defended141 for Jehovah's own sake and for
His servant David's sake. In the word of the Lord that came to Isaiah
for Hezekiah, David supersedes, as it were, the sacred fathers of the
Hebrew race; Jehovah is not spoken of as “the God of Abraham,
Isaac, and Jacob,” but “the God of David.”142 [pg 144] As founder of
the dynasty, he takes rank with the founders of the race and religion
of Israel: he is “the patriarch David.”143 The northern prophet Hosea
looks forward to the time when “the children of Israel shall return,
and seek the Lord their God and David their king”144; when Amos
wishes to set forth the future prosperity of Israel, he says that the
Lord “will raise up the tabernacle of David”145; in Micah “the ruler in
Israel” is to come forth from Bethlehem Ephrathah, the birthplace of
David146; in Jeremiah such references to David are frequent, the
most characteristic being those relating to the “righteous branch,
whom the Lord will raise up unto David,” who “shall reign as king
and deal wisely, and shall execute judgment and justice in the land,
in whose days Judah shall be saved, and Israel shall dwell safely”147;
in Ezekiel “My servant David” is to be the shepherd and prince of
Jehovah's restored and reunited people148; Zechariah, writing at
what we may consider the beginning of the chronicler's own period,
follows the language of his predecessors: he applies Jeremiah's
prophecy of “the righteous branch” to Zerubbabel, the prince of the
house of David149: similarly in Haggai Zerubbabel is the chosen of
Jehovah150; in the appendix to Zechariah it is said that when “the
Lord defends the inhabitants of Jerusalem” “the house of David shall
be as God, as the angel of the Lord before them.”151 In the later [pg
145] literature, Biblical and apocryphal, the Davidic origin of the
Messiah is not conspicuous till it reappears in the Psalms of
Solomon152 and the New Testament, but the idea had not necessarily
been dormant meanwhile. The chronicler and his school studied and
meditated on the sacred writings, and must have been familiar with
this doctrine of the prophets. The interest in such a subject would
not be confined to scholars. Doubtless the downtrodden people
cherished with ever-growing ardour the glorious picture of the
Davidic king. In the synagogues it was not only Moses, but the
Prophets, that were read; and they could never allow the picture of
the Messianic king to grow faint and pale.153

David's name was also familiar as the author of many psalms. The
inhabitants of Jerusalem would often hear them sung at the Temple,
and they were probably used for private devotion. In this way
especially the name of David had become associated with the
deepest and purest spiritual experiences.

This brief survey shows how utterly impossible it was for the
chronicler to transfer the older narrative bodily from the book of
Samuel to his own pages. Large omissions were absolutely
necessary. He could not sit down in cold blood to tell his readers that
the man whose name they associated with the most sacred
memories and the noblest hopes of Israel had been guilty of
treacherous murder, and had offered himself to the Philistines as an
ally against the people of Jehovah.

From this point of view let us consider the chronicler's omissions


somewhat more in detail. In the first place, [pg 146] with one or two
slight exceptions, he omits the whole of David's life before his
accession to the throne, for two reasons: partly because he is
anxious that his readers should think of David as king, the anointed
of Jehovah, the Messiah; partly that they may not be reminded of
his career as an outlaw and a freebooter and of his alliance with the
Philistines.154 It is probably only an unintentional result of this
omission that it enables the chronicler to ignore the important
services rendered to David by Abiathar, whose family were rivals of
the house of Zadok in the priesthood.

We have already seen that the events of David's reign at Hebron and
his struggle with Ishbosheth are omitted because the chronicler does
not recognise Ishbosheth as a legitimate king. The omission would
also commend itself because this section contains the account of
Joab's murder of Abner and David's inability to do more than protest
against the crime. “I am this day weak, though anointed king; and
these men the sons of Zeruiah are too hard for me,”155 are scarcely
words that become an ideal king.

The next point to notice is one of those significant alterations that


mark the chronicler's industry as a redactor. In 2 Sam. v. 21 we read
that after the Philistines had been defeated at Baal-perazim they left
their images there, and David and his men took them away. Why did
they take them away? What did David and his men want with
images? Missionaries bring home images as trophies, and exhibit
them triumphantly, like soldiers who have captured the enemy's
standards. No one, not even an unconverted native, supposes that
they have been brought away to be used [pg 147] in worship. But
the worship of images was no improbable apostacy on the part of an
Israelite king. The chronicler felt that these ambiguous words were
open to misconstruction; so he tells us what he assumes to have
been their ultimate fate: “And they left their gods there; and David
gave commandment, and they were burnt with fire.”156

The next omission was obviously a necessary one; it is the incident


of Uriah and Bath-sheba. The name Bath-sheba never occurs in
Chronicles. When it is necessary to mention the mother of Solomon,
she is called Bath-shua, possibly in order that the disgraceful
incident might not be suggested even by the use of the name. The
New Testament genealogies differ in this matter in somewhat the
same way as Samuel and Chronicles. St. Matthew expressly
mentions Uriah's wife as an ancestress of our Lord, but St. Luke
does not mention her or any other ancestress.

The next omission is equally extensive and important. It includes the


whole series of events connected with the revolt of Absalom, from
the incident of Tamar to the suppression of the rebellion of Sheba
the son of Bichri. Various motives may have contributed to this
omission. The narrative contains unedifying incidents, which are
passed over as lightly as possible by modern writers like Stanley. It
was probably a relief to the chronicler to be able to omit them
altogether. There is no heinous sin like the murder of Uriah, but the
story leaves a general impression of great weakness on David's part.
Joab murders Amasa as he had murdered Abner, and this time there
is no record of any protest even on the part of David. But probably
the main [pg 148] reason for the omission of this narrative is that it
mars the ideal picture of David's power and dignity and the success
and prosperity of his reign.

The touching story of Rizpah is omitted; the hanging of her sons


does not exhibit David in a very amiable light. The Gibeonites
propose that “they shall hang them up unto the Lord in Gibeah of
Saul, the chosen of the Lord,” and David accepts the proposal. This
punishment of the children for the sin of their father was expressly
against the Law157; and the whole incident was perilously akin to
human sacrifice. How could they be hung up before Jehovah in
Gibeah unless there was a sanctuary of Jehovah in Gibeah? And why
should Saul at such a time and in such a connection be called
emphatically “the chosen of Jehovah”? On many grounds, it was a
passage which the chronicler would be glad to omit.

In 2 Sam. xxi. 15-17 we are told that David waxed faint and had to
be rescued by Abishai. This is omitted by Chronicles probably
because it detracts from the character of David as the ideal hero.
The next paragraph in Samuel also tended to depreciate David's
prowess. It stated that Goliath was slain by Elhanan. The chronicler
introduces a correction. It was not Goliath whom Elhanan slew, but
Lahmi, the brother of Goliath. However, the text in Samuel is
evidently corrupt; and possibly this is one of the cases in which
Chronicles has preserved the correct text.158

Then follow two omissions that are not easily accounted for. 2 Sam.
xxii., xxiii., contain two psalms, Psalm xviii. and “the Last Words of
David,” the latter not included in the Psalter. These psalms are
generally [pg 149] considered a late addition to the book of Samuel,
and it is barely possible that they were not in the copy used by the
chronicler; but the late date of Chronicles makes against this
supposition. The psalms may be omitted for the sake of brevity, and
yet elsewhere a long cento of passages from post-Exilic psalms is
added to the material derived from the book of Samuel. Possibly
something in the omitted section jarred upon the theological
sensibilities of the chronicler, but it is not clear what. He does not as
a rule look below the surface for obscure suggestions of undesirable
views. The grounds of his alterations and omissions are usually
sufficiently obvious; but these particular omissions are not at present
susceptible of any obvious explanation. Further research into the
theology of Judaism may perhaps provide us with one hereafter.

Finally, the chronicler omits the attempt of Adonijah to seize the


throne, and David's dying commands to Solomon. The opening
chapters of the book of Kings present a graphic and pathetic picture
of the closing scenes of David's life. The king is exhausted with old
age. His authoritative sanction to the coronation of Solomon is only
obtained when he has been roused and directed by the promptings
and suggestions of the women of his harem. The scene is partly a
parallel and partly a contrast to the last days of Queen Elizabeth; for
when her bodily strength failed, the obstinate Tudor spirit refused to
be guided by the suggestions of her courtiers. The chronicler was
depicting a person of almost Divine dignity, in whom incidents of
human weakness would have been out of keeping; and therefore
they are omitted.
David's charge to Solomon is equally human. Solomon is to make up
for David's weakness and [pg 150] undue generosity by putting Joab
and Shimei to death; on the other hand, he is to pay David's debt of
gratitude to the son of Barzillai. But the chronicler felt that David's
mind in those last days must surely have been occupied with the
temple which Solomon was to build, and the less edifying charge is
omitted.

Constantine is reported to have said that, for the honour of the


Church, he would conceal the sin of a bishop with his own imperial
purple. David was more to the chronicler than the whole Christian
episcopate to Constantine. His life of David is compiled in the spirit
and upon the principles of lives of saints generally, and his omissions
are made in perfect good faith.

Let us now consider the positive picture of David as it is drawn for us


in Chronicles. Chronicles would be published separately, each copy
written out on a roll of its own. There may have been Jews who had
Chronicles, but not Samuel and Kings, and who knew nothing about
David except what they learned from Chronicles. Possibly the
chronicler and his friends would recommend the work as suitable for
the education of children and the instruction of the common people.
It would save its readers from being perplexed by the religious
difficulties suggested by Samuel and Kings. There were many
obstacles, however, to the success of such a scheme; the
persecutions of Antiochus and the wars of the Maccabees took the
leadership out of the hands of scholars and gave it to soldiers and
statesmen. The latter perhaps felt more drawn to the real David than
to the ideal, and the new priestly dynasty would not be anxious to
emphasise the Messianic hopes of the house of David. But let us put
ourselves for a moment in the position of a student of Hebrew
history who [pg 151] reads of David for the first time in Chronicles
and has no other source of information.

Our first impression as we read the book is that David comes into
the history as abruptly as Elijah or Melchizedek. Jehovah slew Saul
“and turned the kingdom unto David the son of Jesse.”159 Apparently
the Divine appointment is promptly and enthusiastically accepted by
the nation; all the twelve tribes come at once in their tens and
hundreds of thousands to Hebron to make David king. They then
march straight to Jerusalem and take it by storm, and forthwith
attempt to bring up the Ark to Zion. An unfortunate accident
necessitates a delay of three months, but at the end of that time the
Ark is solemnly installed in a tent at Jerusalem.160

We are not told who David the son of Jesse was, or why the Divine
choice fell upon him, or how he had been prepared for his
responsible position, or how he had so commended himself to Israel
as to be accepted with universal acclaim. He must, however, have
been of noble family and high character; and it is hinted that he had
had a distinguished career as a soldier.161 We should expect to find
his name in the introductory genealogies; and if we have read these
lists of names with conscientious attention, we shall remember that
there are sundry incidental references to David, and that he was the
seventh son of Jesse,162 who was descended from the Patriarch
Judah, through Boaz, the husband of Ruth.

As we read further we come to other references which throw some


light on David's early career, and at the same time somewhat mar
the symmetry of the [pg 152] opening narrative. The wide
discrepancy between the chronicler's idea of David and the account
given by his authorities prevents him from composing his work on an
entirely consecutive and consistent plan. We gather that there was a
time when David was in rebellion against his predecessor, and
maintained himself at Ziklag and elsewhere, keeping “himself close,
because of Saul the son of Kish,” and even that he came with the
Philistines against Saul to battle, but was prevented by the jealousy
of the Philistine chiefs from actually fighting against Saul. There is
nothing to indicate the occasion or circumstances of these events.163
But it appears that even at this period, when David was in arms
against the king of Israel and an ally of the Philistines, he was the
chosen leader of Israel. Men flocked to him from Judah and
Benjamin, Manasseh and Gad, and doubtless from the other tribes
as well: “From day to day there came to David to help him, until it
was a great host like the host of God.”164

This chapter partly explains David's popularity after Saul's death; but
it only carries the mystery a stage further back. How did this outlaw
and apparently unpatriotic rebel get so strong a hold on the
affections of Israel?

Chap. xii. also provides material for plausible explanations of another


difficulty. In chap. x. the army of Israel is routed, the inhabitants of
the land take to flight, and the Philistines occupy their cities; in [pg
153] xi. and xii. 23-40 all Israel come straightway to Hebron in the
most peaceful and unconcerned fashion to make David king. Are we
to understand that his Philistine allies, mindful of that “great host,
like the host of God,” all at once changed their minds and entirely
relinquished the fruits of their victory?

Elsewhere, however, we find a statement that renders other


explanations possible. David reigned seven years in Hebron,165 so
that our first impression as to the rapid sequence of events at the
beginning of his reign is apparently not correct, and there was time
in these seven years for a more gradual expulsion of the Philistines.
It is doubtful, however, whether the chronicler intended his original
narrative to be thus modified and interpreted.

The main thread of the history is interrupted here and later on166 to
insert incidents which illustrate the personal courage and prowess of
David and his warriors. We are also told how busily occupied David
was during the three months' sojourn of the Ark in the house of
Obed-edom the Gittite. He accepted an alliance with Hiram, king of
Tyre; he added to his harem; he successfully repelled two inroads of
the Philistines, and made him houses in the city of David.167

The narrative returns to its main subject: the history of the


sanctuary at Jerusalem. As soon as the Ark was duly installed in its
tent, and David was established in his new palace, he was struck by
the contrast between the tent and the palace: “Lo, I dwell in a house
of cedar, but the ark of the covenant of the Lord dwelleth under
curtains.” He proposed to substitute a temple for the tent, but was
forbidden by his prophet Nathan, [pg 154] through whom God
promised him that his son should build the Temple, and that his
house should be established for ever.168

Then we read of the wars, victories, and conquests of David. He is


no longer absorbed in the defence of Israel against the Philistines.
He takes the aggressive and conquers Gath; he conquers Edom,
Moab, Ammon, and Amalek; he and his armies defeat the Syrians in
several battles, the Syrians become tributary, and David occupies
Damascus with a garrison. “And the Lord gave victory to David
whithersoever he went.” The conquered were treated after the
manner of those barbarous times. David and his generals carried off
much spoil, especially brass, and silver, and gold; and when he
conquered Rabbah, the capital of Ammon, “he brought forth the
people that were therein, and cut them with saws, and with harrows
of iron, and with axes. And thus did David unto all the cities of the
children of Ammon.” Meanwhile his home administration was as
honourable as his foreign wars were glorious: “He executed
judgment and justice unto all his people”; and the government was
duly organised with commanders of the host and the bodyguard,
with priests and scribes.169

Then follows a mysterious and painful dispensation of Providence,


which the historian would gladly have omitted, if his respect for the
memory of his hero had not been overruled by his sense of the
supreme importance of the Temple. David, like Job, was given over
for a season to Satan, and while possessed by this evil spirit
displeased God by numbering Israel. His punishment took the form
of a great pestilence, which decimated [pg 155] his people, until, by
Divine command, David erected an altar in the threshing-floor of
Ornan the Jebusite and offered sacrifices upon it, whereupon the
plague was stayed. David at once perceived the significance of this
incident: Jehovah had indicated the site of the future Temple. “This
is the house of Jehovah Elohim,170 and this is the altar of burnt
offering for Israel.”171

This revelation of the Divine will as to the position of the Temple led
David to proceed at once with preparations for its erection by
Solomon, which occupied all his energies for the remainder of his
life.172 He gathered funds and materials, and gave his son full
instructions about the building; he organised the priests and Levites,
the Temple orchestra and choir, the doorkeepers, treasurers, officers,
and judges; he also organised the army, the tribes, and the royal
exchequer on the model of the corresponding arrangements for the
Temple.

Then follows the closing scene of David's life. The sun of Israel sets
amid the flaming glories of the western sky. No clouds or mists rob
him of accustomed splendour. David calls a great assembly of
princes and warriors; he addresses a solemn exhortation to them
and to Solomon; he delivers to his son instructions for “all the
works” which “I have been made to understand in writing from the
hand of Jehovah.” It is almost as though the plans of the Temple had
shared with the first tables of stone the honour of being written with
the very finger of God Himself, and David were even greater than
Moses. He reminds Solomon of all the preparations he had made,
and [pg 156] appeals to the princes and the people for further gifts;
and they render willingly—thousands of talents of gold, and silver,
and brass, and iron. David offers prayer and thanksgiving to the
Lord: “And David said to all the congregation, Now bless Jehovah
our God. And all the congregation blessed Jehovah, the God of their
fathers, and bowed down their heads, and worshipped Jehovah and
the king. And they sacrificed sacrifices unto Jehovah, and offered
burnt offerings unto Jehovah, on the morrow after that day, even a
thousand bullocks, a thousand rams, and a thousand lambs, with
their drink offerings and sacrifices in abundance for all Israel, and
did eat and drink before Jehovah on that day with great gladness.
And they made Solomon king; ... and David died in a good old age,
full of days, riches, and honour, and Solomon his son reigned in his
stead.”173

The Roman expressed his idea of a becoming death more simply:


“An emperor should die standing.” The chronicler has given us the
same view at greater length; this is how the chronicler would have
wished to die if he had been David, and how, therefore, he
conceives that God honoured the last hours of the man after His
own heart.

It is a strange contrast to the companion picture in the book of


Kings. There the king is bedridden, dying slowly of old age; the life-
blood creeps coldly through his veins. The quiet of the sick-room is
invaded by the shrill outcry of an aggrieved woman, and the dying
king is roused to hear that once more eager hands are clutching at
his crown. If the chronicler has done nothing else, he has helped us
[pg 157] to appreciate better the gloom and bitterness of the
tragedy that was enacted in the last days of David.

What idea does Chronicles give us of the man and his character? He
is first and foremost a man of earnest piety and deep spiritual
feeling. Like the great religious leaders of the chronicler's own time,
his piety found its chief expression in ritual. The main business of his
life was to provide for the sanctuary and its services; that is, for the
highest fellowship of God and man, according to the ideas then
current. But David is no mere formalist; the psalm of thanksgiving
for the return of the Ark to Jerusalem is a worthy tribute to the
power and faithfulness of Jehovah.174 His prayer after God had
promised to establish his dynasty is instinct with devout confidence
and gratitude.175 But the most gracious and appropriate of these
Davidic utterances is his last prayer and thanksgiving for the liberal
gifts of the people for the Temple.176

Next to David's enthusiasm for the Temple, his most conspicuous


qualities are those of a general and soldier: he has great personal
strength and courage, and is uniformly successful in wars against
numerous and powerful enemies; his government is both able and
upright; his great powers as an organiser and administrator are
exercised both in secular and ecclesiastical matters; in a word, he is
in more senses than one an ideal king.

Moreover, like Alexander, Marlborough, Napoleon, and other epoch-


making conquerors, he had a great charm of personal attractiveness;
he inspired his officers and soldiers with enthusiasm and devotion to
[pg 158] himself. The pictures of all Israel flocking to him in the first
days of his reign and even earlier, when he was an outlaw, are
forcible illustrations of this wonderful gift; and the same feature of
his character is at once illustrated and partly explained by the
romantic episode at Adullam. What greater proof of affection could
outlaws give to their captain than to risk their lives to get him a
draught of water from the well of Bethlehem? How better could
David have accepted and ratified their devotion than by pouring out
this water as a most precious libation to God?177 But the chronicler
gives most striking expression to the idea of David's popularity when
he finally tells us in the same breath that the people worshipped
Jehovah and the king.178

In drawing an ideal picture, our author has naturally omitted


incidents that might have revealed the defects of his hero. Such
omissions deceive no one, and are not meant to deceive any one.
Yet David's failings are not altogether absent from this history. He
has those vices which were characteristic alike of his own age and of
the chronicler's, and which indeed are not yet wholly extinct. He
could treat his prisoners with barbarous cruelty. His pride led him to
number Israel, but his repentance was prompt and thorough; and
the incident brings out alike both his faith in God and his care for his
people. When the whole episode is before us, it does not lessen our
love and respect for David. The reference to his alliance with the
Philistines is vague and incidental. If this were our only account of
the matter, we should interpret it by the rest of his life, and conclude
that if all the facts were known, they would justify his conduct.
[pg 159]
In forming a general estimate of David according to Chronicles, we
may fairly neglect these less satisfactory episodes. Briefly David is
perfect saint and perfect king, beloved of God and man.

A portrait reveals the artist as well as the model and the chronicler in
depicting David gives indications of the morality of his own times.
We may deduce from his omissions a certain progress in moral
sensitiveness. The book of Samuel emphatically condemns David's
treachery towards Uriah, and is conscious of the discreditable nature
of many incidents connected with the revolts of Absalom and
Adonijah; but the silence of Chronicles implies an even severer
condemnation. In other matters, however, the chronicler “judges
himself in that which he approveth.”179 Of course the first business of
an ancient king was to protect his people from their enemies and to
enrich them at the expense of their neighbours. The urgency of
these duties may excuse, but not justify, the neglect of the more
peaceful departments of the administration. The modern reader is
struck by the little stress laid by the narrative upon good
government at home; it is just mentioned, and that is about all. As
the sentiment of international morality is even now only in its
infancy, we cannot wonder at its absence from Chronicles; but we
are a little surprised to find that cruelty towards prisoners is included
without comment in the character of the ideal king.180 It is curious
that the account in the book of Samuel is slightly ambiguous and
might possibly admit of a comparatively mild interpretation; but
Chronicles, according to the ordinary translation, says definitely, “He
cut them with saws.” The mere [pg 160] reproduction of this
passage need not imply full and deliberate approval of its contents;
but it would not have been allowed to remain in the picture of the
ideal king, if the chronicler had felt any strong conviction as to the
duty of humanity towards one's enemies. Unfortunately we know
from the book of Esther and elsewhere that later Judaism had not
attained to any wide enthusiasm of humanity.
[pg 161]
Chapter IV. David—III. His Official Dignity.

In estimating the personal character of David, we have seen that


one element of it was his ideal kingship. Apart from his personality,
his name is significant for Old Testament theology, as that of the
typical king. From the time when the royal title “Messiah” began to
be a synonym for the hope of Israel, down to the period when the
Anglican Church taught the Divine right of kings, and Calvinists
insisted on the Divine sovereignty or royal authority of God, the
dignity and power of the King of kings have always been illustrated
by, and sometimes associated with, the state of an earthly monarch
—whereof David is the most striking example.

The times of the chronicler were favourable to the development of


the idea of the perfect king of Israel, the prince of the house of
David. There was no king in Israel; and, as far as we can gather, the
living representatives of the house of David held no very prominent
position in the community. It is much easier to draw a satisfactory
picture of the ideal monarch when the imagination is not checked
and hampered by the faults and failings of an actual Ahaz or
Hezekiah. In earlier times the prophetic hopes for the house of David
had often been rudely disappointed, but there had been [pg 162]
ample space to forget the past and to revive the old hopes in fresh
splendour and magnificence. Lack of experience helped to commend
the idea of the Davidic king to the chronicler. Enthusiasm for a
benevolent despot is mostly confined to those who have not enjoyed
the privilege of living under such autocratic government.

On the other hand, there was no temptation to flatter any living


Davidic king, so that the semi-Divine character of the kingship of
David is not set forth after the gross and almost blasphemous style
of Roman emperors or Turkish sultans. It is indeed said that the
people worshipped Jehovah and the king; but the essential character
of Jewish thought made it impossible that the ideal king should sit
“in the temple of God, setting himself forth as God.” David and
Solomon could not share with the pagan emperors the honours of
Divine worship in their life-time and apotheosis after their death.
Nothing addressed to any Hebrew king parallels the panegyric to the
Christian emperor Theodosius, in which allusion is made to his
“sacred mind,” and he is told that “as the Fates are said to assist
with their tablets that God who is the partner in your majesty, so
does some Divine power serve your bidding, which writes down and
in due time suggests to your memory the promises which you have
made.”181 Nor does Chronicles adorn the kings of Judah with
extravagant Oriental titles, such as “King of kings of kings of kings.”
Devotion to the house of David never oversteps the bounds of a due
reverence, but the Hebrew idea of monarchy loses nothing by this
salutary reserve.

Indeed, the title of the royal house of Judah rested upon Divine
appointment. “Jehovah ... turned the [pg 163] kingdom unto David;
... and they anointed David king over Israel, according to the word
of Jehovah by the hand of Samuel.”182 But the Divine choice was
confirmed by the cordial consent of the nation; the sovereigns of
Judah, like those of England, ruled by the grace of God and the will
of the people. Even before David's accession the Israelites had
flocked to his standard; and after the death of Saul a great array of
the twelve tribes came to Hebron to make David king, “and all the
rest also of Israel were of one heart to make David king.”183 Similarly
Solomon is the king “whom God hath chosen,” and all the
congregation make him king and anoint him to be prince.184 The
double election of David by Jehovah and by the nation is clearly set
forth in the book of Samuel, and in Chronicles the omission of
David's early career emphasises this election. In the book of Samuel
we are shown the natural process that brought about the change of
dynasty; we see how the Divine choice took effect through the wars
between Saul and the Philistines and through David's own ability and
energy. Chronicles is mostly silent as to secondary causes, and fixes
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like