Machine Learning Algorithms, Real-World Applications and Research Directions
Machine Learning Algorithms, Real-World Applications and Research Directions
of view.
Similar content being viewed by others
Chapter © 2022
Chapter © 2021
Chapter © 2022
subjects.
1. Artificial Intelligence
Introduction
data, social media data, health data, COVID-19 data, and many
used [103], and so on. Thus, the data management tools and
Fig. 1
The worldwide popularity score of various types of ML algorithms
range of 0 (min) to 100 (max) over time where x-axis represents the
score
0(𝑚𝑖𝑛𝑖𝑚𝑢𝑚) to 100(𝑚𝑎𝑥𝑖𝑚𝑢𝑚)
indication values for these learning types are low in 2015 and are
and services.
broader sense and defines the scope of our study. We briefly discuss
paper.
algorithms.
another type that typically represents data about the data. In the
structured data.
document, etc.
various widely used datasets for different purposes. These are, for
example, cybersecurity datasets such as NSL-KDD [119], UNSW-
smartphone datasets such as phone call logs [84, 101], SMS Log
notification logs [73] etc., IoT data [16, 57, 62], agriculture and e-
commerce data [120, 138], health data such as heart disease [92],
diabetes mellitus [83, 134], COVID-19 [43, 74], etc., and many more
and to extract the insights or useful knowledge from the data for
separates the data, and “regression” that fits the data. For
detection, etc.
straightforward problems.
Fig. 3
A general structure of a machine learning based predictive model
Classification Analysis
problems.
● Binary classification: It refers to the classification tasks
having two class labels such as “true and false” or “yes and
classification.
summarize the most common and popular methods that are used
costs. The standard LDA model usually suits each class with
● 𝑔(𝑧)=11+exp(−𝑧).
● (1)
space. KNN uses data and classifies new data points based
perform well.
tasks [82]. ID3 [87], C4.5 [88], and CART [20] are well
By sorting down the tree from the root to some leaf nodes,
starting at the root node of the tree, and then moving down
● Entropy:𝐻(𝑥)=−∑𝑖=1𝑛𝑝(𝑥𝑖)log2𝑝(𝑥𝑖)
● (2)
● Gini(𝐸)=1−∑𝑖=1𝑐𝑝𝑖2.
● (3)
Fig. 4
An example of a decision tree structure
Fig. 5
An example of a random forest structure considering multiple
decision trees
𝛼
set of its input parameters. Let,
●
𝐽𝑖
● is the learning rate, and
●
𝑖th
● is the training example cost of
●
● , then Eq. (4) represents the stochastic gradient descent
𝑗th
weight update method at the
●
● iteration. In large-scale and sparse machine learning, SGD
● 𝑤𝑗 := 𝑤𝑗−𝛼 ∂𝐽𝑖∂𝑤𝑗.
● (4)
model for unseen test cases [106]. Since the rules are easily
Fig. 6
Classification vs. regression. In classification the dotted line
Regression Analysis
Some overlaps are often found between the two types of machine
line) using the best fit straight line [41]. It is defined by the
following equations:
● 𝑦=𝑎+𝑏𝑥+𝑒
● (5)
● 𝑦=𝑎+𝑏1𝑥1+𝑏2𝑥2+⋯+𝑏𝑛𝑥𝑛+𝑒,
● (6)
● 𝑛th
linear, but is the polynomial degree of
● 𝑦=𝑏0+𝑏1𝑥+𝑏2𝑥2+𝑏3𝑥3+⋯+𝑏𝑛𝑥𝑛+𝑒.
● (7)
● 𝑏0,𝑏1,...𝑏𝑛
● Here, y is the predicted/target output,
● 𝑛th
distributed linearly, instead it is
Cluster Analysis
the same category, called a cluster, are in some sense more similar
clustering algorithm.
clustering.
Fig. 7
A graphical interpretation of the widely-used hierarchical clustering
work well.
one by one until all clusters have been merged into a single
application areas.
new ones from the existing ones and then discarding the
the machine learning and data science literature [41, 125]. In the
algorithm looks only at the (X) features, not the (y) outputs
learning.
● 𝑟(𝑋,𝑌)=∑𝑖=1𝑛(𝑋𝑖−𝑋¯)
(𝑌𝑖−𝑌¯)∑𝑖=1𝑛(𝑋𝑖−𝑋¯)2∑𝑖=1𝑛(𝑌𝑖−𝑌¯)2.
● (8)
● 𝜒2
● Chi square: The chi-square
● 𝜒2
freedom, and the sample size depends on
● 𝜒2
● . The chi-square
● 𝑂𝑖
categorical variables. If
● 𝐸𝑖
● represents observed value and
● 𝜒2=∑𝑖=1𝑛(𝑂𝑖−𝐸𝑖)2𝐸𝑖.
● (9)
selection. RFE [82] fits the model and removes the weakest
features.
matrix and then uses those to project the data into a new
Fig. 8
buys a computer or laptop (an item), s/he is likely to also buy anti-
learning does not usually take into account the order of things within
frequent pattern based [8, 49, 68], and tree-based [42]. The most
This algorithm calls for too many passes over the entire
involved.
Reinforcement Learning
Policy.
commonly used.
[52] is that the initial state is fed into the neural network,
output layers, to learn from data [41]. The main advantage of deep
Fig. 9
Machine learning and deep learning performance in general with the
amount of data
various purposes.
Fig. 10
A structure of an artificial neural network modeling with multiple
processing layers
etc.
Fig. 11
above, several other deep learning approaches [96] exist in the area
is a form of the network for deep learning that can generate data
artificial neural networks (ANN) and deep learning (DL) models are
people.
very high frequency. Thus, collecting useful data for the target
application domain.
To analyze the data and extract insights, there exist many machine
quantity for training, then the machine learning models may become
related to the target application before the system can assist with