0% found this document useful (0 votes)
8 views16 pages

Association Rule

Different machine leaning algorithm

Uploaded by

paldibya712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

Association Rule

Different machine leaning algorithm

Uploaded by

paldibya712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Machine Learning Algorithms And Its Real-World Applications

Abstract: The digital world is rife with data in this Fourth Industrial Revolution (4IR) or Industry 4.0
era. Examples of this data include cybersecurity, mobile, social media, business, Internet of Things
(IoT), and health data. The key to developing intelligent analyses of these data and correspondingly
clever and automated applications is understanding artificial intelligence (AI), and specifically
machine learning (ML). There are many different kinds of machine learning algorithms in the field,
including supervised, unsupervised, semi-supervised, and reinforcement learning. Furthermore, deep
learning, a subset of a larger class of machine learning techniques, is capable of large-scale, intelligent
data analysis. We provide a thorough analysis of various machine learning techniques in this paper,
which may be used to improve an application's intelligence and functionality. Therefore, the
fundamental contribution of this study is to provide an explanation of the principles underlying
various machine learning approaches and how they may be applied in a variety of real-world
application areas, including e-commerce, cybersecurity systems, smart cities, healthcare, and
agriculture, among many others. Based on our investigation, we also emphasise the difficulties and
future directions for research. The overall goal of this article is to provide a technical point of
reference for experts in the industry and academia, as well as for decision-makers in a variety of real-
world scenarios and application domains.
Keywords: Machine learning, Data science, Artificial intelligence, Deep learning, Intelligent
applications, Predictive analytics.

Introduction: In this era of data, where everything is digitally recorded [1,3] and connected to a data
source, we are living in the age of data. For example, there is an abundance of different types of data
in the modern electronic environment, including cybersecurity data, Internet of Things (IoT) data, and
smart data, company data, social media data, smartphone data, COVID-19 data, health data, and a lot
more. Structured, semi-structured, and unstructured data are all covered in brief in Section "Types of
Real-World Data and Machine Learning Techniques," and the number of these types of data is
growing daily. By drawing conclusions from these data, numerous intelligent applications in the
pertinent fields can be developed. For example, the pertinent cybersecurity data can be utilized to
create a data-driven, automated, and intelligent cybersecurity system [4], the relevant mobile data can
be used to create context-aware, tailored smart mobile applications [3], and so on. Therefore, there is
an urgent need for data management tools and techniques that can quickly and intelligently extract
insights or meaningful knowledge from the data, which will serve as the foundation for real-world
applications.
In the context of data analysis and computing, artificial intelligence (AI), and machine learning (ML)
in particular, have expanded quickly in recent years, usually enabling the applications to perform in an
intelligent manner [5]. ML is widely regarded as the most popular newest technologies in the fourth
industrial revolution (4IR or Industry 4.0) and often gives systems the capacity to learn and improve
from experience automatically without being specifically coded [3, 4]. "Industry 4.0" [6] generally
refers to the continuous automation of traditional industrial processes and manufacturing, including
exploratory data processing, through the use of new smart technologies like machine learning
automation. Machine learning algorithms are therefore essential for the intelligent analysis of these
data and the creation of the related real-world applications. Four main types of learning algorithms
may be distinguished in this domain: supervised, unsupervised, semi-supervised, and reinforcement
learning [7].
Generally speaking, the type and characteristics of the data as well as the functionality of the learning
algorithms determine how successful and efficient a machine learning solution is. To efficiently create
data-driven systems, machine learning methods can be used in conjunction with classification
analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule
learning, or reinforcement learning [ 8]. Furthermore, as part of a larger family of machine learning
techniques, artificial neural networks, which are known to be capable of intelligent data analysis, are
the source of deep learning [9]. It is therefore difficult to choose an appropriate learning algorithm that
fits the intended application in a given domain. The explanation for this is that various learning
algorithms have distinct goals, and even within a same category, the results of various learning
algorithms can differ based on the properties of the data [10]. In order to apply machine learning
algorithms in a variety of real-world application areas, including cybersecurity services, IoT systems,
business and recommendation systems, smart cities, healthcare and COVID-19, context-aware
systems, sustainable agriculture, and many more, it is crucial to understand the underlying principles
of these algorithms. These topics are briefly discussed in Section "Applications of Machine Learning."
In this paper, we present a thorough view on several forms of machine learning algorithms that may be
implemented to improve the intelligence and capacities of an application, based on the significance
and potential of "Machine Learning" to evaluate the data indicated above. The explanation of the
concepts and potential of various machine learning approaches, as well as their relevance in the
previously listed real-world application domains, constitutes the study's primary contribution. This
paper's goal is to give academics and professionals in the field a fundamental understanding of
machine learning so they may study, research, and construct intelligent and data-driven systems in the
relevant domains.
The key contributions of this paper are listed as follows:
 To establish the parameters of our research by considering the features and attributes of
diverse real-world data sets as well as the capacities of different learning approaches.
 To offer a thorough understanding of machine learning methods that can be used to improve a
data-driven application's intelligence and capabilities.
 To talk about how machine learning-based solutions can be used in a range of real-world
application areas.
 To emphasize and enumerate the possible directions for future research on intelligent data
analysis and services that fall under the purview of our study.
This is how the remainder of the paper is structured. The next section outlines the parameters of our
investigation and provides a more comprehensive presentation of the different kinds of data and
machine learning algorithms. In the part that follows, we go over and describe several machine
learning algorithms in brief. After that, we go over and summarize a number of real-world application
areas that use machine learning algorithms. We highlight a number of research questions and possible
future paths in the penultimate section, and this study is concluded in the last section.
Real-World Data Types and Machine Learning Methodologies:Usually, machine learning
algorithms take in and analyze data to discover relevant patterns regarding people, transactions,
events, business procedures, and so forth. The different kinds of real-world data and machine learning
algorithm categories are covered in the sections that follow.
Types of Real -World Data: Generally speaking, data accessibility is seen as essential to building
real-world data-driven systems or machine learning models [3, 4]. There are many different types of
data, including unstructured, semi-structured, and structured data [11]. Furthermore, an additional type
that generally represents information about the data is called "metadata." We go over these kinds of
data briefly in the sections that follow.
 Structured: It is highly ordered, easily accessible, and follows a standard order in a data model.
It can be utilized by a computer program or an entity. Tabular formats are commonly used to
store structured data in well-defined systems, such relational databases. Structured data
includes things like names, dates, addresses, credit card numbers, stock information,
geolocation, and so on.
 Semi-structured: Unlike the structured data previously discussed, semi-structured data is not
kept in a relational database; yet, it does contain some organizational characteristics that
facilitate analysis. Semi-structured data includes things like HTML, XML, JSON documents,
NoSQL databases, and more.
 Unstructured: Unstructured data, which primarily consists of text and multimedia, is
considerably harder to collect, handle, and analyze because it lacks a predetermined format or
arrangement. Unstructured data includes, but is not limited to, sensor data, emails, blog posts,
wikis, word processing documents, PDF files, audio files, videos, photos, presentations, web
pages, and many more kinds of business documents.
 Metadata: Also known as "data about data," this is not your typical type of data. "Data" are just
the materials that can be used to classify, quantify, or even document something in relation to
an organization's data attributes. This is the main distinction between "data" and "metadata."
However, metadata provides a description of the pertinent data, making it more meaningful to
data users. The author, file size, date the document was created, keywords used to characterize
the document, etc. are a few fundamental examples of metadata for a document.

Researchers in the fields of data science and machine learning employ a variety of popular datasets for
diverse applications. For instance, these include datasets related to cybersecurity, like NSL-KDD [12],
Bot-IoT [14], and so on; smartphone datasets, call logs [13], SMS logs [2], logs of mobile application
usages [15], logs of mobile phone notifications [18] and so on; IoT data, [16], data related to
agriculture and e-commerce[19], health data, like heart disease [17], COVID-19 [20], and many more
in a variety of application domains. The data can be in any of the several categories mentioned above,
and they can differ depending on the real-world application. The data can be in any of the several
categories mentioned above, and they can differ depending on the real-world application. Various
machine learning techniques can be employed based on their learning capabilities to analyze data in a
specific domain and extract valuable insights for developing intelligent applications in the real world.
These techniques are covered in the following.
Types of Machine Learning Techniques:

Fig 1: Different types of Machine Learning Algorithm

As seen in Fig. 1, machine learning algorithms can be broadly classified into four
categories: semi-supervised learning, reinforcement learning, unsupervised
learning, and supervised learning [7]. The following gives a quick overview of
each kind of learning strategy and how it might be used to address issues in the
real world.

 Supervised: Based on sample input-output pairings, machine learning is generally tasked with
learning a function that translates an input to an output [21]. To infer a function, it makes use
of labeled training data and a set of training examples. Under supervised learning, a task-
driven technique is used when certain objectives are determined to be achieved from a given
set of inputs [4]. The two most popular supervised tasks are "regression," which fits the data,
and "classification," which divides the data. Supervised learning can be used, for example, to
predict the class label or emotion of a text segment, such as a tweet or a product review.
 Unsupervised: Unsupervised learning is a data-driven method that examines unlabeled datasets
without the requirement for human intervention [21]. This is frequently used for exploratory
reasons, groupings in findings, generative feature extraction, and the identification of
significant patterns and structures. Clustering, density estimation, feature learning,
dimensionality reduction, association rule discovery, anomaly detection, and other
unsupervised learning tasks are among the most popular ones.
 Semi-supervised: Working with both labeled and unlabeled data, semi-supervised learning is
characterized as a combination of the supervised and unsupervised approaches described above
[21, 4]. It so lies in the middle of "without supervision" and "with supervision" learning. In
real-world scenarios, when unlabeled data are abundant and labeled data may be scarce in
certain settings, semi-supervised learning might be beneficial [7]. A semi-supervised learning
model's ultimate objective is to produce a better prediction result than one might obtain from
the model utilizing just the labeled data. Semi-supervised learning has application in machine
translation, fraud detection, labeling data, and text categorization, among other areas.
 Reinforcement: An environment-driven technique, or reinforcement learning, is a kind of
machine learning algorithm that allows software agents and machines to automatically assess
the optimal behavior in a given context or environment to increase its efficiency [22]. The
ultimate objective of this kind of learning, which is based on rewards and penalties, is to use
the knowledge gathered from environmental activists to take actions that will maximize
rewards and reduce risks [7]. Although it is not ideal to use it to solve simple or basic
problems, it is a potent tool for training AI models that can help increase automation or
optimize the operational efficiency of sophisticated systems like robotics, autonomous driving
tasks, manufacturing, and supply chain logistics.

Different Machine Learning Algorithms: Classification analysis, regression analysis, data


clustering, association rule learning, feature engineering for dimensionality reduction, and deep
learning techniques are some of the machine learning algorithms that are covered in this area. Fig. 2
depicts the general structure of a machine learning-based predictive model. In phase 1, the model is
trained using historical data, and in phase 2, the result is produced for the new test data.

Fig. 2 A general structure of a machine learning based predictive model considering both the training and testing
phase

Classification Analysis: In machine learning, classification is considered a supervised learning


technique. It also refers to a predictive modeling issue in which a class label is predicted for a given
example [21]. It is a mathematical mapping of a function (f) as target, label, or category from input
variables (X) to output variables (Y). Structured or unstructured data can be used to predict the class
of provided data points. For instance, email service providers' spam detection features, such as "spam"
and "not spam," may present classification challenges. The following lists the typical classification
issues.

 Binary classification: Classification assignments with two class labels, such as "true and false"
or "yes and no," are referred to as binary classification [21]. In these kinds of binary
classification problems, the normal condition may belong to one class and the pathological
state to another. For example, in a task involving a medical test, "cancer not detected" could be
the normal condition, whereas "cancer detected" could be the abnormal state. Comparably, in
the email service provider example above, "spam" and "not spam" are regarded as binary
classifications.
 Multiclass classification: This term has historically been used to describe classification
problems with more than two class labels [21]. Unlike binary classification tasks, multiclass
classification does not follow the concept of normal and abnormal outcomes. Rather, examples
are classed as belonging to one of a range of specified classes. Classifying different types of
network attacks, for instance, can be a multiclass classification task using the NSL-KDD [12]
dataset. This dataset has four class labels for the attack categories, which include DoS (Denial
of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing
Attack.

 Multi-label classification: When an example is connected to multiple classes or labels, multi-


label classification is a crucial factor to take into account in machine learning. It is, thus, a
generalization of multiclass classification in which each example may simultaneously belong
to more than one class in each hierarchical level and the classes involved in the problem are
hierarchically constructed (e.g., multi-level text classification). For example, Google News can
be viewed by category, such as "technology," "city name," or "latest news." In contrast to
classic classification tasks where class labels are mutually exclusive, multi-label classification
involves sophisticated machine learning techniques that permit predicting several mutually
non-exclusive classes or labels [23].

The literature on machine learning and data science has many proposed categorization methods [21,
8]. We provide a summary of the most extensively used, well-liked techniques in a range of
application domains below.
 Naïve Bayse (NB): The naive Bayes algorithm relies on the assumption of independence
between every pair of attributes and is based on the Bayes theorem. It performs effectively and
may be applied to a variety of real-world scenarios, including spam filtering, document or text
categorization, and both binary and multi-class categories. The NB classifier can be used to
efficiently categorize the noisy examples in the data and to build a reliable prediction model
[25]. The main advantage is that it requires less training data and does it faster than more
complex methods to estimate the required parameters [23]. However, because of its strict
feature independence assumptions, its performance might be affected. NB classifiers are
commonly available in Gaussian, Multinomial, Complement, Bernoulli, and Categorical forms
[23].
 Linear Discriminant Analysis (LDA): Applying Bayes' rule and fitting class conditional
densities to data yields a linear decision boundary classifier known as linear discriminant
analysis (LDA) [23]. Another name for this technique is a generalization of Fisher's linear
discriminant, which projects a given dataset into a lower-dimensional space, i.e., a
dimensionality reduction that lowers the computational costs or decreases the complexity of
the resulting model. Assuming that every class has the same covariance matrix, the basic LDA
model typically fits each class with a Gaussian density [23]. Regression analysis and ANOVA
(analysis of variance) are closely linked to LDA since both aim to express a single dependent
variable as a linear mixture of additional features or measures.
 Logistic regression (LR): Logistic Regression (LR) is another popular statistical model with a
probabilistic foundation that is utilized in machine learning to address classification problems
[24]. Typically, logistic regression estimates the probabilities using a logistic function, which
is also known as the sigmoid function in Eq. 1 because it is theoretically defined. It is most
effective when the dataset can be divided linearly and has a tendency to overfit high-
dimensional datasets. In these kinds of situations, one can prevent over-fitting by using the
regularization (L1 and L2) approaches [23]. A key flaw in logistic regression is the assumption
of linearity between the independent and dependent variables. Although it can be applied to
regression problems as well as classification problems, classification problems are its more
frequent use.

g(z) = 1 (1)
1 + exp(−z)

 K-nearest neighbors (KNN): An "instance-based learning" or non-generalizing algorithm,


commonly referred to as a "lazy learning" algorithm, is K-Nearest Neighbors (KNN) [26]. It
retains all instances corresponding to training data in n-dimensional space, rather than
concentrating on building a generic internal model.KNN makes use of data to categorize fresh
data points according to similarity metrics (such as the Euclidean distance function) [23]. The
k closest neighbors of each point vote with a simple majority to determine the classification.
Accuracy is dependent on the quality of the data, but it is rather resilient to noisy training data.
The most significant problem with KNN is determining the ideal number of neighbors to take
into account. Regression and classification are two applications for KNN.

 Support vector machine (SVM): Support vector machines (SVMs) are another popular
machine learning technology that may be applied to tasks like regression, classification, and
other tasks [27]. A support vector machine builds a hyper-plane or collection of hyper-planes
in high- or infinite-dimensional space. Presumably, since the larger the margin, the smaller the
classifier's generalization error, the hyper-plane with the largest distance from the closest
training data points in each class attains a strong separation. It works well in high-dimensional
spaces and exhibits variable behavior depending on the kernel, a family of mathematical
functions. Some of the common kernel functions used in SVM classifiers are linear,
polynomial, radial basis function (RBF), sigmoid, etc [23]. Nevertheless, SVM does not
perform as well when the data set has additional noise, such as overlapping target classes.

 Decision tree (DT): One well-known non-parametric supervised learning technique is the
decision tree (DT) [28]. The classification and regression challenges also make use of DT
learning techniques [23]. For DT algorithms, ID3 [29], C4.5 [28], and CART [30] are well
recognized. Furthermore, the recently suggested BehavDT [31] and IntrudTree [32] by Sarker
et al. both work well in the pertinent application fields of cybersecurity and user behavior
analytics, respectively. As illustrated in Fig. 3, by sorting the tree downward from the root to a
few leaf nodes , DT categorizes the occurrences. Starting at the tree's root node and working
along the branch that corresponds to the attribute value, instances are categorized by
examining the attribute defined by that node. The most widely used criterion for splitting are
"gini" for the Gini impurity and "entropy" for the information gain, which has a mathematical
expression of [23].

Entrophy: H(X)= ∑ni=1 - (pxilog2pxi) (2)


c
Gini(E)= 1 − f ( x )=a0 + ∑ p i
2
(3)
i=1

Fig. 3 An example of a decision tree structure

 Random forest (RF): One well-known ensemble classification method in machine learning and
data science with many applications is the random forest classifier [35]. This technique
employs "parallel ensembling," which fits multiple decision tree classifiers concurrently on
various data set sub-samples, as illustrated in Fig. 4. The outcome or final result is determined
by majority voting or averages. As a result, it reduces the issue of over-fitting and improves
control and forecast accuracy [23]. As a result, RF learning models that incorporate many
decision trees tend to be more accurate than models that only use one decision tree [10]. It
combines random feature selection [33] with bootstrap aggregation (bagging) [34] to create a
sequence of decision trees with controlled variation. It is flexible.
Fig. 4 An example of a random forest structure considering multiple decision trees

Regression Analysis: A continuous (y) result variable can be predicted using regression analysis, which
consists of a number of machines learning techniques, depending on the values of one or more (x)
predictor variables [41]. The primary difference between regression and classification is that the latter
helps to forecast a continuous quantity, whilst the former predicts discrete class labels. An illustration
of how classification differs when using regression models may be found in Figure 5. There are
frequently some similarities between the two categories of machine learning algorithms. These days,
regression models are frequently employed in many other domains, such as time series estimation,
trend analysis, cost estimation, financial forecasting or prediction, and many more.

 Simple and multiple linear regression: One of the most widely used ML modeling strategies
and a well-known regression approach is simple and multiple linear regression. The dependent
variable in this technique is continuous, the independent variable or variables might be discrete
or continuous, and the regression line has a linear form. Using the best fit straight line, linear
regression establishes a link between the dependent variable (Y) and one or more independent
variables (X), also referred to as the regression line [41]. The following equations define it:

y =a + bx + e (4)
y =a + b1x1 + b2x2 + ⋯ + bnxn + e, (5)

where ‘a’ is the intercept, ‘b’ is the slope of the line, and ‘e’ is the error term. Using the provided
predictor variable(s), one can use this equation to predict the value of the target variable. While
basic linear regression only includes one independent variable, defined in Eq. 4, multiple linear
regression is an extension of simple linear regression that allows two or more predictor variables
to model a response variable, y, as a linear function [21] specified in Eq. 5.

Fig. 5 Classification vs. regression. In classification the dotted line represents a linear boundary that separates the
two classes; in regres- sion, the dotted line models the linear relationship between the two variables

 Polynomial regression: Polynomial regression is a type of regression analysis where the link
between the dependent variable (y) and the independent variable (x) is expressed as the
polynomial degree of nth in x, rather than as a linear relationship [23]. The linear regression
(polynomial regression of degree 1) equation, which is defined as follows, is also the source of
the polynomial regression equation:

y=b0+b1 x+ b2 x2 + b3 x3+ b4 x4 +………….+ bn xn +e (6)

Here, y is the predicted/target output, b0, b1, ...bn are the regression coefficients, x is an
independent/ input variable. In simple words, we can say that if data are not dis- tributed
linearly, instead it is nth degree of polynomial then we use polynomial regression to get desired
output.

Cluster Analysis: Unsupervised machine learning techniques such as cluster analysis, often called
clustering, can be used to find and group related data points in big datasets without regard to the final
result. It accomplishes this by organizing a set of items into clusters, which are groupings of related
objects that are, in some way, more similar to one another than objects in other groups [21]. It is
frequently used as a data analysis approach to find intriguing patterns or trends in data, such as
customer groupings based on behavior. Clustering has a wide range of applications, including user
modeling, health analytics, e-commerce, mobile data processing, cybersecurity, and behavioral
analytics. The following provides a quick overview and summary of several kind of clustering
methods.
 Partitioning methods: Partitioning techniques: This clustering methodology divides the data
into several groups or clusters according to the characteristics and similarities in the data.
Depending on the type of target application, data scientists or analysts usually calculate the
number of clusters either statically or dynamically to generate for the clustering algorithms.
Based on partitioning techniques, the most popular clustering algorithms include K-means
[36], K-Mediods [38], CLARA [37], etc.

 Density-based methods: It makes advantage of the idea that a cluster in the data space is a
contiguous region of high point density isolated from other such clusters by contiguous regions
of low point density in order to distinguish different groups or clusters. Noisy points are those
that do not belong to a cluster. DBSCAN [32], OPTICS [12], and other clustering techniques
based on density are common. When dealing with clusters with similar densities and large
dimensionality data, the density-based approaches generally falter.
 Hierarchical-based methods: Typically, the goal of hierarchical clustering is to create a tree-
like structure, or hierarchy, among the clusters. There are two main categories of hierarchical
clustering strategies: As illustrated in Fig. 7, there are two different approaches: (i)
Agglomerative, which is a "bottom-up" approach where each observation starts in its cluster
and pairs of clusters are combined as one, moving up the hierarchy, and (ii) Divisive, which is
a "top-down" approach where all observations start in one cluster and splits are carried out
recursively, moving down the hierarchy. In particular, the bottom-up clustering algorithm from
Sarker et al. [102], which we had previously proposed, is an example of a hierarchical
technique.
 Grid-based methods: Grid-based clustering is very useful for handling large datasets. The idea
is to use a grid representation to summarize the dataset and then combine grid cells to create
clusters. The common grid-based clustering methods are STING [42], CLIQUE [43], etc.
 Model-based methods: Model-based clustering algorithms primarily fall into two categories:
those that rely on neural network learning and those that employ statistical learning [47]. As an
illustration, GMM [46] is a statistical learning approach, while SOM [45] [9] represents a
neural network learning method.
 Constraint-based methods: A semi-supervised method of data clustering called "constrained-
based clustering" makes use of restrictions to include domain knowledge. The clustering is
formed by incorporating application- or user-oriented requirements. These types of clustering
are typically clustered using CMWK-Means [49], COP K-means [48], etc.

In the literature on machine learning and data science, numerous clustering algorithms have been
suggested with the capability to group data [21, 8]. We provide a summary of the widely-used, well-
liked techniques in a number of application areas below.
 K-means clustering: Fast, dependable, and easy to use, K-means clustering [36] yields accurate
findings when data sets are well-separated from one another. Using this approach, the data
points are assigned to clusters so that the squared distance between the data points and the
centroid is as short as feasible. Stated differently, the K-means method finds the k number of
centroids and, in an effort to minimize the centroids, assigns each data point to the closest
cluster. The K-means clustering process is susceptible to outliers because extreme values can
quickly alter a mean. A K-means variation that is more resilient to noise and outliers is called
K-medoids clustering [50].
 Mean-shift clustering: A nonparametric clustering method that does not require previous
knowledge of the number of clusters or constraints on cluster shape is mean-shift clustering
[51]. Finding "blobs" in a smooth distribution or sample density is the goal of mean-shift
clustering [23]. This algorithm, which is based on centroid selection, updates centroid
candidates to represent the average of the points inside a specified area. These candidates are
filtered in a post-processing step to eliminate near-duplicates, forming the final collection of
centroids. Application domains include computer vision and image processing, where cluster
analysis is used. One drawback of Mean Shift is its high computational cost. Moreover, the
mean-shift approach performs poorly in large dimension scenarios where the number of
clusters shifts suddenly.
 DBSCAN: A foundational approach for density-based clustering that is extensively employed
in data mining and machine learning is called Density-based Spatial Clustering of Applications
with Noise (DBSCAN) [39]. This method of separating high-density clusters from low-density
clusters for model construction is referred to as a non-parametric density-based clustering
strategy. The fundamental tenet of DBSCAN is that a point is associated with a cluster if it is
near several other points in that cluster. In a large amount of noisy, outlier-filled data, it can
identify clusters of different sizes and shapes. Unlike k-means, DBSCAN may identify clusters
of any shape and does not require an a priori determination of the number of clusters in the
data.
 GMM clustering: When using a distribution-based clustering algorithm for data clustering,
Gaussian mixture models (GMMs) are frequently employed. A probabilistic model known as a
Gaussian mixture model is one in which a mixing of a finite number of Gaussian distributions
with unknown parameters produces all of the data points [23]. Expectation-maximization (EM)
[23] is an optimization approach that can be used to determine the Gaussian parameters for
each cluster. EM is an iterative technique that estimates the parameters using a statistical
model. Gaussian mixture models, as opposed to k-means, take uncertainty into account and
give the probability that a given data point falls into one of the k clusters. Compared to k-
means, GMM clustering is more reliable and effective with non-linear data distributions.
 Agglomerative hierarchical clustering: Agglom- erative clustering is the most widely used
hierarchical clustering technique that groups objects in clusters according to their similarity.
Using a bottom-up methodology, the program first treats every item as a singleton cluster in
this strategy. After that, cluster pairs are combined one at a time until every cluster is
combined into a single, sizable cluster that contains every object. A dendrogram, or tree-based
representation of the elements, is the finished product. Examples of these techniques are Single
linkage [52], Complete linkage, BOTS [41], and so on. The primary benefit of agglomerative
hierarchical clustering over k-means is that the tree-structure hierarchy produced by
agglomerative clustering can aid in better decision-making in the applicable application areas
because it is more informative than the disorganized collection of flat clusters returned by k-
means.

Dimensionality Reduction and Feature Learning: High-dimensional data processing is a difficult


task for researchers and application developers in the fields of machine learning and data science.
Therefore, dimensionality reduction, an unsupervised learning technique, is important because it
reduces computer costs, improves human interpretations, and, by making models simpler, prevents
overfitting and redundancy. Dimensionality reduction can be achieved by the application of both
feature selection and feature extraction processes. The main distinction between feature extraction and
selection is that the former develops entirely new characteristics [53], while the latter retains a subset
of the original features [32, 53]. These methods are briefly discussed in the following.
 Feature selection: The process of selecting a subset of distinct features (variables, predictors)
to utilize in the construction of machine learning and data science models is known as feature
selection, sometimes known as variable selection or attribute selection in the data. By
removing elements that are unnecessary or unimportant, it reduces the complexity of a model
and makes machine learning algorithms train more quickly. By making the model simpler,
more universal, and more accurate, an appropriate and optimal subset of the characteristics
chosen in a problem domain can reduce the issue of overfitting [32]. Consequently, "feature
selection" [54] is regarded as one of the fundamental ideas in machine learning that has a
significant impact on the efficacy and efficiency of the intended machine learning model.
Some well-liked methods for feature selection include the chi-squared test, the analysis of
variance (ANOVA) test, Pearson's correlation coefficient, and regression feature elimination.
 Feature extraction: Feature extraction approaches typically offer a better comprehension of the
data, a way to increase prediction accuracy, and a way to reduce computing cost or training
time in a machine learning-based model or system. "Feature extraction" [54] aims to generate
new features from the current ones in a dataset and then remove the original features in order
to reduce the number of features in the dataset. This new, reduced collection of features can
then be used to summarize most of the information found in the original set of features. To
create new brand components from the existing features in a dataset, for example, principal
components analysis (PCA) is frequently employed as a dimensionality-reduction technique to
obtain a lower dimensional space [53].

In the literature on machine learning and data science, numerous strategies have been put forth to
decrease data dimensions [21, 8]. We provide a summary of the widely-used, well-liked techniques in
a number of application areas below.
 Variance threshold: • The variance threshold is a straightforward fundamental method for
feature selection [23]. By doing this, all features with low variance—that is, features whose
variance is below the threshold—are eliminated. By default, it removes all zero-variance
characteristics, or traits with the same value across all samples. This feature selection approach
can be used for unsupervised learning because it only considers the (X) features and not the
necessary (y) outputs.

 ANOVA: A statistical technique called analysis of variance (ANOVA) is used to confirm


mean values of two or more groups that differ significantly from one another. ANOVA makes
the assumption that the variables' normal distribution and the target have a linear connection. F
tests are used in the ANOVA method to statistically test the equality of means. The test's
"ANOVA F value" [23] can be used to pick features when some features that aren't reliant on
the aim variable can be left out.
 Recursive feature elimination (RFE): A brute force method for feature selection is Recursive
Feature Elimination (RFE). Before it reaches the required number of features, RFE [23] fits the
model and eliminates the weakest feature. The model's coefficients or feature importance are
used to rank the features. Recursive feature elimination (RFE) eliminates a limited number of
features per iteration in an attempt to eliminate dependencies and collinearity in the model.
 Model-based selection: It is possible to utilize linear models penalized with the L1
regularization to decrease the dimensionality of the data. A kind of linear regression known as
least absolute shrinkage and selection operator (Lasso) regression has the ability to reduce
some of the coefficients to zero [23]. It is therefore possible to eliminate that characteristic
from the model. Consequently, the machine learning technique known as penalized lasso
regression is utilized to choose the subset of variables. An example of a tree-based estimator
that can be used to calculate impurity-based function significance and then eliminate
unnecessary features is the Extra Trees Classifier [23].
 Principal component analysis (PCA): One well-known unsupervised learning strategy in the
fields of data science and machine learning is principal component analysis (PCA). Principal
components analysis (PCA) is a mathematical method that converts a set of correlated
variables into a set of uncorrelated variables [56,55]. PCA can be used to create an efficient
machine learning model and as a feature extraction method to lower the dimensionality of the
datasets [53]. In technical terms, PCA projects the data into a new subspace with equal or less
dimensions by first identifying the fully transformed covariance matrix with the highest
eigenvalues [23].
Association Rule Learning: A rule-based machine learning technique called association rule learning
is used to find intriguing connections between variables in big datasets by creating "IF-THEN"
statements [57]. For instance, "a customer is likely to buy anti-virus software (another item) at the
same time s/he buys a computer or laptop (an item)." Today, association rules are used in a wide range
of application fields, such as bioinformatics, smartphone applications, cybersecurity, online usage
mining, medical diagnosis, IoT services, and usage behavior analytics. Association rule learning
typically ignores the order of items within or between transactions, in contrast to sequence mining.
The "support" and "confidence" parameters of association rules, which are described in [57], are a
typical technique to gauge their usefulness.
Numerous approaches for learning association rules have been presented in the data mining literature.
These include logic dependent [58], frequent pattern based [60], and tree-based [59]. Below is a
summary of the most widely used association rule learning algorithms.
 AIS and SETM: The initial association rule mining algorithm put out by Agrawal et al. [57] is
called AIS. The primary drawback of the AIS algorithm is the excessive number of candidate
itemsets that it generates, which wastes time and space. To generate the rules, this approach
requires too many runs over the complete dataset. A different strategy, SETM [60], has the
same problem as the AIS algorithm but shows good performance and steady behavior with
execution time.
 Apriori: Agrawal et al. [61] presented the Apriori, Apriori-TID, and Apriori-Hybrid algorithms
for producing association rules for a given data-set. Because of the Apriori feature of frequent
itemset, these subsequent algorithms perform better than the AIS and SETM discussed before
[61]. Typically, the word "Apriori" denotes prior knowledge about common itemset qualities.
Apriori creates the candidate itemsets using a "bottom-up" methodology. The property "all
subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its
supersets must also be infrequent" is used by Apriori to constrict the search space. Predictive
Apriori [62] is another method that can also provide rules, but it yields unexpected outcomes
because it blends confidence and support. The generally used methods in mining association
rules are called Apriori [61].
 FP-Growth: Frequent Pattern Growth, or FP-Growth, is another popular association rule
learning method that is based on the frequent-pattern tree (FP-tree) that was presented by Han
et al. [59]. The main distinction between Apriori and the FP-growth algorithm [59] is that the
former produces frequent candidate itemsets throughout the rule-generating process, while the
latter employs the effective "divide and conquer" tactic to prevent candidate generation and
instead creates trees. However, FP-Tree is difficult to apply in an interactive mining
environment because of its complexity [63]. It would therefore be difficult to process large
amounts of data because the FP-Tree could not fit into memory. Das et al.'s RARM (Rapid
Association Rule Mining) is an additional approach, yet it has an associated FP-tree problem
[63].
 ABC-RuleMiner: A rule-based machine learning technique that was recently presented by
Sarker et al. [64] in our previous study is meant to find intriguing non-redundant rules that may
be used to deliver intelligent services in the actual world. By considering the influence or
precedence of the associated contextual features, this algorithm efficiently detects redundancy
in connections and finds a set of non-redundant association rules. Using a top-down method,
this algorithm first creates an association generation tree (AGT), from which it then extracts
the association rules by going through the tree. Therefore, ABC-RuleMiner is more effective
than conventional rule-based techniques for both intelligent decision-making and non-
redundant rule development, especially in context-aware smart computing environments where
user or human preferences are involved.
Apriori [61] is the most popular method for identifying association rules from a given dataset [63]
among the association rule learning strategies covered above. The association learning technique's
primary advantage is its comprehensiveness, since it produces all connections that meet the user-
specified limitations, including confidence value and minimal support. The previously stated ABC-
RuleMiner technique [64] has the potential to produce noteworthy outcomes in terms of intelligent
decision-making and non-redundant rule development for the pertinent real-world application areas.


References:

You might also like