0% found this document useful (0 votes)
64 views9 pages

An Overview of Supervised Machine Learning Paradigms and Their Classifiers

Artificial Intelligence (AI) is the theory and development of computer systems capable of performing complex tasks that historically requires human intelligence such as recognizing speech, making decisions and identifying patterns. These tasks cannot be accomplished without the ability of the systems to learn. Machine learning is the ability of machines to learn from their past experiences. Just like humans, when machines learn under supervision, it is termed supervised learning. In this work, a

Uploaded by

Poonam Kilaniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views9 pages

An Overview of Supervised Machine Learning Paradigms and Their Classifiers

Artificial Intelligence (AI) is the theory and development of computer systems capable of performing complex tasks that historically requires human intelligence such as recognizing speech, making decisions and identifying patterns. These tasks cannot be accomplished without the ability of the systems to learn. Machine learning is the ability of machines to learn from their past experiences. Just like humans, when machines learn under supervision, it is termed supervised learning. In this work, a

Uploaded by

Poonam Kilaniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

International Journal of Advanced Engineering, Management and

Science (IJAEMS)
Peer-Reviewed Journal
ISSN: 2454-1311 | Vol-10, Issue-3; Mar-Apr, 2024
Journal Home Page: https://ijaems.com/
DOI: https://dx.doi.org/10.22161/ijaems.103.4

An Overview of Supervised Machine Learning Paradigms


and their Classifiers
Mbeledogu Njideka Nkemdilim, Paul Roseline Uzoamaka, Ugoh Daniel and Mbeledogu
Kaodilichukwu Chidi

Received: 30 Jan 2024; Received in revised form: 14 Feb 2024; Accepted: 22 Mar 2024; Available online: 1 Apr 2024

Abstract— Artificial Intelligence (AI) is the theory and development of computer systems capable of
performing complex tasks that historically requires human intelligence such as recognizing speech, making
decisions and identifying patterns. These tasks cannot be accomplished without the ability of the systems to
learn. Machine learning is the ability of machines to learn from their past experiences. Just like humans,
when machines learn under supervision, it is termed supervised learning. In this work, an in-depth knowledge
on machine learning was expounded. Relevant literatures were reviewed with the aim of presenting the
different types of supervised machine learning paradigms, their categories and classifiers.
Keywords— Artificial intelligence, Machine learning, supervised learning paradigms

I. INTRODUCTION progressively blurs the boundaries between machine


For intelligent system to perform complex tasks that intelligence and human intellect (Tucci, 2023).
historically requires human intelligence such as recognizing
speech, making decisions and identifying patterns (Staff, II. MACHINE LEARNING
2023), it requires the ability to learn from past experiences.
ML are computational techniques (scientific algorithms and
Learning is a process that leads to change and it is an
statistical models) that enable computers to learn from data
attribute that is possessed by humans. It occurs as a result
without being explicitly programmed. If programming is
of experience and increases the potential for improved
automation, then ML is automating the process of
performance and future learning (Ambrose et al., 2010). As
automation. It provides machines with the ability to learn
the intelligence demonstrated by machines are said to be
independently (Ghahremani-Nahr et al., 2021) and makes
artificial, their learning ability is referred to as machine
programming scalable.
learning (ML). ML is a type of Artificial Intelligence (AI)
focused on building computer systems that learn from data. According to NetApp (2023), ML is made up of three parts.
It has applications in all types of sectors including They are:
manufacturing, retail, cyber-security, real-time chatbot a) Computational Algorithm: A formal procedure
agents, humanities disciplines, Agriculture, Social media, describing an ordered sequence of operations to be
healthcare and life sciences, Email, Image processing, performed a finite number of times (Falade, 2021).
travel and hospitality, financial services and energy, This is at the core of considering determinations.
feedstock and utilities ( Bansal et al., 2019). b) Variables and features that make up the decisions.
In the light of its applications, it is undoubtedly more c) Knowledge Base: The known facts which the
valuable than other branches of AI because for a system to system trains to learn from.
be intelligent, it must possess the ability to learn in order to In a typical simple model of machine learning (Fig. 1), the
improve the performance of their AI software applications environment supplies the information to the learning
over time and as well as possess the ability to adapt to element which uses the information to make improvements
changes. This in turn fuels the advancements in AI and in the knowledge base in order for the performance element
to perform its task accurately. The kind of information

This article can be downloaded from here: www.ijaems.com 24


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

supplied to the machine by the environment is usually determined from raw data and experience in the inductive
imperfect, with the result that the learning element does not information processing and it is used in similarity-based
know in advance how to fill in missing details or ignore learning where as in deductive, general rules are used to
details that are unimportant. The machine therefore, determine the specific facts and is used in proof of a
operates by guessing and then receives feedback from the theorem where deductions are made from known axioms to
performance element. The feedback mechanism enables the other existing axioms (Haykin, 1994).
machine to evaluate its hypotheses and revise them if In comparison with the traditional programming, ML uses
necessary. data and output to run on the computer to generate a
Two different kinds of information processing are involved program which can then be used in traditional programming
in machine learning. They are the inductive and deductive while traditional programming uses data and program on
information processing. General pattern and rules are the computer to produce output (Brownlee, 2020).

Data Data
Computer Output Computer Program
Program Output
(a) (b)
(a) Traditional Programming (b) Machine Learning
Fig. 1: Typical simple model of machine learning

Machine Learning Classifiers ii. Regression: The function is continuous. The target
The technique for determining which class a dependent variable is numeric.
belongs to base on one or more independent variables is iii. Forecasting (Probability Estimation): The function
termed as Classification. The type of machine learning is a probability.
algorithm that assigns a label to a data input is known as iv. The supervised learning paradigm classifiers are
Classifier. Decision trees, Naïve Bayes, Regression, Logistic
Regression, Support Vector Machine (SVM), K-
Supervised Machine Learning Paradigm and their
Nearest Neighbor (K-NN), Discriminant Analysis,
Classifiers
Ensemble Methods and Neural Networks.
As the name implies, it is when a machine learns under
Decision Trees
supervision. This is the learning paradigm for acquiring the
input-output relationship information of a system based on This is a statistical classifier used for both classification and
a given set of paired input-output training samples. The regression problems. It incorporates nominal and numerical
model is provided with a correct answer (output) for every values that are expressed as a recursive partition of the
input pattern (Samarasinghe, 2006) and as such referred to instance space. Decision tree is a graphical representation
as “learning with a teacher” (Jain, 1996), that is, available of a well-defined decision problem (Fig. 3). It consists of
data comprises feature vectors together with the target nodes that are concerned with decision making and arcs
values. The learner (computer program) is provided with which connects the nodes (decision rules). The decision tree
two sets of data, training set and test set. The training set has forms the rooted (directed) tree that has basically three types
labelled dataset examples (solution to each problem dataset) of nodes: the root nodes, the internal nodes and the terminal
which the learner can use to identify unlabeled examples in nodes. The root node originates from the tree and in turn is
the test set with the highest possible accuracy as depicted in called the parent node. It has no incoming edges and zero or
Fig. 2. The data is analyzed in order to tune the parameters more outgoing edges. Every other nodes have one incoming
of the model that were not in the training set to predict the node and are called child node. A node with outgoing edges
target value for the new set of data (test data). is termed an internal node. It is also referred to as the test
node. It represents the features of the dataset. Each internal
The major tasks of supervised learning paradigms are:
node has exactly one incoming edge, two or more outgoing
i. Classification: Labeled data and classifiers are used edges and splits the instance space into two or more sub-
to produce predictions about data input spaces based on the discrete function of the input attribute
classifications. The function is discrete and it is a values (attribute test condition) to separate records that have
categorical type. different characteristics. This latter process is called
Splitting.
This article can be downloaded from here: www.ijaems.com 25
©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

Fig. 2: Data Flow Diagram of Supervised Learning Paradigm

This is the process of dividing a node into two or more process of going through and reducing the tree to only the
nodes and decision branches off into variables. For numeric most important nodes or outcomes.
attributes, the range is considered as the partition criteria Decision Tree Pseudocode:
where the decision tree can be geometrically interpreted as
1. Start the decision tree with a root node, P that
a collection of hyperplanes, each orthogonal to one of the
contains the complete dataset.
axes. For classification problem, the entropy, Gini index
2. Using the Attribute Selection Measure (ASM),
and information gain (IG) are the splitting metrics used
determine the best attribute in the dataset P to split
while for regression, residual sum of squares is applied. All
it.
other nodes apart from the root and internal nodes are
3. Divide P into subsets containing possible values
termed as the leaves/terminal/decision nodes. Each of the
for the best attributes.
leaf has exactly one incoming edge and no outgoing edges
4. Generate a tree node that contains the best
because it represents the outcome. The leaf node is assigned
attribute.
to the class label describing the most appropriate target
5. Make new decision trees recursively by using the
value. Instances are classified by navigating them from the
subsets of the dataset P created in Step 3. Continue
root down through the arcs to the leaf (Figure 4). Pruning in
the process until a point is reached that the nodes
decision tree classifier is the opposite of splitting. It is the
cannot be further classified.

This article can be downloaded from here: www.ijaems.com 26


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

Fig. 3: Decision tree showing the root, internal and leaf nodes

Naive Bayes For the mathematical analysis from Bayes theorem, if A and
This is a probabilistic classifier and a generative learning B are events and P(B) ≠ 0, to find the probability of event
algorithm that is based on Bayes’ theorem. It is used for text A:
classification task. Given the data and some prior 𝑃(𝐵 |𝐴)𝑃(𝐴)
𝑃(𝐴|𝐵) = …(1.1)
knowledge, the theorem is based on the probability of a 𝑃(𝐵)

hypothesis. The classifier assumes that all features in the where Event B is an evidence (true), P(A) is the priori of A,
input data are conditionally independent of each other, P(B) is the marginal probability, 𝑃(𝐴|𝐵) is the posteriori
given the class label (note: this assumption is not true for all probability of B and 𝑃(𝐵|𝐴) is the Likelihood probability
real world cases) thereby, permitting the algorithm to make that a hypothesis will come true based on the evidence.
predictions quickly. The dataset is divided into two: the Applying Bayes theorem:
feature matrix and the response vector. The feature matrix
𝑃(𝑋 |𝑦)𝑃(𝑦)
contains all the vector of the dataset in which each vector 𝑃(𝑦|𝑋) = …(1.2)
𝑃(𝑋)
consist of the value of the dependent features. The response
vector contains the value of class variable (prediction) for y is the class variable and X is the dependent feature vector
each row of the feature matrix. (of size n), where

Assumptions of Naive Bayes 𝑋 = 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 …(1.3)

i. Feature independence: The features of the data are Putting the naïve assumption into the Bayes’ theorem
conditionally independent of each other, given the (independence among the features), we split the evidence
class label. into independent parts.
ii. Continuous features are normally distributed: If a If A and B are independent, then:
feature is continuous then it is assumed to be P(A,B) = P(A)P(B) …(1.4)
normally distributed within each class.
Hence,
iii. Discrete features have multinomial distributions:
If a feature is discrete then it is assumed to have a 𝑃(𝑥1 |𝑦)𝑃(𝑥2 |𝑦)…𝑃(𝑥𝑛 |𝑦)𝑃(𝑦)
𝑃(𝑦|𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) =
𝑃(𝑥1 )𝑃(𝑥2 )…𝑃(𝑥𝑛 )
multinomial distribution within each class.
iv. Features are equally important: All features are …(1.5)
assumed to contribute equally to the prediction of which can be expressed as:
the class label. 𝑛
𝑃(𝑦) ∏𝑖=1 𝑃(𝑥𝑖 |𝑦)
v. No missing data: The data should not contain any 𝑃(𝑦|𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) = …(1.6)
𝑃(𝑥1 )𝑃(𝑥2 )…𝑃(𝑥𝑛 )
missing values.

This article can be downloaded from here: www.ijaems.com 27


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

As the denominator remains constant for any given input, data within the dataset, m is the coefficient
we remove 𝑃(𝑦|𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 ) ∝ 𝑃(𝑦) ∏𝑛𝑖=1 𝑃(𝑥𝑖 |𝑦) (contribution of the input value in determining the
In order to create the classifier model, we find the best fit line) and c is the bias or intercept
probability of the given set of inputs for all possible values (deviations added to the line equation for the
of the class variable y, and with maximum probability. predictions made).
2. Adjust the line by varying m and c.
𝑦 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑦 𝑃(𝑦) ∏𝑛𝑖=1 𝑃(𝑥𝑖 |𝑦) ..(1.7) 3. Randomly determine values initially for m and c
Regression and plot the line.
The goal of this statistical classifier is to plot the best-fit line 4. If the line does not fit best, adjust m and c using
or curve between the data (Kurama, 2023). A continuous gradient descent algorithm or least square method.
outcome (y) is predicted based on the value of the predictor 𝑦 = 𝑚𝑥 + 𝑐 …(1.8)
variables (x). Linear regression is the most common y = the dependent variable and it is plotted along
regression model due to ease (Fig. 4). It finds the linear the y-axis
relationship between the dependent variables (continuous) x = the independent variable and plotted along the
and one or more independent variables (continuous or x-axis
discrete).
m = Slope of the line
Steps in determining the best-fit line:
c = the intercept (the value of y when x = 0)
1. Considering the linear problem 𝑦 = 𝑚𝑥 + 𝑐
Line of regression = Best fit line for a model
where y is the dependent data, x is the independent

Fig. 4: Linear Regression Model showing the Best Fit Line

Logistic Regression 𝑒 (𝑏0 + 𝑏1 𝑋)


𝑦= …(1.9)
1+ 𝑒 (𝑏0 + 𝑏1 𝑋)
This does binary classification tasks by predicting the
where 𝑥 = input value, 𝑦 = predicted output, 𝑏0 = bias or
probability of an outcome, event, or observation. Based on
intercept term and 𝑏1 = coefficient for input (𝑥)
the independent variables, it predicts the probability of an
event occurring by fitting the data to a logistic function Logistic regression is similar to linear regression where the
(Fig. 5). The coefficients of the independent variables in input values are combined linearly to predict an output
the logistic function are optimized by maximizing the value using weights or coefficient values but differs in the
likelihood function. A decision boundary is determined output value model. Logistic regression returns a binary
such that the cost function is minimal using Gradient value (0 or 1) as output rather than a numeric value as with
Descent. The model delivers a binary or dichotomous the linear regression.
outcome limited to two possible outcomes: yes/no, 0/1, or
true/false. This is mathematically defined as:

This article can be downloaded from here: www.ijaems.com 28


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

Fig. 5: Logistic Regression with predicted y between 0 and 1

Support Vector Machine (SVM) K-Nearest Neighbour (K-NN)


This is used for classification (pattern recognition) and This is a non-parametric instance base learning classifier
regression (function approximation) problems. It is based that uses proximity (distance) to make predictions about the
on statistical learning theory that can transform the input grouping of individual data. Due to the fact that it is
data into an N-dimensional (where N is the number of unlikely for an object to exactly match another, the classifier
features that is high) by the use of kernel function to clearly finds a group of 𝑘 objects in the training set that are closest
create a linear model in the feature space. The kernel to the test object by measuring the distance between the data
functions used in SVM include linear, polynomial, radial (similarity measure) and assigns a label based on the
basis function and sigmoid function. predominance of a particular class in their neighbor
It constructs an optimal hyperplane (decision boundary) in (Steinbach and Tan, 2009). K-NN is a lazy learning
a multidimensional space that separates cases of different technique because it delays until the query occurs to
class labels by using the objects (samples) on the edges of generalize beyond the training data.
the margin (support vectors) to separate objects rather than K-NN Pseudocode
using the differences in class means. It is based on the 1. Determine parameter k = number of nearest
separation mechanism of the algorithm to obtain a neighbor.
hyperplane by supporting (defining) using the vectors (data 2. Calculate the distance between the query-instance
points) nearest to the margin that it was called the Support and all the training examples.
Vector Machine. 3. Sort the distance and determine the nearest
Sahu and Sharma (2023) noted that SVM uses the Hinge neighbour based on the k-t minimum distance.
Loss function to maximize the margin distance between the 4. Gather the category Y of the nearest neighbor.
observations of the classes (training) as in equ. 1,10. 5. Use simple majority of the category of the nearest
𝑙(𝑦) = max(0,1 + max 𝑤𝑦 𝑥 − 𝑤𝑡 𝑥) …(1.10) neighbor as the prediction value of the query
𝑦≠𝑡 instance.
where w is the model parameter, x is the input variable and Linear Discriminant Analysis (LDA)
t is the target variable.
This is also known as normal discriminant analysis (NDA)
SVM can efficiently be used in high dimensional space or discriminant function analysis (DFA). This technique
where the number of spaces is higher than the number of aids in optimizing machine learning models in data science.
samples, though it can result to poor outcome. The fame of It has generative model frame work because the data
SVM rests on two key properties: it finds solutions to distribution for each class is modeled and uses Bayes
classification tasks that have generalization and it solves theorem to classify new data points by calculating the
non-linear problems using the kernel trick, thus, referred to probability of whether an input data set will belong to a
as kernel machine. It uses Gaussian particular output. Also, this is used to solve multi-class
distribution, thereby, making the induction paradigm for classification problems by separating multiple classes with
parameter estimation the maximum likelihood method multiple features through data dimensionality reduction.
which is then reduced to the minimization of sum-of-errors- Assumptions of LDA
square cost function.

This article can be downloaded from here: www.ijaems.com 29


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

1. Every feature such as variable, dimension, or of the layers are considered) to handle complex non-linear
attribute in the dataset has Gaussian distribution. tasks. The Feed forward neural network comprises of the
2. Each feature holds the same variance and has single layer (Hopfield net architecture) and Multiple layer
varying values around the mean with the same perceptron (MLP) uses back propagation learning
amount on average. (Levenberg Marquardt) and Radial basis neural network are
supervised learning.
3. Each feature is assumed to be sampled randomly.
Feed Forward Neural Networks (FFNN): This is a
4. Lack of multicollinearity in independent features
layered neural network in which an input layer of source
and there is an increment in correlations between
nodes projects on to an input layer of neurons but not vice
independent features and the power of prediction
versa.
decreases.
a. Single-layer Feed Forward Network: This is the
In reducing the features from higher dimension space to
simplest kind of neural network that is flat and
lower dimensional space, the following steps should be
consists of a single layer of output nodes (Fig. 6).
considered:
It is also called single perceptron. The inputs are
1. Compute the separate ability amid the various fed directly to the outputs through a series of
classes. This is to determine the between-class weights. The sum of the products of the weights
variance of the different classes (the distance and the inputs are calculated in each node, and if
between the mean of the different classes). the value is above some threshold (typically 0),
2. Compute the distance among the mean and the the neuron fires and takes the activated value
sample of each class (within class variance). (typically 1); otherwise it takes the deactivated
3. Determine the lower dimensional space that value (-1). Single perceptron is only capable of
maximizes the between class variance and learning linearly separable patterns.
minimizes the within class variance.
Ensemble Methods
This classifier encapsulates multiple learning algorithms for
better predictive results. It aims to mitigate errors or biases
that may exist in individual models by leveraging the
collective intelligence of the ensemble (Singh, 2023). The
outputs of many models are combined thereby utilizing the Fig. 6: A Single layer Feed Forward Network
strengths of these models to improve accuracy and handle
uncertainties in data in its learning system. The various
The mapping of single unit perceptron is expressed as:
ensemble techniques are Max Voting, Averaging, Weighted
Average, Stacking, Blending, Bagging and Boosting. 𝑦 = 𝑓(∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖 + 𝑏) …(1.11)
Artificial Neural Network (ANN)
It is designed to mimic the function and structure of the where 𝑤𝑖 are the individual weights, 𝑥𝑖 are the inputs and 𝑏
human brain. ANN is an intricate network of interconnected is the bias
nodes or neurons that collaborates to tackle complicated b. Multilayer Feed Forward Network (MLP):
tasks. The main characteristics of ANN is the ability to learn This distinguishes itself by the presence of one or
in classification task. It learns by example and through more hidden layers called hidden neurons
experience. In high dimensionality data, learning is needful between the input units and the output units (Fig.
in modeling non-linear relationships or recognizing not well 7). This aids the network in dealing with more
established relationship amongst the input variables. The complex non-linear problems. MLP is structured
learning process is achieved by adjusting the weights of the in a feed forward topology whereby each unit gets
interconnections according to the applied learning its input from the previous one (back
algorithm. The basic attributes of ANNs can be classified propagation).
into Architectural attributes and Neuro-dynamic attributes
(Kartalopoulos, 1996). The architectural attributes define
the number and topology of neurons and interconnectivity
while the neuro-dynamic attributes define the functionality
of the ANN. Based on this, ANN is also referred to as Deep
Learning (DL) when it has more than three layers (the depth

This article can be downloaded from here: www.ijaems.com 30


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

radial basis functions, ‖𝑥 − 𝑐𝑖 ‖ is the Euclidean distance


between the input vector and the center of the radial basis
function and ∅ is the radial basis function usually chosen to
be a Gaussian Function.

III. CONCLUSION
Fig. 7: Multiple Layer Perceptron As the present world revolts round AI for its benefits,
machine learning has been of immense importance to the
building body of such intelligent systems to improve their
The mapping of the inputs to the outputs using an MLP performances. Learning under supervision to predict the
neural network can be expressed as: output of a system when given new inputs has been more
(2) (1) (1) (2)
𝑦𝑘 = 𝑓(∑𝑚 𝑛
𝑗=1 𝑤𝑘𝑗 (∑𝑖=1 𝑤𝑗𝑖 + 𝑤𝑗0 ) + 𝑤𝑘0 )
accurate and of ease when the decision boundary is not
overstrained. The overview of supervised machine learning
…(1.12) paradigms gave a detailed insight to the various statistical
(1) (2)
Where 𝑤𝑗𝑖 and 𝑤𝑘𝑗 indicate the weights in the first and and scientific classifiers used in building functions that map
second layers respectively, going from input 𝑖 to hidden unit new data onto the expected output values in tasks that
𝑗 (hidden layer 1), 𝑚 is the number of the hidden units, 𝑦𝑘 requires either or both classification and regression issues.
(1) (2)
is the output unit, 𝑤𝑗0 and 𝑤𝑘0 are the biases for the
hidden units 𝑗 and 𝑘 respectively. For simplicity, the biases REFERENCES
have been omitted from the diagram. [1] Ambrose, S.A., Bridges, M.N., Dipietro, M, Lovett, M.C. and
c. Radial Basis Neural Network (RBNN): This is Norman, M.K. (2010). How Learning Works: Seven
Research-Based Principles for Smart Teaching, Jossey-Bass
also called Radial Basis Feed Forward (RBF)
A Wiley Imprint Publisher, San Francisco, pp. 1-301
network. It is a two layer feed forward type
[2] Bansal, R., Singh, J. and Kaur, R. (2019). Machine Learning
network in which the input is transformed by the and its Applications: A Review, Journal of Applied Science
basis function at the hidden layer (Fig. 8). At the and Computations, Vol. VI Issue VI, pp. 1392-1398
output layer, linear combinations of the hidden [3] Brownlee, J. (2020). Basic Concepts in Machine Learning.
layer node responses are added to form the output. Retrieved from
The name RBF comes from the fact that the Basis https://machinelearningmastery.com/basic-concepts-in-
function in the hidden layer nodes are radially machine-learning/
symmetric, that is, the neurons in the hidden layer [4] Falade, K.I. (2021). Introduction to Computational
Algorithm, Numerical and Computational Research
contain Gaussian transfer functions whose
Laboratory, pp.1-50
outputs are inversely proportional to the distance
[5] Ghahremani-Nahr, J., Hamed, N. and Sadeghi, M.E. (2021).
from the center of the neuron. Artificial Intelligence and Machine Learning for Real-World
Problem (A Survey), International Journal of Innovation in
Engineering 1 (3), pp. 38-47
[6] Haykin, S. (1998). Neural Networks: A Comprehensive
Foundation, Macmillan College Publishing Company, Inc.
USA, pp. 1-696
[7] Jain, A.K. (1996). Artificial Neural Networks: A tutorial. Pp.
1-14. Retrieved from
www.cogsci.ucsd.edu/ajyu/Teaching/cogs202_sp12/Readin
gs/jain_ann96.pdf
[8] Kartalopoulous, S.V. (1996). Understanding Neural
Fig. 8: Radial Basis Neural Network
Networks and Fuzzy Loic: Basic Concepts and Applications,
IEEE press, NY, pp. 1-232
[9] Kurama, V. (2023). Regression in Machine Learning: What
Mathematically, it can be expressed as:
it is and Examples of Different Models. Retrieved from
𝑦(𝑥) = ∑𝑁
𝑖=1 𝑤𝑖 ∅(‖𝑥 − 𝑐𝑖 ‖) …(1.13) https://builtin.com/data-science/regression=machine-
where 𝑥 is the input vector, 𝑁 is the number of neurons in learning
[10] NetApp (2023). What is Machine Learning? Retrieved from
the hidden layer, 𝑤𝑖 are weights of the connections from the
https://www.netapp.com/artificial-intelligence/what-is-
hidden layer to the output layer, 𝑐𝑖 are the centers of the machine-learning/

This article can be downloaded from here: www.ijaems.com 31


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/
Nkemdilim et al. International Journal of Advanced Engineering, Management and Science, 10(3) -2024

[11] Sahu, C.K. and Sharma, M. (2023). Hinge Loss in support


Vector Machine. Retrieved from
https://www.niser.ac.in/~smishra/teach/cs460/23cs460/lectu
res/lecII.pdf
[12] Samarasinghe, S. (2006). Neural Networks for Applied
Sciences and Engineering from Fundamentals to Complex
Pattern Recognition, Auerbach Publications, Taylor and
Francis Company , New York, pp. 1-570
[13] Singh, A. (2023). A Comprehensive Guide to Ensemble
Learning (with Python Codes). Retrieved from
https://www.analyticsvidhya.com/blog/2018/06/comprehens
ive-guide-for-ensemble-models/
[14] Staff, C. (2023). What is Artificial Intelligence? Definition,
Uses and Types. Retrieved from
https://www.coursera.org/articles/what-is-artificial-
intelligence
[15] Steinbach, M. and Tan, P. (2009). KNN: K- Nearest
Neighbors, Chapter 8, Taylor and Francis, pp. 151-159
[16] Tucci, L. (2023). What is Machine Learning and How does it
work? In-depth guide. Retrieved from
https://www.techtarget.com/searchenterpriseai/definition/ma
chine-learning-ML

This article can be downloaded from here: www.ijaems.com 32


©2024 The Author(s). Published by Infogain Publication, This work is licensed under a Creative Commons Attribution 4.0 License.
http://creativecommons.org/licenses/by/4.0/

You might also like