0% found this document useful (0 votes)
3 views

Softcomputing Course Material

Uploaded by

Roopa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Softcomputing Course Material

Uploaded by

Roopa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

UNIT 1

Learning and Soft Computing


1.a) Define soft computing? Distinguish between soft computing and hard computing.
The need for soft computing arises from the limitations of traditional, classical computing
methods in solving real-world problems. Soft computing is a branch of artificial
intelligence that provides approximate solutions to complex problems that are difficult or
impossible to solve using classical methods.

The following are some of the reasons why soft computing is needed:

1. Complexity of real-world problems: Many real-world problems are complex and involve
uncertainty, vagueness, and imprecision. Traditional computing methods are not well-
suited to handle these complexities.
2. Incomplete information: In many cases, there is a lack of complete and accurate
information available to solve a problem. Soft computing techniques can provide
approximate solutions even in the absence of complete information.
3. Noise and uncertainty: Real-world data is often noisy and uncertain, and classical
methods can produce incorrect results when dealing with such data. Soft computing
techniques are designed to handle uncertainty and imprecision.
4. Non-linear problems: Many real-world problems are non-linear, and classical methods
are not well-suited to solve them. Soft computing techniques such as fuzzy logic and
neural networks can handle non-linear problems effectively.
5. Human-like reasoning: Soft computing techniques are designed to mimic human-like
reasoning, which is often more effective in solving complex problems.
Overall, soft computing provides an effective and efficient way to solve complex real-
world problems that are difficult or impossible to solve using classical computing methods.
In this article, we will cover the need for soft computing and why it is important. So, to
understand the need for soft computing let us first understand the concept of computing.
Concept of computing :
According to the concept of computing, the input is called an antecedent and the output is
called the consequent. For example, Adding information in DataBase, Compute the sum
of two numbers using a C program, etc.
There are two types of computing as following :
1. Hard computing

2. soft computing
Characteristics of hard computing :

 The precise result is guaranteed.

 The control action is unambiguous.


 The control action is formally defined (i.e. with a mathematical model)
Now, the question arises that if we have hard computing then why do we require the need
for soft computing.
Characteristics of soft computing :

 It may not yield a precise solution.

 Algorithms are adaptive.

 In soft computing, you can consider an example where you can see the evolution
changes for a specific species like the human nervous system and behavior of an
Ant’s, etc.

 Learning from experimental data.


Need For Soft Computing :

 Many analytical models are valid for ideal cases. Real-world problems exist in a non-
ideal environment.

 Soft computing provides insights into real-world problems and is just not limited to
theory.

 Hard computing is best suited for solving mathematical problems which give some
precise answers.

 Some important fields like Biology, Medicine and humanities, etc are still intractable
using Convention mathematical and Analytical models.

 It is possible to map the human mind with the help of Soft computing but it is not
possible with Convention mathematical and Analytical models.
Examples –
Consider a problem where a string w1 is “abc” and string w2 is “abd”.

 Problem-1 :
Tell that whether w1 is the same as w2 or not?
Solution –
The answer is simply No, it means there is an algorithm by which we can analyze it.

 Problem-2 :
Tell how much these two strings are similar?
Solution –
The answer from conventional computing is either YES or NO. But these maybe 80%
similar, this can be answered only by Soft Computing.

Recent development in Soft Computing :

1. In the field of Big Data, soft computing working for data analyzing models, data
behavior models, data decision, etc.

2. In case of Recommender system, soft computing plays an important role for analyzing
the problem on the based of algorithm and works for precise results.

3. In Behavior and decision science, soft computing used in this for analyzing the
behavior, and model of soft computing works accordingly.

4. In the fields of Mechanical Engineering, soft computing is a role model for computing
problems such that how a machine will works and how it will make the decision for a
specific problem or input given.

5. In this field of Computer Engineering, you can say it is core part of soft computing and
computing working on advanced level like Machine learning, Artificial intelligence, etc.

Advantages of Soft Computing:

1. Robustness: Soft computing techniques are robust and can handle uncertainty,
imprecision, and noise in data, making them ideal for solving real-world problems.
2. Approximate solutions: Soft computing techniques can provide approximate solutions
to complex problems that are difficult or impossible to solve exactly.
3. Non-linear problems: Soft computing techniques such as fuzzy logic and neural
networks can handle non-linear problems effectively.
4. Human-like reasoning: Soft computing techniques are designed to mimic human-like
reasoning, which is often more effective in solving complex problems.
5. Real-time applications: Soft computing techniques can provide real-time solutions to
complex problems, making them ideal for use in real-time applications.

Disadvantages of Soft Computing:

1. Approximate solutions: Soft computing techniques provide approximate solutions,


which may not always be accurate.
2. Computationally intensive: Soft computing techniques can be computationally
intensive, making them unsuitable for use in some real-time applications.
3. Lack of transparency: Soft computing techniques can sometimes lack transparency,
making it difficult to understand how the solution was arrived at.
4. Difficulty in validation: The approximation techniques used in soft computing can
sometimes make it difficult to validate the results, leading to a lack of confidence in the
solution.
5. Complexity: Soft computing techniques can be complex and difficult to
!

Difference between Soft Computing and Hard Computing


Last Updated : 22 Feb, 2023



The main difference between Soft Computing and Hard Computing is their
approach to solving complex problems:
1. Hard Computing: Hard computing uses traditional mathematical methods
to solve problems, such as algorithms and mathematical models. It is based
on deterministic and precise calculations and is ideal for solving problems
that have well-defined mathematical solutions.
2. Soft Computing: Soft computing, on the other hand, uses techniques such
as fuzzy logic, neural networks, genetic algorithms, and other heuristic
methods to solve problems. It is based on the idea of approximation and is
ideal for solving problems that are difficult or impossible to solve exactly.
In summary, Hard Computing is more precise and relies on mathematical
models, while Soft Computing is more flexible and relies on approximate
solutions.
Soft Computing could be a computing model evolved to resolve the non-linear
issues that involve unsure, imprecise and approximate solutions of a tangle.
These sorts of issues square measure thought of as real-life issues wherever the
human-like intelligence is needed to resolve it. Hard Computing is that the
ancient approach employed in computing that desires Associate in Nursing
accurately declared analytical model. the outcome of hard computing approach
is a warranted, settled, correct result and defines definite management actions
employing a mathematical model or algorithmic rule. It deals with binary and
crisp logic that need the precise input file consecutive. Hard computing isn’t
capable of finding the real world problem’s solution. Difference between Soft
Computing and Hard Computing:
S.NO Soft Computing Hard Computing

Soft Computing is liberal of inexactness, Hard computing needs a exactly state analytic
1.
uncertainty, partial truth and approximation. model.

2. Soft Computing relies on formal logic and Hard computing relies on binary logic and
S.NO Soft Computing Hard Computing

probabilistic reasoning. crisp system.

Soft computing has the features of Hard computing has the features of
3.
approximation and dispositionality. exactitude(precision) and categoricity.

4. Soft computing is stochastic in nature. Hard computing is deterministic in nature.

Soft computing works on ambiguous and noisy


5. Hard computing works on exact data.
data.

Soft computing can perform parallel Hard computing performs sequential


6.
computations. computations.

7. Soft computing produces approximate results. Hard computing produces precise results.

Hard computing requires programs to be


8. Soft computing will emerge its own programs.
written.

9. Soft computing incorporates randomness . Hard computing is settled.

10. Soft computing will use multivalued logic. Hard computing uses two-valued logic.

2.a) Briefly explain the concept of theory of regression and classification with
statistics approaches.
Regression in machine learning
Last Updated : 26 Feb, 2024


Regression, a statistical approach, dissects the relationship between dependent


and independent variables, enabling predictions through various regression
models.
The article delves into regression in machine learning, elucidating models,
terminologies, types, and practical applications.
What is Regression?
Regression is a statistical approach used to analyze the relationship between a
dependent variable (target variable) and one or more independent variables
(predictor variables). The objective is to determine the most suitable function
that characterizes the connection between these variables.
It seeks to find the best-fitting model, which can be utilized to make predictions
or draw conclusions.
Regression in Machine Learning
It is a supervised machine learning technique, used to predict the value of the
dependent variable for new, unseen data. It models the relationship between
the input features and the target variable, allowing for the estimation or
prediction of numerical values.
Regression analysis problem works with if output variable is a real or
continuous value, such as “salary” or “weight”. Many different models can be
used, the simplest is the linear regression. It tries to fit data with the best
hyper-plane which goes through the points.
Terminologies Related to the Regression Analysis in Machine Learning
Terminologies Related to Regression Analysis:
 Response Variable: The primary factor to predict or understand in
regression, also known as the dependent variable or target variable.
 Predictor Variable: Factors influencing the response variable, used to
predict its values; also called independent variables.
 Outliers: Observations with significantly low or high values compared to
others, potentially impacting results and best avoided.
 Multicollinearity: High correlation among independent variables, which can
complicate the ranking of influential variables.
 Underfitting and Overfitting: Overfitting occurs when an algorithm
performs well on training but poorly on testing, while underfitting indicates
poor performance on both datasets.

Regression Types
There are two main types of regression:
 Simple Regression
o Used to predict a continuous dependent variable based on a single
independent variable.
o Simple linear regression should be used when there is only a single
independent variable.
 Multiple Regression
o Used to predict a continuous dependent variable based on multiple
independent variables.
o Multiple linear regression should be used when there are multiple
independent variables.

 NonLinear Regression
o Relationship between the dependent variable and independent
variable(s) follows a nonlinear pattern.
o Provides flexibility in modeling a wide range of functional forms.

Regression Algorithms
There are many different types of regression algorithms, but some of the most
common include:
 Linear Regression
o Linear regression is one of the simplest and most widely used
statistical models. This assumes that there is a linear relationship
between the independent and dependent variables. This means that
the change in the dependent variable is proportional to the change in
the independent variables.
 Polynomial Regression
o Polynomial regression is used to model nonlinear relationships
between the dependent variable and the independent variables. It
adds polynomial terms to the linear regression model to capture
more complex relationships.
 Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression algorithm
that is based on the support vector machine (SVM) algorithm. SVM is
a type of algorithm that is used for classification tasks, but it can also
be used for regression tasks. SVR works by finding a hyperplane that
minimizes the sum of the squared residuals between the predicted
and actual values.
 Decision Tree Regression
o Decision tree regression is a type of regression algorithm that
builds a decision tree to predict the target value. A decision tree is a
tree-like structure that consists of nodes and branches. Each node
represents a decision, and each branch represents the outcome of
that decision. The goal of decision tree regression is to build a tree
that can accurately predict the target value for new data points.
 Random Forest Regression
o Random forest regression is an ensemble method that combines
multiple decision trees to predict the target value. Ensemble
methods are a type of machine learning algorithm that combines
multiple models to improve the performance of the overall model.
Random forest regression works by building a large number of
decision trees, each of which is trained on a different subset of the
training data. The final prediction is made by averaging the
predictions of all of the trees.

Regularized Linear Regression Techniques


 Ridge Regression
o Ridge regression is a type of linear regression that is used to prevent
overfitting. Overfitting occurs when the model learns the training
data too well and is unable to generalize to new data.
 Lasso regression
o Lasso regression is another type of linear regression that is used to
prevent overfitting. It does this by adding a penalty term to the loss
function that forces the model to use some weights and to set others
to zero.

Characteristics of Regression
Here are the characteristics of the regression:
 Continuous Target Variable: Regression deals with predicting continuous
target variables that represent numerical values. Examples include
predicting house prices, forecasting sales figures, or estimating patient
recovery times.
 Error Measurement: Regression models are evaluated based on their
ability to minimize the error between the predicted and actual values of the
target variable. Common error metrics include mean absolute error (MAE),
mean squared error (MSE), and root mean squared error (RMSE).
 Model Complexity: Regression models range from simple linear models to
more complex nonlinear models. The choice of model complexity depends on
the complexity of the relationship between the input features and the target
variable.
 Overfitting and Underfitting: Regression models are susceptible to
overfitting and underfitting.
 Interpretability: The interpretability of regression models varies depending
on the algorithm used. Simple linear models are highly interpretable, while
more complex models may be more difficult to interpret.

Examples
Which of the following is a regression task?
 Predicting age of a person
 Predicting nationality of a person
 Predicting whether stock price of a company will increase tomorrow
 Predicting whether a document is related to sighting of UFOs?
Solution : Predicting age of a person (because it is a real value, predicting
nationality is categorical, whether stock price will increase is discrete-yes/no
answer, predicting whether a document is related to UFO is again discrete- a
yes/no answer).
Regression Model Machine Learning
Let’s take an example of linear regression. We have a Housing data set and
we want to predict the price of the house. Following is the python code for it.

Output:

Here in this graph, we plot the test data. The red line indicates the best fit line for
predicting the price.
To make an individual prediction using the linear regression model:
print( str(round(regr.predict(5000))) )

Regression Evaluation Metrics


Here are some most popular evaluation metrics for regression:
 Mean Absolute Error (MAE): The average absolute difference between the predicted
and actual values of the target variable.
 Mean Squared Error (MSE): The average squared difference between the predicted
and actual values of the target variable.
 Root Mean Squared Error (RMSE): The square root of the mean squared error.
 Huber Loss: A hybrid loss function that transitions from MAE to MSE for larger errors,
providing balance between robustness and MSE’s sensitivity to outliers.
 Root Mean Square Logarithmic Error
 R2 – Score: Higher values indicate better fit, ranging from 0 to 1.

Applications of Regression
 Predicting prices: For example, a regression model could be used to predict the price
of a house based on its size, location, and other features.
 Forecasting trends: For example, a regression model could be used to forecast the
sales of a product based on historical sales data and economic indicators.
 Identifying risk factors: For example, a regression model could be used to identify
risk factors for heart disease based on patient data.
 Making decisions: For example, a regression model could be used to recommend
which investment to buy based on market data.

Advantages of Regression
 Easy to understand and interpret
 Robust to outliers
 Can handle both linear and nonlinear relationships.

Disadvantages of Regression
 Assumes linearity
 Sensitive to multicollinearity
 May not be suitable for highly complex relationships

Classification vs Regression in Machine Learning


Last Updated : 06 Nov, 2023



What is the Classification Algorithm?


The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program
learns from the given dataset or observations and then classifies new observation into a
number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or
dog, etc. Classes can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with
the corresponding output.

In classification algorithm, a discrete output function(y) is mapped to input variable(x).

1. y=f(x), where y = categorical output

The best example of an ML classification algorithm is Email Spam Detector.

The main goal of the Classification algorithm is to identify the category of a given dataset,
and these algorithms are mainly used to predict the output for the categorical data.

Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.
The algorithm which implements the classification on a dataset is known as a classifier.
There are two types of Classifications:

o Binary Classifier: If the classification problem has only two possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is
called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.

Learners in Classification Problems:


In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives
the test dataset. In Lazy learner case, classification is done on the basis of the most related
data stored in the training dataset. It takes less time in training but more time for
predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training dataset
before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time
in learning, and less time in prediction. Example: Decision Trees, Naïve Bayes, ANN.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:

o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

Evaluating a Classification model:


Once our model is completed, it is necessary to evaluate its performance; either it is a
Classification or Regression model. So for evaluating a Classification model, we have the
following ways:

1. Log Loss or Cross-Entropy Loss:

o It is used for evaluating the performance of a classifier, whose output is a probability value
between the 0 and 1.
o For a good binary Classification model, the value of log loss should be near to 0.
o The value of log loss increases if the predicted value deviates from the actual value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:

1. ?(ylog(p)+(1?y)log(1?p))

Where y= Actual output, p= predicted output.

2. Confusion Matrix:

o The confusion matrix provides us a matrix/table as output and describes the performance
of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form, which has a total number of
correct predictions and incorrect predictions. The matrix looks like as below table:

o
Actual Positive Actual Negative

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

3. AUC-ROC curve:

o ROC curve stands for Receiver Operating Characteristics Curve and AUC stands
for Area Under the Curve.
o It is a graph that shows the performance of the classification model at different thresholds.
o To visualize the performance of the multi-class classification model, we use the AUC-ROC
Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and
FPR(False Positive Rate) on X-axis.

Use cases of Classification Algorithms


Classification algorithms can be used in different places. Below are some popular use
cases of Classification Algorithms:

o Email Spam Detection


o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.

Binary Classification and Multiclass Classification


Comparison between Classification and Regression
Classification Regression

In this problem statement, the target variables In this problem statement, the target variables are
are discrete. continuous.

Problems like Spam Email Classification, Disease Problems like House Price Prediction, Rainfall
prediction like problems are solved using Prediction like problems are solved using regression
Classification Algorithms. Algorithms.

In this algorithm, we try to find the best possible


In this algorithm, we try to find the best-fit line
decision boundary which can separate the two
which can represent the overall trend in the data.
classes with the maximum possible separation.

Evaluation metrics like Precision, Recall, and F1- Evaluation metrics like Mean Squared Error, R2-
Score are used here to evaluate the performance Score, and MAPE are used here to evaluate the
of the classification algorithms. performance of the regression algorithms.

Here we face the problems like binary


Here we face the problems like Linear
Classification or Multi-Class
Regression models as well as non-linear models.
Classification problems.

Input Data are Independent variables and Input Data are Independent variables and
categorical dependent variable. continuous dependent variable.

The classification algorithm’s task mapping the


The regression algorithm’s task is mapping input
input value of x with the discrete output variable
value (x) with continuous output variable (y).
of y.

Output is Categorical labels. Output is Continuous numerical values.

Objective is to Predict categorical/class labels. Objective is to Predicting continuous numerical


Classification Regression

values.

Example use cases are Spam detection, image Example use cases are Stock price prediction, house
recognition, sentiment analysis price prediction, demand forecasting.

Examples of regression algorithms are:


Examples of classification algorithms are:
Linear Regression, Polynomial Regression, Ridge
Logistic Regression, Decision Trees, Random
Regression, Lasso Regression, Support Vector
Forest, Support Vector Machines (SVM), K-
Regression (SVR), Decision Trees for Regression,
Nearest Neighbors (K-NN), Naive Bayes, Neural
Random Forest Regression, K-Nearest Neighbors
Networks, K-Means Clustering, Multi-layer
(K-NN) Regression, Neural Networks for
Perceptron (MLP), etc.
Regression, etc.

UNIT 2
Single – Layer Networks
Single Layer Perceptron in TensorFlow
The perceptron is a single processing unit of any neural network. Frank Rosenblatt first proposed
in 1958 is a simple neuron which is used to classify its input into one or two categories. Perceptron is a linear
classifier, and is used in supervised learning. It helps to organize the given input data.

A perceptron is a neural network unit that does a precise computation to detect features in the input data.
Perceptron is mainly used to classify the data into two parts. Therefore, it is also known as Linear Binary
Classifier.

Perceptron uses the step function that returns +1 if the weighted sum of its input 0 and -1.

The activation function is used to map the input between the required value like (0, 1) or (-1, 1).
o Input value or One input layer: The input layer of the perceptron is made of
artificial input neurons and takes the initial data into the system for further
processing.
o Weights and Bias:
Weight: It represents the dimension or strength of the connection between units. If
the weight to node 1 to node 2 has a higher quantity, then neuron 1 has a more
considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an additional
parameter which task is to modify the output along with the weighted sum of the
input to the other neuron.
o Net sum: It calculates the total sum.

Activation Function: A neuron can be activated or not, is determined by an


activation function. The activation function calculates a weighted sum and
further adding bias with it to give

How does it work?


The perceptron works on these simple steps which are given below:

a. In the first step, all the inputs x are multiplied with their weights w.

b. In this step, add all the increased values and call them the Weighted sum.
c. In our last step, apply the weighted sum to a correct Activation Function.

o the result.

Single Layer Perceptron


The single-layer perceptron was the first neural network model, proposed in 1958 by
Frank Rosenbluth. It is one of the earliest models for learning. Our goal is to find a linear
decision function measured by the weight vector w and the bias parameter b.

To understand the perceptron layer, it is necessary to comprehend artificial neural


networks (ANNs).

The artificial neural network (ANN) is an information processing system, whose


mechanism is inspired by the functionality of biological neural circuits. An artificial neural
network consists of several processing units that are interconnected.

This is the first proposal when the neural model is built. The content of the neuron's local
memory contains a vector of weight.

he single vector perceptron is calculated by calculating the sum of the input vector
multiplied by the corresponding element of the vector, with each increasing the amount
of the corresponding component of the vector by weight. The value that is displayed in
the output is the input of an activation function.
Let us focus on the implementation of a single-layer perceptron for an image classification
problem using TensorFlow. The best example of drawing a single-layer perceptron is
through the representation of "logistic regression."

Now, We have to do the following necessary steps of training logistic regression-

o he weights are initialized with the random values at the origination of each training.
o For each element of the training set, the error is calculated with the difference
between the desired output and the actual output. The calculated error is used to
adjust the weight.
o The process is repeated until the fault made on the entire training set is less than
the specified limit until the maximum number of iterations has been reached.

2b) Differentiate between local minima and global minima?


Local Minima:

1. Definition: A local minimum is a point within a given range such that the function value at that point is
less than or equal to the function values of nearby points.
2. Characteristics:
o A function f(x)f(x)f(x) has a local minimum at x=cx = cx=c if f(c)≤f(x)f(c) \leq f(x)f(c)≤f(x) for
all xxx in some neighborhood around ccc.
o There can be multiple local minima in a function.
o The derivative of the function at a local minimum (if it exists) is zero, i.e., f′(c)=0f'(c) = 0f′(c)=0,
and the second derivative is positive, i.e., f′′(c)>0f''(c) > 0f′′(c)>0.
3. Example: For the function f(x)=x4−x2f(x) = x^4 - x^2f(x)=x4−x2, the points x=−12x = -\frac{1}{\
sqrt{2}}x=−21 and x=12x = \frac{1}{\sqrt{2}}x=21 are local minima.

Global Minima:
1. Definition: A global minimum is a point at which the function value is the lowest over the entire domain
of the function.
2. Characteristics:
o A function f(x)f(x)f(x) has a global minimum at x=cx = cx=c if f(c)≤f(x)f(c) \leq f(x)f(c)≤f(x) for
all xxx in the domain of fff.
o There is only one global minimum value, but there can be multiple points where this minimum
value occurs.
o A global minimum is also a local minimum, but the converse is not necessarily true.
3. Example: For the function f(x)=x4−x2f(x) = x^4 - x^2f(x)=x4−x2, the point x=0x = 0x=0 is the global
minimum since f(0)=−0.25f(0) = -0.25f(0)=−0.25 is the lowest value the function attains.

Key Differences:

 Scope: Local minima are determined within a neighborhood, while global minima are determined over
the entire domain.
 Uniqueness: There can be multiple local minima, but there is only one global minimum value.
 Comparison: All global minima are local minima, but not all local minima are global minima.

Visualization:

Consider the function f(x)=x4−x2f(x) = x^4 - x^2f(x)=x4−x2:

 Local Minima: f(x)f(x)f(x) has local minima at x=±12x = \pm \frac{1}{\sqrt{2}}x=±21.


 Global Minimum: f(x)f(x)f(x) has a global minimum at x=0x = 0x=0.

The graph of this function shows three valleys (local minima) with the deepest one being the global minimum.

What is the significance of momentum term in back propagation


learning?
Backpropagation, or backward propagation of errors, is an algorithm that is designed to test
for errors working back from output nodes to input nodes. It's an important mathematical
tool for improving the accuracy of predictions in data mining and machine learning.
Essentially, backpropagation is an algorithm used to quickly calculate derivatives in
a neural network, which are the changes in output because of tuning and adjustments.

There are two leading types of backpropagation networks:

 Static backpropagation. Static backpropagation is a network developed to map static


inputs for static outputs. Static networks can solve static classification problems, such
as optical character recognition (OCR).

 Recurrent backpropagation. The recurrent backpropagation network is used for fixed-


point learning. This means that during neural network training, the weights are
numerical values that determine how much nodes -- also referred to as neurons --
influence output values. They're adjusted so that the network can achieve stability by
reaching a fixed value.

 Artificial neural networks (ANNs) and deep neural networks use backpropagation as
a learning algorithm to compute a gradient descent, which is an optimization
algorithm that guides the user to the maximum or minimum of a function.
 In a machine learning context, the gradient descent helps the system minimize the
gap between desired outputs and achieved system outputs. The algorithm tunes the
system by adjusting the weight values for various inputs to narrow the difference
between outputs. This is also known as the error between the two.
 More specifically, a gradient descent algorithm uses a gradual process to provide
information on how a network's parameters need to be adjusted to reduce the
disparity between the desired and achieved outputs. An evaluation metric called a
cost function guides this process. The cost function is a mathematical function that
measures this error. The algorithm's goal is to determine how the parameters must
be adjusted to reduce the cost function and improve overall accuracy.

Advantages of backpropagation algorithms


 They don't have any parameters to tune except for the number of inputs.

 They're highly adaptable and efficient, and don't require prior knowledge about the network.

 They use a standard process that usually works well.

 They're user-friendly, fast and easy to program.

 Users don't need to learn any special functions.


Disadvantages of backpropagation algorithms
 They prefer a matrix-based approach over a mini-batch approach.

 Data mining is sensitive to noisy data and other irregularities. Unclean data can affect the
backpropagation algorithm when training a neural network used for data mining.

 Performance is highly dependent on input data.

 Training is time- and resource-intensive.


What is a backpropagation algorithm in machine learning?
Backpropagation is a type of supervised learning since it requires a known, desired output for each
input value to calculate the loss function gradient, which is how desired output values differ from
actual output. Supervised learning, the most common training approach in machine learning, uses a
training data set that has clearly labeled data and specified desired outputs.

Along with classifier algorithms such as naive Bayesian filters, K-nearest neighbors and support
vector machines, the backpropagation training algorithm has emerged as an important part of
machine learning applications that involve predictive analytics. While backpropagation techniques
are mainly applied to neural networks, they can also be applied to both classification and regression
problems in machine learning. In real-world applications, developers and machine learning experts
implement backpropagation algorithms for neural networks using programming languages such as
Python.

2.B)Describe Adaline and Madaline Network


Architecture of Adaline

The architecture of Adaline, short for Adaptive Linear Neuron, consists of a single−layer
neural network. It typically comprises an input layer, a weight adjustment unit, and an output
layer. The input layer receives the input data, which is then multiplied by adjustable weights.
The weighted inputs are summed, and the result is passed through an activation function,
often a linear activation function. The output of the activation function is compared to the
desired output, and the network adjusts its weights using a supervised learning algorithm,
such as the Widrow−Hoff learning rule or delta rule. This iterative process continues until the
network reaches a satisfactory level of accuracy in making predictions or performing
regression tasks. The simplicity and linearity of the architecture allow Adaline to solve linearly
separable problems effectively.

Learning Algorithm

The Adaline network aims to minimize output disparities by fine−tuning weights using the
renowned Widrow−Hoff rule (Delta rule or LMS algorithm). Gradient descent is employed to
adjust weights, approaching optimal values iteratively. This continuous refinement enables
the network to align predictions with expected outcomes, showcasing its great learning and
adaptive abilities. Adaline is a powerful tool in pattern recognition and machine learning,
dynamically adapting weights based on feedback received.

Applications of Adaline

Adaline networks have showcased their adaptability in various domains, encompassing


pattern recognition, signal processing, and adaptive filtering. Particularly noteworthy is their
effectiveness in noise cancellation, as Adaline's weight adjustment capability facilitates the
removal of undesired noise from signals, reducing the error between the original and noisy
signals. Additionally, Adaline networks have proven to be valuable assets in prediction tasks
and control systems, further broadening their utility across diverse application areas.

Architecture of Madaline

The Madaline architecture comprises multiple layers of Adaline units. Input data is initially
received by the input layer, which then transmits it through intermediate layers before
reaching the output layer. Within the intermediate layers, each Adaline unit calculates a
linear combination of inputs, followed by passing the unit's output through an activation
function. Ultimately, the output layer combines outputs from the intermediate layers to
generate the final output.

Learning Algorithm

The learning algorithm in Madaline networks follows a similar principle as Adaline but with
some modifications. The weights of each Adaline unit are updated using the Delta rule, and
the error is propagated backward through the layers using the backpropagation algorithm.
Backpropagation allows the network to adjust the weights in each layer based on the error
contribution of that layer, enabling the network to learn complex patterns.

Applications of Madaline

Madaline networks have showcased exceptional performance in tackling diverse classification


problems such as speech recognition, image recognition, and medical diagnosis. Their
proficiency in handling complex patterns and learning from extensive datasets makes them
an excellent choice for tasks that involve establishing intricate decision boundaries. By
excelling in these areas, Madaline networks play a significant role in driving advancements
across various fields, providing robust solutions for challenging classification scenarios.

3.b) What is learing in Neural Network ? Explain with neat diagram supervised and
unsupervised learnig in Neural Network.
Neural networks extract identifying features from data, lacking pre-programmed
understanding. Network components include neurons, connections, weights, biases,
propagation functions, and a learning rule. Neurons receive inputs, governed by
thresholds and activation functions. Connections involve weights and biases regulating
information transfer. Learning, adjusting weights and biases, occurs in three stages: input
computation, output generation, and iterative refinement enhancing the network’s
proficiency in diverse tasks.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of this
simulation.
3. The neural network then responds in a new way to the environment because of the
changes in its free parameters.

Importance of Neural Networks


The ability of neural networks to identify patterns, solve intricate puzzles, and adjust to
changing surroundings is essential. Their capacity to learn from data has far-reaching
effects, ranging from revolutionizing technology like natural language processing and self-
driving automobiles to automating decision-making processes and increasing efficiency in
numerous industries. The development of artificial intelligence is largely dependent on
neural networks, which also drive innovation and influence the direction of technology.
How does Neural Networks work?
Let’s understand with an example of how a neural network works:
Consider a neural network for email classification. The input layer takes features like
email content, sender information, and subject. These inputs, multiplied by adjusted
weights, pass through hidden layers. The network, through training, learns to recognize
patterns indicating whether an email is spam or not. The output layer, with a binary
activation function, predicts whether the email is spam (1) or not (0). As the network
iteratively refines its weights through backpropagation, it becomes adept at distinguishing
between spam and legitimate emails, showcasing the practicality of neural networks in
real-world applications like email filtering.
Working of a Neural Network
Neural networks are complex systems that mimic some features of the functioning of the
human brain. It is composed of an input layer, one or more hidden layers, and an output
layer made up of layers of artificial neurons that are coupled. The two stages of the basic
process are called backpropagation and forward propagation.

Forward Propagation
 Input Layer: Each feature in the input layer is represented by a node on the network,
which receives input data.
 Weights and Connections: The weight of each neuronal connection indicates how
strong the connection is. Throughout training, these weights are changed.
 Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by
weights, adding them up, and then passing them through an activation function. By
doing this, non-linearity is introduced, enabling the network to recognize intricate
patterns.
 Output: The final result is produced by repeating the process until the output layer is
reached.
Backpropagation
 Loss Calculation: The network’s output is evaluated against the real goal values, and
a loss function is used to compute the difference. For a regression problem, the Mean
Squared Error (MSE) is commonly used as the cost function.
Supervised Machine Learning
Last Updated : 27 Feb, 2024



A machine is said to be learning from past Experiences(data feed-in) with


respect to some class of tasks if its Performance in a given Task improves
with the Experience. For example, assume that a machine has to predict
whether a customer will buy a specific product let’s say “Antivirus” this year or
not. The machine will do it by looking at the previous knowledge/past
experiences i.e. the data of products that the customer had bought every year
and if he buys an Antivirus every year, then there is a high probability that the
customer is going to buy an antivirus this year as well. This is how machine
learning works at the basic conceptual level.

Supervised Machine Learning


Supervised learning is a machine learning technique that is widely used in
various fields such as finance, healthcare, marketing, and more. It is a form of
machine learning in which the algorithm is trained on labeled data to make
predictions or decisions based on the data inputs.In supervised learning, the
algorithm learns a mapping between the input and output data. This mapping is
learned from a labeled dataset, which consists of pairs of input and output data.
The algorithm tries to learn the relationship between the input and output data
so that it can make accurate predictions on new, unseen data.
Let us discuss what learning for a machine is as shown below media as follows:
Supervised learning is where the model is trained on a labelled dataset.
A labelled dataset is one that has both input and output parameters. In this type of
learning both training and validation, datasets are labelled as shown in the figures below.
The labeled dataset used in supervised learning consists of input features and
corresponding output labels. The input features are the attributes or characteristics of the
data that are used to make predictions, while the output labels are the desired outcomes
or targets that the algorithm tries to predict.

Both the above figures have labelled data set as follows:


 Figure A: It is a dataset of a shopping store that is useful in predicting whether a
customer will purchase a particular product under consideration or not based on his/
her gender, age, and salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means
that the customer won’t purchase it.
 Figure B: It is a Meteorological dataset that serves the purpose of predicting wind
speed based on different parameters.
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output: Wind Speed
Training the system: While training the model, data is usually split in the ratio of 80:20
i.e. 80% as training data and the rest as testing data. In training data, we feed input as
well as output for 80% of data. The model learns from training data only. We use different
machine learning algorithms(which we will discuss in detail in the next articles) to build
our model. Learning means that the model will build some logic of its own.
Once the model is ready then it is good to be tested. At the time of testing, the input is fed
from the remaining 20% of data that the model has never seen before, the model will
predict some value and we will compare it with the actual output and calculate the
accuracy.
Types of Supervised Learning Algorithm
Supervised learning is typically divided into two main categories: regression and
classification. In regression, the algorithm learns to predict a continuous output value,
such as the price of a house or the temperature of a city. In classification, the algorithm
learns to predict a categorical output variable or class label, such as whether a customer
is likely to purchase a product or not.
One of the primary advantages of supervised learning is that it allows for the creation of
complex models that can make accurate predictions on new data. However, supervised
learning requires large amounts of labeled training data to be effective. Additionally, the
quality and representativeness of the training data can have a significant impact on the
accuracy of the model.
Supervised learning can be further classified into two categories:

Regression
Regression is a supervised learning technique used to predict continuous numerical
values based on input features. It aims to establish a functional relationship between
independent variables and a dependent variable, such as predicting house prices based
on features like size, bedrooms, and location.
The goal is to minimize the difference between predicted and actual values using
algorithms like Linear Regression, Decision Trees, or Neural Networks, ensuring the
model captures underlying patterns in the data.
Classification
Classification is a type of supervised learning that categorizes input data into predefined
labels. It involves training a model on labeled examples to learn patterns between input
features and output classes. In classification, the target variable is a categorical value.
For example, classifying emails as spam or not.
The model’s goal is to generalize this learning to make accurate predictions on new,
unseen data. Algorithms like Decision Trees, Support Vector Machines, and Neural
Networks are commonly used for classification tasks.
NOTE: There are common Supervised Machine Learning Algorithm that can be used for
both regression and classification task.
Supervised Machine Learning Algorithm
Supervised learning can be further divided into several different types, each with its own
unique characteristics and applications. Here are some of the most common types of
supervised learning algorithms:
 Linear Regression: Linear regression is a type of regression algorithm that is used to
predict a continuous output value. It is one of the simplest and most widely used
algorithms in supervised learning. In linear regression, the algorithm tries to find a
linear relationship between the input features and the output value. The output value is
predicted based on the weighted sum of the input features.
 Logistic Regression : Logistic regression is a type of classification algorithm that is
used to predict a binary output variable. It is commonly used in machine learning
applications where the output variable is either true or false, such as in fraud detection
or spam filtering. In logistic regression, the algorithm tries to find a linear relationship
between the input features and the output variable. The output variable is then
transformed using a logistic function to produce a probability value between 0 and 1.
 Decision Trees: Decision tree is a tree-like structure that is used to model decisions
and their possible consequences. Each internal node in the tree represents a decision,
while each leaf node represents a possible outcome. Decision trees can be used to
model complex relationships between input features and output variables.
A decision tree is a type of algorithm that is used for both classification and regression
tasks.
o Decision Trees Regression: Decision Trees can be utilized for regression
tasks by predicting the value linked with a leaf node.
o Decision Trees Classification: Random Forest is a machine learning
algorithm that uses multiple decision trees to improve classification and
prevent overfitting.
 Random Forests: Random forests are made up of multiple decision trees that work
together to make predictions. Each tree in the forest is trained on a different subset of
the input features and data. The final prediction is made by aggregating the predictions
of all the trees in the forest.
Random forests are an ensemble learning technique that is used for both classification
and regression tasks.
o Random Forest Regression: It combines multiple decision trees to reduce
overfitting and improve prediction accuracy.
o Random Forest Classifier: Combines several decision trees to improve the
accuracy of classification while minimizing overfitting.
 Support Vector Machine(SVM) : The SVM algorithm creates a hyperplane to
segregate n-dimensional space into classes and identify the correct category of new
data points. The extreme cases that help create the hyperplane are called support
vectors, hence the name Support Vector Machine.
A Support Vector Machine is a type of algorithm that is used for both classification and
regression tasks
o Support Vector Regression: It is a extension of Support Vector Machines
(SVM) used for predicting continuous values.
o Support Vector Classifier: It aims to find the best hyperplane that
maximizes the margin between data points of different classes.
 K-Nearest Neighbors (KNN): KNN works by finding k training examples closest to a
given input and then predicts the class or value based on the majority class or average
value of these neighbors. The performance of KNN can be influenced by the choice of
k and the distance metric used to measure proximity. However, it is intuitive but can be
sensitive to noisy data and requires careful selection of k for optimal results.
A K-Nearest Neighbors (KNN) is a type of algorithm that is used for both classification
and regression tasks.
o K-Nearest Neighbors Regression: It predicts continuous values by
averaging the outputs of the k closest neighbors.
o K-Nearest Neighbors Classification: Data points are classified based on
the majority class of their k closest neighbors.
Supervised and Unsupervised learning
Last Updated : 13 Mar, 2024



Machine learning is a field of computer science that gives computers the ability
to learn without being explicitly programmed. Supervised learning and
unsupervised learning are two main types of machine learning.
In supervised learning, the machine is trained on a set of labeled data, which
means that the input data is paired with the desired output. The machine then
learns to predict the output for new input data. Supervised learning is often
used for tasks such as classification, regression, and object detection.
In unsupervised learning, the machine is trained on a set of unlabeled data,
which means that the input data is not paired with the desired output. The
machine then learns to find patterns and relationships in the data.
Unsupervised learning is often used for tasks such as clustering, dimensionality
reduction, and anomaly detection.
What is Supervised learning?
Supervised learning is a type of machine learning algorithm that learns from
labeled data. Labeled data is data that has been tagged with a correct answer
or classification.
Supervised learning, as the name indicates, has the presence of a supervisor as
a teacher. Supervised learning is when we teach or train the machine using
data that is well-labelled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the training
data(set of training examples) and produces a correct outcome from labeled
data.
For example, a labeled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant” , “Camel”or “Cow.”

Key Points:
 Supervised learning involves training a machine from labeled data.
 Labeled data consists of examples with the correct answer or classification.
 The machine learns the relationship between inputs (fruit images) and outputs (fruit
labels).
 The trained machine can then make predictions on new, unlabeled data.
3.Learning with Reinforcement Learning
Through interaction with the environment and feedback in the form of rewards or
penalties, the network gains knowledge. Finding a policy or strategy that optimizes
cumulative rewards over time is the goal for the network. This kind is frequently utilized in
gaming and decision-making applications.

4.a)Explain the adaptive linear neuron architecture with neat sketch.


daline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was
developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows −

 It uses bipolar activation function.


 Adaline neuron can be trained using Delta rule or Least Mean Square(LMS) rule or widrow-hoff
rule
 The net input is compared with the target value to compute the error signal.
 on the basis of adaptive training algoritham weights are adjusted
The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of
which the actual output is compared with the desired/target output. After comparison on the basis of
training algorithm, the weights and bias will be updated.

Adaptive Linear Neuron Learning algorithm


Step 0: initialize the weights and the bias are set to some random values but not to zero, also initialize
the learning rate α.
Step 1 − perform steps 2-7 when stopping condition is false.
Step 2 − perform steps 3-5 for each bipolar training pair s:t.
Step 3 − Activate each input unit as follows −
𝑥𝑖=𝑠(𝑖=1𝑡𝑜𝑛)

𝑦𝑖𝑛=∑𝑖𝑛𝑥𝑖.𝑤𝑖+b
Step 4 − Obtain the net input with the following relation −

Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 5 Until least mean square is obtained (t - yin), Adjust the weight and bias as follows −

wi(new) = wi(old) + α(t - yin)xi


b(new) = b(old) + α(t - yin)

Now calculate the error using => E = (t - yin)2


Step 7 − Test for the stopping condition, if error generated is less then or equal to specified tolerance
then stop.

UNIT 3
Multilayer Perceptron

Multi-Layer Perceptron Learning in Tensorflow


Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a
neural network that has multiple layers. To create a neural network we combine neurons
together so that the outputs of some neurons are inputs of other neurons.
A gentle introduction to neural networks and TensorFlow can be found here:
 Neural Networks
 Introduction to TensorFlow
A multi-layer perceptron has one input layer and for each input, there is one neuron(or
node), it has one output layer with a single node for each output and it can have any
number of hidden layers and each hidden layer can have any number of nodes. A
schematic diagram of a Multi-Layer Perceptron (MLP) is depicted below.
Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a
neural network that has multiple layers. To create a neural network we combine neurons
together so that the outputs of some neurons are inputs of other neurons.
A gentle introduction to neural networks and TensorFlow can be found here:
 Neural Networks
 Introduction to TensorFlow
A multi-layer perceptron has one input layer and for each input, there is one neuron(or
node), it has one output layer with a single node for each output and it can have any
number of hidden layers and each hidden layer can have any number of nodes. A

schematic diagram of a Multi-Layer


Perceptron (MLP) is depicted below.

Stepwise Implementation
Step 1: Import the necessary libraries.
Step 2: Download the dataset.

Step 3: Now we will convert the pixels into floating-point values.


Step 4: Understand the structure of the dataset
Step 5: Visualize the data.
Step 6: Form the Input, hidden, and output layers.
Step 7: Compile the model.
Step 8: Fit the model.
Step 9: Find Accuracy of the model.
5.a) What are similarities and differences between RBF and MLPs network?

 MLP is the classical type of neural network. it consists of one or several hidden layers (depending
on the abstraction required in deep learning). It performs a dot product between the input and
weights and applies monotonic activation functions such as sigmoid or ReLU. In MLP, the fine-
tuning of weights (the training) is usually done through backpropagation for all layers.
 RBF is a neural network, consisting of just one hidden layer. For each of the neurons in the input
layer, the hidden layer first computes the distance between inputs and weights, which can be
viewed as centers, and then an activation function which is usually a Gaussian function is
implemented to the calculated radial distance. this is why it is called "Radial Basis Function
Networks". Since the Gaussian function has it's center at zero, when the input neurons are equal to
the weights (centers) so the distance is zero, RBF neurons have maximum activation. The training
of an RBF neural network can either be done through backpropagation or RBF Network hybrid
learning. Also, typically the RBF network has a faster learning speed compared to MLP and are less
sensitive to the order of presentation of training data.

5.b) Explain the following:


1.radial basis function regularization theory
2. ill posed problems
1. Radial basis function regularization theory:

Radial Basis Functions are a special class of feed-forward neural networks consisting of three layers: an input
layer, a hidden layer, and the output layer. This is fundamentally different from most neural network
architectures, which are composed of many layers and bring about nonlinearity by recurrently applying non-
linear activation functions. The input layer receives input data and passes it into the hidden layer, where the
computation occurs. The hidden layer of Radial Basis Functions Neural Network is the most powerful and very
different from most Neural networks. The output layer is designated for prediction tasks like classification or
regression.

nput Layer

The input layer consists of one neuron for every predictor variable. The input neurons pass the value
to each neuron in the hidden layer. N-1 neurons are used for categorical values, where N denotes the
number of categories. The range of values is standardized by subtracting the median and dividing by
the interquartile range.
Hidden Layer

The hidden layer contains a variable number of neurons (the ideal number determined by the training
process). Each neuron comprises a radial basis function centered on a point. The number of
dimensions coincides with the number of predictor variables. The radius or spread of the RBF function
may vary for each dimension.

When an x vector of input values is fed from the input layer, a hidden neuron calculates the Euclidean
distance between the test case and the neuron's center point. It then applies the kernel function using
the spread values. The resulting value gets fed into the summation layer.

Output Layer or Summation Layer

The value obtained from the hidden layer is multiplied by a weight related to the neuron and passed to
the summation. Here the weighted values are added up, and the sum is presented as the network's
output. Classification problems have one output per target category, the value being the probability
that the case evaluated has that category.

Advantages of RBFN

 Easy Design

 Good Generalization

 Faster Training

 Only one hidden layer

 A straightforward interpretation of the meaning or function of each node in the hidden layer

Unit 4
Fuzzy Logic Systems

Fuzzy Logic | Introduction




The term fuzzy refers to things that are not clear or are vague. In the real world
many times we encounter a situation when we can’t determine whether the
state is true or false, their fuzzy logic provides very valuable flexibility for
reasoning. In this way, we can consider the inaccuracies and uncertainties of
any situation.
Fuzzy Logic is a form of many-valued logic in which the truth values of variables
may be any real number between 0 and 1, instead of just the traditional values
of true or false. It is used to deal with imprecise or uncertain information and is
a mathematical method for representing vagueness and uncertainty in
decision-making.
Fuzzy Logic is based on the idea that in many cases, the concept of true or false
is too restrictive, and that there are many shades of gray in between. It allows
for partial truths, where a statement can be partially true or false, rather than
fully true or false.
Fuzzy Logic is used in a wide range of applications, such as control systems,
image processing, natural language processing, medical diagnosis, and artificial
intelligence.
The fundamental concept of Fuzzy Logic is the membership function, which
defines the degree of membership of an input value to a certain set or
category. The membership function is a mapping from an input value to a
membership degree between 0 and 1, where 0 represents non-membership and
1 represents full membership.
Fuzzy Logic is implemented using Fuzzy Rules, which are if-then statements
that express the relationship between input variables and output variables in a
fuzzy way. The output of a Fuzzy Logic system is a fuzzy set, which is a set of
membership degrees for each possible output value.
In summary, Fuzzy Logic is a mathematical method for representing vagueness
and uncertainty in decision-making, it allows for partial truths, and it is used in
a wide range of applications. It is based on the concept of membership function
and the implementation is done using Fuzzy rules.
In the boolean system truth value, 1.0 represents the absolute truth value and
0.0 represents the absolute false value. But in the fuzzy system, there is no
logic for the absolute truth and absolute false value. But in fuzzy logic, there is
an intermediate value too present which is partially true and partially false.
ARCHITECTURE
Its Architecture contains four parts :
 RULE BASE: It contains the set of rules and the IF-THEN conditions provided
by the experts to govern the decision-making system, on the basis of
linguistic information. Recent developments in fuzzy theory offer several
effective methods for the design and tuning of fuzzy controllers. Most of
these developments reduce the number of fuzzy rules.
 FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into fuzzy sets.
Crisp inputs are basically the exact inputs measured by sensors and passed
into the control system for processing, such as temperature, pressure, rpm’s,
etc.
 INFERENCE ENGINE: It determines the matching degree of the current fuzzy
input with respect to each rule and decides which rules are to be fired
according to the input field. Next, the fired rules are combined to form the
control actions.
 DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by the
inference engine into a crisp value. There are several defuzzification methods
available and the best-suited one is used with a specific expert system to
reduce the error.
Membership function
Definition: A graph that defines how each point in the input space is mapped
to membership value between 0 and 1. Input space is often referred to as the
universe of discourse or universal set (u), which contains all the possible
elements of concern in each particular application.
There are largely three types of fuzzifiers:
 Singleton fuzzifier
 Gaussian fuzzifier
 Trapezoidal or triangular fuzzifier
What is Fuzzy Control?
 It is a technique to embody human-like thinkings into a control system.
 It may not be designed to give accurate reasoning but it is designed to give
acceptable reasoning.
 It can emulate human deductive thinking, that is, the process people use to
infer conclusions from what they know.
 Any uncertainties can be easily dealt with the help of fuzzy logic.
Advantages of Fuzzy Logic System
 This system can work with any type of inputs whether it is imprecise,
distorted or noisy input information.
 The construction of Fuzzy Logic Systems is easy and understandable.
 Fuzzy logic comes with mathematical concepts of set theory and the
reasoning of that is quite simple.
 It provides a very efficient solution to complex problems in all fields of life as
it resembles human reasoning and decision-making.
 The algorithms can be described with little data, so little memory is required.
Disadvantages of Fuzzy Logic Systems
 Many researchers proposed different ways to solve a given problem through
fuzzy logic which leads to ambiguity. There is no systematic approach to
solve a given problem through fuzzy logic.
 Proof of its characteristics is difficult or impossible in most cases because
every time we do not get a mathematical description of our approach.
 As fuzzy logic works on precise as well as imprecise data so most of the time
accuracy is compromised.
Application
 It is used in the aerospace field for altitude control of spacecraft and
satellites.
 It has been used in the automotive system for speed control, traffic control.
 It is used for decision-making support systems and personal evaluation in the
large company business.
 It has application in the chemical industry for controlling the pH, drying,
chemical distillation process.
 Fuzzy logic is used in Natural language processing and various intensive
applications in Artificial Intelligence.
 Fuzzy logic is extensively used in modern control systems such as expert
systems.
 Fuzzy Logic is used with Neural Networks as it mimics how a person would
make decisions, only much faster. It is done by Aggregation of data and
changing it into more meaningful data by forming partial truths as Fuzzy
sets.

Difference between Neural Network And Fuzzy Logic




Neural Network:
Neural network is an information processing system that is inspired by the way
biological nervous systems such as brain process information. A neural network
is composed of a large number of interconnected processing elements known
as neurons which are used to solve problems. A neural network is an attempt to
make a computer model of the human brain and neural networks are parallel
computing devices. The simple diagram of the neural network is as shown
below:
Fuzzy Logic:
The term fuzzy represents the things which are not clear. In the real world
many times we find a situation where we can’t determine whether the state is
true or false, their fuzzy logic provides very valuable flexibility for reasoning. In
this way, we can consider the inaccuracies and uncertainties of any situation.
The simple diagram of fuzzy logic is as shown below:

Difference between Neural Network And Fuzzy Logic


Neural Network Fuzzy Logic

This system can not easily modified. This system can easily modified.

It trains itself by learning from data set Everything must be defined explicitly.
It is complex than fuzzy logic. It is simpler than neural network.

It helps to perform predictions. It helps to perform pattern recognition.

Difficult to extract knowledge. Knowledge can easily extracted.

It based on learning. It doesn’t base on learning.

Fuzzifiction:

Defuzzification:
Unit 5
Support Vector Machines

Support Vector Machine (SVM) Algorithm


Last Updated : 04 Jul, 2024



Support Vector Machine (SVM) is a powerful machine learning algorithm used


for linear or nonlinear classification, regression, and even outlier detection
tasks. SVMs can be used for a variety of tasks, such as text classification, image
classification, spam detection, handwriting identification, gene expression
analysis, face detection, and anomaly detection. SVMs are adaptable and
efficient in a variety of applications because they can manage high-dimensional
data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating
hyperplane between the different classes available in the target feature.
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm used
for both classification and regression. Though we say regression problems as
well it’s best suited for classification. The main objective of the SVM algorithm is
to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space. The hyperplane tries that
the margin between the closest points of different classes should be as
maximum as possible. The dimension of the hyperplane depends upon the
number of features. If the number of input features is two, then the hyperplane
is just a line. If the number of input features is three, then the hyperplane
becomes a 2-D plane. It becomes difficult to imagine when the number of
features exceeds three.
Let’s consider two independent variables x1, x2, and one dependent variable
which is either a blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input features x 1,
x2) that segregate our data points or do a classification between red and blue
circles. So how do we choose the best line or in general the best hyperplane
that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the
largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point
on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin. So from the above figure,
we choose L2. Let’s consider a scenario like shown below

Selecting hyperplane for data with outlier

Here we have one blue ball in the boundary of the red ball. So how does SVM
classify the data? It’s simple! The blue ball in the boundary of red ones is an
outlier of blue balls. The SVM algorithm has the characteristics to ignore the
outlier and finds the best hyperplane that maximizes the margin. SVM is robust
to outliers.

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum margin as
done with previous data sets along with that it adds a penalty each time a point
crosses the margin. So the margins in these types of cases are called soft

minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly used penalty. If no


margins. When there is a soft margin to the data set, the SVM tries to

violations no hinge loss.If violations hinge loss proportional to the distance of


violation.
Till now, we were talking about linearly separable data(the group of blue balls
and red balls are separable by a straight line/linear line). What to do if data are
not linearly separable?

Original 1D dataset for classification


Say, our data is shown in the figure above. SVM solves this by creating a new
variable using a kernel. We call a point xi on the line and we create a new
variable yi as a function of distance from origin o.so if we plot this we get
something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance from the
origin. A non-linear function that creates a new variable is referred to as a
kernel.
Support Vector Machine Terminology
1. Hyperplane: Hyperplane is the decision boundary that is used to separate
the data points of different classes in a feature space. In the case of linear
classifications, it will be a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the
hyperplane, which makes a critical role in deciding the hyperplane and
margin.
3. Margin: Margin is the distance between the support vector and hyperplane.
The main objective of the support vector machine algorithm is to maximize
the margin. The wider margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map
the original input data points into high-dimensional feature spaces, so, that
the hyperplane can be easily found out even if the data points are not
linearly separable in the original input space. Some of the common kernel
functions are linear, polynomial, radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin
hyperplane is a hyperplane that properly separates the data points of
different categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers,
SVM permits a soft margin technique. Each data point has a slack variable
introduced by the soft-margin SVM formulation, which softens the strict
margin requirement and permits certain misclassifications or violations. It
discovers a compromise between increasing the margin and reducing
violations.
7. C: Margin maximisation and misclassification fines are balanced by the
regularisation parameter C in SVM. The penalty for going over the margin or
misclassifying data items is decided by it. A stricter penalty is imposed with a
greater value of C, which results in a smaller margin and perhaps fewer
misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes
incorrect classifications or margin violations. The objective function in SVM is
frequently formed by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used
to solve SVM. The dual formulation enables the use of kernel tricks and more
effective computing.
Mathematical intuition of Support Vector Machine
Consider a binary classification problem with two classes, labeled as +1 and -1.
We have a training dataset consisting of input feature vectors X and their
corresponding class labels Y.

𝑤𝑇𝑥+𝑏=0wTx+b=0
The equation for the linear hyperplane can be written as:

The vector W represents the normal vector to the hyperplane. i.e the direction
perpendicular to the hyperplane. The parameter b in the equation represents
the offset or distance of the hyperplane from the origin along the normal
vector w.
The distance between a data point x_i and the decision boundary can be

𝑑𝑖=𝑤𝑇𝑥𝑖+𝑏∣∣𝑤∣∣di=∣∣w∣∣wTxi+b
calculated as:

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean
norm of the normal vector W

𝑦^={1: 𝑤𝑇𝑥+𝑏≥00: 𝑤𝑇𝑥+𝑏 <0y^={10: wTx+b≥0: wTx+b <0


For Linear SVM classifier :

Optimization:

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑤,𝑏12𝑤𝑇𝑤=𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑊,𝑏12∥𝑤∥2subject to𝑦𝑖(𝑤𝑇𝑥𝑖+𝑏)≥1𝑓𝑜𝑟𝑖=1,2,3,
 For Hard margin linear SVM classifier:

⋯,𝑚w,bminimize21wTw=W,bminimize21∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,m
The target variable or label for the ith training instance is denoted by the symbol
ti in this statement. And ti=-1 for negative occurrences (when yi= 0) and
ti=1positive instances (when yi = 1) respectively. Because we require the
decision boundary that satisfy the constraint: (𝑤𝑇𝑥𝑖+𝑏)≥1ti(wTxi+b)≥1

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑤,𝑏12𝑤𝑇𝑤+𝐶∑𝑖=1𝑚𝜁𝑖subject to 𝑦𝑖(𝑤𝑇𝑥𝑖+𝑏)≥ 1−𝜁𝑖𝑎𝑛𝑑𝜁𝑖≥0𝑓𝑜𝑟𝑖=1,2,3,


 For Soft margin linear SVM classifier:

⋯,𝑚w,bminimize 21wTw+C∑i=1mζisubject to yi(wTxi+b)≥ 1−ζiandζi≥0fori=1,2,3,⋯,m


 Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used
to solve SVM. The optimal Lagrange multipliers α(i) that maximize the

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝛼:12∑𝑖→𝑚∑𝑗→𝑚𝛼𝑖𝛼𝑗𝑡𝑖(𝑥𝑖,𝑥𝑗)−∑𝑖→𝑚𝛼𝑖αmaximize:21i→m∑j→m∑αiαjtitjK(xi,xj)
following dual objective function

−i→m∑αi
where,
 αi is the Lagrange multiplier associated with the ith training sample.
 K(xi, xj) is the kernel function that computes the similarity between two
samples xi and xj. It allows SVM to handle nonlinear classification problems
by implicitly mapping the samples into a higher-dimensional feature space.
 The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has been
solved and the optimal Lagrange multipliers have been discovered. The training
samples that have i > 0 are the support vectors, while the decision boundary is

𝑤=∑𝑖→𝑚𝛼𝑖(𝑥𝑖,𝑥)+𝑏𝑡𝑖(𝑤𝑇𝑥𝑖−𝑏)=1⟺𝑏=𝑤𝑇𝑥𝑖−𝑡𝑖w=i→m∑αitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−ti
supplied by:

Types of Support Vector Machine


Based on the nature of the decision boundary, Support Vector Machines (SVM)
can be divided into two main parts:
 Linear SVM: Linear SVMs use a linear decision boundary to separate the
data points of different classes. When the data can be precisely linearly
separated, linear SVMs are very suitable. This means that a single straight
line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the
data points into their respective classes. A hyperplane that maximizes the
margin between the classes is the decision boundary.
 Non-Linear SVM: Non-Linear SVM can be used to classify data when it
cannot be separated into two classes by a straight line (in the case of 2D). By
using kernel functions, nonlinear SVMs can handle nonlinearly separable
data. The original input data is transformed by these kernel functions into a
higher-dimensional feature space, where the data points can be linearly
separated. A linear SVM is used to locate a nonlinear decision boundary in
this modified space.
Popular kernel functions in SVM
The SVM kernel is a function that takes low-dimensional input space and
transforms it into higher-dimensional space, ie it converts nonseparable
problems to separable problems. It is mostly useful in non-linear separation
problems. Simply put the kernel, does some extremely complex data
transformations and then finds out the process to separate the data based on

Linear : 𝐾(𝑤,𝑏)=𝑤𝑇𝑥+𝑏Polynomial : 𝐾(𝑤,𝑥)=(𝛾𝑤𝑇𝑥+𝑏)𝑁Gaussian RBF: 𝐾(𝑤,𝑥)=exp⁡(−𝛾∣∣𝑥𝑖−𝑥𝑗∣∣𝑛S


the labels or outputs defined.

igmoid :𝐾(𝑥𝑖,𝑥𝑗)=tanh⁡(𝛼𝑥𝑖𝑇𝑥𝑗+𝑏)Linear : K(w,b)Polynomial : K(w,x)Gaussian RBF: K(w,x)


Sigmoid :K(xi,xj)=wTx+b=(γwTx+b)N=exp(−γ∣∣xi−xj∣∣n=tanh(αxiTxj+b)
Advantages of SVM
 Effective in high-dimensional cases.
 Its memory is efficient as it uses a subset of training points in the decision
function called support vectors.
 Different kernel functions can be specified for the decision functions and its
possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about patients
diagnosed with cancer enables doctors to differentiate malignant cases and
benign ones are given independent attributes.
Steps
 Load the breast cancer dataset from sklearn.datasets
 Separate input features and target variables.
 Build and train the SVM classifiers using RBF kernel.
 Plot the scatter plot of the input features.
 Plot the decision boundary.
 Plot the decision boundary
Python
# Load the important packages

You might also like