Softcomputing Course Material
Softcomputing Course Material
The following are some of the reasons why soft computing is needed:
1. Complexity of real-world problems: Many real-world problems are complex and involve
uncertainty, vagueness, and imprecision. Traditional computing methods are not well-
suited to handle these complexities.
2. Incomplete information: In many cases, there is a lack of complete and accurate
information available to solve a problem. Soft computing techniques can provide
approximate solutions even in the absence of complete information.
3. Noise and uncertainty: Real-world data is often noisy and uncertain, and classical
methods can produce incorrect results when dealing with such data. Soft computing
techniques are designed to handle uncertainty and imprecision.
4. Non-linear problems: Many real-world problems are non-linear, and classical methods
are not well-suited to solve them. Soft computing techniques such as fuzzy logic and
neural networks can handle non-linear problems effectively.
5. Human-like reasoning: Soft computing techniques are designed to mimic human-like
reasoning, which is often more effective in solving complex problems.
Overall, soft computing provides an effective and efficient way to solve complex real-
world problems that are difficult or impossible to solve using classical computing methods.
In this article, we will cover the need for soft computing and why it is important. So, to
understand the need for soft computing let us first understand the concept of computing.
Concept of computing :
According to the concept of computing, the input is called an antecedent and the output is
called the consequent. For example, Adding information in DataBase, Compute the sum
of two numbers using a C program, etc.
There are two types of computing as following :
1. Hard computing
2. soft computing
Characteristics of hard computing :
In soft computing, you can consider an example where you can see the evolution
changes for a specific species like the human nervous system and behavior of an
Ant’s, etc.
Many analytical models are valid for ideal cases. Real-world problems exist in a non-
ideal environment.
Soft computing provides insights into real-world problems and is just not limited to
theory.
Hard computing is best suited for solving mathematical problems which give some
precise answers.
Some important fields like Biology, Medicine and humanities, etc are still intractable
using Convention mathematical and Analytical models.
It is possible to map the human mind with the help of Soft computing but it is not
possible with Convention mathematical and Analytical models.
Examples –
Consider a problem where a string w1 is “abc” and string w2 is “abd”.
Problem-1 :
Tell that whether w1 is the same as w2 or not?
Solution –
The answer is simply No, it means there is an algorithm by which we can analyze it.
Problem-2 :
Tell how much these two strings are similar?
Solution –
The answer from conventional computing is either YES or NO. But these maybe 80%
similar, this can be answered only by Soft Computing.
1. In the field of Big Data, soft computing working for data analyzing models, data
behavior models, data decision, etc.
2. In case of Recommender system, soft computing plays an important role for analyzing
the problem on the based of algorithm and works for precise results.
3. In Behavior and decision science, soft computing used in this for analyzing the
behavior, and model of soft computing works accordingly.
4. In the fields of Mechanical Engineering, soft computing is a role model for computing
problems such that how a machine will works and how it will make the decision for a
specific problem or input given.
5. In this field of Computer Engineering, you can say it is core part of soft computing and
computing working on advanced level like Machine learning, Artificial intelligence, etc.
1. Robustness: Soft computing techniques are robust and can handle uncertainty,
imprecision, and noise in data, making them ideal for solving real-world problems.
2. Approximate solutions: Soft computing techniques can provide approximate solutions
to complex problems that are difficult or impossible to solve exactly.
3. Non-linear problems: Soft computing techniques such as fuzzy logic and neural
networks can handle non-linear problems effectively.
4. Human-like reasoning: Soft computing techniques are designed to mimic human-like
reasoning, which is often more effective in solving complex problems.
5. Real-time applications: Soft computing techniques can provide real-time solutions to
complex problems, making them ideal for use in real-time applications.
The main difference between Soft Computing and Hard Computing is their
approach to solving complex problems:
1. Hard Computing: Hard computing uses traditional mathematical methods
to solve problems, such as algorithms and mathematical models. It is based
on deterministic and precise calculations and is ideal for solving problems
that have well-defined mathematical solutions.
2. Soft Computing: Soft computing, on the other hand, uses techniques such
as fuzzy logic, neural networks, genetic algorithms, and other heuristic
methods to solve problems. It is based on the idea of approximation and is
ideal for solving problems that are difficult or impossible to solve exactly.
In summary, Hard Computing is more precise and relies on mathematical
models, while Soft Computing is more flexible and relies on approximate
solutions.
Soft Computing could be a computing model evolved to resolve the non-linear
issues that involve unsure, imprecise and approximate solutions of a tangle.
These sorts of issues square measure thought of as real-life issues wherever the
human-like intelligence is needed to resolve it. Hard Computing is that the
ancient approach employed in computing that desires Associate in Nursing
accurately declared analytical model. the outcome of hard computing approach
is a warranted, settled, correct result and defines definite management actions
employing a mathematical model or algorithmic rule. It deals with binary and
crisp logic that need the precise input file consecutive. Hard computing isn’t
capable of finding the real world problem’s solution. Difference between Soft
Computing and Hard Computing:
S.NO Soft Computing Hard Computing
Soft Computing is liberal of inexactness, Hard computing needs a exactly state analytic
1.
uncertainty, partial truth and approximation. model.
2. Soft Computing relies on formal logic and Hard computing relies on binary logic and
S.NO Soft Computing Hard Computing
Soft computing has the features of Hard computing has the features of
3.
approximation and dispositionality. exactitude(precision) and categoricity.
7. Soft computing produces approximate results. Hard computing produces precise results.
10. Soft computing will use multivalued logic. Hard computing uses two-valued logic.
2.a) Briefly explain the concept of theory of regression and classification with
statistics approaches.
Regression in machine learning
Last Updated : 26 Feb, 2024
Regression Types
There are two main types of regression:
Simple Regression
o Used to predict a continuous dependent variable based on a single
independent variable.
o Simple linear regression should be used when there is only a single
independent variable.
Multiple Regression
o Used to predict a continuous dependent variable based on multiple
independent variables.
o Multiple linear regression should be used when there are multiple
independent variables.
NonLinear Regression
o Relationship between the dependent variable and independent
variable(s) follows a nonlinear pattern.
o Provides flexibility in modeling a wide range of functional forms.
Regression Algorithms
There are many different types of regression algorithms, but some of the most
common include:
Linear Regression
o Linear regression is one of the simplest and most widely used
statistical models. This assumes that there is a linear relationship
between the independent and dependent variables. This means that
the change in the dependent variable is proportional to the change in
the independent variables.
Polynomial Regression
o Polynomial regression is used to model nonlinear relationships
between the dependent variable and the independent variables. It
adds polynomial terms to the linear regression model to capture
more complex relationships.
Support Vector Regression (SVR)
o Support vector regression (SVR) is a type of regression algorithm
that is based on the support vector machine (SVM) algorithm. SVM is
a type of algorithm that is used for classification tasks, but it can also
be used for regression tasks. SVR works by finding a hyperplane that
minimizes the sum of the squared residuals between the predicted
and actual values.
Decision Tree Regression
o Decision tree regression is a type of regression algorithm that
builds a decision tree to predict the target value. A decision tree is a
tree-like structure that consists of nodes and branches. Each node
represents a decision, and each branch represents the outcome of
that decision. The goal of decision tree regression is to build a tree
that can accurately predict the target value for new data points.
Random Forest Regression
o Random forest regression is an ensemble method that combines
multiple decision trees to predict the target value. Ensemble
methods are a type of machine learning algorithm that combines
multiple models to improve the performance of the overall model.
Random forest regression works by building a large number of
decision trees, each of which is trained on a different subset of the
training data. The final prediction is made by averaging the
predictions of all of the trees.
Characteristics of Regression
Here are the characteristics of the regression:
Continuous Target Variable: Regression deals with predicting continuous
target variables that represent numerical values. Examples include
predicting house prices, forecasting sales figures, or estimating patient
recovery times.
Error Measurement: Regression models are evaluated based on their
ability to minimize the error between the predicted and actual values of the
target variable. Common error metrics include mean absolute error (MAE),
mean squared error (MSE), and root mean squared error (RMSE).
Model Complexity: Regression models range from simple linear models to
more complex nonlinear models. The choice of model complexity depends on
the complexity of the relationship between the input features and the target
variable.
Overfitting and Underfitting: Regression models are susceptible to
overfitting and underfitting.
Interpretability: The interpretability of regression models varies depending
on the algorithm used. Simple linear models are highly interpretable, while
more complex models may be more difficult to interpret.
Examples
Which of the following is a regression task?
Predicting age of a person
Predicting nationality of a person
Predicting whether stock price of a company will increase tomorrow
Predicting whether a document is related to sighting of UFOs?
Solution : Predicting age of a person (because it is a real value, predicting
nationality is categorical, whether stock price will increase is discrete-yes/no
answer, predicting whether a document is related to UFO is again discrete- a
yes/no answer).
Regression Model Machine Learning
Let’s take an example of linear regression. We have a Housing data set and
we want to predict the price of the house. Following is the python code for it.
Output:
Here in this graph, we plot the test data. The red line indicates the best fit line for
predicting the price.
To make an individual prediction using the linear regression model:
print( str(round(regr.predict(5000))) )
Applications of Regression
Predicting prices: For example, a regression model could be used to predict the price
of a house based on its size, location, and other features.
Forecasting trends: For example, a regression model could be used to forecast the
sales of a product based on historical sales data and economic indicators.
Identifying risk factors: For example, a regression model could be used to identify
risk factors for heart disease based on patient data.
Making decisions: For example, a regression model could be used to recommend
which investment to buy based on market data.
Advantages of Regression
Easy to understand and interpret
Robust to outliers
Can handle both linear and nonlinear relationships.
Disadvantages of Regression
Assumes linearity
Sensitive to multicollinearity
May not be suitable for highly complex relationships
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with
the corresponding output.
The main goal of the Classification algorithm is to identify the category of a given dataset,
and these algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.
The algorithm which implements the classification on a dataset is known as a classifier.
There are two types of Classifications:
o Binary Classifier: If the classification problem has only two possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is
called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives
the test dataset. In Lazy learner case, classification is done on the basis of the most related
data stored in the training dataset. It takes less time in training but more time for
predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training dataset
before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes more time
in learning, and less time in prediction. Example: Decision Trees, Naïve Bayes, ANN.
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
o It is used for evaluating the performance of a classifier, whose output is a probability value
between the 0 and 1.
o For a good binary Classification model, the value of log loss should be near to 0.
o The value of log loss increases if the predicted value deviates from the actual value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:
1. ?(ylog(p)+(1?y)log(1?p))
2. Confusion Matrix:
o The confusion matrix provides us a matrix/table as output and describes the performance
of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form, which has a total number of
correct predictions and incorrect predictions. The matrix looks like as below table:
o
Actual Positive Actual Negative
3. AUC-ROC curve:
o ROC curve stands for Receiver Operating Characteristics Curve and AUC stands
for Area Under the Curve.
o It is a graph that shows the performance of the classification model at different thresholds.
o To visualize the performance of the multi-class classification model, we use the AUC-ROC
Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and
FPR(False Positive Rate) on X-axis.
In this problem statement, the target variables In this problem statement, the target variables are
are discrete. continuous.
Problems like Spam Email Classification, Disease Problems like House Price Prediction, Rainfall
prediction like problems are solved using Prediction like problems are solved using regression
Classification Algorithms. Algorithms.
Evaluation metrics like Precision, Recall, and F1- Evaluation metrics like Mean Squared Error, R2-
Score are used here to evaluate the performance Score, and MAPE are used here to evaluate the
of the classification algorithms. performance of the regression algorithms.
Input Data are Independent variables and Input Data are Independent variables and
categorical dependent variable. continuous dependent variable.
values.
Example use cases are Spam detection, image Example use cases are Stock price prediction, house
recognition, sentiment analysis price prediction, demand forecasting.
UNIT 2
Single – Layer Networks
Single Layer Perceptron in TensorFlow
The perceptron is a single processing unit of any neural network. Frank Rosenblatt first proposed
in 1958 is a simple neuron which is used to classify its input into one or two categories. Perceptron is a linear
classifier, and is used in supervised learning. It helps to organize the given input data.
A perceptron is a neural network unit that does a precise computation to detect features in the input data.
Perceptron is mainly used to classify the data into two parts. Therefore, it is also known as Linear Binary
Classifier.
Perceptron uses the step function that returns +1 if the weighted sum of its input 0 and -1.
The activation function is used to map the input between the required value like (0, 1) or (-1, 1).
o Input value or One input layer: The input layer of the perceptron is made of
artificial input neurons and takes the initial data into the system for further
processing.
o Weights and Bias:
Weight: It represents the dimension or strength of the connection between units. If
the weight to node 1 to node 2 has a higher quantity, then neuron 1 has a more
considerable influence on the neuron.
Bias: It is the same as the intercept added in a linear equation. It is an additional
parameter which task is to modify the output along with the weighted sum of the
input to the other neuron.
o Net sum: It calculates the total sum.
a. In the first step, all the inputs x are multiplied with their weights w.
b. In this step, add all the increased values and call them the Weighted sum.
c. In our last step, apply the weighted sum to a correct Activation Function.
o the result.
This is the first proposal when the neural model is built. The content of the neuron's local
memory contains a vector of weight.
he single vector perceptron is calculated by calculating the sum of the input vector
multiplied by the corresponding element of the vector, with each increasing the amount
of the corresponding component of the vector by weight. The value that is displayed in
the output is the input of an activation function.
Let us focus on the implementation of a single-layer perceptron for an image classification
problem using TensorFlow. The best example of drawing a single-layer perceptron is
through the representation of "logistic regression."
o he weights are initialized with the random values at the origination of each training.
o For each element of the training set, the error is calculated with the difference
between the desired output and the actual output. The calculated error is used to
adjust the weight.
o The process is repeated until the fault made on the entire training set is less than
the specified limit until the maximum number of iterations has been reached.
1. Definition: A local minimum is a point within a given range such that the function value at that point is
less than or equal to the function values of nearby points.
2. Characteristics:
o A function f(x)f(x)f(x) has a local minimum at x=cx = cx=c if f(c)≤f(x)f(c) \leq f(x)f(c)≤f(x) for
all xxx in some neighborhood around ccc.
o There can be multiple local minima in a function.
o The derivative of the function at a local minimum (if it exists) is zero, i.e., f′(c)=0f'(c) = 0f′(c)=0,
and the second derivative is positive, i.e., f′′(c)>0f''(c) > 0f′′(c)>0.
3. Example: For the function f(x)=x4−x2f(x) = x^4 - x^2f(x)=x4−x2, the points x=−12x = -\frac{1}{\
sqrt{2}}x=−21 and x=12x = \frac{1}{\sqrt{2}}x=21 are local minima.
Global Minima:
1. Definition: A global minimum is a point at which the function value is the lowest over the entire domain
of the function.
2. Characteristics:
o A function f(x)f(x)f(x) has a global minimum at x=cx = cx=c if f(c)≤f(x)f(c) \leq f(x)f(c)≤f(x) for
all xxx in the domain of fff.
o There is only one global minimum value, but there can be multiple points where this minimum
value occurs.
o A global minimum is also a local minimum, but the converse is not necessarily true.
3. Example: For the function f(x)=x4−x2f(x) = x^4 - x^2f(x)=x4−x2, the point x=0x = 0x=0 is the global
minimum since f(0)=−0.25f(0) = -0.25f(0)=−0.25 is the lowest value the function attains.
Key Differences:
Scope: Local minima are determined within a neighborhood, while global minima are determined over
the entire domain.
Uniqueness: There can be multiple local minima, but there is only one global minimum value.
Comparison: All global minima are local minima, but not all local minima are global minima.
Visualization:
The graph of this function shows three valleys (local minima) with the deepest one being the global minimum.
Artificial neural networks (ANNs) and deep neural networks use backpropagation as
a learning algorithm to compute a gradient descent, which is an optimization
algorithm that guides the user to the maximum or minimum of a function.
In a machine learning context, the gradient descent helps the system minimize the
gap between desired outputs and achieved system outputs. The algorithm tunes the
system by adjusting the weight values for various inputs to narrow the difference
between outputs. This is also known as the error between the two.
More specifically, a gradient descent algorithm uses a gradual process to provide
information on how a network's parameters need to be adjusted to reduce the
disparity between the desired and achieved outputs. An evaluation metric called a
cost function guides this process. The cost function is a mathematical function that
measures this error. The algorithm's goal is to determine how the parameters must
be adjusted to reduce the cost function and improve overall accuracy.
They're highly adaptable and efficient, and don't require prior knowledge about the network.
Data mining is sensitive to noisy data and other irregularities. Unclean data can affect the
backpropagation algorithm when training a neural network used for data mining.
Along with classifier algorithms such as naive Bayesian filters, K-nearest neighbors and support
vector machines, the backpropagation training algorithm has emerged as an important part of
machine learning applications that involve predictive analytics. While backpropagation techniques
are mainly applied to neural networks, they can also be applied to both classification and regression
problems in machine learning. In real-world applications, developers and machine learning experts
implement backpropagation algorithms for neural networks using programming languages such as
Python.
The architecture of Adaline, short for Adaptive Linear Neuron, consists of a single−layer
neural network. It typically comprises an input layer, a weight adjustment unit, and an output
layer. The input layer receives the input data, which is then multiplied by adjustable weights.
The weighted inputs are summed, and the result is passed through an activation function,
often a linear activation function. The output of the activation function is compared to the
desired output, and the network adjusts its weights using a supervised learning algorithm,
such as the Widrow−Hoff learning rule or delta rule. This iterative process continues until the
network reaches a satisfactory level of accuracy in making predictions or performing
regression tasks. The simplicity and linearity of the architecture allow Adaline to solve linearly
separable problems effectively.
Learning Algorithm
The Adaline network aims to minimize output disparities by fine−tuning weights using the
renowned Widrow−Hoff rule (Delta rule or LMS algorithm). Gradient descent is employed to
adjust weights, approaching optimal values iteratively. This continuous refinement enables
the network to align predictions with expected outcomes, showcasing its great learning and
adaptive abilities. Adaline is a powerful tool in pattern recognition and machine learning,
dynamically adapting weights based on feedback received.
Applications of Adaline
Architecture of Madaline
The Madaline architecture comprises multiple layers of Adaline units. Input data is initially
received by the input layer, which then transmits it through intermediate layers before
reaching the output layer. Within the intermediate layers, each Adaline unit calculates a
linear combination of inputs, followed by passing the unit's output through an activation
function. Ultimately, the output layer combines outputs from the intermediate layers to
generate the final output.
Learning Algorithm
The learning algorithm in Madaline networks follows a similar principle as Adaline but with
some modifications. The weights of each Adaline unit are updated using the Delta rule, and
the error is propagated backward through the layers using the backpropagation algorithm.
Backpropagation allows the network to adjust the weights in each layer based on the error
contribution of that layer, enabling the network to learn complex patterns.
Applications of Madaline
3.b) What is learing in Neural Network ? Explain with neat diagram supervised and
unsupervised learnig in Neural Network.
Neural networks extract identifying features from data, lacking pre-programmed
understanding. Network components include neurons, connections, weights, biases,
propagation functions, and a learning rule. Neurons receive inputs, governed by
thresholds and activation functions. Connections involve weights and biases regulating
information transfer. Learning, adjusting weights and biases, occurs in three stages: input
computation, output generation, and iterative refinement enhancing the network’s
proficiency in diverse tasks.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of this
simulation.
3. The neural network then responds in a new way to the environment because of the
changes in its free parameters.
Forward Propagation
Input Layer: Each feature in the input layer is represented by a node on the network,
which receives input data.
Weights and Connections: The weight of each neuronal connection indicates how
strong the connection is. Throughout training, these weights are changed.
Hidden Layers: Each hidden layer neuron processes inputs by multiplying them by
weights, adding them up, and then passing them through an activation function. By
doing this, non-linearity is introduced, enabling the network to recognize intricate
patterns.
Output: The final result is produced by repeating the process until the output layer is
reached.
Backpropagation
Loss Calculation: The network’s output is evaluated against the real goal values, and
a loss function is used to compute the difference. For a regression problem, the Mean
Squared Error (MSE) is commonly used as the cost function.
Supervised Machine Learning
Last Updated : 27 Feb, 2024
Regression
Regression is a supervised learning technique used to predict continuous numerical
values based on input features. It aims to establish a functional relationship between
independent variables and a dependent variable, such as predicting house prices based
on features like size, bedrooms, and location.
The goal is to minimize the difference between predicted and actual values using
algorithms like Linear Regression, Decision Trees, or Neural Networks, ensuring the
model captures underlying patterns in the data.
Classification
Classification is a type of supervised learning that categorizes input data into predefined
labels. It involves training a model on labeled examples to learn patterns between input
features and output classes. In classification, the target variable is a categorical value.
For example, classifying emails as spam or not.
The model’s goal is to generalize this learning to make accurate predictions on new,
unseen data. Algorithms like Decision Trees, Support Vector Machines, and Neural
Networks are commonly used for classification tasks.
NOTE: There are common Supervised Machine Learning Algorithm that can be used for
both regression and classification task.
Supervised Machine Learning Algorithm
Supervised learning can be further divided into several different types, each with its own
unique characteristics and applications. Here are some of the most common types of
supervised learning algorithms:
Linear Regression: Linear regression is a type of regression algorithm that is used to
predict a continuous output value. It is one of the simplest and most widely used
algorithms in supervised learning. In linear regression, the algorithm tries to find a
linear relationship between the input features and the output value. The output value is
predicted based on the weighted sum of the input features.
Logistic Regression : Logistic regression is a type of classification algorithm that is
used to predict a binary output variable. It is commonly used in machine learning
applications where the output variable is either true or false, such as in fraud detection
or spam filtering. In logistic regression, the algorithm tries to find a linear relationship
between the input features and the output variable. The output variable is then
transformed using a logistic function to produce a probability value between 0 and 1.
Decision Trees: Decision tree is a tree-like structure that is used to model decisions
and their possible consequences. Each internal node in the tree represents a decision,
while each leaf node represents a possible outcome. Decision trees can be used to
model complex relationships between input features and output variables.
A decision tree is a type of algorithm that is used for both classification and regression
tasks.
o Decision Trees Regression: Decision Trees can be utilized for regression
tasks by predicting the value linked with a leaf node.
o Decision Trees Classification: Random Forest is a machine learning
algorithm that uses multiple decision trees to improve classification and
prevent overfitting.
Random Forests: Random forests are made up of multiple decision trees that work
together to make predictions. Each tree in the forest is trained on a different subset of
the input features and data. The final prediction is made by aggregating the predictions
of all the trees in the forest.
Random forests are an ensemble learning technique that is used for both classification
and regression tasks.
o Random Forest Regression: It combines multiple decision trees to reduce
overfitting and improve prediction accuracy.
o Random Forest Classifier: Combines several decision trees to improve the
accuracy of classification while minimizing overfitting.
Support Vector Machine(SVM) : The SVM algorithm creates a hyperplane to
segregate n-dimensional space into classes and identify the correct category of new
data points. The extreme cases that help create the hyperplane are called support
vectors, hence the name Support Vector Machine.
A Support Vector Machine is a type of algorithm that is used for both classification and
regression tasks
o Support Vector Regression: It is a extension of Support Vector Machines
(SVM) used for predicting continuous values.
o Support Vector Classifier: It aims to find the best hyperplane that
maximizes the margin between data points of different classes.
K-Nearest Neighbors (KNN): KNN works by finding k training examples closest to a
given input and then predicts the class or value based on the majority class or average
value of these neighbors. The performance of KNN can be influenced by the choice of
k and the distance metric used to measure proximity. However, it is intuitive but can be
sensitive to noisy data and requires careful selection of k for optimal results.
A K-Nearest Neighbors (KNN) is a type of algorithm that is used for both classification
and regression tasks.
o K-Nearest Neighbors Regression: It predicts continuous values by
averaging the outputs of the k closest neighbors.
o K-Nearest Neighbors Classification: Data points are classified based on
the majority class of their k closest neighbors.
Supervised and Unsupervised learning
Last Updated : 13 Mar, 2024
Machine learning is a field of computer science that gives computers the ability
to learn without being explicitly programmed. Supervised learning and
unsupervised learning are two main types of machine learning.
In supervised learning, the machine is trained on a set of labeled data, which
means that the input data is paired with the desired output. The machine then
learns to predict the output for new input data. Supervised learning is often
used for tasks such as classification, regression, and object detection.
In unsupervised learning, the machine is trained on a set of unlabeled data,
which means that the input data is not paired with the desired output. The
machine then learns to find patterns and relationships in the data.
Unsupervised learning is often used for tasks such as clustering, dimensionality
reduction, and anomaly detection.
What is Supervised learning?
Supervised learning is a type of machine learning algorithm that learns from
labeled data. Labeled data is data that has been tagged with a correct answer
or classification.
Supervised learning, as the name indicates, has the presence of a supervisor as
a teacher. Supervised learning is when we teach or train the machine using
data that is well-labelled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the training
data(set of training examples) and produces a correct outcome from labeled
data.
For example, a labeled dataset of images of Elephant, Camel and Cow would
have each image tagged with either “Elephant” , “Camel”or “Cow.”
Key Points:
Supervised learning involves training a machine from labeled data.
Labeled data consists of examples with the correct answer or classification.
The machine learns the relationship between inputs (fruit images) and outputs (fruit
labels).
The trained machine can then make predictions on new, unlabeled data.
3.Learning with Reinforcement Learning
Through interaction with the environment and feedback in the form of rewards or
penalties, the network gains knowledge. Finding a policy or strategy that optimizes
cumulative rewards over time is the goal for the network. This kind is frequently utilized in
gaming and decision-making applications.
𝑦𝑖𝑛=∑𝑖𝑛𝑥𝑖.𝑤𝑖+b
Step 4 − Obtain the net input with the following relation −
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 5 Until least mean square is obtained (t - yin), Adjust the weight and bias as follows −
UNIT 3
Multilayer Perceptron
Stepwise Implementation
Step 1: Import the necessary libraries.
Step 2: Download the dataset.
MLP is the classical type of neural network. it consists of one or several hidden layers (depending
on the abstraction required in deep learning). It performs a dot product between the input and
weights and applies monotonic activation functions such as sigmoid or ReLU. In MLP, the fine-
tuning of weights (the training) is usually done through backpropagation for all layers.
RBF is a neural network, consisting of just one hidden layer. For each of the neurons in the input
layer, the hidden layer first computes the distance between inputs and weights, which can be
viewed as centers, and then an activation function which is usually a Gaussian function is
implemented to the calculated radial distance. this is why it is called "Radial Basis Function
Networks". Since the Gaussian function has it's center at zero, when the input neurons are equal to
the weights (centers) so the distance is zero, RBF neurons have maximum activation. The training
of an RBF neural network can either be done through backpropagation or RBF Network hybrid
learning. Also, typically the RBF network has a faster learning speed compared to MLP and are less
sensitive to the order of presentation of training data.
Radial Basis Functions are a special class of feed-forward neural networks consisting of three layers: an input
layer, a hidden layer, and the output layer. This is fundamentally different from most neural network
architectures, which are composed of many layers and bring about nonlinearity by recurrently applying non-
linear activation functions. The input layer receives input data and passes it into the hidden layer, where the
computation occurs. The hidden layer of Radial Basis Functions Neural Network is the most powerful and very
different from most Neural networks. The output layer is designated for prediction tasks like classification or
regression.
nput Layer
The input layer consists of one neuron for every predictor variable. The input neurons pass the value
to each neuron in the hidden layer. N-1 neurons are used for categorical values, where N denotes the
number of categories. The range of values is standardized by subtracting the median and dividing by
the interquartile range.
Hidden Layer
The hidden layer contains a variable number of neurons (the ideal number determined by the training
process). Each neuron comprises a radial basis function centered on a point. The number of
dimensions coincides with the number of predictor variables. The radius or spread of the RBF function
may vary for each dimension.
When an x vector of input values is fed from the input layer, a hidden neuron calculates the Euclidean
distance between the test case and the neuron's center point. It then applies the kernel function using
the spread values. The resulting value gets fed into the summation layer.
The value obtained from the hidden layer is multiplied by a weight related to the neuron and passed to
the summation. Here the weighted values are added up, and the sum is presented as the network's
output. Classification problems have one output per target category, the value being the probability
that the case evaluated has that category.
Advantages of RBFN
Easy Design
Good Generalization
Faster Training
A straightforward interpretation of the meaning or function of each node in the hidden layer
Unit 4
Fuzzy Logic Systems
The term fuzzy refers to things that are not clear or are vague. In the real world
many times we encounter a situation when we can’t determine whether the
state is true or false, their fuzzy logic provides very valuable flexibility for
reasoning. In this way, we can consider the inaccuracies and uncertainties of
any situation.
Fuzzy Logic is a form of many-valued logic in which the truth values of variables
may be any real number between 0 and 1, instead of just the traditional values
of true or false. It is used to deal with imprecise or uncertain information and is
a mathematical method for representing vagueness and uncertainty in
decision-making.
Fuzzy Logic is based on the idea that in many cases, the concept of true or false
is too restrictive, and that there are many shades of gray in between. It allows
for partial truths, where a statement can be partially true or false, rather than
fully true or false.
Fuzzy Logic is used in a wide range of applications, such as control systems,
image processing, natural language processing, medical diagnosis, and artificial
intelligence.
The fundamental concept of Fuzzy Logic is the membership function, which
defines the degree of membership of an input value to a certain set or
category. The membership function is a mapping from an input value to a
membership degree between 0 and 1, where 0 represents non-membership and
1 represents full membership.
Fuzzy Logic is implemented using Fuzzy Rules, which are if-then statements
that express the relationship between input variables and output variables in a
fuzzy way. The output of a Fuzzy Logic system is a fuzzy set, which is a set of
membership degrees for each possible output value.
In summary, Fuzzy Logic is a mathematical method for representing vagueness
and uncertainty in decision-making, it allows for partial truths, and it is used in
a wide range of applications. It is based on the concept of membership function
and the implementation is done using Fuzzy rules.
In the boolean system truth value, 1.0 represents the absolute truth value and
0.0 represents the absolute false value. But in the fuzzy system, there is no
logic for the absolute truth and absolute false value. But in fuzzy logic, there is
an intermediate value too present which is partially true and partially false.
ARCHITECTURE
Its Architecture contains four parts :
RULE BASE: It contains the set of rules and the IF-THEN conditions provided
by the experts to govern the decision-making system, on the basis of
linguistic information. Recent developments in fuzzy theory offer several
effective methods for the design and tuning of fuzzy controllers. Most of
these developments reduce the number of fuzzy rules.
FUZZIFICATION: It is used to convert inputs i.e. crisp numbers into fuzzy sets.
Crisp inputs are basically the exact inputs measured by sensors and passed
into the control system for processing, such as temperature, pressure, rpm’s,
etc.
INFERENCE ENGINE: It determines the matching degree of the current fuzzy
input with respect to each rule and decides which rules are to be fired
according to the input field. Next, the fired rules are combined to form the
control actions.
DEFUZZIFICATION: It is used to convert the fuzzy sets obtained by the
inference engine into a crisp value. There are several defuzzification methods
available and the best-suited one is used with a specific expert system to
reduce the error.
Membership function
Definition: A graph that defines how each point in the input space is mapped
to membership value between 0 and 1. Input space is often referred to as the
universe of discourse or universal set (u), which contains all the possible
elements of concern in each particular application.
There are largely three types of fuzzifiers:
Singleton fuzzifier
Gaussian fuzzifier
Trapezoidal or triangular fuzzifier
What is Fuzzy Control?
It is a technique to embody human-like thinkings into a control system.
It may not be designed to give accurate reasoning but it is designed to give
acceptable reasoning.
It can emulate human deductive thinking, that is, the process people use to
infer conclusions from what they know.
Any uncertainties can be easily dealt with the help of fuzzy logic.
Advantages of Fuzzy Logic System
This system can work with any type of inputs whether it is imprecise,
distorted or noisy input information.
The construction of Fuzzy Logic Systems is easy and understandable.
Fuzzy logic comes with mathematical concepts of set theory and the
reasoning of that is quite simple.
It provides a very efficient solution to complex problems in all fields of life as
it resembles human reasoning and decision-making.
The algorithms can be described with little data, so little memory is required.
Disadvantages of Fuzzy Logic Systems
Many researchers proposed different ways to solve a given problem through
fuzzy logic which leads to ambiguity. There is no systematic approach to
solve a given problem through fuzzy logic.
Proof of its characteristics is difficult or impossible in most cases because
every time we do not get a mathematical description of our approach.
As fuzzy logic works on precise as well as imprecise data so most of the time
accuracy is compromised.
Application
It is used in the aerospace field for altitude control of spacecraft and
satellites.
It has been used in the automotive system for speed control, traffic control.
It is used for decision-making support systems and personal evaluation in the
large company business.
It has application in the chemical industry for controlling the pH, drying,
chemical distillation process.
Fuzzy logic is used in Natural language processing and various intensive
applications in Artificial Intelligence.
Fuzzy logic is extensively used in modern control systems such as expert
systems.
Fuzzy Logic is used with Neural Networks as it mimics how a person would
make decisions, only much faster. It is done by Aggregation of data and
changing it into more meaningful data by forming partial truths as Fuzzy
sets.
Neural Network:
Neural network is an information processing system that is inspired by the way
biological nervous systems such as brain process information. A neural network
is composed of a large number of interconnected processing elements known
as neurons which are used to solve problems. A neural network is an attempt to
make a computer model of the human brain and neural networks are parallel
computing devices. The simple diagram of the neural network is as shown
below:
Fuzzy Logic:
The term fuzzy represents the things which are not clear. In the real world
many times we find a situation where we can’t determine whether the state is
true or false, their fuzzy logic provides very valuable flexibility for reasoning. In
this way, we can consider the inaccuracies and uncertainties of any situation.
The simple diagram of fuzzy logic is as shown below:
This system can not easily modified. This system can easily modified.
It trains itself by learning from data set Everything must be defined explicitly.
It is complex than fuzzy logic. It is simpler than neural network.
Fuzzifiction:
Defuzzification:
Unit 5
Support Vector Machines
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input features x 1,
x2) that segregate our data points or do a classification between red and blue
circles. So how do we choose the best line or in general the best hyperplane
that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the
largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes
So we choose the hyperplane whose distance from it to the nearest data point
on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin. So from the above figure,
we choose L2. Let’s consider a scenario like shown below
Here we have one blue ball in the boundary of the red ball. So how does SVM
classify the data? It’s simple! The blue ball in the boundary of red ones is an
outlier of blue balls. The SVM algorithm has the characteristics to ignore the
outlier and finds the best hyperplane that maximizes the margin. SVM is robust
to outliers.
So in this type of data point what SVM does is, finds the maximum margin as
done with previous data sets along with that it adds a penalty each time a point
crosses the margin. So the margins in these types of cases are called soft
In this case, the new variable y is created as a function of distance from the
origin. A non-linear function that creates a new variable is referred to as a
kernel.
Support Vector Machine Terminology
1. Hyperplane: Hyperplane is the decision boundary that is used to separate
the data points of different classes in a feature space. In the case of linear
classifications, it will be a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the
hyperplane, which makes a critical role in deciding the hyperplane and
margin.
3. Margin: Margin is the distance between the support vector and hyperplane.
The main objective of the support vector machine algorithm is to maximize
the margin. The wider margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map
the original input data points into high-dimensional feature spaces, so, that
the hyperplane can be easily found out even if the data points are not
linearly separable in the original input space. Some of the common kernel
functions are linear, polynomial, radial basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin
hyperplane is a hyperplane that properly separates the data points of
different categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers,
SVM permits a soft margin technique. Each data point has a slack variable
introduced by the soft-margin SVM formulation, which softens the strict
margin requirement and permits certain misclassifications or violations. It
discovers a compromise between increasing the margin and reducing
violations.
7. C: Margin maximisation and misclassification fines are balanced by the
regularisation parameter C in SVM. The penalty for going over the margin or
misclassifying data items is decided by it. A stricter penalty is imposed with a
greater value of C, which results in a smaller margin and perhaps fewer
misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes
incorrect classifications or margin violations. The objective function in SVM is
frequently formed by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires
locating the Lagrange multipliers related to the support vectors can be used
to solve SVM. The dual formulation enables the use of kernel tricks and more
effective computing.
Mathematical intuition of Support Vector Machine
Consider a binary classification problem with two classes, labeled as +1 and -1.
We have a training dataset consisting of input feature vectors X and their
corresponding class labels Y.
𝑤𝑇𝑥+𝑏=0wTx+b=0
The equation for the linear hyperplane can be written as:
The vector W represents the normal vector to the hyperplane. i.e the direction
perpendicular to the hyperplane. The parameter b in the equation represents
the offset or distance of the hyperplane from the origin along the normal
vector w.
The distance between a data point x_i and the decision boundary can be
𝑑𝑖=𝑤𝑇𝑥𝑖+𝑏∣∣𝑤∣∣di=∣∣w∣∣wTxi+b
calculated as:
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean
norm of the normal vector W
Optimization:
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑤,𝑏12𝑤𝑇𝑤=𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑊,𝑏12∥𝑤∥2subject to𝑦𝑖(𝑤𝑇𝑥𝑖+𝑏)≥1𝑓𝑜𝑟𝑖=1,2,3,
For Hard margin linear SVM classifier:
⋯,𝑚w,bminimize21wTw=W,bminimize21∥w∥2subject toyi(wTxi+b)≥1fori=1,2,3,⋯,m
The target variable or label for the ith training instance is denoted by the symbol
ti in this statement. And ti=-1 for negative occurrences (when yi= 0) and
ti=1positive instances (when yi = 1) respectively. Because we require the
decision boundary that satisfy the constraint: (𝑤𝑇𝑥𝑖+𝑏)≥1ti(wTxi+b)≥1
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒𝛼:12∑𝑖→𝑚∑𝑗→𝑚𝛼𝑖𝛼𝑗𝑡𝑖(𝑥𝑖,𝑥𝑗)−∑𝑖→𝑚𝛼𝑖αmaximize:21i→m∑j→m∑αiαjtitjK(xi,xj)
following dual objective function
−i→m∑αi
where,
αi is the Lagrange multiplier associated with the ith training sample.
K(xi, xj) is the kernel function that computes the similarity between two
samples xi and xj. It allows SVM to handle nonlinear classification problems
by implicitly mapping the samples into a higher-dimensional feature space.
The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has been
solved and the optimal Lagrange multipliers have been discovered. The training
samples that have i > 0 are the support vectors, while the decision boundary is
𝑤=∑𝑖→𝑚𝛼𝑖(𝑥𝑖,𝑥)+𝑏𝑡𝑖(𝑤𝑇𝑥𝑖−𝑏)=1⟺𝑏=𝑤𝑇𝑥𝑖−𝑡𝑖w=i→m∑αitiK(xi,x)+bti(wTxi−b)=1⟺b=wTxi−ti
supplied by: