ML Unit 1-Notes
ML Unit 1-Notes
Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives
computers the ability to learn without being explicitly programmed." This is an older, informal definition.
Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E."
In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised Learning
In supervised learning, we are given a data set and already know what our correct output should look like,
having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification" problems. In a
regression problem, we are trying to predict results within a continuous output, meaning that we are trying
to map input variables to some continuous function. In a classification problem, we are instead trying to
predict results in a discrete output. In other words, we are trying to map input variables into discrete
categories.
Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price as a function
of size is a continuous output, so this is a regression problem.
We could turn this example into a classification problem by instead making our output about whether the
house "sells for more or less than the asking price." Here we are classifying the houses based on price into
two discrete categories.
Example 2:
(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture
(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or
benign.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Unsupervised Learning
Unsupervised learning allows us to approach problems with little or no idea what our results should look
like. We can derive structure from data where we don't necessarily know the effect of the variables.
We can derive this structure by clustering the data based on relationships among the variables in the data.
Example:
Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these
genes into groups that are somehow similar or related by different variables, such as lifespan, location,
roles, and so on.
Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.
(i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).
Learning Process:
Basic components of learning process The learning process, whether by a human or a machine, can be
divided into four components, namely, data storage, abstraction, generalization and evaluation. Figure 1.1
illustrates the various components and the steps involved in the learning process.
Data storage : Facilities for storing and retrieving huge amounts of data are an important component of
the learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning. In a human being, the data is stored in the brain and data is retrieved using electrochemical
signals. Computers use hard disk drives, flash memory, random access memory and similar devices to
store data and use cables and other technology to retrieve data.
Abstraction: The second component of the learning process is known as abstraction. Abstraction is the
process of extracting knowledge about stored data. This involves creating general concepts about the data
as a whole. The creation of knowledge involves application of known models and creation of new models.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.
Generalization :The third component of the learning process is known as generalisation. The term
generalization describes the process of turning the knowledge about stored data into a form that can be
utilized for future action. These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to discover those properties of the data
that will be most relevant to future tasks.
Evaluation : Evaluation is the last component of the learning process. It is the process of giving feedback
to the user to measure the utility of the learned knowledge. This feedback is then utilised to effect
improvements in the whole learning process.
Application of machine learning methods to large databases is called data mining. In data mining, a large
volume of data is processed to construct a simple model with valuable use, for example, having high
predictive accuracy. The following is a list of some of the typical applications of machine learning.
2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection,
and the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality
of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough
by computers. The World Wide Web is huge; it is constantly growing and searching for relevant
information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the system
designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer correctly
when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for playing games such as chess,
backgammon and Go.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Learning Models :
Machine learning is concerned with using the right features to build the right models that achieve the right
tasks. The basic idea of Learning models has divided into three categories. For a given problem, the
collection of all possible outcomes represents the sample space or instance space.
Using a Logical expression. (Logical models)
Using the Geometry of the instance space. (Geometric models)
Using Probability to classify the instance space. (Probabilistic models)
Grouping and Grading
Logical models : Logical models use a logical expression to divide the instance space into segments and
hence construct grouping models. A logical expression is an expression that returns a Boolean value, i.e.,
a True or False outcome. Once the data is grouped using a logical expression, the data is divided into
homogeneous groupings for the problem we are trying to solve. For example, for a classification problem,
all the instances in the group belong to one class. There are mainly two kinds of logical models: Tree
models and Rule models. Rule models consist of a collection of implications or IF-THEN rules. For tree-
based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the behaviour of the model for
this segment. Rule models follow the same reasoning.
Geometric models: features could be described as points in two dimensions (x- and y-axis) or a three-
dimensional space (x, y, and z). Even when features are not intrinsically geometric, they could be
modelled in a geometric manner (for example, temperature as a function of time can be modelled in two
axes). In geometric models, there are two ways we could impose similarity. We could use geometric
concepts like lines or planes to segment (classify) the instance space.
These are called Linear models. Alternatively, we can use the geometric notion of distance to represent
similarity. In this case, if two points are close together, they have similar values for features and thus can
be classed as similar. We call such models as Distance-based models.
Linear models: Linear models are relatively simple. In this case, the function is represented as a
linear combination of its inputs. Thus, if x1 and x2 are two scalars or vectors of the same
dimension and a and b are arbitrary scalars, then ax1 + bx2 represents a linear combination of x1
and x2. In the simplest case where f(x) represents a straight line, we have an equation of the form
f (x) = mx + c where c represents the intercept and m represents the slope.
Linear models are parametric, which means that they have a fixed form with a small number of numeric
parameters that need to be learned from data. For example, in f (x) = mx + c, m and c are the parameters
that we are trying to learn from the data. This technique is different from tree or rule models, where the
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
structure of the model (e.g., which features to use in the tree, and where) is not fixed in advance. Linear
models are stable, i.e., small variations in the training data have only a limited impact on the learned
model. In contrast, tree models tend to vary more with the training data, as the choice of a different split
at the root of the tree typically means that the rest of the tree is different as well. As a result of having
relatively few parameters, Linear models have low variance and high bias. This implies that Linear
models are less likely to overfit the training data than some other models. However, they are more likely
to underfit. For example, if we want to learn the boundaries between countries based on labelled data,
then linear models are not likely to give a good approximation.
Distance-based models: Distance-based models are the second class of Geometric models. Like Linear
models, distance based models are based on the geometry of data. As the name implies, distance-based
models work on the concept of distance. In the context of Machine learning, the concept of distance is not
based on merely the physical distance between two points. Instead, we could think of the distance
between two points considering the mode of transport between two points. Travelling between two cities
by plane covers less distance physically than by train because a plane is unrestricted. Similarly, in chess,
the concept of distance depends on the piece used – for example, a Bishop can move diagonally. Thus,
depending on the entity and the mode of travel, the concept of distance can be experienced differently.
The distance metrics commonly used are Euclidean, Minkowski, Manhattan, and Mahalanobis.
Distance is applied through the concept of neighbours and exemplars. Neighbours are points in proximity
with respect to the distance measure expressed through exemplars. Exemplars are either centroids that
find a centre of mass according to a chosen distance metric or medoids that find the most centrally located
data point. The most commonly used centroid is the arithmetic mean, which minimises squared Euclidean
distance to all other points.
The centroid represents the geometric centre of a plane figure, i.e., the arithmetic mean position of all the
points in the figure from the centroid point. This definition extends to any object in n-dimensional space:
its centroid is the mean position of all the points.
Medoids are similar in concept to means or centroids. Medoids are most commonly used on data when a
mean or centroid cannot be defined. They are used in contexts where the centroid is not representative of
the dataset, such as in image data.
Examples of distance-based models include the nearest-neighbour models, which use the training data as
exemplars – for example, in classification. The K-means clustering algorithm also uses exemplars to
create clusters of similar data points.
Probabilistic models The third family of machine learning algorithms is the probabilistic models. We
have seen before that the k-nearest neighbour algorithm uses the idea of distance (e.g., Euclidian distance)
to classify entities, and logical models use a logical expression to partition the instance space. The
probabilistic models use the idea of probability to classify new entities. Probabilistic models see features
and target variables as random variables. The process of modelling represents and manipulates the level
of uncertainty with respect to these variables.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
There are two types of probabilistic models: Predictive and Generative.
Predictive probability models use the idea of a conditional probability distribution P (Y |X) from which Y
can be predicted from X.
Generative models estimate the joint distribution P (Y, X). Once we know the joint distribution for the
generative models, we can derive any conditional or marginal distribution involving the same variables.
Thus, the generative model is capable of creating new data points and their labels, knowing the joint
probability distribution. The joint distribution looks for a relationship between two variables. Once this
relationship is inferred, it is possible to infer new data points. Naïve Bayes is an example of a
probabilistic classifier. We can do this using the Bayes rule defined as
𝑃(𝐷|𝐻) ∗ 𝑃(𝐻)
𝑃(𝐻|𝐷) =
𝑃(𝐷)
Where (𝐻|𝐷) = 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ; P(H) = Prior Probability; P(D)= Likelihood of event and
𝑃(𝐷|𝐻) = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡 𝐷 𝑔𝑖𝑣𝑒𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑖𝑠 𝐻.
The Naïve Bayes algorithm is based on the idea of Conditional Probability. Conditional probability is
based on finding the probability that something will happen, given that something else has already
happened. The task of the algorithm then is to look at the evidence and to determine the likelihood of a
specific class and assign a label accordingly to each entity.
Some broad categories of models:
Geometric models E.g. K-nearest neighbors, linear regression, support vector machine, logistic
regression,
Probabilistic models : Naïve Bayes, Gaussian process regression, conditional random field
Logical models: Decision tree, random forest.
Grading vs grouping is an orthogonal categorization to geometric-probabilistic-logical-compositional.
Grouping models break the instance space up into groups or segments and in each segment apply a very
simple method (such as majority class). E.g. decision tree, KNN. Grading models form one global model
over the instance space. E.g. Linear classifiers – Neural networks.
Design a Learning System in Machine Learning
When we fed the Training Data to Machine Learning Algorithm, this algorithm will produce a
mathematical model and with the help of the mathematical model, the machine will make a prediction and
take a decision without being explicitly programmed. Also, during training data, the more machine will
work with it the more it will get experience and the more it will get experience the more efficient result is
produced.
Example : In Driverless Car, the training data is fed to Algorithm like how to Drive Car in Highway,
Busy and Narrow Street with factors like speed limit, parking, stop at signal etc. After that, a Logical and
Mathematical model is created on the basis of that and after that, the car will work according to the
logical model. Also, the more data the data is fed the more efficient output is produced.
Designing a Learning System in Machine Learning :
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Steps for Designing Learning System are:
Step 1) Choosing the Training Experience: The very important and first task is to choose the training
data or training experience which will be fed to the Machine Learning Algorithm. It is important to note
that the data or experience that we fed to the algorithm must have a significant impact on the Success or
Failure of the Model. So Training data or experience should be chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
The training experience will be able to provide direct or indirect feedback regarding choices. For
example: While Playing chess the training data will provide feedback to itself like instead of this move
if this is chosen the chances of success increases.
Second important attribute is the degree to which the learner will control the sequences of training
examples. For example: when training data is fed to the machine then at that time accuracy is very less
but when it gains experience while playing again and again with itself or opponent the machine
algorithm will get feedback and control the chess game accordingly.
Third important attribute is how it will represent the distribution of examples over which performance
will be measured. For example, a Machine learning algorithm will get experience while going through
a number of different cases and different examples. Thus, Machine Learning Algorithm will get more
and more experience by passing through more and more examples and hence its performance will
increase.
Step 2- Choosing target function: The next important step is choosing the target function. It means
according to the knowledge fed to the algorithm the machine learning will choose NextMove function
which will describe what type of legal moves should be taken. For example : While playing chess with
the opponent, when opponent will play then the machine learning algorithm will decide what be the
number of possible legal moves taken in order to get success.
Step 3- Choosing Representation for Target function: When the machine algorithm will know all the
possible legal moves the next step is to choose the optimized move using any representation i.e. using
linear Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove function will
move the Target move like out of these move which will provide more success rate. For Example : while
playing chess machine have 4 possible moves, so the machine will choose that optimized move which
will provide success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with
the training data. The training data had to go through with set of example and through these examples the
training data will approximates which steps are chosen and after that machine will provide feedback on it.
For Example : When a training data of Playing chess is fed to algorithm so at that time it is not machine
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
algorithm will fail or get success and again from that failure or success it will measure while next move
what step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes from number of examples ,
failures and success , correct and incorrect decision and what will be the next step etc. Example:
DeepBlue is an intelligent computer which is ML-based won chess game against the chess expert Garry
Kasparov, and it became the first computer which had beaten a human chess expert.
Types of Learning In general, machine learning algorithms can be classified into three types: Supervised
learning, Unsupervised learning and Reinforcement learning.
Supervised learning: A training set of examples with the correct responses (targets) is provided and,
based on this training set, the algorithm generalises to respond correctly to all possible inputs. This is also
called learning from exemplars. Supervised learning is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs. In supervised learning, each example
in the training set is a pair consisting of an input object (typically a vector) and an output value. A
supervised learning algorithm analyzes the training data and produces a function, which can be used for
mapping new examples. In the optimal case, the function will correctly determine the class labels for
unseen instances. Both classification and regression problems are supervised learning problems. A wide
range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no
single learning algorithm that works best on all supervised learning problems.
A “supervised learning” is so called because the process of an algorithm learning from the training dataset
can be thought of as a teacher supervising the learning process. We know the correct answers (that is, the
correct outputs), the algorithm iteratively makes predictions on the training data and is corrected by the
teacher. Learning stops when the algorithm achieves an acceptable level of performance. Example
Consider the following data regarding patients entering a clinic. The data consists of the gender and age
of the patients and each patient is labeled as “healthy” or “sick”.
Gender Age Label
M 40 Sick
F 37 Healthy
M 74 Sick
M 23 Healthy
F 39 Sick
F 44 Healthy
M 56 Healthy
Unsupervised learning : Correct responses are not provided, but instead the algorithm tries to identify
similarities between the inputs so that inputs that have something in common are categorised together.
The statistical approach to unsupervised learning is known as density estimation. Unsupervised learning
is a type of machine learning algorithm used to draw inferences from datasets consisting of input data
without labeled responses. In unsupervised learning algorithms, a classification or categorization is not
included in the observations. There are no output values and so there is no estimation of functions. Since
the examples given to the learner are unlabeled, the accuracy of the structure that is output by the
algorithm cannot be evaluated. The most common unsupervised learning method is cluster analysis,
which is used for exploratory data analysis to find hidden patterns or grouping in data.
Example : Consider the following data regarding patients entering a clinic. The data consists of the gender
and age of the patients.
Gender Age
F 61
M 37
M 74
F 32
F 24
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
M 44
F 56
Based on this data, can we infer anything regarding the patients entering the clinic?
Reinforcement learning: This is somewhere between supervised and unsupervised learning. The
algorithm gets told when the answer is wrong, but does not get told how to correct it. It has to explore and
try out different possibilities until it works out how to get the answer right. Reinforcement learning is
sometime called learning with a critic because of this monitor that scores the answer, but does not suggest
improvements. Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards. A learner (the program) is not told what actions to take as in most forms of
machine learning, but instead must discover which actions yield the most reward by trying them. In the
most interesting and challenging cases, actions may affect not only the immediate reward but also the
next situations and, through that, all subsequent rewards.
Example Consider teaching a dog a new trick: we cannot tell it what to do, but we can reward/punish it if
it does the right/wrong thing. It has to find out what it did that made it get the reward/punishment. We can
use a similar method to train computers to do many tasks, such as playing backgammon or chess,
scheduling jobs, and controlling robot limbs. Reinforcement learning is different from supervised
learning. Supervised learning is learning from examples provided by a knowledgeable expert.
Issues in Machine Learning : Our checkers example raises a number of generic questions about machine
learning. The field of machine learning is concerned with answering questions such as the following:
a. What algorithms exist for learning general target functions from specific training examples? In
what settings will particular algorithms converge to the desired function, given sufficient training
data? Which algorithms perform best for which types of problems and representations?
b. How much training data is sufficient? What general bounds can be found to relate the confidence
in learned hypotheses to the amount of training experience and the character of the learner's
hypothesis space?
c. When and how can prior knowledge held by the learner guide the process of generalizing from
examples? Can prior knowledge be helpful even when it is only approximately correct?
d. What is the best strategy for choosing a useful next training experience, and how does the choice
of this strategy alter the complexity of the learning problem?
e. What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn? Can this
process itself be automated?
f. How can the learner automatically alter its representation to improve its ability to represent and
learn the target function?
1960’s and 70’s: Models of human learning – High-level symbolic descriptions of knowledge, e.g.,
logical expressions or graphs/networks, e.g., – Winston’s (1975) structural learning system learned logic-
based structural descriptions from examples.
Minsky Papert 1969 ,1970’s: Genetic algorithms – Developed by Holland (1975)
1970’s - present: Knowledge-intensive learning – A tabula rasa approach typically fares poorly. “To
acquire new knowledge a system must already possess a great deal of initial knowledge.” Lenat’s CYC
project is a good example.
1970’s - present: Alternative modes of learning (besides examples) – Learning from instruction, e.g.,
(Mostow, 1983) (Gordon & Subramanian, 1993) – Learning by analogy, e.g., (Veloso, 1990) – Learning
from cases, e.g., (Aha, 1991) – Discovery (Lenat, 1977) – 1991: The first of a series of workshops on
Multistrategy Learning (Michalski)
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
1970’s – present: Meta-learning – Heuristics for focusing attention, e.g., (Gordon & Subramanian, 1996)
– Active selection of examples for learning, e.g., (Angluin, 1987), (Gasarch & Smith, 1988), (Gordon,
1991) – Learning how to learn, e.g., (Schmidhuber, 1996)
1980 – The First Machine Learning Workshop was held at Carnegie-Mellon University in Pittsburgh.
1980 – Three consecutive issues of the International Journal of Policy Analysis and Information Systems
were specially devoted to machine learning.
1981 - Hinton, Jordan, Sejnowski, Rumelhart, McLeland at UCSD – Back Propagation alg. PDP Book
1986 – The establishment of the Machine Learning journal.
1987 – The beginning of annual international conferences on machine learning (ICML). Snowbird ML
conference
1988 – The beginning of regular workshops on computational learning theory (COLT).
1990’s – Explosive growth in the field of data mining, which involves the application of machine learning
techniques. Bottom line from History
1960 – The Perceptron (Minsky Papert)
1960 – “Bellman Curse of Dimensionality”
1980 – Bounds on statistical estimators (C. Stone)
1990 – Beginning of high dimensional data (Hundreds variables)
2000 – High dimensional data (Thousands variables)
A Glimpse in to the future :
Today status: – First-generation algorithms: – Neural nets, decision trees, etc.
Future: – Smart remote controls, phones, cars – Data and communication networks, softwares
In this case this was a binary classification problem (a yes no type problem). There are two main types of
Decision Trees:
1. Classification trees (Yes/No types) What we have seen above is an example of classification tree,
where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical.
2. Regression trees (Continuous data types) Here the decision or the outcome variable is Continuous, e.g.
a number like 123. Working Now that we know what a Decision Tree is, we’ll see how it works
internally. There are many algorithms out there which construct Decision Trees, but one of the best is
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
called as ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3. Before discussing the ID3 algorithm,
we’ll go through few definitions.
Entropy also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the
amount of uncertainty or randomness in data.
Neural Networks (ANN - Artificial Neural Network) : The term "Artificial Neural Network" is derived
from Biological neural networks that develop the structure of a human brain. Similar to the human brain
that has neurons interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as nodes. The
given figure 2 illustrates the typical diagram of Biological Neural Network.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.
The typical Artificial Neural Network looks something like the given figure 3.
Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input x and computes the weighted sum of the inputs and includes a
bias b. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of task
we are performing.
Advantages of Artificial Neural Network (ANN)
1. Parallel processing capability: Artificial neural networks have a numerical value that can perform
more than one task simultaneously.
2. Storing data on the entire network: Data that is used in traditional programming is stored on the
whole network, not on a database. The disappearance of a couple of pieces of data in one place
doesn't prevent the network from working.
3. Capability to work with incomplete knowledge: After ANN training, the information may
produce output even with inadequate data. The loss of performance here relies upon the
significance of missing data.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
4. Having a memory distribution: For ANN is to be able to adapt, it is important to determine the
examples and to encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to the chosen
instances, and if the event can't appear to the network in all its aspects, it can produce false
output.
5. Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
1. Assurance of proper network structure: There is no particular guideline for determining the
structure of artificial neural networks. The appropriate network structure is accomplished through
experience, trial, and error.
2. Unrecognized behavior of the network: It is the most significant issue of ANN. When ANN
produces a testing solution, it does not provide insight concerning why and how. It decreases trust
in the network.
3. Hardware dependence: Artificial neural networks need processors with parallel processing
power, as per their structure. Therefore, the realization of the equipment is dependent.
4. Difficulty of showing the issue to the network: ANNs can work with numerical data. Problems
must be converted into numerical values before being introduced to ANN. The presentation
mechanism to be resolved here will directly impact the performance of the network. It relies on
the user's abilities.
5. The duration of the network is unknown: The network is reduced to a specific value of the error,
and this value does not give us optimum results. “Science artificial neural networks that have
steeped into the world in the mid-20th century are exponentially developing. In the present time,
we have investigated the pros of artificial neural networks and the issues encountered in the
course of their utilization. It should not be overlooked that the cons of ANN networks, which are
a flourishing science branch, are eliminated individually, and their pros are increasing day by day.
It means that artificial neural networks will turn into an irreplaceable part of our lives
progressively important.”
How do artificial neural networks work? Artificial Neural Network can be best represented as a
weighted directed graph, where the artificial neurons form the nodes. The association between the
neurons outputs and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a pattern and image
in the form of a vector. These inputs are then mathematically assigned by the notations x(n) for every
n number of inputs.
Types of Artificial Neural Network: There are various types of Artificial Neural Networks (ANN)
depending upon the human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example, segmentation
or classification.
Feedback ANN: In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN: A feed-forward network is a basic neural network comprising of an input layer,
an output layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated neurons,
and the output is decided. The primary advantage of this network is that it figures out how to evaluate
and recognize input patterns.
Example: Suppose we see a strange cat that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature. So as support
vector creates a decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it
will classify it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM :
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Hyperplane and Support Vectors in the SVM algorithm:
And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a
hyperplane that has a maximum margin, which means the maximum distance between the data points.
Support Vectors: The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
How does SVM works?
Linear SVM: The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (circle and diamond), and the dataset has two features x1 and x2.
We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider
the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes. Consider the below image:
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the hyperplane
is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Non-Linear SVM: If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as: z=x 2 +y2 By adding the third dimension,
the sample space will become as below
image:
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
So now, SVM will divide the datasets into classes in the following way. Consider the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2D
space with z=1, then it will become as:
Clustering :
Introduction to clustering
As the name suggests, unsupervised learning is a machine learning technique in which models are
not supervised using training dataset. Instead, models itself find the hidden patterns and insights from
the given data. It can be compared to learning which takes place in the human brain while learning
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
new things. It can be defined as: “Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed to act on that data without any
supervision.” Unsupervised learning cannot be directly applied to a regression or classification
problem because unlike supervised learning, we have the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it
does not have any idea about the features of the dataset. The task of the unsupervised learning
algorithm is to identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to similarities between
images.
Below are some main reasons which describe the importance of Unsupervised Learning:
i. Unsupervised learning is helpful for finding useful insights from the data. o Unsupervised
learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.
ii. Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
iii. In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs
are also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it.
Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc. Once it applies the suitable algorithm, the
algorithm divides the data objects into groups according to the similarities and difference between the
objects. Types of
Unsupervised Learning Algorithm: The unsupervised learning algorithm can be further categorized into
two types of problems:
Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group. Cluster
analysis finds the commonalities between the data objects and categorizes them as per the presence and
absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together in
the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.
Advantages of Unsupervised Learning :
I. Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
II. Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
Disadvantages of Unsupervised Learning :
I. Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
II. The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.
***********************