0% found this document useful (0 votes)
75 views

ML Unit 1-Notes

The document provides an overview of machine learning techniques including: 1. The definition of machine learning, supervised vs unsupervised learning, and examples of each. 2. The four components of the learning process: data storage, abstraction, generalization, and evaluation. 3. Various applications of machine learning in domains like retail, finance, manufacturing, medicine, web search, and more.

Uploaded by

Abhi Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

ML Unit 1-Notes

The document provides an overview of machine learning techniques including: 1. The definition of machine learning, supervised vs unsupervised learning, and examples of each. 2. The four components of the learning process: data storage, abstraction, generalization, and evaluation. 3. Various applications of machine learning in domains like retail, finance, manufacturing, medicine, web search, and more.

Uploaded by

Abhi Reddy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Greater Noida Institute of Technology

Department of CSE Session 2022-23


Unit-1 (KCS-055)
Machine Learning Techniques (KCS-055)
Syllabus Unit1 :
INTRODUCTION – Learning, Types of Learning, Well defined learning problems, Designing a Learning
System, History of ML, Introduction of Machine Learning Approaches – (Artificial Neural Network,
Clustering, Reinforcement Learning, Decision Tree Learning, Bayesian networks, Support Vector
Machine, Genetic Algorithm), Issues in Machine Learning and Data Science Vs Machine Learning;
Well posed problem:
Any problem can be segregated as well-posed learning problem if it has three traits –
Task, Performance Measure and Experience
Certain examples that efficiently defines the well-posed learning problem are –
To better filter emails as spam or not
Task – Classifying emails as spam or not
Performance Measure – The fraction of emails accurately classified as spam or not spam.
Experience – Observing you label emails as spam or not spam.
Handwriting Recognition Problem
Task – Acknowledging handwritten words within portrayal.
Performance Measure – percent of words accurately classified.
Experience – a directory of handwritten words with given classifications.

What is Machine Learning?

Machine learning is programming computers to optimize a performance criterion using example


data or past experience. We have a model defined up to some parameters, and learning is the
execution of a computer program to optimize the parameters of the model using the training data
or past experience. The model may be predictive to make predictions in the future, or descriptive
to gain knowledge from data, or both.

Two definitions of Machine Learning are offered. Arthur Samuel described it as: "the field of study that gives
computers the ability to learn without being explicitly programmed." This is an older, informal definition.

Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E
with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E."

Example I: playing checkers.


Task T = the task of playing checkers.
Performance P = the probability that the program will win the next game.
Experience E = the experience of playing many games of checkers
Example II. Handwriting recognition learning problem
Task T: Recognising and classifying handwritten words within images
Performance P: Percent of words correctly classified
Training experience E: A dataset of handwritten words with given classifications
Example III: A robot driving learning problem
Task T: Driving on highways using vision sensors
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Performance measure P: Average distance traveled before an error
Training experience: A sequence of images and steering commands recorded while
observing a human driver
Example IV: A chess learning problem
Task T: Playing chess
Performance measure P: Percent of games won against opponents
Training experience E: Playing practice games against itself Definition

In general, any machine learning problem can be assigned to one of two broad classifications:

Supervised learning and Unsupervised learning. Third is Reinforcement learning also.

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like,
having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a
regression problem, we are trying to predict results within a continuous output, meaning that we are trying
to map input variables to some continuous function. In a classification problem, we are instead trying to
predict results in a discrete output. In other words, we are trying to map input variables into discrete
categories.

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function
of size is a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the
house "sells for more or less than the asking price." Here we are classifying the houses based on price into
two discrete categories.

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or
benign.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)

Unsupervised Learning

Unsupervised learning allows us to approach problems with little or no idea what our results should look
like. We can derive structure from data where we don't necessarily know the effect of the variables.

We can derive this structure by clustering the data based on relationships among the variables in the data.

With unsupervised learning there is no feedback based on the prediction results.

Example:

Clustering: Take a collection of 1,000,000 different genes, and find a way to automatically group these
genes into groups that are somehow similar or related by different variables, such as lifespan, location,
roles, and so on.

Non-clustering: The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment.
(i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

Learning Process:

Basic components of learning process The learning process, whether by a human or a machine, can be
divided into four components, namely, data storage, abstraction, generalization and evaluation. Figure 1.1
illustrates the various components and the steps involved in the learning process.

Data storage : Facilities for storing and retrieving huge amounts of data are an important component of
the learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning. In a human being, the data is stored in the brain and data is retrieved using electrochemical
signals. Computers use hard disk drives, flash memory, random access memory and similar devices to
store data and use cables and other technology to retrieve data.

Abstraction: The second component of the learning process is known as abstraction. Abstraction is the
process of extracting knowledge about stored data. This involves creating general concepts about the data
as a whole. The creation of knowledge involves application of known models and creation of new models.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.

Generalization :The third component of the learning process is known as generalisation. The term
generalization describes the process of turning the knowledge about stored data into a form that can be
utilized for future action. These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to discover those properties of the data
that will be most relevant to future tasks.

Evaluation : Evaluation is the last component of the learning process. It is the process of giving feedback
to the user to measure the utility of the learned knowledge. This feedback is then utilised to effect
improvements in the whole learning process.

Applications of machine learning :

Application of machine learning methods to large databases is called data mining. In data mining, a large
volume of data is processed to construct a simple model with valuable use, for example, having high
predictive accuracy. The following is a list of some of the typical applications of machine learning.

1. In retail business, machine learning is used to study consumer behaviour.

2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection,
and the stock market.

3. In manufacturing, learning models are used for optimization, control, and troubleshooting.

4. In medicine, learning programs are used for medical diagnosis.

5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality
of service.

6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough
by computers. The World Wide Web is huge; it is constantly growing and searching for relevant
information cannot be done manually.

7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the system
designer need not foresee and provide solutions for all possible situations.

8. It is used to find solutions to many problems in vision, speech recognition, and robotics.

9. Machine learning methods are applied in the design of computer-controlled vehicles to steer correctly
when driving on a variety of roads.

10. Machine learning methods have been used to develop programmes for playing games such as chess,
backgammon and Go.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Learning Models :

Machine learning is concerned with using the right features to build the right models that achieve the right
tasks. The basic idea of Learning models has divided into three categories. For a given problem, the
collection of all possible outcomes represents the sample space or instance space.
Using a Logical expression. (Logical models)
Using the Geometry of the instance space. (Geometric models)
Using Probability to classify the instance space. (Probabilistic models)
Grouping and Grading

Logical models : Logical models use a logical expression to divide the instance space into segments and
hence construct grouping models. A logical expression is an expression that returns a Boolean value, i.e.,
a True or False outcome. Once the data is grouped using a logical expression, the data is divided into
homogeneous groupings for the problem we are trying to solve. For example, for a classification problem,
all the instances in the group belong to one class. There are mainly two kinds of logical models: Tree
models and Rule models. Rule models consist of a collection of implications or IF-THEN rules. For tree-
based models, the ‘if-part’ defines a segment and the ‘then-part’ defines the behaviour of the model for
this segment. Rule models follow the same reasoning.

Geometric models: features could be described as points in two dimensions (x- and y-axis) or a three-
dimensional space (x, y, and z). Even when features are not intrinsically geometric, they could be
modelled in a geometric manner (for example, temperature as a function of time can be modelled in two
axes). In geometric models, there are two ways we could impose similarity. We could use geometric
concepts like lines or planes to segment (classify) the instance space.
These are called Linear models. Alternatively, we can use the geometric notion of distance to represent
similarity. In this case, if two points are close together, they have similar values for features and thus can
be classed as similar. We call such models as Distance-based models.

Linear models: Linear models are relatively simple. In this case, the function is represented as a
linear combination of its inputs. Thus, if x1 and x2 are two scalars or vectors of the same
dimension and a and b are arbitrary scalars, then ax1 + bx2 represents a linear combination of x1
and x2. In the simplest case where f(x) represents a straight line, we have an equation of the form
f (x) = mx + c where c represents the intercept and m represents the slope.

Linear models are parametric, which means that they have a fixed form with a small number of numeric
parameters that need to be learned from data. For example, in f (x) = mx + c, m and c are the parameters
that we are trying to learn from the data. This technique is different from tree or rule models, where the
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
structure of the model (e.g., which features to use in the tree, and where) is not fixed in advance. Linear
models are stable, i.e., small variations in the training data have only a limited impact on the learned
model. In contrast, tree models tend to vary more with the training data, as the choice of a different split
at the root of the tree typically means that the rest of the tree is different as well. As a result of having
relatively few parameters, Linear models have low variance and high bias. This implies that Linear
models are less likely to overfit the training data than some other models. However, they are more likely
to underfit. For example, if we want to learn the boundaries between countries based on labelled data,
then linear models are not likely to give a good approximation.

Distance-based models: Distance-based models are the second class of Geometric models. Like Linear
models, distance based models are based on the geometry of data. As the name implies, distance-based
models work on the concept of distance. In the context of Machine learning, the concept of distance is not
based on merely the physical distance between two points. Instead, we could think of the distance
between two points considering the mode of transport between two points. Travelling between two cities
by plane covers less distance physically than by train because a plane is unrestricted. Similarly, in chess,
the concept of distance depends on the piece used – for example, a Bishop can move diagonally. Thus,
depending on the entity and the mode of travel, the concept of distance can be experienced differently.
The distance metrics commonly used are Euclidean, Minkowski, Manhattan, and Mahalanobis.
Distance is applied through the concept of neighbours and exemplars. Neighbours are points in proximity
with respect to the distance measure expressed through exemplars. Exemplars are either centroids that
find a centre of mass according to a chosen distance metric or medoids that find the most centrally located
data point. The most commonly used centroid is the arithmetic mean, which minimises squared Euclidean
distance to all other points.

The centroid represents the geometric centre of a plane figure, i.e., the arithmetic mean position of all the
points in the figure from the centroid point. This definition extends to any object in n-dimensional space:
its centroid is the mean position of all the points.

Medoids are similar in concept to means or centroids. Medoids are most commonly used on data when a
mean or centroid cannot be defined. They are used in contexts where the centroid is not representative of
the dataset, such as in image data.

Examples of distance-based models include the nearest-neighbour models, which use the training data as
exemplars – for example, in classification. The K-means clustering algorithm also uses exemplars to
create clusters of similar data points.
Probabilistic models The third family of machine learning algorithms is the probabilistic models. We
have seen before that the k-nearest neighbour algorithm uses the idea of distance (e.g., Euclidian distance)
to classify entities, and logical models use a logical expression to partition the instance space. The
probabilistic models use the idea of probability to classify new entities. Probabilistic models see features
and target variables as random variables. The process of modelling represents and manipulates the level
of uncertainty with respect to these variables.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
There are two types of probabilistic models: Predictive and Generative.
Predictive probability models use the idea of a conditional probability distribution P (Y |X) from which Y
can be predicted from X.
Generative models estimate the joint distribution P (Y, X). Once we know the joint distribution for the
generative models, we can derive any conditional or marginal distribution involving the same variables.
Thus, the generative model is capable of creating new data points and their labels, knowing the joint
probability distribution. The joint distribution looks for a relationship between two variables. Once this
relationship is inferred, it is possible to infer new data points. Naïve Bayes is an example of a
probabilistic classifier. We can do this using the Bayes rule defined as
𝑃(𝐷|𝐻) ∗ 𝑃(𝐻)
𝑃(𝐻|𝐷) =
𝑃(𝐷)
Where (𝐻|𝐷) = 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ; P(H) = Prior Probability; P(D)= Likelihood of event and
𝑃(𝐷|𝐻) = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑜𝑐𝑐𝑢𝑟𝑒𝑛𝑐𝑒 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡 𝐷 𝑔𝑖𝑣𝑒𝑛 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑖𝑠 𝐻.

The Naïve Bayes algorithm is based on the idea of Conditional Probability. Conditional probability is
based on finding the probability that something will happen, given that something else has already
happened. The task of the algorithm then is to look at the evidence and to determine the likelihood of a
specific class and assign a label accordingly to each entity.
Some broad categories of models:
Geometric models E.g. K-nearest neighbors, linear regression, support vector machine, logistic
regression,
Probabilistic models : Naïve Bayes, Gaussian process regression, conditional random field
Logical models: Decision tree, random forest.
Grading vs grouping is an orthogonal categorization to geometric-probabilistic-logical-compositional.
Grouping models break the instance space up into groups or segments and in each segment apply a very
simple method (such as majority class). E.g. decision tree, KNN. Grading models form one global model
over the instance space. E.g. Linear classifiers – Neural networks.
Design a Learning System in Machine Learning
When we fed the Training Data to Machine Learning Algorithm, this algorithm will produce a
mathematical model and with the help of the mathematical model, the machine will make a prediction and
take a decision without being explicitly programmed. Also, during training data, the more machine will
work with it the more it will get experience and the more it will get experience the more efficient result is
produced.

Example : In Driverless Car, the training data is fed to Algorithm like how to Drive Car in Highway,
Busy and Narrow Street with factors like speed limit, parking, stop at signal etc. After that, a Logical and
Mathematical model is created on the basis of that and after that, the car will work according to the
logical model. Also, the more data the data is fed the more efficient output is produced.
Designing a Learning System in Machine Learning :
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Steps for Designing Learning System are:

Step 1) Choosing the Training Experience: The very important and first task is to choose the training
data or training experience which will be fed to the Machine Learning Algorithm. It is important to note
that the data or experience that we fed to the algorithm must have a significant impact on the Success or
Failure of the Model. So Training data or experience should be chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
 The training experience will be able to provide direct or indirect feedback regarding choices. For
example: While Playing chess the training data will provide feedback to itself like instead of this move
if this is chosen the chances of success increases.
 Second important attribute is the degree to which the learner will control the sequences of training
examples. For example: when training data is fed to the machine then at that time accuracy is very less
but when it gains experience while playing again and again with itself or opponent the machine
algorithm will get feedback and control the chess game accordingly.
 Third important attribute is how it will represent the distribution of examples over which performance
will be measured. For example, a Machine learning algorithm will get experience while going through
a number of different cases and different examples. Thus, Machine Learning Algorithm will get more
and more experience by passing through more and more examples and hence its performance will
increase.
Step 2- Choosing target function: The next important step is choosing the target function. It means
according to the knowledge fed to the algorithm the machine learning will choose NextMove function
which will describe what type of legal moves should be taken. For example : While playing chess with
the opponent, when opponent will play then the machine learning algorithm will decide what be the
number of possible legal moves taken in order to get success.
Step 3- Choosing Representation for Target function: When the machine algorithm will know all the
possible legal moves the next step is to choose the optimized move using any representation i.e. using
linear Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove function will
move the Target move like out of these move which will provide more success rate. For Example : while
playing chess machine have 4 possible moves, so the machine will choose that optimized move which
will provide success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move cannot be chosen just with
the training data. The training data had to go through with set of example and through these examples the
training data will approximates which steps are chosen and after that machine will provide feedback on it.
For Example : When a training data of Playing chess is fed to algorithm so at that time it is not machine
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
algorithm will fail or get success and again from that failure or success it will measure while next move
what step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes from number of examples ,
failures and success , correct and incorrect decision and what will be the next step etc. Example:
DeepBlue is an intelligent computer which is ML-based won chess game against the chess expert Garry
Kasparov, and it became the first computer which had beaten a human chess expert.

Types of Learning In general, machine learning algorithms can be classified into three types: Supervised
learning, Unsupervised learning and Reinforcement learning.
Supervised learning: A training set of examples with the correct responses (targets) is provided and,
based on this training set, the algorithm generalises to respond correctly to all possible inputs. This is also
called learning from exemplars. Supervised learning is the machine learning task of learning a function
that maps an input to an output based on example input-output pairs. In supervised learning, each example
in the training set is a pair consisting of an input object (typically a vector) and an output value. A
supervised learning algorithm analyzes the training data and produces a function, which can be used for
mapping new examples. In the optimal case, the function will correctly determine the class labels for
unseen instances. Both classification and regression problems are supervised learning problems. A wide
range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no
single learning algorithm that works best on all supervised learning problems.
A “supervised learning” is so called because the process of an algorithm learning from the training dataset
can be thought of as a teacher supervising the learning process. We know the correct answers (that is, the
correct outputs), the algorithm iteratively makes predictions on the training data and is corrected by the
teacher. Learning stops when the algorithm achieves an acceptable level of performance. Example
Consider the following data regarding patients entering a clinic. The data consists of the gender and age
of the patients and each patient is labeled as “healthy” or “sick”.
Gender Age Label
M 40 Sick
F 37 Healthy
M 74 Sick
M 23 Healthy
F 39 Sick
F 44 Healthy
M 56 Healthy
Unsupervised learning : Correct responses are not provided, but instead the algorithm tries to identify
similarities between the inputs so that inputs that have something in common are categorised together.
The statistical approach to unsupervised learning is known as density estimation. Unsupervised learning
is a type of machine learning algorithm used to draw inferences from datasets consisting of input data
without labeled responses. In unsupervised learning algorithms, a classification or categorization is not
included in the observations. There are no output values and so there is no estimation of functions. Since
the examples given to the learner are unlabeled, the accuracy of the structure that is output by the
algorithm cannot be evaluated. The most common unsupervised learning method is cluster analysis,
which is used for exploratory data analysis to find hidden patterns or grouping in data.
Example : Consider the following data regarding patients entering a clinic. The data consists of the gender
and age of the patients.
Gender Age
F 61
M 37
M 74
F 32
F 24
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
M 44
F 56
Based on this data, can we infer anything regarding the patients entering the clinic?
Reinforcement learning: This is somewhere between supervised and unsupervised learning. The
algorithm gets told when the answer is wrong, but does not get told how to correct it. It has to explore and
try out different possibilities until it works out how to get the answer right. Reinforcement learning is
sometime called learning with a critic because of this monitor that scores the answer, but does not suggest
improvements. Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards. A learner (the program) is not told what actions to take as in most forms of
machine learning, but instead must discover which actions yield the most reward by trying them. In the
most interesting and challenging cases, actions may affect not only the immediate reward but also the
next situations and, through that, all subsequent rewards.
Example Consider teaching a dog a new trick: we cannot tell it what to do, but we can reward/punish it if
it does the right/wrong thing. It has to find out what it did that made it get the reward/punishment. We can
use a similar method to train computers to do many tasks, such as playing backgammon or chess,
scheduling jobs, and controlling robot limbs. Reinforcement learning is different from supervised
learning. Supervised learning is learning from examples provided by a knowledgeable expert.

Issues in Machine Learning : Our checkers example raises a number of generic questions about machine
learning. The field of machine learning is concerned with answering questions such as the following:
a. What algorithms exist for learning general target functions from specific training examples? In
what settings will particular algorithms converge to the desired function, given sufficient training
data? Which algorithms perform best for which types of problems and representations?
b. How much training data is sufficient? What general bounds can be found to relate the confidence
in learned hypotheses to the amount of training experience and the character of the learner's
hypothesis space?
c. When and how can prior knowledge held by the learner guide the process of generalizing from
examples? Can prior knowledge be helpful even when it is only approximately correct?
d. What is the best strategy for choosing a useful next training experience, and how does the choice
of this strategy alter the complexity of the learning problem?
e. What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn? Can this
process itself be automated?
f. How can the learner automatically alter its representation to improve its ability to represent and
learn the target function?

History of Machine Learning

1960’s and 70’s: Models of human learning – High-level symbolic descriptions of knowledge, e.g.,
logical expressions or graphs/networks, e.g., – Winston’s (1975) structural learning system learned logic-
based structural descriptions from examples.
Minsky Papert 1969 ,1970’s: Genetic algorithms – Developed by Holland (1975)
1970’s - present: Knowledge-intensive learning – A tabula rasa approach typically fares poorly. “To
acquire new knowledge a system must already possess a great deal of initial knowledge.” Lenat’s CYC
project is a good example.
1970’s - present: Alternative modes of learning (besides examples) – Learning from instruction, e.g.,
(Mostow, 1983) (Gordon & Subramanian, 1993) – Learning by analogy, e.g., (Veloso, 1990) – Learning
from cases, e.g., (Aha, 1991) – Discovery (Lenat, 1977) – 1991: The first of a series of workshops on
Multistrategy Learning (Michalski)
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
1970’s – present: Meta-learning – Heuristics for focusing attention, e.g., (Gordon & Subramanian, 1996)
– Active selection of examples for learning, e.g., (Angluin, 1987), (Gasarch & Smith, 1988), (Gordon,
1991) – Learning how to learn, e.g., (Schmidhuber, 1996)
1980 – The First Machine Learning Workshop was held at Carnegie-Mellon University in Pittsburgh.
1980 – Three consecutive issues of the International Journal of Policy Analysis and Information Systems
were specially devoted to machine learning.
1981 - Hinton, Jordan, Sejnowski, Rumelhart, McLeland at UCSD – Back Propagation alg. PDP Book
1986 – The establishment of the Machine Learning journal.
1987 – The beginning of annual international conferences on machine learning (ICML). Snowbird ML
conference
1988 – The beginning of regular workshops on computational learning theory (COLT).
1990’s – Explosive growth in the field of data mining, which involves the application of machine learning
techniques. Bottom line from History
1960 – The Perceptron (Minsky Papert)
1960 – “Bellman Curse of Dimensionality”
1980 – Bounds on statistical estimators (C. Stone)
1990 – Beginning of high dimensional data (Hundreds variables)
2000 – High dimensional data (Thousands variables)
A Glimpse in to the future :
Today status: – First-generation algorithms: – Neural nets, decision trees, etc.
Future: – Smart remote controls, phones, cars – Data and communication networks, softwares

Approaches of Machine Learning:


Decision Tree: Decision Trees are a type of Supervised Machine Learning (that is you explain what the
input is and what the corresponding output is in the training data) where the data is continuously split
according to a certain parameter. The tree can be explained by two entities, namely decision nodes and
leaves. The leaves are the decisions or the final outcomes. And the decision nodes are where the data is
split.
An example of a decision tree can be explained using above binary tree. Let’s say you want to predict
whether a person is fit given their information like age, eating habit, and physical activity, etc. The
decision nodes here are questions like ‘What’s the age?’, ‘Does he exercise?’, and ‘Does he eat a lot of
pizzas’? And the leaves, which are outcomes like either ‘fit’, or ‘unfit’.

In this case this was a binary classification problem (a yes no type problem). There are two main types of
Decision Trees:
1. Classification trees (Yes/No types) What we have seen above is an example of classification tree,
where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical.
2. Regression trees (Continuous data types) Here the decision or the outcome variable is Continuous, e.g.
a number like 123. Working Now that we know what a Decision Tree is, we’ll see how it works
internally. There are many algorithms out there which construct Decision Trees, but one of the best is
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
called as ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3. Before discussing the ID3 algorithm,
we’ll go through few definitions.
Entropy also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the
amount of uncertainty or randomness in data.
Neural Networks (ANN - Artificial Neural Network) : The term "Artificial Neural Network" is derived
from Biological neural networks that develop the structure of a human brain. Similar to the human brain
that has neurons interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as nodes. The
given figure 2 illustrates the typical diagram of Biological Neural Network.

Figure 2: Biological neural network

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.
The typical Artificial Neural Network looks something like the given figure 3.

Figure 3 : Artificial neural network

Relationship between Biological neural network and artificial neural network:

Biological neural network Artificial neural network


Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the network
of neurons makes up a human brain so that computers will have an option to understand things and make
decisions in a human-like manner. The artificial neural network is designed by programming computers
to behave simply like interconnected brain cells. There are around 1000 billion neurons in the human
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
brain. Each neuron has an association point somewhere in the range of 1,000 and 100,000. In the human
brain, data is stored in such a manner as to be distributed, and we can extract more than one piece of this
data when necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors. We can understand the artificial neural network with an example,
consider an example of a digital logic gate that takes an input and gives an output. "OR" gate, which takes
two inputs. If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform the same
task. The outputs to inputs relationship keep changing because of the neurons in our brain, which are
"learning."

The architecture of an artificial neural network:

Input Layer Hidden layers Output Layer

Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input x and computes the weighted sum of the inputs and includes a
bias b. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of task
we are performing.
Advantages of Artificial Neural Network (ANN)
1. Parallel processing capability: Artificial neural networks have a numerical value that can perform
more than one task simultaneously.
2. Storing data on the entire network: Data that is used in traditional programming is stored on the
whole network, not on a database. The disappearance of a couple of pieces of data in one place
doesn't prevent the network from working.
3. Capability to work with incomplete knowledge: After ANN training, the information may
produce output even with inadequate data. The loss of performance here relies upon the
significance of missing data.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
4. Having a memory distribution: For ANN is to be able to adapt, it is important to determine the
examples and to encourage the network according to the desired output by demonstrating these
examples to the network. The succession of the network is directly proportional to the chosen
instances, and if the event can't appear to the network in all its aspects, it can produce false
output.
5. Having fault tolerance: Extortion of one or more cells of ANN does not prohibit it from
generating output, and this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:
1. Assurance of proper network structure: There is no particular guideline for determining the
structure of artificial neural networks. The appropriate network structure is accomplished through
experience, trial, and error.
2. Unrecognized behavior of the network: It is the most significant issue of ANN. When ANN
produces a testing solution, it does not provide insight concerning why and how. It decreases trust
in the network.
3. Hardware dependence: Artificial neural networks need processors with parallel processing
power, as per their structure. Therefore, the realization of the equipment is dependent.
4. Difficulty of showing the issue to the network: ANNs can work with numerical data. Problems
must be converted into numerical values before being introduced to ANN. The presentation
mechanism to be resolved here will directly impact the performance of the network. It relies on
the user's abilities.
5. The duration of the network is unknown: The network is reduced to a specific value of the error,
and this value does not give us optimum results. “Science artificial neural networks that have
steeped into the world in the mid-20th century are exponentially developing. In the present time,
we have investigated the pros of artificial neural networks and the issues encountered in the
course of their utilization. It should not be overlooked that the cons of ANN networks, which are
a flourishing science branch, are eliminated individually, and their pros are increasing day by day.
It means that artificial neural networks will turn into an irreplaceable part of our lives
progressively important.”
How do artificial neural networks work? Artificial Neural Network can be best represented as a
weighted directed graph, where the artificial neurons form the nodes. The association between the
neurons outputs and neuron inputs can be viewed as the directed edges with weights. The Artificial
Neural Network receives the input signal from the external source in the form of a pattern and image
in the form of a vector. These inputs are then mathematically assigned by the notations x(n) for every
n number of inputs.
Types of Artificial Neural Network: There are various types of Artificial Neural Networks (ANN)
depending upon the human brain neuron and network functions, an artificial neural network similarly
performs tasks. The majority of the artificial neural networks will have some similarities with a more
complex biological partner and are very effective at their expected tasks. For example, segmentation
or classification.
Feedback ANN: In this type of ANN, the output returns into the network to accomplish the best-
evolved results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to solve
optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN: A feed-forward network is a basic neural network comprising of an input layer,
an output layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated neurons,
and the output is decided. The primary advantage of this network is that it figures out how to evaluate
and recognize input patterns.

Support Vector Machines


Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning. The goal of the SVM algorithm is to create the best line
or decision boundary that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary is called a
hyperplane. SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are classified
using a decision boundary or hyperplane:

Example: Suppose we see a strange cat that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature. So as support
vector creates a decision boundary between these two data (cat and dog) and choose extreme cases
(support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it
will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM :
SVM can be of two types:
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n


dimensional space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM. The dimensions of the hyperplane
depend on the features present in the dataset, which means if there are 2 features (as shown in image),
then hyperplane will be a straight line.

And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a
hyperplane that has a maximum margin, which means the maximum distance between the data points.

Support Vectors: The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
How does SVM works?
Linear SVM: The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (circle and diamond), and the dataset has two features x1 and x2.
We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider
the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes. Consider the below image:
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the hyperplane
is called as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Non-Linear SVM: If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below image:

So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as: z=x 2 +y2 By adding the third dimension,
the sample space will become as below
image:
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)

So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2D
space with z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.


SVM Kernels In practice, SVM algorithm is implemented with kernel that transforms an input data
space into 53 the required form. SVM uses a technique called the kernel trick in which kernel takes a
low dimensional input space and transforms it into a higher dimensional space. In simple words,
kernel converts nonseparable problems into separable problems by adding more dimensions to it. It
makes SVM more powerful, flexible and accurate. The following are some of the types of kernels
used by SVM.
Linear Kernel It can be used as a dot product between any two observations. The formula of linear
kernel is as below :
K(𝑥, 𝑥𝑖)=sum(𝑥 ∗ 𝑥𝑖)
From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is the sum of the
multiplication of each pair of input values.

Clustering :
Introduction to clustering
As the name suggests, unsupervised learning is a machine learning technique in which models are
not supervised using training dataset. Instead, models itself find the hidden patterns and insights from
the given data. It can be compared to learning which takes place in the human brain while learning
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
new things. It can be defined as: “Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed to act on that data without any
supervision.” Unsupervised learning cannot be directly applied to a regression or classification
problem because unlike supervised learning, we have the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.

Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of
different types of cats and dogs. The algorithm is never trained upon the given dataset, which means it
does not have any idea about the features of the dataset. The task of the unsupervised learning
algorithm is to identify the image features on their own. Unsupervised learning algorithm will
perform this task by clustering the image dataset into the groups according to similarities between
images.

Below are some main reasons which describe the importance of Unsupervised Learning:
i. Unsupervised learning is helpful for finding useful insights from the data. o Unsupervised
learning is much similar as a human learns to think by their own experiences, which makes it
closer to the real AI.
ii. Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
iii. In real-world, we do not always have input data with the corresponding output so to solve such
cases, we need unsupervised learning.

Working of unsupervised learning can be understood by the below diagram:


Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)

Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs
are also not given. Now, this unlabeled input data is fed to the machine learning model in order to train it.
Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable
algorithms such as k-means clustering, Decision tree, etc. Once it applies the suitable algorithm, the
algorithm divides the data objects into groups according to the similarities and difference between the
objects. Types of

Unsupervised Learning Algorithm: The unsupervised learning algorithm can be further categorized into
two types of problems:

Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group. Cluster
analysis finds the commonalities between the data objects and categorizes them as per the presence and
absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together in
the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.
Advantages of Unsupervised Learning :
I. Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
II. Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
Disadvantages of Unsupervised Learning :
I. Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
Greater Noida Institute of Technology
Department of CSE Session 2022-23
Unit-1 (KCS-055)
II. The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.

Genetic Algorithm Learning:


Genetic algorithms (GAS) provide a learning method motivated by an analogy to biological evolution.
Rather than search from general-to-specific hypotheses, or from simple-to-complex, GAS generate
successor hypotheses by repeatedly mutating and recombining parts of the best currently known
hypotheses. At each step, a collection of hypotheses called the current population is updated by replacing
some fraction of the population by offspring of the most fit current hypotheses. The process forms a
generate-and-test beam-search of hypotheses, in which variants of the best current hypotheses are most
likely to be considered next. The popularity of GAS is motivated by a number of factors including:
Evolution is known to be a successful, robust method for adaptation within biological systems. GAS can
search spaces of hypotheses containing complex interacting parts, where the impact of each part on
overall hypothesis fitness may be difficult to model. Genetic algorithms are easily parallelized and can
take advantage of the decreasing costs of powerful computer hardware.
The problem addressed by GAS is to search a space of candidate hypotheses to identify the best
hypothesis. In GAS the "best hypothesis" is defined as the one that optimizes a predefined numerical
measure for the problem at hand, called b the hypothesis fitness. For example, if the learning task is the
problem of approximating an unknown function given training examples of its input and output, then
fitness could be defined as the accuracy of the hypothesis over this training data. If the task is to learn a
strategy for playing chess, fitness could be defined as the number of games won by the individual when
playing against other individuals in the current population. Although different implementations of genetic
algorithms vary in their details, they typically share the following structure: The algorithm operates by
iteratively updating a pool of hypotheses, called the population. On each iteration, all members of the
population are evaluated according to the fitness function. A new population is then generated by
probabilistically selecting the fit individuals from the current population. Some of these selected
individuals are carried forward into the next generation population intact. Others are used as the basis for
creating new offspring individuals by applying genetic operations such as crossover and mutation.

Data Science vs. Machine Learning:


Data science is a concept used to tackle big data and includes data cleansing, preparation, and analysis. A
data scientist gathers data from multiple sources and applies machine learning, predictive analytics, and
sentiment analysis to extract critical information from the collected data sets. They understand data from a
business point of view and can provide accurate predictions and insights that can be used to power critical
business decisions. Data science is an umbrella term that encompasses data analytics, data mining,
machine learning, and several other related disciplines.
Machine learning can be defined as the practice of using algorithms to extract data, learn from it, and then
forecast future trends for that topic. Traditional machine learning software is statistical analysis and
predictive analysis that is used to spot patterns and catch hidden insights based on perceived data.

A good example of machine learning implementation is Facebook. Facebook’s machine learning


algorithms gather behavioral information for every user on the social platform. Based on one’s past
behavior, the algorithm predicts interests and recommends articles and notifications on the news feed.
Similarly, when Amazon recommends products, or when Netflix recommends movies based on past
behaviors, machine learning is at work.

***********************

You might also like