Ai Module-5note
Ai Module-5note
in for notes
***************************************************************************
SYLLABUS- MODULE 5 (Machine Learning)
Learning from Examples –Forms of Learning, Supervised Learning, Learning Decision Trees,
Evaluating and choosing the best hypothesis, Regression and classification with Linear models.
***************************************************************************
Learning From Example
Agents that can improve their behavior through diligent study of their own
experiences
An agent is learning if it improves its performance on future tasks after making observations
about the world. Machine Learning is defined as a technology that is used to train machines to
perform various actions such as predictions, recommendations, estimations, etc., based on
historical data or past experience.
Machine Learning enables computers to behave like human beings by training them with the
help of past experience and predicted data.
There are three key aspects of Machine Learning, which are as follows:
1. Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and estimations,
etc.
TRACE KTU
2. Experience: It is defined as learning from historical or past data and used to estimate
and resolve future tasks.
3. Performance: It is defined as the capacity of any machine to resolve any machine
learning task or problem and provide the best outcome for the same. However,
performance is dependent on the type of machine learning problems.
Techniques in Machine Learning
Machine Learning techniques are divided mainly into the following 4 categories:
1. Supervised Learning
Supervised learning is applicable when a machine has sample data, i.e., input as well as output
data with correct labels. Correct labels are used to check the correctness of the model using
some labels and tags. Supervised learning technique helps us to predict future events with the
help of past experience and labeled examples. Initially, it analyses the known training dataset,
and later it introduces an inferred function that makes predictions about output values. Further,
it also predicts errors during this entire learning process and also corrects those errors through
algorithms.
Example: Let's assume we have a set of images tagged as ''dog''. A machine learning algorithm
is trained with these dog images so it can easily distinguish whether an image is a dog or not.
2. Unsupervised Learning
In unsupervised learning, a machine is trained with some input samples or labels only, while
output is not known. The training information is neither classified nor labeled; hence, a
machine may not always provide correct output compared to supervised learning.
Example: Let's assume a machine is trained with some set of documents having different
categories (Type A, B, and C), and we have to organize them into appropriate groups. Because
the machine is provided only with input samples or without output, so, it can organize these
datasets into type A, type B, and type C categories, but it is not necessary whether it is
organized correctly or not.
3. Reinforcement Learning
Reinforcement Learning is a feedback-based machine learning technique. In such type of
TRACE KTU
learning, agents (computer programs) need to explore the environment, perform actions, and
on the basis of their actions, they get rewards as feedback. For each good action, they get a
positive reward, and for each bad action, they get a negative reward. The goal of a
Reinforcement learning agent is to maximize the positive rewards. Since there is no labeled
data, the agent is bound to learn by its experience only.
4. Semi-supervised Learning
Semi-supervised Learning is an intermediate technique of both supervised and unsupervised
learning. It performs actions on datasets having few labels as well as unlabeled data. However,
it generally contains unlabeled data. Hence, it also reduces the cost of the machine learning
model as labels are costly, but for corporate purposes, it may have few labels. Further, it also
increases the accuracy and performance of the machine learning model.
Sem-supervised learning helps data scientists to overcome the drawback of supervised and
unsupervised learning. Speech analysis, web content classification, protein sequence
classification, text documents classifiers., etc., are some important applications of Semi-
supervised learning.
Machine Learning is widely being used in approximately every sector, including healthcare,
marketing, finance, infrastructure, automation, etc. There are some important real-world
examples of machine learning, which are as follows:
TRACE KTU
These self-learning neural networks help specialists for providing quality treatment by
analyzing external data on a patient's condition, X-rays, CT scans, various tests, and screenings.
Other than treatment, machine learning is also helpful for cases like automatic billing, clinical
decision supports, and development of clinical care guidelines, etc.
Marketing:
Machine learning helps marketers to create various hypotheses, testing, evaluation, and analyze
datasets. It helps us to quickly make predictions based on the concept of big data. It is also
helpful for stock marketing as most of the trading is done through bots and based on
calculations from machine learning algorithms. Various Deep Learning Neural network helps
to build trading models such as Convolutional Neural Network, Recurrent Neural Network,
Long-short term memory, etc.
Self-driving cars:
This is one of the most exciting applications of machine learning in today's world. It plays a
vital role in developing self-driving cars. Various automobile companies like Tesla, Tata, etc.,
are continuously working for the development of self-driving cars. It also becomes possible by
the machine learning method (supervised learning), in which a machine is trained to detect
people and objects while driving.
Speech Recognition:
Speech Recognition is one of the most popular applications of machine learning. Nowadays,
almost every mobile application comes with a voice search facility. This ''Search By Voice''
facility is also a part of speech recognition. In this method, voice instructions are converted
into text, which is known as Speech to text" or "Computer speech recognition.
Google assistant, SIRI, Alexa, Cortana, etc., are some famous applications of speech
recognition.
Traffic Prediction:
Machine Learning also helps us to find the shortest route to reach our destination by using
Google Maps. It also helps us in predicting traffic conditions, whether it is cleared or congested,
through the real-time location of the Google Maps app and sensor.
Image Recognition:
Image recognition is also an important application of machine learning for identifying objects,
persons, places, etc. Face detection and auto friend tagging suggestion is the most famous
application of image recognition used by Facebook, Instagram, etc. Whenever we upload
photos with our Facebook friends, it automatically suggests their names through image
recognition technology.
TRACE KTU
Product Recommendations:
Machine Learning is widely used in business industries for the marketing of various products.
Almost all big and small companies like Amazon, Alibaba, Walmart, Netflix, etc., are using
machine learning techniques for products recommendation to their users. Whenever we search
for any products on their websites, we automatically get started with lots of advertisements for
similar products. This is also possible by Machine Learning algorithms that learn users'
interests and, based on past data, suggest products to the user.
Automatic Translation:
Automatic language translation is also one of the most significant applications of machine
learning that is based on sequence algorithms by translating text of one language into other
desirable languages. Google GNMT (Google Neural Machine Translation) provides this
feature, which is Neural Machine Learning. Further, you can also translate the selected text on
images as well as complete documents through Google Lens.
Virtual Assistant:
A virtual personal assistant is also one of the most popular applications of machine learning.
First, it records out voice and sends to cloud-based server then decode it with the help of
machine learning algorithms. All big companies like Amazon, Google, etc., are using these
features for playing music, calling someone, opening an app and searching data on the internet,
etc.
Supervised learning
Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
TRACE KTU
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc. Spam Filtering,
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
Advantages of Supervised learning:
With the help of supervised learning, the model can predict the output on the basis of prior
experiences. In supervised learning, we can have an exact idea about the classes of objects.
Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
Disadvantages of supervised learning:
Supervised learning models are not suitable for handling the complex tasks. Supervised
TRACE KTU
learning cannot predict the correct output if the test data is different from the training dataset.
Training required lots of computation times. In supervised learning, we need enough
knowledge about the classes of object.
Unsupervised Learning
As the name suggests, unsupervised learning is a machine learning technique in which models
are not supervised using training dataset. Instead, models itself find the hidden patterns and
insights from the given data. It can be compared to learning which takes place in the human
brain while learning new things. It can be defined as:
Unsupervised learning cannot be directly applied to a regression or classification problem
because unlike supervised learning, we have the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given
dataset, which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups
according to similarities between images.
TRACE KTU
• Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the
objects of another group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of those commonalities.
• Association: An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database. It determines the set
of items that occurs together in the dataset. Association rule makes marketing strategy
more effective. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.
Unsupervised Learning algorithms:
TRACE KTU
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to
maximize reward in a particular situation. It is employed by various software and machines to
find the best possible behavior or path it should take in a specific situation. Reinforcement
learning differs from supervised learning in a way that in supervised learning the training data
has the answer key with it so the model is trained with the correct answer itself whereas in
reinforcement learning, there is no answer but the reinforcement agent decides what to do to
perform the given task. In the absence of a training dataset, it is bound to learn from its
experience.
Example: The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
Semi-supervised learning
In semi-supervised learning we are given a few labeled examples and must make what we can
of a large collection of unlabeled examples. Even the labels themselves may not be the oracular
truths that we hope for. Imagine that you are trying to build a system to guess a person’s age
from a photo. You gather some labeled examples by snapping pictures of people and asking
their age. That’s supervised learning. But in reality some of the people lied about their age. It’s
not just that there is random noise in the data; rather the inaccuracies are systematic, and to
uncover them is an unsupervised learning problem involving images, self-reported ages, and
true (unknown) ages. Thus, both noise and lack of labels create a continuum between
supervised and unsupervised learning.
TRACE KTU
Fitting a function of a single variable to some data points. The examples are points in the (x,
y) plane, where y = f(x). We don’t know what f is, but we will approximate it with a function
h selected from a hypothesis space, H, which for this example we will take to be the set of
polynomials, such as x5+3x2+2
Fig (a) shows some data with an exact fit by a straight line. The line is called a consistent
hypothesis because it agrees with all the data.
Figure (b) shows a high degree polynomial that is also consistent with the same data. This
illustrates a fundamental problem in inductive learning: how do we choose from among
multiple consistent hypotheses? One answer is to prefer the simplest hypothesis consistent with
the data.
This principle is called Ockham’s razor, after the 14th-century English philosopher William
of Ockham, who used it to argue sharply against all sorts of complications. Defining simplicity
is not easy, but it seems clear that a degree-1 polynomial is simpler than a degree-7 polynomial,
and thus (a) should be preferred to (b).
Figure (c) shows a second data set. There is no consistent straight line for this data set; in fact,
it requires a degree-6 polynomial for an exact fit. There are just 7 data points, so a polynomial
with 7 parameters does not seem to be finding any pattern in the data and we do not expect it
to generalize well. A straight line that is not consistent with any of the data points, but might
generalize fairly well for unseen values of x, is also shown in (c). In general, there is a tradeoff
between complex hypotheses that fit the training data well and simpler hypotheses that may
generalize better.
Figure (d) we expand the hypothesis space H to allow polynomials over both x and sin(x), and
find that the data in (c) can be fitted exactly by a simple function of the form ax + b + c sin(x).
This shows the importance of the choice of hypothesis space. We say that a learning problem
is realizable if the hypothesis space contains the true function. Unfortunately, we cannot
always tell whether a given learning problem is realizable, because the true function is not
known.
Supervised learning can be done by choosing the hypothesis h∗ that is most probable given the
data:
TRACE KTU
Then we can say that the prior probability P(h) is high for a degree-1 or -2 polynomial, lower
for a degree-7 polynomial, and especially low for degree-7 polynomials with large, sharp spikes
as in Figure 18.1(b). We allow unusual-looking functions when the data say we really need
them, but we discourage them by giving them a low prior probability.
Regression vs. Classification in Machine Learning
Regression and Classification algorithms are Supervised Learning algorithms. Both the
algorithms are used for prediction in Machine learning and work with the labeled datasets. But
the difference between both is how they are used for different machine learning problems.
The main difference between Regression and Classification algorithms that Regression
algorithms are used to predict the continuous values such as price, salary, age, etc. and
Classification algorithms are used to predict/Classify the discrete values such as Male or
Female, True or False, Spam or Not Spam, etc.
Classification:
Classification is a process of finding a function which helps in dividing the dataset into classes
based on different parameters. In Classification, a computer program is trained on the training
dataset and based on that training, it categorizes the data into different classes.
The task of the classification algorithm is to find the mapping function to map the input(x) to
the discrete output(y).
TRACE KTU
Example: The best example to understand the Classification problem is Email Spam Detection.
The model is trained on the basis of millions of emails on different parameters, and whenever
it receives a new email, it identifies whether the email is spam or not. If the email is spam, then
it is moved to the Spam folder.
Regression:
TRACE KTU
value of the dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables.
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
TRACE KTU
Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:
Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on X-
axis, then such a relationship is termed as a Positive linear relationship.
TRACE KTU
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Gradient Descent
Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.
A regression model uses gradient descent to update the coefficients of the line by reducing the
cost function. It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.
Model Performance
The Goodness of fit determines how the line of regression fits the set of observations. The
process of finding the best model out of various models is called optimization. It can be
achieved by R-squared method. R-squared is a statistical method that determines the goodness
of fit. It measures the strength of the relationship between the dependent and independent
variables on a scale of 0-100%. The high value of R-square determines the less difference
between the predicted values and actual values and hence represents a good model. It is also
called a coefficient of determination, or coefficient of multiple determination for multiple
regression. It can be calculated from the below formula:
TRACE KTU
Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression.
The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous or
categorical values.
Simple Linear regression algorithm has mainly two objectives:
• Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
• Forecasting new observations. Such as Weather forecasting according to temperature,
Revenue of a company according to the investments in a year, etc.
Simple Linear Regression Model:
The Simple Linear Regression model can be represented using the below equation:
y= a0+a1x+ ε
Where,
a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.
For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
Each feature variable must model the linear relationship with the dependent variable.
Multivariate Regression
Multivariate regression allows one to have a different view of the relationship between various
variables from all the possible angles. It helps you to predict the behaviour of the response
variables depending on how the predictor variables move. The multivariate regression method
helps you find a relationship between multiple variables or features.
TRACE KTU
of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the actual
values.
Variance: If the machine learning model performs well with the training dataset, but does not
perform well with the test dataset, then variance occurs.
Generalization: It shows how well a model is trained to predict unseen data
Overfitting & underfitting are the two main errors/problems in the machine learning model,
which cause poor performance in Machine Learning. Overfitting occurs when the model fits
more data than required, and it tries to capture each and every datapoint fed to it. Hence it starts
capturing noise and inaccurate data from the dataset, which degrades the performance of the
model. An overfitted model doesn't perform accurately with the test/unseen dataset and can’t
generalize well. An overfitted model is said to have low bias and high variance.
For example suppose there are three students, X, Y, and Z, and all three are preparing for an
exam.
• X has studied only three sections of the book and left all other sections.
• Y has a good memory, hence memorized the whole book.
• And the third student, Z, has studied and practiced all the questions.
So, in the exam, X will only be able to solve the questions if the exam has questions related to
section 3.
Student Y will only be able to solve questions if they appear exactly the same as given in the
book.
Student Z will be able to solve all the exam questions in a proper way.
The same happens with machine learning;
• if the algorithm learns from a small part of the data, it is unable to capture the required
data points and hence under fitted.
• Suppose the model learns the training dataset, like the Y student. They perform very
well on the seen dataset but perform badly on unseen data or unknown instances. In
such cases, the model is said to be Overfitting.
•
TRACE KTU
And if the model performs well with the training dataset and also with the test/unseen
dataset, similar to student Z, it is said to be a good fit.
How to detect Overfitting?
Overfitting in the model can only be detected once you test the data. To detect the issue, we
can perform Train/test split. In the train-test split of the dataset, we can divide our dataset into
random test and training datasets. We train the model with a training dataset which is about
80% of the total dataset. After training the model, we test it with the test dataset, which is 20
% of the total dataset.
Now, if the model performs well with the training dataset but not with the test dataset, then it
is likely to have an overfitting issue. For example, if the model shows 85% accuracy with
training data and 50% accuracy with the test dataset, it means the model is not performing well.
TRACE KTU
3. Feature Selection
we identify the most important features within training data, and other features are
removed.
this process helps to simplify the model and reduces noise from the data.
4. Cross-Validation
divided the dataset into k-equal-sized subsets of data; these subsets are known as folds.
5. Data Augmentation
adding more data to prevent overfitting,
slightly modified copies of already existing data are added to the dataset.
Ensemble Methods
prediction from different machine learning models is combined to identify the most popular
result.
The most commonly used ensemble methods are Bagging and Boosting.
In bagging, individual data points can be selected more than once. After the collection of
several sample datasets, these models are trained independently, and depending on the type of
TRACE KTU
In supervised learning techniques, the main aim is to determine the possible hypothesis out of
hypothesis space that best maps input to the corresponding or correct outputs.
There are some common methods given to find out the possible hypothesis from the Hypothesis
space, where hypothesis space is represented by uppercase-h (H) and hypothesis by lowercase-
h (h).
Hypothesis space (H):
Hypothesis space is defined as a set of all possible legal hypotheses; hence it is also known as
a hypothesis set. It is used by supervised machine learning algorithms to determine the best
possible hypothesis to describe the target function or best maps input to output. It is often
constrained by choice of the framing of the problem, the choice of model, and the choice of
model configuration
Hypothesis (h)
It is defined as the approximate function that best describes the target in supervised machine
learning algorithms. It is primarily based on data as well as bias and restrictions applied to data.
Hence hypothesis (h) can be concluded as a single hypothesis that maps input to proper output
and can be evaluated as well as used to make predictions.
Now, assume we have some test data by which ML algorithms predict the outputs for input as
follows
TRACE KTU
Decision Tree Classification Algorithm
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches represent
the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches. The decisions or the test are
performed on the basis of features of the given dataset. o It is a graphical representation for
getting all the possible solutions to a problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure. In order to build a tree, we use the
CART algorithm, which stands for Classification and Regression Tree algorithm. A decision
tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees.
TRACE KTU
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
How does the Decision Tree algorithm Work?
Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -
3. Continue this process until a stage is reached where you cannot further classify the nodes
and called the final node as a leaf node.
Example
Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels. The next decision node further
gets split into one decision node (Cab facility) and one leaf node. Finally, the decision node
splits into two leaf nodes (Accepted offers and Declined offer).
Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm. An attribute with the low Gini index
should be preferred as compared to the high Gini index. It only creates binary splits, and the
CART algorithm uses the Gini index to create binary splits.Gini index can be calculated using
the below formula:
Gini Index= 1- ∑jPj2
TRACE KTU
First find Entropy(S) whole dataset
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
3 3 3 3
=− 6 𝑙𝑜𝑔2 6 − 𝑙𝑜𝑔2 6 = 1
6
|𝑆𝐴1=1 | |𝑆𝐴1=0 |
Gain(S, A1)= ES − { |𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐴=1 ) + |𝑆|
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝐴1=0 )}
TRACE KTU
TRACE KTU
TRACE KTU
TRACE KTU
TRACE KTU
TRACE KTU
TRACE KTU
TRACE KTU