0% found this document useful (0 votes)
9 views49 pages

ML Notes

This document provides an introduction to machine learning, detailing its types, key components, and applications. It explains supervised, unsupervised, semi-supervised, and reinforcement learning, along with concepts like hypothesis space and inductive bias. The document also highlights real-world applications such as image recognition, speech recognition, traffic prediction, product recommendations, self-driving cars, email filtering, and virtual personal assistants.

Uploaded by

Dr. R. Gowri CIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

ML Notes

This document provides an introduction to machine learning, detailing its types, key components, and applications. It explains supervised, unsupervised, semi-supervised, and reinforcement learning, along with concepts like hypothesis space and inductive bias. The document also highlights real-world applications such as image recognition, speech recognition, traffic prediction, product recommendations, self-driving cars, email filtering, and virtual personal assistants.

Uploaded by

Dr. R. Gowri CIT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

UNIT I

MACHINE LEARNING

UNIT-1 INTRODUCTION TO MACHINE LEARNING 9

Introduction: Introduction to Machine Learning: Introduction. Different types of


learning, Hypothesis space and inductive bias, Evaluation. Training and test sets,
cross validation, Concept of over fitting, under fitting, Bias and Variance. Linear
Regression: Introduction, Linear regression, Simple and Multiple Linear
regression, Polynomial regression, evaluating regression fit.

Machine learning is a subset of artificial intelligence (AI) that involves the development of
algorithms and statistical models that enable computers to perform tasks without explicit
instructions. Instead of being programmed with specific rules for every possible scenario,
machine learning algorithms learn patterns from data and make decisions based on that learning.

Key components of machine learning include:

1. Data: The raw information from which the machine learning models learn. This can
include text, images, audio, and other types of data.
2. Algorithms: The mathematical methods and processes used to find patterns in the data.
Common algorithms include decision trees, neural networks, and support vector machines.
3. Models: The output of the learning process, which can make predictions or decisions based
on new data.
4. Training: The process of feeding data into the machine learning algorithm to create a
model. This typically involves splitting data into training and testing sets to evaluate the
model's performance.
5. Features: The individual measurable properties or characteristics of the data being used in
the model. Feature engineering involves selecting, modifying, or creating features to
improve the model's performance.
6. Evaluation: Assessing the performance of the machine learning model using metrics like
accuracy, precision, recall, and F1 score.

There are several types of machine learning, including:

1. Supervised Learning: The algorithm is trained on labeled data, where the input data and
the corresponding correct output are provided. Examples include classification and
regression tasks.
2. Unsupervised Learning: The algorithm is trained on unlabeled data and tries to identify
patterns or structures within the data. Examples include clustering and association tasks.
3. Semi-supervised Learning: A combination of supervised and unsupervised learning,
where the algorithm is trained on a small amount of labeled data and a large amount of
unlabeled data.
4. Reinforcement Learning: The algorithm learns by interacting with an environment and
receiving feedback in the form of rewards or penalties. This approach is often used in
robotics, game playing, and other decision-making tasks.

Machine learning is widely used in various applications, including image and speech recognition,
natural language processing, recommendation systems, autonomous vehicles, and healthcare
diagnostics.

Machine learning can be categorized into several types based on the nature of the learning process
and the type of data used. The primary types of learning in machine learning are:

1. Supervised Learning:
o Definition: The algorithm is trained on labeled data, meaning the input data is paired
with the correct output.
o Examples:
▪ Classification: Predicting a categorical label (e.g., spam detection in emails,
image recognition).
▪ Regression: Predicting a continuous value (e.g., predicting house prices,
stock prices).
2. Unsupervised Learning:
o Definition: The algorithm is trained on unlabeled data and tries to find patterns or
structures within the data.
o Examples:
▪ Clustering: Grouping similar data points together (e.g., customer
segmentation, image compression).
▪ Association: Finding rules that describe large portions of the data (e.g.,
market basket analysis).
3. Semi-supervised Learning:
o Definition: Combines a small amount of labeled data with a large amount of
unlabeled data during training. It is useful when labeling data is expensive or time-
consuming.
o Example: Text classification with a small set of labeled documents and a large
corpus of unlabeled documents.
4. Reinforcement Learning:
o Definition: The algorithm learns by interacting with an environment and receiving
feedback in the form of rewards or penalties. The goal is to learn a strategy (policy)
that maximizes cumulative reward.
o Examples:
▪ Game playing: Algorithms learning to play games like chess or Go.
▪ Robotics: Robots learning to navigate or manipulate objects.

These types of learning provide a broad framework for understanding how machine learning
algorithms can be applied to different problems and datasets.
In machine learning, the concepts of hypothesis space and inductive bias are fundamental to
understanding how models learn from data and generalize to new, unseen examples. It can be
given by:

Hypothesis Space

The hypothesis space refers to the set of all possible hypotheses (or models) that a learning
algorithm can choose from to make predictions based on the data. Each hypothesis in this space
represents a different way of mapping inputs to outputs. The size and nature of the hypothesis
space are determined by the choice of the learning algorithm and the model parameters.

For example:

• In linear regression, the hypothesis space consists of all possible linear functions.
• In decision trees, the hypothesis space consists of all possible trees that can be constructed
based on the features and splits.

The hypothesis space can be finite or infinite, and its complexity can greatly affect the learning
process and the model's ability to generalize.

Inductive Bias

Inductive bias refers to the set of assumptions that a learning algorithm makes to predict outputs
for new inputs that it has not encountered before. Since there are usually many hypotheses
consistent with the training data, inductive bias helps the algorithm choose the most appropriate
one from the hypothesis space.

Types of inductive biases include:

• Preference Bias (Search Bias): The algorithm prefers some hypotheses over others based
on criteria such as simplicity (Occam's razor), regularization, or prior knowledge. For
example, linear models prefer linear relationships even if more complex models might fit
the training data better.
• Restriction Bias: The algorithm is restricted to a subset of the hypothesis space, limiting
the types of models it can choose. For instance, a linear regression algorithm can only
choose linear models, even if the true relationship is nonlinear.

Inductive bias is essential because it allows machine learning algorithms to generalize from
limited data. Without inductive bias, the hypothesis space would be too large, and the algorithm
might overfit the training data without being able to generalize well to new data.

Relationship Between Hypothesis Space and Inductive Bias

• The hypothesis space defines the boundaries within which the learning algorithm searches
for the best model.
• Inductive bias guides the search process within the hypothesis space, helping the algorithm
choose the most appropriate model for the given problem.

A balance between a sufficiently rich hypothesis space and a well-chosen inductive bias is crucial
for effective learning. Too narrow a hypothesis space may lead to underfitting, where the model
is too simple to capture the underlying data patterns. Conversely, too broad a hypothesis space
without a strong inductive bias may lead to overfitting, where the model captures noise rather
than the underlying data distribution.

Hypothesis Space Definition:

Hypothesis space consists of all possible linear functions that can be used to predict the price
based on the size. Mathematically, this is represented as:

Example:

Dataset is:

Size (sq ft) Price ($)

800 150,000

1000 180,000

1200 210,000

1400 240,000
We need to find the best linear function that fits this data.

Interpretation:

This means that for every additional square foot, the price increases by $150, starting from a base
price of $50,000.

Hypothesis Space:

• Finite vs. Infinite: In linear regression, the hypothesis space is infinite because there are
infinitely many possible values for theta0 and theta1.
• Bounded by Assumptions: The hypothesis space is restricted to linear functions.
Nonlinear relationships cannot be captured by this space.
8

1. Introduction to machine learning


• Machine learning is the field of study that allows computers to learn without
being explicitly programmed.
• Using machine learning, we don't need to provide explicit instructions to
Computers for reacting to some special situations.
• We need to provide training to the computers to find real-time solutions for
the specific problems.
• The chess game is a famous example where machine learning is being used to
play chess.
• The code lets the machine learn and optimizes itself over repeated games.

Machine Learning is broadly classified into two main categories.

1. Supervised Machine Learning


2. Unsupervised Machine Learning

Supervised Learning

• Supervised learning is similar to having a trainer or teacher who supervises all


the machines' reactions and tells step-by-step solutions to specific problems.
• It's like a hand-holding way of teaching the computers what to do.
• One real-time example of supervised learning is recognizing different types of
images using computers.
• Being humans, we also learn by this model as we are taught to recognize
different objects like a car by repeat exposures. In the same way, machines are
taught.
• We feed a different set of some specific images into a machine where each image
has a specific identifier to identify the type of the image.
• However, computers are taught so that every time the particular blend of
pixels comes in front of the computer; it can recognize the type of image
loaded into the model dataset.
• Supervised learning works in a way that the computer can learn through the
previous exposures; for example, if a computer sees a car object and recognizes
it like a car, then next time, it should be able to identify any different image
of car object by identifying a lot of features that are similar to previously
identified images of Car.
• When we train a machine learning model for image recognition, we present
many images where every single image is attached to a label so that the data can
be clearly labeled and gets stored in a machine learning model.
• Once we complete the training, we should present an object's image that is not
part of the training data, and the machine should be able to identify it by
classifying all its previous learnings.
• This is the most fundamental type of Supervised learning, which is called
Classification.
• Out machine learning model must be able to classify the different bunch of images.
• It must be programmed so that the different objects can be recognized
according to their unique characteristics.
• However, we can create a generic classifier so that it is not dependent on
learning data. We don't need to recode the entire model on changing the training
data.

Unsupervised Learning

• In Supervised learning, the specific kind of dataset is loaded into computers to


learn through the repeat exposures to the dataset.
• Instead of providing training data where every piece of data is clearly labeled,
we provide the unstructured training data in unsupervised learning.
• We want the model to sense the dataset so that it learns to find the structure
in unstructured data.
• In other words, we can say, in unsupervised learning, we don't tell computers the
kind of data.
• Instead, we want the computers to see the structure in the data by observing
which way the data is being organized.
• One type of Unsupervised learning is called clustering, in which the computer
looks at the dataset and its features and can figure out the separate clusters in
which the data is maintained.

Reinforcement Learning

• We have covered supervised learning in which we have loaded the labeled


training data in the machine learning models so that the computers can classify
the data and performs regressions to identify the dataset.
• We have also covered Unsupervised learning. We have loaded the
unstructured unlabeled dataset grouped in separate clusters, and we want
computers to be smart enough to identify the separate clusters.
• As humans, we are much experienced in reinforcement learning. We tend to
learn through reinforcement.
• For example, if driving through a route that is full of traffic jam, we'll ignore
going through the same route on other days.
• There are two kinds of reinforcements we generally come through, 1. Positive,
2. Negative Reinforcement.
• The same way machine works in the case of Reinforcement Algorithms.
• One of the real-time examples of reinforcement learning is a Chess game where
the Computer with the Reinforcement learning algorithm calculates the
winning probability with every move.
• The computer might come through positive as well negative reinforcement
with every single move.
• However, through many cycles of training and by practicing more and more
games, the computers will learn about which moves in which situations will
lead to an increase in its winning percentage.

3. Applications of Machine learning

Machine learning is a buzzword for today's technology, and it is growing very rapidly
day by day. We are using machine learning in our daily life even without knowing it
such as Google
Maps, Google assistant, Alexa, etc. Below are some most trending real-world
applications of Machine Learning:

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is


used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we


upload a photo with our Facebook friends, then we automatically get a tagging
suggestion with name, and the technology behind this is machine learning's face
detection and recognition algorithm.

It is based on the Facebook project named "Deep Face," which is responsible for
face recognition and person identification in the picture.

2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present,
machine learning algorithms are widely used by various applications of speech
recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct
path with the shortest route and predicts the traffic conditions.

Downloaded by E.BHUVANESWARI CIT ([email protected])


It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies


such as Amazon, Netflix, etc., for product recommendation to the user. Whenever we
search for some product on Amazon, then we started getting an advertisement for the
same product while internet surfing on the same browser and this is because of
machine learning.

Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.

As similar, when we use Netflix, we find some recommendations for entertainment


series, movies, etc., and this is also done with the help of machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars.


Machine learning plays a significant role in self-driving cars. Tesla, the most popular
car manufacturing company is working on self-driving car. It is using unsupervised
learning method to train the car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal,


and spam. We always receive an important mail in our inbox with the important
symbol and spam emails in our spam box, and the technology behind this is Machine
learning. Below are some spam filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Downloaded by E.BHUVANESWARI CIT ([email protected])


Some machine learning algorithms such as Multi-Layer Perceptron, Decision
tree, and Naïve Bayes classifier are used for email spam filtering and malware
detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana,
Siri. As the name suggests, they help us in finding the information using our voice
instruction. These assistants can help us in various ways just by our voice instructions
such as Play music, call someone, Open an email, Scheduling an appointment, etc.

These virtual assistants use machine learning algorithms as an important part.

These assistant record our voice instructions, send it over the server on a cloud, and
decode it using ML algorithms and act accordingly.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways
that a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud
transaction.

For each genuine transaction, the output is converted into some hash values, and
these values become the input for the next round. For each genuine transaction, there is
a specific pattern which gets change for the fraud transaction hence, it detects it and
makes our online transactions more secure.

9. Stock Market trading:

Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short
term memory neural network is used for the prediction of stock market trends.

10. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this,
medical technology is growing very fast and able to build 3D models that can predict
the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.


11. Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the language then it is not a problem
at all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.

The technology behind the automatic translation is a sequence to sequence learning


algorithm, which is used with image recognition and translates the text from one
language to another language.

Give a detailed note on hypothesis in machine learning and statistics?

4. Hypothesis in Machine Learning

• The hypothesis is a common term in Machine Learning and data science projects.
• As we know, machine learning is one of the most powerful technologies
across the world, which helps us to predict results based on past experiences.
• Moreover, data scientists and ML professionals conduct experiments that aim to
solve a problem.
• These ML professionals and data scientists make an initial assumption for
the solution of the problem.
• This assumption in Machine learning is known as Hypothesis.
• In Machine Learning, at various times, Hypothesis and Model are
used interchangeably.
• However, a Hypothesis is an assumption made by scientists, whereas a
model is a mathematical representation that is used to test the hypothesis.

What is Hypothesis?

The hypothesis is defined as the supposition or proposed explanation based


on insufficient evidence or assumptions. It is just a guess based on some known
facts but has not yet been proven. A good hypothesis is testable, which results in either
true or false.

Example: Let's understand the hypothesis with a common example. Some scientist
claims that ultraviolet (UV) light can damage the eyes then it may also cause
blindness.
In this example, a scientist just claims that UV rays are harmful to the eyes, but we
assume they may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.

4.1 Hypothesis in Machine Learning (ML)

The hypothesis is one of the commonly used concepts of statistics in Machine Learning.
It is specifically used in Supervised Machine learning, where an ML model learns a
function that best maps the input to corresponding outputs with the help of an available
dataset.

Downloaded by E.BHUVANESWARI CIT ([email protected])


In supervised learning techniques, the main aim is to determine the possible hypothesis
out of hypothesis space that best maps input to the corresponding or correct outputs.

There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:

Hypothesis space (H):

Hypothesis space is defined as a set of all possible legal hypotheses; hence it


is also known as a hypothesis set. It is used by supervised machine learning
algorithms to determine the best possible hypothesis to describe the target function or
best maps input to output.

It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.

Hypothesis (h):

It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.

Hence hypothesis (h) can be concluded as a single hypothesis that maps input to
proper output and can be evaluated as well as used to make predictions.

The hypothesis (h) can be formulated in machine learning as follows:


y= mx + b

Downloaded by E.BHUVANESWARI CIT ([email protected])


Where,

Y: Range

m: Slope of the line which divided test data or changes in y divided by change

in x. x: domain

c: intercept (constant)

Example: Let's understand the hypothesis (h) and hypothesis space (H) with a
two- dimensional coordinate plane showing the distribution of data as follows:

Now, assume we have some test data by which ML algorithms predict the outputs for
input as follows:

If we divide this coordinate plane in such as way that it can help you to predict
output or result as follows:

Downloaded by E.BHUVANESWARI CIT ([email protected])


Based on the given test data, the output result will be as follows:

However, based on data, algorithm, and constraints, this coordinate plane can also be
divided in the following ways as follows:

With the above example, we can conclude that;

Hypothesis space (H) is the composition of all legal best possible ways to
divide the coordinate plane so that it best maps input to proper output.

Further, each individual best possible way is called a hypothesis (h). Hence, the
hypothesis and hypothesis space would be like this:
4.2 Hypothesis in Statistics

Similar to the hypothesis in machine learning, it is also considered an assumption of


the output. However, it is falsifiable, which means it can be failed in the presence of
sufficient evidence.

Unlike machine learning, we cannot accept any hypothesis in statistics because it is


just an imaginary result and based on probability. Before start working on an
experiment, we must be aware of two important types of hypotheses as follows:

o Null Hypothesis: A null hypothesis is a type of statistical hypothesis which tells


that there is no statistically significant effect exists in the given set of
observations. It is also known as conjecture and is used in quantitative analysis
to test theories about markets, investment, and finance to decide whether an
idea is true or false.
o Alternative Hypothesis: An alternative hypothesis is a direct contradiction
of the null hypothesis, which means if one of the two hypotheses is true, then the
other must be false. In other words, an alternative hypothesis is a type of
statistical hypothesis which tells that there is some significant effect that exists
in the given set of observations.

Significance level

• The significance level is the primary thing that must be set before
starting an experiment.
• It is useful to define the tolerance of error and the level at which effect
can be considered significantly.
• During the testing process in an experiment, a 95% significance level is
accepted, and the remaining 5% can be neglected.
• The significance level also tells the critical or threshold value. For e.g.,
in an experiment, if the significance level is set to 98%, then the critical value
is 0.02%.

Downloaded by E.BHUVANESWARI CIT ([email protected])


P-value

• The p-value in statistics is defined as the evidence against a null hypothesis.


• In other words, P-value is the probability that a random chance generated the
data or something else that is equal or rarer under the null hypothesis
condition.
• If the p-value is smaller, the evidence will be stronger, and vice-versa which
means the null hypothesis can be rejected in testing.
• It is always represented in a decimal form, such as 0.035.
• Whenever a statistical test is carried out on the population and sample to find
out P- value, then it always depends upon the critical value.
• If the p-value is less than the critical value, then it shows the effect is
significant, and the null hypothesis can be rejected.
• Further, if it is higher than the critical value, it shows that there is no significant
effect and hence fails to reject the Null hypothesis.

What Is Inductive Bias in Machine Learning?

5. Inductive Bias:

Definition

Every machine learning model requires some type of architecture design and
possibly some initial assumptions about the data we want to analyze. Generally,
every building block and every belief that we make about the data is a form of
inductive bias.
• Inductive biases play an important role in the ability of machine learning
models to generalize to the unseen data.
• A strong inductive bias can lead our model to converge to the global optimum.
• On the other hand, a weak inductive bias can cause the model to find only the
local optima and be greatly affected by random changes in the initial states.
We can categorize inductive biases into two different groups called relational and
non- relational. The former represents the relationship between entities in the
network, while the latter is a set of techniques that further constrain the learning
algorithm.

Briefly explain the two main problems that degrade the performance of
machine learning models?
Overfitting

• Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset.
• Because of this, the model starts caching noise and inaccurate values present
in the dataset, and all these factors reduce the efficiency and accuracy of the
model.
• The overfitted model has low bias and high variance.
• The chances of occurrence of overfitting increase as much we provide training
to our model.
• It means the more we train our model, the more chances of occurring the
overfitted model.

Overfitting is the main problem that occurs in supervised learning.

Example: The concept of the overfitting can be understood by the below graph of the
linear regression output:

As we can see from the above graph, the model tries to cover all the data points present
in the scatter plot. It may look efficient, but in reality, it is not so. Because the goal
of the regression model to find the best fit line, but here we have not got any best fit,
so, it will generate the prediction errors.

How to avoid the Overfitting in Model

Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which we
can reduce the occurrence of overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling

Underfitting

• Underfitting occurs when our machine learning model is not able to capture
the underlying trend of the data.
• To avoid the overfitting in the model, the fed of training data can be stopped
at an early stage, due to which the model may not learn enough from the
training data.
• As a result, it may fail to find the best fit of the dominant trend in the data.
• In the case of underfitting, the model is not able to learn enough from the
training data, and hence it reduces the accuracy and produces unreliable
predictions.

An underfitted model has high bias and low variance.

Example: We can understand the underfitting using below output of the linear
regression model:

Downloaded by E.BHUVANESWARI CIT ([email protected])

As we can see from the above diagram, the model is unable to capture the data points
present in the plot.

How to avoid underfitting:

o By increasing the training time of the model.


o By increasing the number of features.

Goodness of Fit

• The "Goodness of fit" term is taken from the statistics, and the goal of the
machine learning models to achieve the goodness of fit.
• In statistics modeling, it defines how closely the result or predicted values
match the true values of the dataset.
• The model with a good fit is between the underfitted and overfitted model,
and ideally, it makes predictions with 0 errors, but in practice, it is difficult to
achieve it.
• As when we train our model for a time, the errors in the training data go down,
and the same happens with test data.
• But if we train the model for a long duration, then the performance of the model
may decrease due to the overfitting, as the model also learn the noise present
in the dataset.
• The errors in the test dataset start increasing, so the point, just before the
raising of errors, is the good point, and we can stop here for achieving a good
model.

There are two other methods by which we can get a good point for our model,
which are the resampling method to estimate model accuracy and validation
dataset.

What is bias and variance? Explain its trade off?


Bias and Variance in Machine Learning

• Machine learning is a branch of Artificial Intelligence, which allows


machines to perform data analysis and make predictions.
• However, ifDownloaded by E.BHUVANESWARI
the machine CIT ([email protected])
learning model is not accurate, it can make
predictions errors, and these prediction errors are usually known as Bias and
Variance.
• In machine learning, these errors will always be present as there is always a
slight difference between the model predictions and actual predictions.
• The main aim of ML/data science analysts is to reduce these errors in order
to get more accurate results.
Errors in Machine Learning?

In machine learning, an error is a measure of how accurately an algorithm can make


predictions for the previously unknown dataset. On the basis of these errors, the
machine learning model is selected that can perform best on the particular dataset.
There are mainly two types of errors in machine learning, which are:

o Reducible errors: These errors can be reduced to improve the model accuracy.
Such errors can further be classified into

bias and Variance.


o Irreducible errors: These errors will always be present in the model

regardless of which algorithm has been used. The cause of these errors is unknown
variables whose value can't be reduced.

What is Bias?

While making predictions, a difference occurs between prediction values


made by the model and actual values/expected values, and this difference
is known as bias errors or Errors due to bias.

A model has either:

o Low Bias: A low bias model will make fewer assumptions about the form of the
target function.
o High Bias: A model with a high bias makes more assumptions, and the model
becomes unable to capture the important features of our dataset. A high bias
model also cannot perform well on new data.

Some examples of machine learning algorithms with low bias are Decision
Trees, k- Nearest Neighbours and Support Vector Machines.

At the same time, an algorithm with high bias is Linear Regression, Linear
Discriminant Analysis and Logistic Regression.

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce
the high bias:

o Increase the input features as the model is underfitted.


o Decrease the regularization term.
o Use more complex models, such as including some polynomial features.

What is a Variance Error?

variance tells that how much a random variable is different from its
expected value. Ideally, a model should not vary too much from one training dataset to
another, which means the algorithm should be good in understanding the hidden
mapping between inputs and output variables.

Variance errors are either of low variance or high variance.


Low variance means there is a small variation in the prediction of the target function with
changes in the training data set.

At the same time,Downloaded


High variance shows CIT
by E.BHUVANESWARI a large variation in the prediction of the
([email protected])
target function with changes in the training dataset.

A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a model
gives good results with the training dataset but shows high error rates on the test
dataset.

Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:

o A high variance model leads to overfitting.


o Increase model complexities.

Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.

Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis.

At the same time, algorithms with high variance are decision tree, Support
Vector Machine, and K-nearest neighbours.

Ways to Reduce High Variance:


o Reduce the input features or number of parameters as a model is overfitted.
o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.

Different Combinations of Bias-Variance

Downloaded by E.BHUVANESWARI CIT ([email protected])


There are four possible combinations of bias and variances, which are represented
by the below diagram:

1. Low-Bias,Low-Variance:The combination of low bias and low variance


shows an ideal machine learning model. However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model
predictions are inconsistent and accurate on average. This case occurs when the
model learns with a large number of parameters and hence leads to an
overfitting
3. High-Bias,Low-Variance: With High bias and low variance, predictions are
consistent but inaccurate on average. This case occurs when a model does not
learn well with the training dataset or uses few numbers of the parameter.
It leads to underfitting problems in the model.
4. High-Bias,High-Variance:With high bias and high variance, predictions are
inconsistent and also inaccurate on average.

How to identify High variance or High

Bias? High variance can be identified if the

Downloaded by E.BHUVANESWARI CIT ([email protected])


model has:
o Low training error and high test error.

High Bias can be identified if the model has:

o High training error and the test error is almost similar to training error.

Bias-Variance Trade-Off

• While building the machine learning model, it is really important to take care
of bias and variance in order to avoid overfitting and underfitting in the
model.
• If the model is very simple with fewer parameters, it may have low variance
and high bias.
• Whereas, if the model has a large number of parameters, it will have high
variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the Bias-
Variance trade-off.

For an accurate prediction of the model, algorithms need a low variance and low bias.
But this is not possible because bias and variance are related to each other:

o If we decrease the variance, it will increase the bias.


o If we decrease the bias, it will increase the variance.

Bias-Variance trade-off is a central issue in supervised learning.

o Ideally, we need a model that accurately captures the regularities in training data
and simultaneously generalizes well with the unseen dataset.
o Unfortunately, doing this is not possible simultaneously.
o Because a high variance algorithm may perform well with training data, but it
may lead to overfitting to noisy data.

Downloaded by E.BHUVANESWARI CIT ([email protected])


o Whereas, high bias algorithm generates a much simple model that may not
even capture important regularities in the data.
o So, we need to find a sweet spot between bias and variance to make an optimal model.

Hence, the Bias-Variance trade-off is about finding the sweet spot to make
a balance between bias and variance errors.

LINEAR REGRESSION

Linear regression is a statistical method used to model the relationship between a dependent
variable (target) and one or more independent variables (predictors). The goal is to fit a linear
equation to the observed data. Here are the key points about linear regression:

Assumptions

1. Linearity: The relationship between the dependent and independent variables is linear.
2. Independence: Observations are independent of each other.
3. Homoscedasticity: The variance of errors is constant across all levels of the independent
variables.
4. Normality: The errors are normally distributed.
Example

Predicting house prices based on the size of the house. Here, the size of the house is the
independent variable, and the house price is the dependent variable.

Multiple Linear Regression

Multiple linear regression models the relationship between two or more independent variables
and a dependent variable. The equation is:

Example

Predicting house prices based on multiple factors such as size, number of bedrooms, location, and
age of the house. Here, size, number of bedrooms, location, and age are the independent variables,
and the house price is the dependent variable.

Key Differences

• Number of Predictors: Simple linear regression uses one predictor, while multiple linear
regression uses two or more.
• Complexity: Multiple linear regression is more complex as it involves multiple predictors,
which can capture more nuanced relationships but also require careful consideration of
multicollinearity (when predictors are highly correlated).

Assumptions (common to both)

1. Linearity: The relationship between dependent and independent variables is linear.


2. Independence: Observations are independent of each other.
3. Homoscedasticity: The variance of residuals (errors) is constant.
4. Normality: Residuals are normally distributed.

Evaluation Metrics

• R-squared: Measures the proportion of variance in the dependent variable explained by


the model.
• Adjusted R-squared: Adjusted for the number of predictors in the model, providing a
more accurate measure when multiple predictors are involved.
• F-statistic: Tests the overall significance of the model.
• p-values: Tests the significance of individual predictors.

Simple and multiple linear regression are foundational techniques in statistics and machine
learning, used to understand relationships between variables and make predictions based on data.

Polynomial Regression

Polynomial regression is an extension of linear regression that models the relationship between
the independent variable’x’and the dependent variable ‘y’ as an nth degree polynomial. It
allows for more complex, non-linear relationships.

Equation

The general form of a polynomial regression model is:

Example

Predicting the progression of a disease based on time. A linear model might not capture the
progression accurately, but a polynomial model can account for the accelerating or decelerating
rate of progression.

Key Concepts

Degree of the Polynomial

• The degree of the polynomial nnn determines the flexibility of the model.
• A higher degree polynomial can fit more complex data patterns but may also lead to
overfitting.
Basis Functions

• Polynomial regression can be viewed as linear regression with polynomial basis


functions.

Steps to Perform Polynomial Regression

1. Data Preprocessing: Standardize or normalize data if necessary.


2. Feature Transformation: Generate polynomial features from the original independent
variable(s).

Fit the Model: Use linear regression techniques to fit the model to the transformed features.

Model Evaluation: Assess the model's performance using metrics such as R2R^2R2, RMSE
(Root Mean Square Error), and cross-validation.

Advantages

1. Flexibility: Can model non-linear relationships.


2. Simplicity: Easy to implement using existing linear regression techniques.

Disadvantages

1. Overfitting: High-degree polynomials can overfit the training data, leading to poor
generalization.
2. Extrapolation: Predictions outside the range of the training data can be unreliable.
3. Computational Complexity: High-degree polynomials require more computational
resources and can suffer from numerical instability.

Applications

• Economics: Modeling growth rates and economic trends.


• Engineering: Curve fitting in control systems and signal processing.
• Biology: Modeling population growth and biological processes.

Polynomial regression is a powerful tool for capturing non-linear relationships in data, but it
must be used with caution to avoid overfitting and ensure meaningful predictions.

Evaluating Regression Fit


Evaluating the fit of a regression model is crucial to understanding how well it describes the
relationship between the independent and dependent variables. This involves using various
metrics and techniques to assess the model's performance, accuracy, and predictive power.

Key Metrics for Evaluating Regression Fit

1. R-squared (Coefficient of Determination)


o Definition: R-squared measures the proportion of the variance in the dependent
variable that is predictable from the independent variables. It is given by,

• Interpretation: An R2 value close to 1 indicates a good fit, meaning that the model
explains a large portion of the variance in the dependent variable. However, it doesn't
account for overfitting or the complexity of the model.

Adjusted R-squared

• Definition: Adjusted R-squared adjusts the R-squared value for the number of predictors
in the model, providing a more accurate measure of model performance.
• Formula:


• Where ‘n’ is the number of observations and ‘p’ is the number of predictors.
• Interpretation: Unlike R-squared, the adjusted R-squared can decrease if unnecessary
predictors are added to the model.

Mean Squared Error (MSE)

• Definition: MSE measures the average of the squares of the errors, i.e., the average
squared difference between the observed and predicted values.
• Formula


• Interpretation: Lower MSE values indicate a better fit. However, it is sensitive to
outliers since errors are squared.

Root Mean Squared Error (RMSE)

• Definition: RMSE is the square root of the MSE, providing a measure of the average
magnitude of the errors in the same units as the dependent variable.
• Formula:

• Interpretation: Lower RMSE values indicate a better fit.

Mean Absolute Error (MAE)

• Definition: MAE measures the average absolute difference between the observed and
predicted values.
• Formula

Interpretation: Lower MAE values indicate a better fit. MAE is less sensitive to outliers
compared to MSE.
PART -A

S.N Question and Answer CO,K


o
Define Machine Learning. CO1, K1
1. Machine learning is a subfield of artificial intelligence, which
isbroadly defined as the capability of a machine to imitate
intelligent human behavior.
2. List out Different Types of learning methods. CO1, K1
Supervised Learning, Unsupervised Learning, reinforcement
Learning
3. What is meant by supervised learning? CO1, K1
Supervised learning, also known as supervised machine learning,
isa subcategory of machine learning and artificial intelligence. It is
defined by its use of labeled datasets to train algorithms that to
classify data or predict outcomes accurately.
4. What is Unsupervised Learning? CO1, K1
Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed to act
on that data without any supervision.
5. Differentiate supervised and unsupervised machine learning. CO1, K1
In supervised machine learning, the machine is trained using
labeled data. Then a new dataset is given into the learning model
so that the algorithm provides a positive outcome by analyzing the
labeled data. For example, we first require to label the data which
isnecessary to train the model while performing classification.
In the unsupervised machine learning, the machine is not trained
using labeled data and let the algorithms make the decisions
without any corresponding output variables.
6. Define Reinforcement Learning? CO1, K1
Reinforcement learning is a machine learning training method
basedon rewarding desired behaviors and/or punishing undesired
ones. In general, a reinforcement learning agent is able to perceive
and interpret its environment, take actions and learn through trial
and error.
7. Where is supervised learning used? CO1, K1
Linear regression is a supervised learning technique typically used
in predicting, forecasting, and finding relationships between
quantitative data.
8. Give Example for Unsupervised Learning. CO1, K1
ome examples of unsupervised learning algorithms include K-
Means Clustering, Principal Component Analysis and Hierarchical
Clustering.
9. Give Example for Reinforcement Learning. CO1, K1
Reinforcement learning can be used in different fields such as
healthcare, finance, recommendation systems, etc. Playing games
like Go: Google has reinforcement learning agents that learn to
solve problems by playing simple games like Go, which is a game
of strategy
10. List Out real time application of ML. CO1, K1
Image recognition
Speech recognition.
Medical diagnosis.
Statistical arbitrage
Predictive analytics
11. Define Data Science. CO1, K1
ata science is the domain of study that deals with vast volumes of data
using modern tools and techniques to find unseen patterns, derive
meaningful information, and make business decisions.
12. What is PCA? CO1, K1
Principal component analysis (PCA) is a technique for reducing
the dimensionality of such datasets, increasing interpretability but at
the same time minimizing information loss

13. What is the use of PCA in machine learning? CO1, K1


or Write Applications of PCA in Machine Learning.

It is used to reduce the number of dimensions in healthcare data.


PCA can help resize an image. It can be used in finance to analyze
stock data and forecast returns. PCA helps to find patterns in the
high- dimensional datasets
14. What is Hypothesis? CO1, K1

The hypothesis is defined as the supposition or proposed


explanation based on insufficient evidence or assumptions.

Example: Some scientist claims that ultraviolet (UV) light can damage
the eyes then it may also cause blindness.In this example, a scientist
just claims that UV rays are harmful to the eyes, but we assume they
may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.
15. Define hypothesis set. CO1, K1
Hypothesis space is defined as a set of all possible legal
hypotheses; hence it is also known as a hypothesis set.

16. Write the hypothesis formula in machine learning? CO1, K2

The hypothesis (h) can be formulated in machine learning as follows:


y= mx + b

Where, Y: Range ,m: Slope of the line which divided test data or
changes in y divided by change in x, x: domain, c: intercept
(constant)
17. Define inductive bias? CO1, K1
Every machine learning model requires some type of architecture
design and possibly some initial assumptions about the data we want
to
analyze. Generally, every building block and every belief that we
make about the data is a form of inductive bias.
18. What is k-NN algorithm? CO1, K1
The k-Nearest Neighbors(k-NN) algorithm assumes that entities
belonging to a particular category should appear near each other, and
those that are part of different groups should be distant. In other
words, we assume that similar data points are clustered near each
other away from the dissimilar ones.
19. Define bias and variance? CO1, K1

Bias: Bias is a prediction error that is introduced in the model due to


oversimplifying the machine learning algorithms. Or it is the
difference between the predicted values and the actual values.

Variance: If the machine learning model performs well with the


training dataset, but does not perform well with the test dataset, then
variance occurs.
20. How to avoid the Overfitting in Model? CO1, K2

Both overfitting and underfitting cause the degraded performance of


the machine learning model. But the main cause is overfitting, so
there are some ways by which we can reduce the occurrence of
overfitting in our model.

o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
21. How to avoid underfitting in model? CO1, K2

o By increasing the training time of the model.


o By increasing the number of features.
22. Define VC dimension? CO1, K1

The Vapnik Chervonenkis dimension is the capacity of a


classification algorithm and is defined as the maximum cardinality
of the points that the algorithm is able to shatter.
23. What is PAC learning? CO1, K1

A good learner will learn with high probability and close approximation to
the target concept,the selected hypothesis will have lower the
error(“approximately correct”)with the parameters ε and δ is called
probably approximately correct learning.
24. Hypothesis h generated the errors with respect to price and engine CO1, K3
power of 5 samples,given ε = 0.05 and δ = 0.20
s.no 1 2 3 4 5
Error(h) 0.001 0.07 0.045 0.065 0.036

Find h is PAC or not?

Part-B

S.No Question and Answer CO,K


1. Explain in detail about machine learning concepts with different CO1,K3
learning types.
2. Discuss with examples some useful applications of machine learning. CO1,K3
3. Differentiate supervised, unsupervised and reinforcement learning. CO1,K3

4. Explain in detail why to learn linear algebra before machine learning CO1,K3

5. Describe briefly about some examples of linear algebra in machine CO1,K3


learning?
6. Give a detailed note on hypothesis in machine learning and statistics? CO1,K3

7. What Is Inductive Bias in Machine Learning? CO1,K3

8. Briefly explain the two main problems that degrade the CO1,K3
performance of machine learning models?
9. What is bias and variance? Explain its trade off? CO1,K3

10. Give a short note on VC dimension and PAC learning? CO1,K3

You might also like