0% found this document useful (0 votes)
11 views30 pages

Chapter 1

Machine Learning is a subset of artificial intelligence that enables computers to learn from data and improve their performance without explicit programming. It encompasses various types, including supervised, unsupervised, and reinforcement learning, and is increasingly essential for tasks too complex for manual implementation. Applications range from self-driving cars to recommendation systems, highlighting its importance in modern technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

Chapter 1

Machine Learning is a subset of artificial intelligence that enables computers to learn from data and improve their performance without explicit programming. It encompasses various types, including supervised, unsupervised, and reinforcement learning, and is increasingly essential for tasks too complex for manual implementation. Applications range from self-driving cars to recommendation systems, highlighting its importance in modern technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

What is Machine Learning

In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work
on our instructions. But can a machine also learn from experiences or past data like a human
does? So here comes the role of Machine Learning.

Machine Learning is said as a subset of artificial intelligence that is mainly concerned with
the development of algorithms which allow a computer to learn from the data and past
experiences on their own. The term machine learning was first introduced by Arthur
Samuel in 1959. We can define it in a summarized way as:

Definition

Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed.

With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions
without being explicitly programmed. Machine learning brings computer science and
statistics together for creating predictive models. Machine learning constructs or uses the
algorithms that learn from historical data. The more we will provide the information, the
higher will be the performance.

A machine has the ability to learn if it can improve its performance by gaining more
data.

How does Machine Learning work

A Machine Learning system learns from historical data, builds the prediction models,
and whenever it receives new data, predicts the output for it. The accuracy of predicted
output depends upon the amount of data, as the huge amount of data helps to build a better
model which predicts the output more accurately.

Suppose we have a complex problem, where we need to perform some predictions, so instead
of writing a code for it, we just need to feed the data to generic algorithms, and with the help
of these algorithms, machine builds the logic as per the data and predict the output. Machine
learning has changed our way of thinking about the problem. The below block diagram
explains the working of Machine Learning algorithm:

Features of Machine Learning:

o Machine learning uses data to detect various patterns in a given dataset.


o It can learn from past data and improve automatically.
o It is a data-driven technology.
o Machine learning is much similar to data mining as it also deals with the huge amount
of the data.

Need for Machine Learning

The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly. As a human, we have some limitations as we cannot access the huge
amount of data manually, so for this, we need some computer systems and here comes the
machine learning to make things easy for us.

We can train machine learning algorithms by providing


them the huge amount of data and let them explore the data, construct the models, and predict
the required output automatically. The performance of the machine learning algorithm
depends on the amount of data, and it can be determined by the cost function. With the help
of machine learning, we can save both time and money.

The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition,
and friend suggestion by Facebook, etc. Various top companies such as Netflix and
Amazon have build machine learning models that are using a vast amount of data to analyze
the user interest and recommend product accordingly.

Types of Machine Learning

1) Supervised Learning

Supervised learning is a type of machine learning method in which we provide sample


labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.

The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:

1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

1) Supervised Learning

Supervised learning is a type of machine learning method in which we provide sample


labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.

The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.

The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teach

er. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression
o
2) Unsupervised Learning

Unsupervised learning is a learning method in which a machine learns without any


supervision.

The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any supervision.
The goal of unsupervised learning is to restructure the input data into new features or a group
of objects with similar patterns.

In unsupervised learning, we don't have a predetermined result. The machine tries to find
useful insights from the huge amount of data. It can be further classifieds into two categories
of algorithms:

o Clustering
o Association

3) Reinforcement Learning

Reinforcement learning is a feedback-based learning method, in which a learning agent gets a


reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its perfo rmance. In reinforcement
learning, the agent interacts with the environment and explores it. The goal of an agent is to
get the most reward points, and hence, it improves its performance.

The robotic dog,

which automatically learns the movement of his arms, is an example of Reinforcement


learning.

Artificial Intelligence (AI): Developing machines to mimic human intelligence


and behaviour.
Machine Learning (ML): Algorithms that learn from structured data to predict
outputs and discover patterns in that data.

Deep Learning (DL):

Algorithms based on highly complex neural networks that mimic the way a human
brain works to detect patterns in large unstructured data sets.

Applications of Machine Learning

1.Learning Associations

2.Classification

3.Pattern Recognition

4.Natural Language Processing

5.Spam filering

6.Recommendation Systems

Learning a Class from Examples

Learning a class from examples is a fundamental concept in supervised


machine learning. It involves teaching a model to identify a class by
providing it with examples that have known labels.

How it works
 A model is trained by providing it with labeled examples
 The model analyzes patterns and relationships between the input and
output variables
 The model learns to make predictions based on the patterns it has
identified

Examples of learning a class from examples


 Image classification
A model can learn to classify images of cats and dogs by being shown
labeled examples of both
 Spam detection
A model can learn to identify spam emails by being shown examples of
spam and non-spam emails
 Churn prediction
A model can learn to predict whether a customer will churn by being
shown examples of customers who have churned and those who have
not
Vapnik-Chervonenkis Dimension

The VC dimension of a hypothesis class is the maximum number of points that the hypothesis class
can shatter

What is Shattering?
A hypothesis class is said to “shatter” a set of data points if, no matter how you label those
points (e.g., assign them as positive or negative), the hypothesis class has a function that can
correctly classify them.

Example of Shattering:
Imagine you have two points on a 2D plane.

 A straight line (linear hypothesis) can divide these two points in all possible ways based on
their labels (e.g., positive-negative or negative-positive). Hence, the hypothesis class of
straight lines shatters these two points.
 However, for three points that form a triangle, a straight line cannot shatter them if their
labels are mixed in a specific way.
 If a model can shatter three points but not four, its VC dimension is 3.

Formal Definition of VC Dimension


The VC dimension of a hypothesis class $H$ is the largest number of data points that can be
shattered by $H$.
In other words, for a dataset of size :

 If can shatter points, but not points, the VC dimension of $H$ is $n$.

Importance of VC Dimension

In machine learning, understanding the capacity and performance of a model is critical. One
important concept that helps in this understanding is the Vapnik-Chervonenkis (VC)
dimension. The VC dimension measures the ability of a hypothesis space (the set of all
possible models) to fit different patterns in a dataset.

Introduced by Vladimir Vapnik and Alexey Chervonenkis, this concept plays a vital role in
assessing the trade-off between model complexity and generalization. It helps us understand
how well a model can balance learning from the training data and performing well on unseen
data.

VC Dimension and Model Complexity


The VC dimension is directly tied to a model’s complexity:

 Higher VC Dimension: Indicates a more complex model capable of learning intricate


patterns.
 Lower VC Dimension: Suggests a simpler model with limited learning capacity.

Balance Between Complexity and Generalization:

Overfitting: A model with a very high VC dimension may overfit, memorizing the training
data instead of generalizing.

Underfitting: A model with a very low VC dimension may underfit, failing to capture the
patterns in data.

VC Dimension and Model Complexity


The VC dimension is directly tied to a model’s complexity:

 Higher VC Dimension: Indicates a more complex model capable of learning intricate


patterns.
 Lower VC Dimension: Suggests a simpler model with limited learning capacity.

Balance Between Complexity and Generalization


 Overfitting: A model with a very high VC dimension may overfit, memorizing the
training data instead of generalizing.
 Underfitting: A model with a very low VC dimension may underfit, failing to capture
the patterns in data.

Probably Approximately Correct (PAC) learning

Probably Approximately Correct (PAC) learning is a theoretical framework


introduced by Leslie Valiant in 1984. It addresses the problem of learning
a function from a set of samples in a way that is both probably correct and
approximately correct.
In simpler terms, PAC learning formalizes the conditions under which a
learning algorithm can be
expected to perform well on new, unseen data after being trained on a
finite set of examples.

Importance of PAC Learning


It helps determine the conditions under which a learning algorithm can
generalize well from a limited number of samples, offering insights into the
trade-offs between accuracy, confidence, and sample size.
The PAC framework is widely applicable and serves as a basis for
analyzing and designing many machine learning algorithms
PAC learning is a theoretical framework that addresses the question of how
much data is necessary for a learning algorithm to perform well on new,
unseen data. The core idea is that a learning algorithm can be considered
PAC if, given a sufficient number of training samples, it can produce a
hypothesis that is likely (with high probability) to be approximately correct
(within a specified error margin).

Key Components of PAC Learning


1. Hypothesis Space: This is the set of all possible hypotheses that a learning algorithm can
choose from. The complexity of the hypothesis space significantly impacts the sample
complexity required for learning.
2. Sample Complexity: This refers to the number of training examples needed to ensure that
the learned hypothesis will generalize well to new data. In PAC learning, it is crucial to
determine how many samples are required to achieve a desired level of accuracy and
confidence.
3. Generalization: This is the ability of a learning algorithm to perform well on unseen data. In
the PAC framework, generalization is quantified by the probability that the chosen hypothesis
will have an error rate within an acceptable range on new samples.
4. Error Rate: The error rate is defined as the probability that the hypothesis will misclassify
an example drawn from the underlying distribution. PAC learning aims to minimize this error
rate while ensuring that the hypothesis is consistent with the training data.

Probably Approximately Correct (PAC) learning defines a mathematical


relationship between the number of training samples, the error rate, and the
probability that the available training data are large enough to attain the
desired error rate.

.
The hypotheses that fit the data with zero error are called consistent
with the training data. We call the set of hypotheses that are consistent
with training data the version space.

PAC learning is a framework for mathematical analysis of


machine learning, that aims to select a hypothesis that has a
low error, with high probability.

PAC model has two parameters -> ϵ and δ. It requires that


with a probability of at least 1−δ, the model learns a concept
with an error of at most ϵ i.e., with an accuracy
of 1−ϵ. δ gives the failure to achieve the accuracy
of 1−ϵ i.e., 1−δ gives the confidence of achieving the
accuracy of 1−ϵ.

Example: Learn the concept “University Acceptance”


Suppose we are given the undergraduate GPA and GRE score of n students that previously
applied to this university (the training set). Each student is labeled as accepted or not
accepted.

The task is to build a supervised learning model that correctly predicts if a pair
[Undergraduate GPA, GRE score] will be accepted to the university or not.

Figure. True trend (T) and derived hypothesis (H) of


university acceptance
This figure presents the GPAs and the GRE scores on the x-
axis and y-axis respectively. T represents the true trend line,
while H represents the hypothesis formulated to closely
represent the true trend. The hypothesis misclassifies true
positive records that fall in the false negative region as reject
and true negative records that fall in the false positive region
as accept.
Error region = T XOR H

this error region to be less than or equal to ϵ with a


probability of at least 1−δ

Approximately correct
A hypothesis is said to be approximately correct, if the
probability of error is less than or equal to ϵ,
where 0≤ϵ≤1/2
That is, P(T⊕H)≤ϵ

Probably approximately correct


As the training samples are drawn randomly, there is a
chance for the selected training sample to be misleading.
Hence, these training samples are not always actual
representations of the population, in which case the
hypothesis generated may not be approximately correct.
Therefore, the goal is to achieve low generalization error with
high probability.
P(Error(H)≤ϵ)>1−δ

P(P(T⊕H)≤ϵ)>1−δ
Now, suppose we have been given the below errors of a
hypothesis: H1 on 5 different sample sets of the university
acceptance training set. ϵ =0.05, δ=0.2.

Sample set Error(H1)

1 0.069

2 0.03

3 0.021

4 0.05
5 0.013

Out of the 5 sample sets, the error on the sample set 1 is


greater than the upper bound of the error; 0.05. Therefore,
the hypothesis is incorrect for the sample set 1. Hence, we
can say that the hypothesis is approximately correct 4/5 (0.8)
times. Since 0.8 ≥ 1-0.2, we say that H1 is probably
approximately correct.
Similarly, suppose below are the errors of another
hypothesis: H2 on 5 different sample sets. ϵ=0.05, δ=0.2.

Sample set Error(H1)


1 0.057
2 0.041
3 0.072
4 0.023
5 0.064

Out of the 5 sample sets, the errors on sample sets 1, 3, and


5 are greater than the upper bound of the error; 0.05.
Therefore, the hypothesis is incorrect for sample sets 1, 3,
and 5. Hence, we can say that the hypothesis is
approximately correct 2/5 (0.4) times.
Since 0.4 < 1-0.2, we say that H2 is not probably
approximately correct.

Noise
Noise in machine learning is irrelevant or unpredictable data that can make
machine learning models less accurate. It can be caused by human error,
unreliable data collection tools, or random variations.
What causes noise?
 Human error: Mistakes made by people who collect or process data
 Unreliable data collection tools: Tools that don't collect data accurately
 Random variations: Natural variability in complex systems
 Attacks: Malicious attempts to introduce noise into data
How does noise affect machine learning?
 Noise can make it difficult to identify patterns in data
 Noise can cause algorithms to misinterpret data
 Noise can lead to performance issues like overfitting
 Noise can make machine learning models less robust and reliable
Learning Multiple Classes

Multiclass Classification

Multiclass classification is a machine learning classification task that consists of more than
two classes, or outputs

Examples

 using a model to identify animal types in images from an encyclopedia is a multiclass


classification example because there are many different animal classifications that
each image can be classified as any animal
 Handwritten digit Recognition
 Face class classification
 Animal class classification
 Fruit classification
 Optical Charcater Recognition

In classification, we are presented with a number of training examples divided into K


separate classes, and we build a machine learning model to predict which of those classes
some previously unseen data belongs to (ie. the animal types from the previous example). In
seeing the training dataset, the model learns patterns specific to each class and uses those
patterns to predict the membership of future data.

Binary vs. Multiclass Classification


Binary classification is used to organize data into two classes. Examples of binary
classification include: email spam detection, churn prediction, and conversion
prediction.

Multiclass classification permits multiple classes. Examples of multiclass


classification include: face classification, animal species classification, and optical
character recognition.

Classifiers Used in Multiclass Classification


Naive Bayes

K-nearest neighbor

Decision Trees
Regression

Machine Learning Regression is a technique for investigating the relationship between


independent variables or features and a dependent variable or outcome. It’s used as a method
for predictive modelling in machine learning, in which an algorithm is used to predict
continuous outcomes.

Regression is a key element of predictive modelling, so can be found within many different
applications of machine learning. Whether powering financial forecasting or predicting
healthcare trends, regression analysis can bring organisations key insight for decision-
making. It’s already used in different sectors to forecast house prices, stock or share prices, or
map salary changes

Common use for machine learning regression models include:

 Forecasting continuous outcomes like house prices, stock prices, or sales.

 Predicting the success of future retail sales or marketing campaigns to ensure


resources are used effectively.

 Predicting customer or user trends, such as on streaming services or e-commerce


websites.

 Analysing datasets to establish the relationships between variables and an output.

 Predicting interest rates or stock prices from a variety of factors.

 Creating time series visualisations.

Simple Linear Regression

Multiple linear regression

Example: Suppose there is a marketing company A, who does various advertisement every
year and get sales on that. The below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants to
know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis

In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors


o Determining Market trends
o Prediction of road accidents due to rash driving.

Linear Regression:
o Linear regression is a statistical regression method which is used for predictive
analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.

o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Mean Squared Error?

The Mean Squared Error measures how close a regression line is to a set of data points. It is a
risk function corresponding to the expected value of the squared error loss.

Mean square error is calculated by taking the average, specifically the mean, of errors
squared from data as it relates to a function.
Fig: Regression Line

A larger MSE indicates that the data points are dispersed widely around its central moment
(mean), whereas a smaller MSE suggests the opposite. A smaller MSE is preferred because it
indicates that your data points are dispersed closely around its central moment (mean). It
reflects the centralized distribution of your data values, the fact that it is not skewed, and,
most importantly, it has fewer errors (errors measured by the dispersion of the data points
from its mean).

Lesser the MSE => Smaller is the error => Better the estimator.

The Mean Squared Error is calculated as:

MSE = (1/n) * Σ(actual – forecast)2

where:

 Σ – a symbol that means “sum”

 n – sample size

 actual – the actual data value

 forecast – the predict

 ted data value

 This algorithm explains the linear relationship between the dependent(output) variable
y and the independent(predictor) variable X using a straight line Y= B0 + B1 X.

 The goal of the linear regression algorithm is to get the best values for B0 and B1 to
find the best fit line. The best fit line is a line that has the least error which means the
error between predicted values and actual values should be minimum.



Ill-Posed Problem

An ill-posed problem in machine learning is a problem that has multiple


solutions, no solutions, or an unstable solution procedure. This can happen
when there is not enough information in the training data to choose a single
best model.

a well-posed problem is that for which

1. a solution exists,

2. the solution is unique,

3. the solution’s behaviour changes continuously with


the initial conditions.

Problems that are not well-posed are also referred to as


“ill-posed”.
Inductive Bias

Inductive bias can be defined as the set of assumptions or biases that a


learning algorithm employs to make predictions on unseen data based on
its training data. These assumptions are inherent in the algorithm's design
and serve as a foundation for learning and generalization.
The inductive bias of an algorithm influences how it selects a hypothesis
(a possible explanation or model) from the hypothesis space (the set of all
possible hypotheses) that best fits the training data

Model Selection and Generalization

Model Selection

Model selection is the process of deciding which algorithm and model


architecture is best suited for a particular task or dataset

A model's ability to generalize to new, untested data may not be as strong as


its ability to perform effectively on a single dataset or problem. Finding a
perfect balance between the complexity of models & generalization is
therefore key to model selection.

Model Generalization

Generalization in machine learning refers to the ability of a trained model to


accurately make predictions on new, unseen data. The purpose of
generalization is to equip the model to understand the patterns and
relationships within its training data and apply them to previously unseen
examples from within the same distribution as the training set.

A spam email classifier is a great example of generalization in machine


learning. Suppose you have a training dataset containing emails labeled as
either spam or not spam and your goal is to build a model that can
accurately classify incoming emails as spam or legitimate based on their
content.

 Overfitting refers to a scenario where a machine learning model


memorizes the training data but does not correctly learn its
underlying patterns. Overfit models perform exceptionally well on
training data but fail to generalize to new, unseen data. This is
because the model becomes too complex or too specialized to the
training set, capturing noise, outliers, or random fluctuations in the
data as meaningful patterns. Overfitting causes the model to be
overly sensitive to small fluctuations in the training data, making it
less robust to noise or variations in the real world.

 Underfitting occurs when a machine learning model is too simplistic


and can’t capture the underlying patterns in the data. An underfit
model typically exhibits high error on both the training and testing
data. Underfit models also typically exhibit high bias because they’re
not expressive enough to accurately represent the data.

 Diettrich 2003 (“Machine Learning”) describes the triple tradeoff for empirical
(supervised) learning. That is, there exists a tradeoff between:
 1) The size or complexity of the learned classifier
 2) The amount of training data
 3) The generalization accuracy on new examples

Validation Set
The validation set is used to fine-tune the hyperparameters of the model
and is considered a part of the training of the model. The model only sees
this data for evaluation but does not learn from this data, providing an
objective unbiased evaluation of the model.

Test Set
The final step is to use a test set to verify the model's functionality. Some publications refer to the
validation dataset as a test set, especially if there are only two subsets instead of three. Similarly, if
records in this final test set have not formed part of a previous evaluation or cross-validation, they
might also constitute a holdout set.

Cross Validation

You might also like