0% found this document useful (0 votes)
10 views16 pages

INT354 Unit 1 Part1

The document provides an introduction to machine learning, explaining how machines learn input-output functions and defining machine learning through historical definitions. It discusses the statistical learning framework, including concepts like instance space, data distribution, and the Bayes optimal classifier, as well as the empirical risk minimization (ERM) approach to minimize loss in training datasets. Additionally, it addresses overfitting and the importance of inductive bias in restricting the hypothesis space to improve model performance.

Uploaded by

ihtgoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views16 pages

INT354 Unit 1 Part1

The document provides an introduction to machine learning, explaining how machines learn input-output functions and defining machine learning through historical definitions. It discusses the statistical learning framework, including concepts like instance space, data distribution, and the Bayes optimal classifier, as well as the empirical risk minimization (ERM) approach to minimize loss in training datasets. Additionally, it addresses overfitting and the importance of inductive bias in restricting the hypothesis space to improve model performance.

Uploaded by

ihtgoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Introduction to Machine

Learning
How Machines Learn?
Learning Input-Output Functions
• Goal- We are trying to learn a function f (target function)
• Input f Output
• f takes a vector-valued input a n-tuple x=(x1,x2,…….,xn)
• F itself may be vector-valued  yielding a k-tuple as output
Example: Learning Input-Output
Functions
Example: Learning Input-Output
Functions
Well Posed Learning
Definition # 1
Arthur Samuel (1959) coined the term machine learning
and defined it as: ‘the field of study that gives computers
the ability to learn without being explicitly programmed.’ This
is an informal and old definition of machine learning.
Definition # 2
In 1998, Tom Mitchell redefined the concept of machine
learning as ‘A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measures P; if its performance at tasks in T, as
measured by P, improves with experience E.’
Well Posed Learning Problem

• Classification of emails as spam or not spam --- Task T

• Tracking user marking emails as spam or not spam --- Experience E

• The number of emails correctly classified as spam or not spam ---


Performance P
Designing a Learning System
Statistical Learning Framework
• Define Instance space (X) and Label Space (Y)
• Assume Data Distribution (D)
• D(x,y)=Pr(X=x).Pr(Y=y|X=x)
• Pr(X=x): Marginal Distribution (µ)x))
• Pr(Y=y|X=x): Conditional Distribution (ƞ(x))
• Goal: Build a classifier c
• True Vs Empirical Error
• True error: depends on unknown D
• Empirical Error: Uses training data
Statistical Learning Framework
• Bayes Optimal Classifier
• Theoretical classifier minimizing true error.
h∗(x)=1if Pr⁡(Y=1∣x)≥0.5, otherwise −1.
• Machine Learning Pipeline
• Training Phase: Optimize
• Testing Phase: Evaluate on disjoint test data.
• Expected Results in Statistical Framework
• With sufficient data, c's future performance approximates true
error.
• Generalizes if training error approaches true error.
Empirical Risk Minimization
(ERM)
• ERM aims to find a model that minimizes the average
loss on a given training dataset.
Visualizing ERM
• Imagine a target (the true function) and a set of darts
(models).
ERM
• Goal: to throw a dart as
close to the center of the
target as possible.
•The Target: Represents the true
underlying function
•The Darts: Represent different models,
each with its own set of parameters.
•The Bullseye: Represents the optimal
model, which minimizes the loss.
ERM with Inductive Bias
• The Overfitting can be
avoid by:
• Limiting the number of
possible hypothesis
• Increasing the number of
training data
ERM with Inductive Bias
• Restricting the search space
of ERM to avoid Overfitting
• Hypothesis Class (H)
• ERM
• By limiting the search space
to a specific hypothesis class,
we can reduce the risk of
overfitting. This restriction is
often referred to as inductive
bias. The left circle is labeled Hypothesis Space
(H) and the right circle is labeled All
Possible Predictors. The overlapping
region is labeled ERM(S).

You might also like