0% found this document useful (0 votes)

28 views43 pages

Chapter 01

Uploaded by

Prerna Bhandari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views43 pages

Chapter 01

Uploaded by

Prerna Bhandari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

INF 8245 E -

Fall 2021

Machine
Learning
-

Sarath Chandar

t.IN/-roduction-#
( -
Machine Learn
:\
Introduction to Machine
1.
- Learning
that
Study of algorithms
-

improve their performance P

at some task t
with
experience E
-

Learning task : LP,T

,
E >

Example : -

1) T :
spam classification

p :
Accuracy
E : set of example mails labelled as
spam or not
spam .

2) T : credit card fraudulent transaction detection

p :
Accuracy
E : set of historical transactions marked as
legit
or
fraud .

3) :
Playing the
of chess
T
game

P : number of games won

of game trajectories from

E set
expert
:

players .
Prediction :

The most common ML

application .

input output
'
Agent
✗ → →
y

observation Prediction

E✗amp

① Given house
selling
the the
predict
size of ,
the

price .

the house
✗ : size of .

Y: price
selling
.

② Given the current stock price of a

company , predict

the same after 10 minutes .

✗ : stock
price at time t

y : stock price at time ttio

Given
③ an
image , predict the
object in the
image .

✗ !
Image as pixels

Y :
object name from a
predefined set

e.
g. { dog catbirds flight ball
, , , }
④ Given an
image of a handwritten
digit predict ,
the
digit .

✗ : ☒
Y : 7
[ one
of { 0,1 ,
2 .
.
9 }]

⑤ temperature
Given the ,
humidity ,
wind predict whether

it will rain or not .

✗:
temp humidity , , wind

y :
yes / No

Note : ① In examples ① and ②

,
Y is a real number .

y E- R

② In examples ③ ④ ⑤
,
, , y is
categorical .

y E- { dog ,
cat ,
birds ,
flight ,
ball } in ③

YE { 0,1 ,
. . .9 ] in ⑥

ye { yes ,
no } in ⑤

{ YER-tregressi
c-
o npoobl
{ c.
e m.l
,
y?
. .
.cn } classification probl
9
Categorical
In ⑤ { yes
, y c-
,
no
} →

binary classification problem .

Prediction task Given

-
: -
some
input ✗
,
make a
good
prediction of output y ,
denoted
by y^ .

✗ can be a vector ✗ = Lxi ,

Nz . . -

xp >
-

↳ P''

features
y can be a scalar or a vector .

Note : y →
target
5 → Model 's prediction
-

Supervised
-
Learning : -

N
"
{ n'
'"

training
set
Given a
, y }i= ,

data
of N points learn a
prediction function f
:n→y
,

such that given a new n

f can
accurately
,

predict the
correspondingly .

-
Note function
'

: The prediction
'

f- is useful

it can make accurate predictions

if
only
'

n' We will call this

for unseen .

capability as generalizationtounseen
instances .
Generalization is a
key requirement

for any
ML
algorithm .

Simpke✗amp#
Hoasi-griepredictioninportland.org#
- -

Jhon lives Portland wants to

,
who in , Oregon

his house and wants to know what a

sell

would be One to do
market price way
.

good
this is to first collect information on recent

model
sold and make of
housing
a
houses

prices .
This is an example for regression .
following dataset
Let us consider the :

+
✗ Y

the house
size of Price the house
of .

( in square feet )

2104
399,900
1600 329,900

2400 369,000
"

:
.
:

Let have 47 data points the

say
us we .

first would be to split

step the data

data and test data

into
training
.

ttet
to
① we will use the
training data

train the model .

will the test data

②
use
we
to test the generalization of the
only
model .

Note : Test set is a

proxy for the

true performance model when

of the

it after
'

training
sees a new n
.

training
Let us divide the dataset into 30

and 17 test instances

instances .

training
data :
Visualization of -

÷
N
Now consider this simple model :

|y^=wo+w,n
kind this fit ?
What of curves can model

fit lines
This model
only
can .

y n yn yn

¥→ .

n
:I -# II.
Wo = 1.5 Wo =
O Wo = I

W W 0.5 W =
0.5
,
=
O ,
=
,

→ Note the significance of

adding Wo .

→ If you dont add Wo , you can

only cover

lines
passing through the origin !

Wo = bias in ML literature .
§ =
Wo t W
,
R

µ
Parameters of the model .

W = ( Wo ,w , )

y^ ( ;w )

wotw.net#Gnweantmparameersw.w?-
n =

First step is to define an

objective function

that the model should achieve .

ob-j:/ vewant.jcniwltobeas~closeaspossible.to#,
This can be done
by minimizing
the

following emorfench
instances .

Number of training
←
'
N

{ goin yin }
'

E.
.
)
Ecw )
I
= ;w -

i
→ T
target
.

least squares model 's

function prediction
'

error

yln
'

minimise this
99"

÷ { n' l
}
" "
Ecw) ( w.tw ,
1-2 y
-

|obi:m:%
How to find the minimum ?

→ This error
fn .
is a
quadratic fn .
of the

parameters .

one

there
is only
minimum .

At the derivative the

the minimum
,
of

fn will be
.
w.int .
the
parameters w zero .

i. e. dE_ = 0 .

Ecw ) will be linear

'
→ derivatives of w.r.t.io

the elements
in of w .

1-7 min Ecw ) has a

unique solution w*

which can be found in closed form .

thefiltedmodeli.gl#w7- wtwn
Note : This is an example of a
linear model .

r-mo-delithemodelislinearintermsofthe-yfparameters.gl
; near models with quadratic error
functions
have an unique closed
form solution !

→ More on this later .

-
Switching tot×E :

① = Wo t win

① = Wo . I + W
,
N

ret w -

l ;) .
✗ ⇐

I :)
Then 5 = wtn .

with 1 to
Note : We will
often pretend n

avoid
treating the bias
separately .

Now consider the entire

training data :

f-i-i .fi:0#:: : : : .i
"

(2)
y
"

:
"
N'
nzlm .
.
.
.
.

one feature
✗ = data matrix Y =
target vector .

rows in ✗ = examples .

Columns in ✗ =
features .

✗ = NXP matrix .

N ✗ 1 Vector
Y =

w = PXI vector .

matrix vector
I = ✗ W Single
to
9 9 multiplication
NXP PXI
- predict y for all

Nxt
N
examples .

|±±¥÷]
"" " " " " "

¥
IF
Solution to least squares : -

Ecw ) =t( Y -
✗ W ) -

✗ w )

set it to zero to find

diff .
w.it . w
,

the minimum .

(yaw )
¥
-
-
-
o

XT ( y -
✗ W ) = 0

✗TY -
✗ TXW =
0

XTXW = ✗ Ty

|w*=(✗T×5f
c-
Moore -

Penrose
Pseudo - inverse of matrix
✗

Now
given a new n
,
we can find the

target y
as
follows :

y^= w*T ✗ ,
Baeito zpmpLe
: -

the learnt linear

How good
is regression

model ?

Leastsquaresemor(LsE7_
Not insane
←
2
scale as

¥?(y
"
! win
ECW) = 1- prediction
2
¢
N
depends on .
Better metric for performance :

Root Mean Square Error 42ms E)

Ekras
⇐

,|2
scale
Node y
as
: RMSE is on
the same .

Train RMSE = 68727.04

Test RMSE =
57976.80 .

The model is off by approx

. 50k$ .
still

a decent performance for a

simple model .

-
Summary :

Linear
-
regression
Model : I = win ( linear model )
-

Parameters : w

objective fn Ecw ) 4- WHY )

f-
: ✗ xw
-

- -

son:w*=×i×
Me : RAISE =

,|2Elw
Synthetic example :
- -

Consider the
following synthetic function

Y= sin @ Tn ) + c-

where we have added a small Gaussian

noise C- to the output of the sin

function .

C- ~ Nco ,
0.3 )
9 9
mean std deviation
.

training following
The data consists the
of
to data
points sampled from this
fn .

Note:I : In real
life problems .
we will not

have access to true function y .

Adding to the
Note2 Gaussian noise

Because ,
sin fn is reasonable .
even in real life observed
'
'
will have
y
.

some noise due to observation errors .

or the

inherent stochas #
city of
the
process that
generates
function !
'

the
y

Linear solution :
regression
-

- -

Looks like the

line
learnt fn
does not fit the

data well
very .

we can consider higher order

polynomials !

y^( n
;w ) =
Wo + win
+ wait . . .tw
,yn

a.
rf raised to the
:
power of
2
Note
: n .

al
"
: n of
the second example

Nz : 2nd feature of a .
the
fact that n°-1 can write
Using
we
,

y^ ( n
;w) = wont -1W , n 't . . . .
+ WMNM
M
sÉ
= I
wj
j=o

I :* )
Wo Ñ
Let W =
n =

'Wµ

y^ In ;w) = win .

Mode : 5 ( n
;w ) is a non-linear fn of.
n
.
But

5 ( n ;w) is still a linear fn . of w ! So

this is still a linear model .

;-§? { scrim ;w7 )

'm
Elul = -

y
,

is still a
quadratic fn .
of W .
Hence
,

unique solution exists .

n is a vector instead of being a

the Same
scalar .
However ,
we can still use

least squares solution .

14=2 is still not
good .

looks
Solutions for 14=3,4 good .

for slightly deviating

M= 5 is
Solution

from the sin curve .

keep
increasing ?
M
What happens as we
Solution for M=9 achieves zero
training
error !

Is this solution ?
a
good
NO . The fitted curve oscillates
wildly
of the
representation
-

and
gives
a
poor

for sin # ) n .

↳ This is known as
¥ng .
In this example ,
we know the true function .

So we can
tell that 1×1--9 is not a
good
it
approximation of
.

How to tell
if a model is
overfitting
when we
dont know the true function ?

its
→
If a model is
overfitting ,

.i
should be
generalization performance
bad

high

oO÷÷÷
error .

→ zero
training
error

good region
Kait ! 9 degree polynomial contains 3
degree
then
polynomial as
special case .

why M=q

performs bad ?

Let us look at the learnt w ?

As +9 increases the
magnitude of the coeff gets
.

larger .

For M=9 ,
10
weights are
heavily tuned for
the
given
to data
points !

9-degree polynomial contains the 3-

degree
model should
polynomial . So a 9-
degree
be able to recover the 3-
degree
too
the weights
solution
by setting remaining
.
In other words ,
9-
degree polynomial
to model
model is
expressive enough
the
given on .

But we are
notabktoarn
that solution .

Howte°fg ?

solution Add more

data
points / examples so

that the model cannot overfit .

15 examples .

too examples .

data points
With 100 ,
M=9 model
approximates
the true function very
well !
What if you cannot obtain more data ?

Solution
-
2 :
-

Add a
penalty term to the error

order to
function in
discourage the coefficients

reaching large
from values .

Ecw ) =
f- ¥ ,
{ 5 Cn
"
;w ) -

yen
.

]
11W 112
Iz
+

where llwlid __
WTW = wit wit . -

twµ)
This is known as
regularization .

regularization term .

relative the
✗ = Controls the
importance of
term
regula ligation .

Note:_ Ecw) is still a

quadratic fn .

of w .

fn be
exactly
This error .
Can
minimized
in closed form .

We will derive this solution later .

How to choose ✗ ?

It to work with An ✗ and

easy
is

compute ✗ as elm ✗
.

T T regularization
.

No much
too
regularization
too
when ✗ is small
,
then no
regularization .

When ✗ is too
high ,
too much
regularization .
It crucial to choose the
is
very right ✗ .

|7=hyper-parameterofthem
-
fix the
hyper parameter
normally
we will -

and learn the

parameters from the data .

to choose ✗ ? Can we use the

But how

test set to choose ✗ ?

III
proxy for
to be
Test supposed
a
set is

completely
new n .

choose 7 based the test set , then

If you
on

has the test set So the

model
.

the seen

test set will not be the

proxy for the

of the model
real performance .

what else can we

do ?

hold out set

Create a
separate .
awoa.g.a#eTh#isperouceaduirueieskonofwxnaiasm.io-denfsa6scbtioen.-sgt)
values 7
① For different of ,

✓ train the model .

set
✓ Compute the performance in valid .

validation performance
.

test performance for the

③ Compute the

(f)
good
region
of ✗ .
Given
training split
training
a set we the data into
,

and use the validation set

and validation part
known as
for model selection .
This technique is

Chida .

É☒
-

the chosen train / valid split not

what if is

What
if it bad validation
representative ? was a

split ?

We can do k-

fold cross-validation .

fold I
fold 2 folds fold 4 folds

valid train

train valid train

train Valid train

train valid train

Valid
train
-
Divide the data into K -
folds (disjoint )
folds for fold for
training
-
Use K-1 and last

testing .

-
Repeat previous step to test with all folds .

' '

Average
all k
performance .

very costly !
-

Mot feave-Cidn ! -

Inner the

dataset is too small

,
with n
points ,
we will

do validation
n -

fold cross -
.

-
this kenew the true function
In example ,
we

polynomial and hence

is a 3 -

degree we can

obtain it from the 9 -

degree polynomial

by regularization .

What if the true function was a 15 -

degree
polynomial and we dont know that ?

other words How can we choose the

In ,

value of M
?

M -
is also a
hyper parameter !

Solution3 different values

of M
Try
and select the value of M based

the Val set

performance in .
.

This is also model selection !

-
Machine
- Learning Pipeline :

① Define the
input and
output hey
:

② Collect examples for the task .

examples brain / valid / test set

③ Divide the into
-
.

data
④ Do
preprocessing
.

model
④ Define your

Paramekrsandhyperparamelers.JO
-
Model will consist
of

to
Define the
eÉn you want

minimize .

⑤ For different values of hyper parameters

by mnimijngn
learn model
-

params .

Compute validation
performance
-
.

⑦ Pick the best model based on the Val .

perf .

= ⑧ Test the model with test set .

Classic : -

Consider the
following binary classification
problem We are
interested
classifying the
.
in

either blue class

points
"

data as
Orange
"
or
class .

There are 2
features :

N, & N2 .

we have 100 examples

per class .

class 1 : blue

class 2 :
Orange .

to
We will
always convert the
targets numbers .

Blue :O Blue = -1
(
or )

orange
__ 1
Orange =
+1

so1ution Linear
-
model

we can use the same idea

from
regression .
For { 0,1 } classification ,

5>0.5

{
1 if
I =

0 if 5<-0.5

5= Decision boundary .

For § -1,13 classification ,

{
-1 if < o

y^ =

1 if 570

y Decision
boundary .

Orange
region
f)
Linear model without

bias term .

(
→ blue
region

decision
boundary wTn=
?⃝
?⃝
?⃝
linear model with

bias term .

classified
NIE : some blues
wrongly as
orange
& Vice - Versa .

boundary
line
clearly
not decision
good
is a

for this classification problem .

Solutions :

Nearest-neighbormethods.ve
those
training
observations in the set

+ closest in to
input space n to
form y^ .

If more
neighbors are class 0
,
then predict

class 0 and vice -

Vasa .
" '

541 1-
"gµ,,zgy
-

.
-2

k od

N ,< In ) =

Neighborhood of n .

↳ k closest n' " in T

=
points .

±
what metric ?
Euclidean distance .

if y^fn7 > 05 , class 1

else class 0 .

This like
is
majority voting from the

neighbors .

This model assumes that the class distribution

is
localtysnnooth .
12=10 is

decision
boundary
and
more
irregular
far local
to
responds one
where
clusters
dominates
.

class

There are still some mis classifications .

=
1<=1 .

The decision
boundary
more
is even
irregular
than before .

There are no mis classification .

k= hyper parameter
-

of the nearest
neighbor
algorithm .

Let us do model selection

using a
separate
validation set .
to
1<=1 over
fits the
training data !

better than linear model

K NN performs
.

9 model .

linear
Non
-

Are there
any
other hyper param .

for

K -
NN
algorithm ?
melrictocomputew.nl#
Least
-
squares
Vs .
Nearest neighbors : -

É"
Least Nearest
squares
Neighbors .

boundary boundary
-
Decision is Decision is
-

smooth
very .

on a handful of
input points & their

positions .

-
More stable - Less stable .

-
Assumes that the -
Assumes that the

boundary
decision class distribution is

linear
locally
is .

smooth .

↳
strong assumption
-

high bias
,
low -

low bias ,
high
variance

y
Variance .

-
You should know !
-

① Prediction problem .

② Regression / classification
③ Supervised learning / Generalization
Linear
④ regression

⑤ bias in linear model .

⑥ Parameter /hyper parameter of a model .

①
overfitting Add more data
⑧ to ✓
overpaying
solutions
\
Regularization .

⑨ Model selection

④ cross-validation, k -
fold cross - validation .

ML
pipeline
④ Nearest
neighbor classifier .

-
?⃝

Assignment: Case Study - 1: Operation Analytics
59% (27)
Assignment: Case Study - 1: Operation Analytics
5 pages
NN Theory
No ratings yet
NN Theory
138 pages
Lecture1 2015
No ratings yet
Lecture1 2015
52 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Neural
No ratings yet
Neural
68 pages
Linear-Regression 231212 072619
No ratings yet
Linear-Regression 231212 072619
13 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
No ratings yet
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
387 pages
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
No ratings yet
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
91 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
ML 2
No ratings yet
ML 2
39 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Neural 13
No ratings yet
Neural 13
34 pages
Machine Learning Lecture1
No ratings yet
Machine Learning Lecture1
56 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
17 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
S24 Lecture 2 ML Problem Formulation
No ratings yet
S24 Lecture 2 ML Problem Formulation
38 pages
10 11 12 Neural Network
No ratings yet
10 11 12 Neural Network
20 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
First Cours 2
No ratings yet
First Cours 2
42 pages
Perceptron
No ratings yet
Perceptron
26 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
ANN For Fitting Applications (Data Modeling)
No ratings yet
ANN For Fitting Applications (Data Modeling)
47 pages
ML 01
No ratings yet
ML 01
24 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Machine Learning
No ratings yet
Machine Learning
95 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
ML 2
No ratings yet
ML 2
155 pages
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
No ratings yet
ECS171: Machine Learning: Lecture 1: Overview of Class, LFD 1.1, 1.2
29 pages
CS229
No ratings yet
CS229
216 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
ML Workshop
No ratings yet
ML Workshop
78 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Ad8552 ML Unit Ii
No ratings yet
Ad8552 ML Unit Ii
94 pages
Introduction To Matlab Neural Network Toolbox
No ratings yet
Introduction To Matlab Neural Network Toolbox
10 pages
ML PPTS Merged
No ratings yet
ML PPTS Merged
514 pages
Week 1-5
No ratings yet
Week 1-5
13 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Week 01
No ratings yet
Week 01
37 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
KMeans
No ratings yet
KMeans
2 pages
Case Study
No ratings yet
Case Study
1 page
T Test
No ratings yet
T Test
3 pages
OOPs Syllabus
No ratings yet
OOPs Syllabus
4 pages
Prob Syllabus
No ratings yet
Prob Syllabus
7 pages
Chapter 03
No ratings yet
Chapter 03
12 pages
Salesforce
No ratings yet
Salesforce
2 pages
SQL Syllabus
No ratings yet
SQL Syllabus
2 pages
Chapter 02
No ratings yet
Chapter 02
23 pages
Analysis of Regression
No ratings yet
Analysis of Regression
22 pages
Statement of Purpose: A. B. C. D. E. F. G. H. I. J. K. L. M. N. o
No ratings yet
Statement of Purpose: A. B. C. D. E. F. G. H. I. J. K. L. M. N. o
5 pages
Lab - VI
No ratings yet
Lab - VI
2 pages
Association Rules: An Association Rule Has 2 Parts
No ratings yet
Association Rules: An Association Rule Has 2 Parts
3 pages
When Not To Normalize Your Data: Related
No ratings yet
When Not To Normalize Your Data: Related
6 pages
SimpliLearn Project1
No ratings yet
SimpliLearn Project1
1 page
Numerical Analysis
No ratings yet
Numerical Analysis
3 pages
RTTT
No ratings yet
RTTT
20 pages
Statistical Methods
No ratings yet
Statistical Methods
1 page
Assignment Statistical Quality Control
0% (1)
Assignment Statistical Quality Control
2 pages
Discuss The Weibull Failure Model and Obtain The Expressions For Reliability Function
No ratings yet
Discuss The Weibull Failure Model and Obtain The Expressions For Reliability Function
1 page