0% found this document useful (0 votes)

46 views61 pages

Intro to Machine Learning Basics

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views61 pages

Intro to Machine Learning Basics

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction of

Machine / Deep Learning

Hung-yi Lee 李宏毅
Machine Learning
≈ Looking for Function
• Speech Recognition

f( ) = “How are you”

• Image Recognition
f( ) = “Cat”
• Playing Go

f( ) = “5-5” (next move)

Different types of Functions
Regression: The function outputs a scalar.
PM2.5 today
Predict PM2.5 of
PM2.5
temperature f tomorrow
Concentration
of O3

Classification: Given options (classes), the function outputs

the correct one.

Spam
filtering f Yes/No
Different types of Functions
Classification: Given options (classes), the function
outputs the correct one.
Each position
is a class
(19 x 19 classes)

Function
a position on
the board

Next move
Playing GO
Structured Learning
create something with
structure (image, document)

Regression,
Classification
How to find a function?
A Case Study
YouTube Channel

[Link]
The function we want to find …

𝑦=𝑓
no. of views
on 2/26
1. Function
with Unknown Parameters

𝑦=𝑓

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

feature
𝑦: no. of views on 2/26, 𝑥!: no. of views on 2/25
𝑤 and 𝑏 are unknown parameters (learned from data)
weight bias
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.3k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

label 𝑦&

4.9k
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

𝑒"= 𝑦 − 𝑦& = 2.1𝑘 𝑒#
𝑦& 𝑦&

4.9k 7.5k 9.8k

Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.

4.8k 4.9k

𝑏 + 𝑤𝑥! = 𝑦 1
Loss: 𝐿 = , 𝑒"
𝑒! 𝑁
"
𝑦&

4.9k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

𝑒 = 𝑦 − 𝑦& "
𝐿 is mean square error (MSE)
If 𝑦 and 𝑦& are both probability distributions Cross-entropy
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
Model 𝑦 = 𝑏 + 𝑤𝑥! values is.
Small 𝐿

𝑏 Error Surface

Large 𝐿 𝑤
Source of image: [Link]

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss
𝐿 Negative Increase w

Positive Decrease w

𝑤( 𝑤
Source of image: [Link]

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
𝜕𝐿
𝜂 |%)%! 𝜂: learning rate
𝜕𝑤
hyperparameters

𝑤( 𝑤! 𝑤
Source of image: [Link]

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
Ø Update 𝑤 iteratively
Does local minima truly cause the problem?

Local global
minima minima
𝑤( 𝑤! 𝑤" 𝑤* 𝑤
3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
%,'

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Ø Compute

𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑤! ← 𝑤( −𝜂 |%)%! ,')'!
𝜕𝑤 𝜕𝑤
𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑏! ← 𝑏( − 𝜂 |%)%! ,')'!
𝜕𝑏 𝜕𝑏

Can be done in one line in most deep learning frameworks

Ø Update 𝑤 and 𝑏 interatively
Model 𝑦 = 𝑏 + 𝑤𝑥!

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑏 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
(−𝜂 𝜕𝐿⁄𝜕𝑤, −𝜂 𝜕𝐿⁄𝜕𝑏)

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data

Training

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

on data of 2017 – 2020 (training data)
How about data of 2021 (unseen during training)?
𝐿′ = 0.58𝑘
Red: real no. of views
𝑦 = 0.1𝑘 + 0.97𝑥! blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
2017 - 2020 2021
𝑦 = 𝑏 + 𝑤𝑥!
𝐿 = 0.48𝑘 𝐿′ = 0.58𝑘
'
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.38𝑘 𝐿′ = 0.49𝑘
#$!
𝒃 𝒘∗𝟏 𝒘∗𝟐 𝒘∗𝟑 𝒘∗𝟒 𝒘∗𝟓 𝒘∗𝟔 𝒘∗𝟕
0.05k 0.79 -0.31 0.12 -0.01 -0.10 0.30 0.18
()
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥# 𝐿′ = 0.46𝑘
𝐿 = 0.33𝑘
#$!
%&
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.32𝑘 𝐿′ = 0.46𝑘
#$!
Linear models
Linear models are too simple … we need more sophisticated modes.

Different w
Different 𝑏

𝑥!

Linear models have severe limitation. Model Bias

We need a more flexible model!
red curve = constant + sum of a set of

0
𝑥!

2
All Piecewise Linear Curves
= constant + sum of a set of

More pieces require more

Beyond Piecewise Linear?
Approximate continuous curve
𝑦 by a piecewise linear curve.

𝑥!
To have good approximation, we need sufficient pieces.
red curve = constant + sum of a set of

How to represent
this function? Hard Sigmoid

𝑥!

Sigmoid Function
1
𝑦=𝑐
1 + 𝑒 2 '3%4"
= 𝑐 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 + 𝑤𝑥!

𝑥!
Different 𝑤

Change slopes

Different b

Shift

Different 𝑐

Change height
red curve = sum of a set of + constant

𝑦
𝑐! 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏! + 𝑤!𝑥!
1

𝑐5 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏5 + 𝑤5𝑥! 3

0
𝑥!
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
0 * 1 + 2 + 3
𝑐" 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏" + 𝑤"𝑥! 2
New Model: More Features
𝑦 = 𝑏 + 𝑤𝑥!

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
*

𝑦 = 𝑏 + , 𝑤# 𝑥#
#

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #
𝑗: 1,2,3
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# no. of features
* # 𝑖: 1,2,3
no. of sigmoid
1
𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5 + 𝑤!!
𝑏! 𝑤!" 𝑥!
𝑤67 : weight for 𝑥7 for i-th sigmoid 1
𝑤!5
2 𝑥"
𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5 +

1 𝑥5

3
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

𝑟( = 𝑏( + 𝑤(! 𝑤(( 𝑤(+ 𝑥(
𝑟+ 𝑏+ 𝑤+! 𝑤+( 𝑤++ 𝑥+

𝒓 = 𝒃 + 𝑊 𝒙
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑟! + 𝑤!!
𝑏! 𝑤!" 𝑥!
1
𝑤!5
2 𝑥"
𝒓 = 𝒃 + 𝑊 𝒙 𝑟" +

1 𝑥5

3
𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
1 𝑏! 𝑤!" 𝑥!
𝑎! = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑟! = 1
𝑤!5
1 + 𝑒 28"
2 𝑥"
𝑎" 𝑟" +

1 𝑥5

3
𝒂 =𝜎 𝒓 𝑎5 𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄 , 𝒂
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄, 𝒂

𝒂 =𝜎 𝒓 𝒓 = 𝒃 + 𝑊 𝒙
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Function with unknown parameters
𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

𝒙 feature Rows
of 𝑊

……
𝜃!
Unknown parameters
𝜃(
𝜽 =
𝜃+
𝑊 𝒃 ⋮

𝒄, 𝑏
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Loss Ø Loss is a function of parameters 𝐿 𝜃
Ø Loss means how good a set of values is.

feature

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
𝑒
label 𝑦
D
Given a set of values

1
Loss: 𝐿 = , 𝑒"
𝑁
"
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Optimization of New Model 𝜃!

𝜽∗ = 𝑎𝑟𝑔 min 𝐿 𝜽 = 𝜃(
𝜽 𝜃+
⋮
Ø (Randomly) Pick initial values 𝜽(
𝜕𝐿 𝜕𝐿
|𝜽$𝜽! 𝜂 |𝜽$𝜽!
𝜕𝜃! 𝜃!! 𝜃!- 𝜕𝜃!
𝒈 = 𝜕𝐿 𝜃(! ← 𝜃(- − 𝜕𝐿
gradient 𝜕𝜃 |𝜽$𝜽! 𝜂 |𝜽$𝜽!
( ⋮ ⋮ 𝜕𝜃(
⋮ ⋮
𝒈 = ∇𝐿 𝜽- 𝜽! ← 𝜽- − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽

Ø (Randomly) Pick initial values 𝜽(

Ø Compute gradient 𝒈 = ∇𝐿 𝜽(
𝜽! ← 𝜽( − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽!
𝜽" ← 𝜽! − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽"
𝜽5 ← 𝜽" − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽
B batch
Ø (Randomly) Pick initial values 𝜽( 𝐿
Ø Compute gradient 𝒈 = ∇𝐿! 𝜽( 𝐿!
batch
update 𝜽! ← 𝜽( − 𝜂𝒈
N
Ø Compute gradient 𝒈 = ∇𝐿" 𝜽! 𝐿"
update 𝜽" ← 𝜽! − 𝜂𝒈 batch
Ø Compute gradient 𝒈 = ∇𝐿5 𝜽" 𝐿5
update 𝜽5 ← 𝜽" − 𝜂𝒈 batch
1 epoch = see all the batches once
Optimization of New Model
Example 1
Ø 10,000 examples (N = 10,000) B batch
Ø Batch size is 10 (B = 10)
How many update in 1 epoch?
batch
1,000 updates
Example 2
N
Ø 1,000 examples (N = 1,000) batch
Ø Batch size is 100 (B = 100)
How many update in 1 epoch?
10 updates batch
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

More variety of models …

Sigmoid → ReLU
How to represent
this function?

𝑥!

Rectified Linear
Unit (ReLU) 𝑐 𝑚𝑎𝑥 0, 𝑏 + 𝑤𝑥!

𝑥!
𝑐′ 𝑚𝑎𝑥 0, 𝑏′ + 𝑤′𝑥!
Sigmoid → ReLU

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #

Activation function

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

Which one is better?

Experimental Results

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

linear 10 ReLU 100 ReLU 1000 ReLU

2017 – 2020 0.32k 0.32k 0.28k 0.27k
2021 0.46k 0.45k 0.43k 0.43k
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

Even more variety of models …

+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 or 1 𝑥5

……
+ 𝑎5 +

1 1

𝒂′ = 𝜎 𝒃′ + 𝑊′ 𝒂 𝒂 =𝜎 𝒃 + 𝑊 𝒙
Experimental Results
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Red: real no. of views
3 layers blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

It is not fancy enough.

Let’s give it a fancy name!

hidden layer hidden layer
+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 1 𝑥5

……
+ 𝑎5 +

1 Neuron 1

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

Deep = Many hidden layers
22 layers

[Link]
du/slides/winter1516_le 19 layers
[Link]

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)

Deep = Many hidden layers

152 layers 101 layers

Special
structure

Why we want “Deep” network,

not “Fat” network? 3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

Better on training data, worse on unseen data

Overfitting
Let’s predict no. of views today!
• If we want to select a model for predicting no. of
views today, which one will you use?
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

We will talk about model selection next time. J

To learn more ……
Backpropagation
Basic Introduction Computing gradients in
an efficient way

[Link] [Link]

Week2 DL
No ratings yet
Week2 DL
29 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
DL Unit-2
100% (1)
DL Unit-2
24 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
77 pages
ML 01
No ratings yet
ML 01
24 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
73 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Lect 8
No ratings yet
Lect 8
117 pages
Introduction to Deep Learning Techniques
No ratings yet
Introduction to Deep Learning Techniques
299 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
DL 02 Basics
No ratings yet
DL 02 Basics
95 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Instructor's Solution Manual For Neural Networks
0% (1)
Instructor's Solution Manual For Neural Networks
40 pages
CS2011 5
No ratings yet
CS2011 5
43 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
46 pages
16 - The Key To The Most Powerful ML Models
No ratings yet
16 - The Key To The Most Powerful ML Models
25 pages
Short Course Machine Learning F de Vuyst 1715052496
No ratings yet
Short Course Machine Learning F de Vuyst 1715052496
74 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
78 pages
13 - Neural Network (Perceptrons)
No ratings yet
13 - Neural Network (Perceptrons)
31 pages
Module 2
No ratings yet
Module 2
55 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
DL Module II Till7thAug
No ratings yet
DL Module II Till7thAug
131 pages
Neural Networks
No ratings yet
Neural Networks
108 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
Lecture 1
No ratings yet
Lecture 1
56 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
2 DL Training
No ratings yet
2 DL Training
60 pages
DLUNIT2
No ratings yet
DLUNIT2
25 pages
Advanced Machine Learning Techniques
No ratings yet
Advanced Machine Learning Techniques
61 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Logistic Regression and Sigmoid Function
No ratings yet
Logistic Regression and Sigmoid Function
32 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Deep Learning
No ratings yet
Deep Learning
50 pages
Linear Regression & Optimization Techniques
No ratings yet
Linear Regression & Optimization Techniques
42 pages
Time Series Regression Solutions
No ratings yet
Time Series Regression Solutions
7 pages
Generative AI-Revised Lab-Manual - Sabyasachi Chakraborty - Final
No ratings yet
Generative AI-Revised Lab-Manual - Sabyasachi Chakraborty - Final
39 pages
Toc Thund
No ratings yet
Toc Thund
2 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Nondeterministic Finite Automata Guide
No ratings yet
Nondeterministic Finite Automata Guide
115 pages
MATLAB OOP for Advanced Learners
No ratings yet
MATLAB OOP for Advanced Learners
13 pages
A Perfect Guide To GPT
No ratings yet
A Perfect Guide To GPT
9 pages
تعلم جافا
No ratings yet
تعلم جافا
10 pages
TOC Practical File 18
No ratings yet
TOC Practical File 18
35 pages
DFA Basics and Examples in Automata Theory
No ratings yet
DFA Basics and Examples in Automata Theory
30 pages
Time Series Analysis: A Complete Guide
No ratings yet
Time Series Analysis: A Complete Guide
26 pages
Deep Learning Manual - Docx 1
No ratings yet
Deep Learning Manual - Docx 1
23 pages
Deterministic Finite Automata and Turing Machine
No ratings yet
Deterministic Finite Automata and Turing Machine
4 pages
Combinatorial Probability Concepts
No ratings yet
Combinatorial Probability Concepts
7 pages
Chapter 2 Simultaneous Equation Models New
No ratings yet
Chapter 2 Simultaneous Equation Models New
15 pages
Notes On Mathematical Expectation
No ratings yet
Notes On Mathematical Expectation
6 pages
Time Series With EViews PDF
No ratings yet
Time Series With EViews PDF
37 pages
Understanding Multilayer Perceptrons
No ratings yet
Understanding Multilayer Perceptrons
2 pages
Econometric S Lecture 45
No ratings yet
Econometric S Lecture 45
31 pages
Econometrics: Understanding Autocorrelation
No ratings yet
Econometrics: Understanding Autocorrelation
17 pages
02 Distributions
No ratings yet
02 Distributions
19 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
Financial Time Series Analysis in R
No ratings yet
Financial Time Series Analysis in R
54 pages
Bridging Offline and Online Reinforcement Learning For Llms
No ratings yet
Bridging Offline and Online Reinforcement Learning For Llms
17 pages
Finite State Machines in ATM Systems
No ratings yet
Finite State Machines in ATM Systems
26 pages
Analyzing Types of Neural Networks in Deep Learning
No ratings yet
Analyzing Types of Neural Networks in Deep Learning
15 pages
Solution Manual For Neural Networks and Learning Machines 3rd Edition
No ratings yet
Solution Manual For Neural Networks and Learning Machines 3rd Edition
6 pages
(Coursera) GenAI
No ratings yet
(Coursera) GenAI
27 pages
CU MBA Questions 1st Sem 2015
No ratings yet
CU MBA Questions 1st Sem 2015
29 pages
Non-Deterministic Algorithm
No ratings yet
Non-Deterministic Algorithm
13 pages

Intro to Machine Learning Basics

Uploaded by

Intro to Machine Learning Basics

Uploaded by

Introduction of

Machine / Deep Learning

f( ) = “How are you”

f( ) = “5-5” (next move)

Classification: Given options (classes), the function outputs

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

4.8k 4.9k 7.5k 3.4k 9.8k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

4.9k 7.5k 9.8k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Can be done in one line in most deep learning frameworks

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

Linear models have severe limitation. Model Bias

More pieces require more

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

Step 1: Step 2: define

Step 1: Step 2: define

Ø (Randomly) Pick initial values 𝜽(

Step 1: Step 2: define

More variety of models …

Which one is better?

linear 10 ReLU 100 ReLU 1000 ReLU

Step 1: Step 2: define

Even more variety of models …

Step 1: Step 2: define

It is not fancy enough.

Let’s give it a fancy name!

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

AlexNet (2012) VGG (2014) GoogleNet (2014)

152 layers 101 layers

Why we want “Deep” network,

Better on training data, worse on unseen data

We will talk about model selection next time. J

You might also like