0% found this document useful (0 votes)
10 views11 pages

01 02 Intro

Uploaded by

luomichael23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

01 02 Intro

Uploaded by

luomichael23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

9/5/23

Learning from Data

Model “car”
Modeling & Computation Predict
Future Data
Train

Past Data
(perhaps with labels)

1 2

Machine Learning in Practice

Modeling Computation
Modeling

3 4

1
9/5/23

Linear Models Linear Models


• Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house). • Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house).
• Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price).

𝑥" # of bedrooms 𝑥" # of bedrooms 𝑤"


𝑥# # of bathrooms 𝑥# # of bathrooms 𝑤#
𝐱 = 𝑥$ square feet
𝐱 = 𝑥$ square feet 𝐰 = 𝑤$

⋮ ⋮ ⋮ ⋮ ⋮
𝑥! age of house 𝑥! age of house 𝑤!

5 6

Linear Models Linear Models


• Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house). • Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house).
• Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price). • Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price).

Question: How to find 𝐰?

• 𝑓 𝐱 = 𝑤"𝑥" + 𝑤#𝑥# + ⋯ + 𝑤! 𝑥!
• 𝑤", 𝑤#, ⋯ , 𝑤! : weights Price = $0.5M

Prediction:
Features of a House
𝑓 𝐱 = 𝐱% 𝐰
𝐱 ∈ ℝ!

7 8

2
9/5/23

Linear Models Linear Models


• Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house). • Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house).
• Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price). • Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price).

Question: How to find 𝐰? Question: How to find 𝐰?

• Training features: 𝐱", ⋯ , 𝐱 & ∈ ℝ! . • Training features: 𝐱", ⋯ , 𝐱 & ∈ ℝ! .


• Training targets: 𝑦", ⋯ , 𝑦& ∈ ℝ. • Training targets: 𝑦", ⋯ , 𝑦& ∈ ℝ.
" #

⋯ • Loss function: 𝐿 𝐰 = ∑& %


'(" 𝐱 ' 𝐰 − 𝑦' .
&

totally 𝑛 houses

9 10

Linear Models Linear Models Are Not Expressive


• Feature vector: 𝐱 ∈ ℝ! (e.g., features of a house). Example: Given a person’s photo, predict her/his age.
• Prediction: 𝑓 𝐱 = 𝐱 % 𝐰 (e.g., housing price).
Age = 36
Question: How to find 𝐰?

• Training features: 𝐱", ⋯ , 𝐱 & ∈ ℝ! . Photo (features)


• Training targets: 𝑦", ⋯ , 𝑦& ∈ ℝ.
" #
• Loss function: 𝐿 𝐰 = ∑& %
'(" 𝐱 ' 𝐰 − 𝑦' .
&
• Least squares regression: 𝐰 ⋆ = min 𝐿 𝐰 .
𝐰

11 12

3
9/5/23

Linear Models Are Not Expressive Traditional Approaches


Example: Given a person’s photo, predict her/his age.
SIFT Feature
Age = 36

Photo (features) HOG Feature

Question: Can we use linear regression?

• Linear regression assumes the target is a weighted average of every


pixel in the photo. LBP Feature

• Linear regression works poorly for age prediction.

13 14

Traditional Approaches Convolutional Neural Networks (CNNs)

SIFT Feature

HOG Feature Linear Model


“dog”

LBP Feature

15 16

4
9/5/23

Applications of CNNs Applications of CNNs: Medical Diagnosis

• CNNs are suitable for image data. Example: Skin cancer diagnosis
• CNNs convert images to effective
representations. (Feature extraction.) • Input: an image.
• Outputs:
• Is it skin cancer?
• Applications: • Benign or malignant?
• Image/video recognition.
• Face recognition. • The same accuracy as human experts.
• Image generation.
• …
Reference
[1] Esteva et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017.

17 18

Applications of CNNs: Medical Diagnosis Applications of CNNs: Self Driving Cars

• CNNs play an import role in self driving cars.


• Understand images taken by the cameras.
• Recognize signs, cars, pedestrians, and obstacles.
• CNN is not everything; self-driving car is a sophisticated system.

19 20

5
9/5/23

Recurrent Neural Networks (RNNs) Applications of RNNs: Machine Translation

• RNNs naturally fit sequence data, e.g.,


• time series data (e.g., stock price, weather),
• text data,
• speech data…

21 22

Applications of RNNs: Machine Translation Applications of RNNs: Speech Recognition

23 24

6
9/5/23

Deep Reinforcement Learning (DFL) Applications of DRL: Games

• DFL has applications in robotics, video game, and finance.

25 26

Applications of DRL: Robotics Applications of DRL: Robotics

Control Theory v.s. DRL Control Theory v.s. DRL

Boston Dynamics' Atlas

27 28

7
9/5/23

What we have learned so far…


• ML tasks: regression, classification, …
• ML models: linear models, CNNs, RNNs, …

Example: least squares regression model Computations


! " &
• Loss function: 𝐿 𝐰 = ∑
" #$!
𝐱#% 𝐰 − 𝑦# .
• Optimization: 𝐰⋆ = min 𝐿 𝐰 .
𝐰

• How to solve the model?

29 30

Computational Methods Gradient Descent


Example: least squares regression model
• ML tasks: regression, classification, …
! &
• ML models: linear models, CNNs, RNNs, … • Loss function: 𝐿 𝐰 = ∑"#$! 𝐱#% 𝐰 − 𝑦# .
"
• Optimization: 𝐰⋆ = min 𝐿 𝐰 .
𝐰
Example: least squares regression model
! &
• Loss function: 𝐿 𝐰 = " ∑"#$! 𝐱#% 𝐰 − 𝑦# . +,
Gradient:
• Optimization: 𝐰⋆ = min 𝐿 𝐰 . +𝐰
𝐰 • 𝐰 is a 𝑑-dimensional vector.
• 𝐿 𝐰 is a scalar.
• Computations: solve the model using numerical algorithms, e.g., )*
gradient descent (GD) or stochastic descent (SGD). • Thus )𝐰
is a 𝑑-dimensional vector.

31 32

8
9/5/23

Gradient Descent Gradient Descent


Example: least squares regression model Example: least squares regression model
! " & ! &
• Loss function: 𝐿 𝐰 = ∑ 𝐱#% 𝐰 − 𝑦# . • Loss function: 𝐿 𝐰 = ∑"#$! 𝐱#% 𝐰 − 𝑦# .
" #$! "
• Optimization: 𝐰⋆ = min 𝐿 𝐰 . • Optimization: 𝐰⋆ = min 𝐿 𝐰 .
𝐰 𝐰

+, Gradient descent algorithm Gradient descent algorithm


Gradient:
+𝐰
• 𝐰 is a 𝑑-dimensional vector. • Randomly initialize 𝐰+. • Randomly initialize 𝐰+.
• 𝐿 𝐰 is a scalar. • For 𝑡 = 0 to 𝑇: • For 𝑡 = 0 to 𝑇:
"# "#
)* • Gradient at 𝐰! : 𝐠 ! = │𝐰 ! ; • Gradient at 𝐰! : 𝐠 ! = │𝐰 ! ;
"𝐰 "𝐰
• Thus )𝐰
is a 𝑑-dimensional vector. • 𝐰!%& = 𝐰! − 𝛼 𝐠 ! . • 𝐰!%& = 𝐰! − 𝛼 𝐠 ! .

33 34

Gradient Descent Computational Challenges


Example: least squares regression model
!
• Loss function: 𝐿 𝐰 = ∑"#$! 𝐱#% 𝐰 − 𝑦# .
& • Big data: too many training samples.
" • ImageNet: 14 million 256×256 images.
• Optimization: 𝐰⋆ = min 𝐿 𝐰 .
𝐰 • Big model: too many model parameters.
• ResNet-50 (a very popular CNN architecture) has 25 million parameters.
Variants of Gradient Descent

• Stochastic gradient descent (SGD).


• SGD with momentum.
• RMSProp.
• ADAM…

35 36

9
9/5/23

Computational Challenges Computational Challenges

• Big data: too many training samples. • Big data: too many training samples.
• ImageNet: 14 million 256×256 images. • ImageNet: 14 million 256×256 images.
• Big model: too many model parameters. • Big model: too many model parameters.
• ResNet-50 (a very popular CNN architecture) has 25 million parameters. • ResNet-50 (a very popular CNN architecture) has 25 million parameters.

• Big data + big model bring computational challenges. • Big data + big model bring computational challenges.
• Training ResNet-50 on ImageNet using a single GPU takes around 14 • Training ResNet-50 on ImageNet using a single GPU takes around 14
days. days.

è Efficient algorithms and software systems are necessary.

37 38

Machine Learning in Practice Machine Learning in Practice

Modeling Computation Modeling Computation

• The model that fits the data and problem.


• Decide the network structures, activation
functions, loss functions, etc.
• Improve the prediction accuracy.
• Experience in ML models.
• Understanding of the problem and data.

39 40

10
9/5/23

Machine Learning in Practice Machine Learning in Practice

Modeling Computation Modeling Computation

• Design or apply efficient algorithms. • Design or apply efficient algorithms.


• Implement the algorithm using systems like • Implement the algorithm using systems like
TensorFlow, Hadoop, etc. TensorFlow, Hadoop, etc.
• Optimize your code. • Optimize your code.
• Experience in algorithms. • Experience in algorithms.
• Experience in systems. • Experience in systems.

41 42

11

You might also like