0% found this document useful (0 votes)

4 views56 pages

Lecture 1

Uploaded by

anna tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views56 pages

Lecture 1

Uploaded by

anna tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Deep Learning from First Principles

Tan Minh Nguyen

Department of Mathematics, NUS
Logistics
• Discussion Forums
• CampusWire: https://campuswire.com/p/GE20115D5
(Access Code: 7410)
• Course Materials
• Slides, lecture notes, and textbooks
• Where: Files on Campus Wire
Introduction
Can machines think?
I propose to consider the question, “Can
machines think?”. This should begin with
definitions of the meaning of the terms
"machine" and "think." The definitions might
be framed so as to reflect so far as possible
the normal use of the words, but this attitude
is dangerous…

Alan Turing, 1950

Turing, A. M. “Computing machinery and intelligence”. Mind 49 433-460.

The
Imitation
Game

I believe that in about fifty years' time it will be possible, to

programme computers … to make them play the imitation game
so well that an average interrogator will not have more than 70
percent chance of making the right identification after five
minutes of questioning.
Alan M Turing, 1950
https://en.wikipedia.org/wiki/Turing_test#/media/File:Turing_test_diagram.png
From artificial intelligence to machine learning

Can machines think?

Can machines do what

thinking beings do?

How can machines

learn to do some things
that thinking beings do?
In this short course, we are
interested in the study of
algorithmic and
mathematical approaches to
(deep) learning
A concrete definition of learning
A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with
experience E.
Tom Mitchell

8 ÷ 2(2 + 2)
16
calc(*args)

8 ÷ 2(2 + 2)
calc(*args)
16
Why do we need to study
machine learning?
Machine learning: revolution in technology
Machine learning: revolution in science
Machine learning: revolution in engineering
Types of Learning
• Supervised learning:
Example: distinguish photos of cats from photos of dogs
• Unsupervised learning
Example: figure out that cat and dog photos show different
animals
• Reinforcement learning
Example: play Go
Types of Learning
• Supervised learning:
• Linear and nonlinear models
• Basic learning and approximation theory
• Learning/optimization algorithms
• Unsupervised learning
• Dimensional reduction, clustering and generative models
• Reinforcement learning
• Markov decision processes, reinforcement learning
algorithms
What this course is
• A (hopefully) gentle introduction to machine learning and
deep learning.
• A holistic view of the modern interplay of deep learning
models with applied mathematics, including optimization,
differential equations, and control.

What this course isn’t

• A comprehensive survey of state-of-the-art machine
learning models and methods
• A “math class”
Preliminaries
Types of data (from our survey)
Math/Python background

Used ML before? Introductory Class? Expectations

Representing data in computers
Many data are numerical in nature

Other examples
• Video captures
• Financial time series
• Numerical measurements from experiments
What about general discrete data?
We make an important distinction
• Ordinal data
Data that has a natural notion of order, e.g.
• Star ratings of a product
• Level of language proficiency
• Letter grades of a class
• Nominal data
Data that has no order, e.g.
• Categories of image classification
• Answers to True/False questions
We need to embed these discrete data into something we can
represent on a computer, e.g. real/floating point numbers
The types of embedding depends on the nature of the data!
• Ordinal data
We want embedding to preserve this ordering, so we
typically use real numbers
⋆,⋆⋆,⋆⋆⋆ → 1, 2, 3
• Nominal data
This is somewhat opposite -- we want embedding to not
introduce spurious ordering, e.g. one-hot embedding
1 0 0
apple, orange, pear → 0 , 1 , 0
0 0 1
Classes of machine learning problems
Supervised Learning Unsupervised Learning Reinforcement Learning
Regression Clustering Value iteration
Classification Dimensional reduction Policy gradient
Function approximation Generative models Actor-critic
Inverse problems/design Anomaly detection Exploration
… … …

There are many intersections between them!

Evaluation and Selection using Data
In more quantitative terms, given a dataset 𝒟, we split it into

𝒟 = 𝒟𝒕𝒓𝒂𝒊𝒏 ∪ 𝒟𝒕𝒆𝒔𝒕

• 𝒟𝒕𝒓𝒂𝒊𝒏 is called the training set, and it is used to train our

machine learning model
• 𝒟𝒕𝒆𝒔𝒕 is called the testing set, and it is used to evaluate the
performance of our model. We should not peek at this set
while training!
• An additional splitting into a validation set 𝒟𝒗𝒂𝒍𝒊𝒅 is
sometimes used to perform hyper-parameter tuning and model
selection
Supervised Learning
What is supervised learning?
Supervised learning is simplest and most prevalent type of
machine learning problems

It is about learning to make predictions

Examples
• Image recognition
• Weather prediction
• Stock price prediction
• …
Given dataset: 𝒟 = 𝑥+ , 𝑦+ . +,-
Inputs: 𝑥+ Outputs/labels: 𝑦+ Data size: 𝑁
Goal: learn the relationship from 𝑥+ → 𝑦+

𝑥- = 1
𝑦- = Cat
0

𝑥/ = 𝑓 ∗ (Oracle) 0
𝑦/ = Dog
1

The oracle can be

• Deterministic: 𝑦! = 𝑓 ∗ (𝑥! )
• Random: 𝑦! ∼ 𝑝∗ (⋅ |𝑥! ) e.g. 𝑦! = 𝑓 ∗ 𝑥! + 𝜖!
Hypothesis space
The oracle 𝑓 ∗ is unknown to us, except through the dataset

.
𝒟 = 𝑥+ , 𝑦+ = 𝑓 ∗ 𝑥+ +,-

The supervised learning approach:

1. Define a hypothesis space ℋ consisting of a set of
candidate functions, e.g.
ℋ = 𝑓: 𝑓 𝑥 = 𝑤# + 𝑤$ 𝑥
2. Find the “best” function 𝑓? in ℋ that approximates 𝑓 ∗
What you get depends on ℋ!

Curve fitting methods and the message they send. https://xkcd.com/2048/

What does best approximation mean?
Useful to define some loss function 𝐿 𝑦′, 𝑦 which is small if
𝑦 ≈ 𝑦′ and large otherwise. Then, we can find the best
approximation by an optimization problem

.
1
min 𝑅456 𝑓 = 3 𝐿(𝑓 𝑥+ , 𝑓 ∗ (𝑥+ ))
1∈ℋ 𝑁
+,-
𝑦+
This is called empirical risk minimization (ERM)
So, is learning just optimization?
We want to do well on unseen data! In other words, our model
must generalize.
What we can solve

Empirical risk minimization Generalization Gap

.
1
min 𝑅456
1∈ℋ
𝑓 = 3 𝐿(𝑓 𝑥+ , 𝑓 ∗ 𝑥+ )
𝑁 𝑓!
+,-
𝑥+ ∼ 𝜇

≠
Population risk minimization
min 𝑅676 𝑓 = 𝔼8∼: [𝐿 𝑓 𝑥 , 𝑓 ∗ 𝑥 ]
1∈ℋ
𝑓#
What we really want to solve
Three paradigms of supervised learning

tio n 𝑓∗
ma
oxi
r
𝓗 A pp
𝑓$

n
tio
iza
𝑓;

ral
ne
Ge
Optimization 𝑓#
(using 𝒟)
Linear Models
Simple linear regression

This is the simplest case, where 𝑥+ , 𝑦+ are all scalars

Step 1: Define hypothesis space

ℋ = {𝑓: 𝑓 𝑥 = 𝑤; + 𝑤- 𝑥, 𝑤; ∈ ℝ, 𝑤- ∈ ℝ}

Step 2: Find best approximation

We need to define a loss function

1 <
𝐿 𝑦′, 𝑦 = 𝑦 − 𝑦 /
2
Then, the empirical risk minimization problem is
.
1 /
min 𝑅456 (𝑓) = min 3 𝑤; + 𝑤- 𝑥+ − 𝑦+
1∈ℋ =! ,=" 2𝑁
+,-
Empirical risk minimization problem:
.
1 /
min 3 𝑤; + 𝑤- 𝑥+ − 𝑦+
=! ,=" 2𝑁
+,-

Solution:
? !!"# ? !!"#
?=!
𝑤
E; , 𝑤
E- = 0 and ?="
𝑤
E; , 𝑤
E- = 0

∑#(𝑥# − 𝑥)(𝑦
̅ # − 𝑦)
C 1 1
𝑤
@! = 𝑦C − 𝑤
@" 𝑥̅ 𝑤
@" = 𝑥̅ = M 𝑥# 𝑦C = M 𝑦#
∑# 𝑥# − 𝑥̅ $ 𝑁 𝑁
# #

𝑓! 𝑥 = 𝑤
%" + 𝑤
%# 𝑥
Ordinary Least Squares
Formula (1D)
Approximation
Is the linear hypothesis space large enough?

This right figure is an instance of under-fitting

Overfitting and generalization
Polynomial hypothesis space: ℋ = {𝑓: 𝑓 𝑥 = ∑NN O
+,; 𝑤O 𝑥 }

Hypothesis space too big, so over-fitting can happen, with or

without noise!
The role of loss functions
So far, we only considered the
mean-square loss

1
𝐿 𝑦′, 𝑦 = 𝑦 − 𝑦 # $
2

There are many other choices, e.g.

the Huber loss

1
𝑦 − 𝑦 # $ if 𝑦 − 𝑦 # ≤ 𝛿
𝐿 𝑦′, 𝑦 = 2
1
𝛿 𝑦 − 𝑦 # − 𝛿 $ otherwise
2
Mean square vs Huber loss in regression
We perform a linear regression on a noisy dataset with outliers.
What do you observe?
General linear basis models
The simple linear regression we have seen is quite limited
• Only for 1D inputs
• Can only fit linear relationships

It turns out that we can easily generalize the previous approach by

considering linear basis models
General linear basis models
Consider 𝑥 ∈ ℝP and the new hypothesis space
QR-

ℋQ = 𝑓: 𝑓 𝑥 = 3 𝑤O 𝜙O 𝑥
O,;

Each 𝜙O : ℝP → ℝ is called a basis function or feature map

Why is this a generalization?

• Take 𝑑 = 1, 𝑀 = 2, 𝜙# (𝑥) = 1, 𝜙$ 𝑥 = 𝑥
• In general, 𝑀 can be large and 𝜙% ’s can be highly
nonlinear, but 𝑓 is linear in 𝑤 = (𝑤# , … , 𝑤&'$ )
Examples of basis functions
Some choices of basis functions in 1D
• Polynomial basis: 𝜙% 𝑥 = 𝑥 %
"
&'(!
• Gaussian basis: 𝜙% 𝑥 = exp −
$)"
&'(! *
• Sigmoid basis: 𝜙% 𝑥 = 𝜎 with 𝜎 𝑏 =
) *+, #$
Ordinary least squares for linear basis models

The empirical risk minimization problem is now

min 𝑅@AB (𝑓) = min$ 𝑅@AB (𝑤)

=∈ℋ$ C∈ℝ
G
1 H
= min$ 1 𝑓(𝑥E ) − 𝑦E
C∈ℝ 2𝑁
EF#
H
G JK#
1
= min$ 1 1 𝑤I 𝜙I (𝑥E ) − 𝑦E
C∈ℝ 2𝑁
EF# IF"
We can rewrite
/
. QR-
1
min' + + 𝑤O 𝜙O (𝑥+ ) − 𝑦+
=∈ℝ 2𝑁
+,- O,;

in compact form
1
min' ‖Φ𝑤 − 𝑦‖/
=∈ℝ 2𝑁

𝜙; (𝑥- ) ⋯ 𝜙QR- 𝑥- 𝑤; 𝑦-
𝜙; (𝑥/ ) ⋯ 𝜙QR- (𝑥/ ) 𝑤- 𝑦/
Φ= 𝑤= ⋮ 𝑦= ⋮
⋮ ⋱ ⋮
𝜙; (𝑥. ) ⋯ 𝜙QR- (𝑥. ) 𝑤QR- 𝑦.
We want to solve
1
min' ‖Φ𝑤 − 𝑦‖/
=∈ℝ 2𝑁

We can do this by setting ∇𝑅456 𝑤

E = 0.

Suppose ΦT Φ is invertible, then we have

Φ T Φ𝑤
5 −𝑦 =0
Rearranging we have
General Ordinary
Least Squares
T R- T
5=
𝑤 Φ Φ Φ 𝑦 Formula

What happens if ΦT Φ is not invertible, i.e. it is singular?

In the singular case, we have an infinite number of solutions, all
of which have 𝑅456 𝑤 E = 0. They are given by

E 𝑢 = ΦU 𝑦 + 𝐼 − ΦU Φ 𝑢
𝑤 𝑢 ∈ ℝQ

Here, ΦU denotes the Moore-Penrose pseudoinverse of Φ.

How do we pick a solution?

Regularization
Often, it is advantageous to consider the regularized least
squares problem
1
min- ‖Φ𝑤 − 𝑦‖/ + 𝜆𝐶(𝑤)
=∈ℝ 2𝑁
regularizer
Types of regularization
• ℓ( regularization: 𝐶 𝑤 = 𝑤 (
(ridge regression)
• ℓ$ regularization: 𝐶 𝑤 = 𝑤 $ = ∑% |𝑤% | (least absolute
shrinkage and selection operator, or lasso)
• …
Regularization and generalization
We apply ℓ/ regularization on the over-fitting examples

Recall:
ℋQ = 𝑓: 𝑓 𝑥 = ∑NN
+,; 𝑤O 𝑥 O
so 𝑀 = 100 , but 𝑁 = 10

Without regularization With regularization

Classification using linear basis models

In 𝐾-class classification problems, each 𝑦+ takes on the class label

of one of 𝐾 classes.

We will use the one-hot encoding introduced earlier to represent

each 𝑦+ that belongs to class 𝑘 as
𝑦+ = 0, … , 0, 1, 0, … , 0 ∈ ℝV
kth Position
We require a slight change of hypothesis space

QR-

ℋQ = 𝑓: 𝑓 𝑥 = 𝑔 3 𝑤O 𝜙O 𝑥 , 𝑤O ∈ ℝV
+,;

The function 𝑔: ℝV → ℝV is called an activation function, and

the most commonly used one is the soft-max function

exp 𝑧W
𝑔 𝑧 W =
∑O exp 𝑧O

Notice that 𝑔 always outputs a vector which can be interpreted as

probabilities over 𝐾 classes
Everything else remain the same, and we can define the empirical
risk minimization problem for classification as

.
1
min 𝑅
-×/ 456
𝑊 = min 3 𝐿 𝑔 Φ𝑊 + , 𝑦+
X∈ℝ X∈ℝ -×/ 𝑁
+,-

What loss function should we use? We can always use mean

square loss, but there is a better choice: the cross-entropy loss

𝐿 𝑦 < , 𝑦 = − 3 𝑦W log 𝑦W<

W,-
Summary
1. Machine learning vs AI
2. Types of Learning Problems
• Supervised learning
• Unsupervised learning
• Reinforcement learning
3. Linear models as a baseline for supervised learning
Useful Tools
Version control with Git
• https://www.freecodecamp.org/news/what-is-git-and-
how-to-use-it-c341b049ae61/
Interactive python with Jupyter notebooks
• https://www.datacamp.com/community/tutorials/tutorial-
jupyter-notebook
Data visualization using Seaborn and Pandas
• https://jakevdp.github.io/PythonDataScienceHandbook/04
.14-visualization-with-seaborn.html
Further Reading
Matrix Cookbook
• https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbo
ok.pdf
More on linear models (Pattern Recognition and Machine
Learning, Bishop)
• http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%2
0-
%20Pattern%20Recognition%20And%20Machine%20Le
arning%20-%20Springer%20%202006.pdf

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Home Work Chapter 1 To 7: Book: Business Logistics/Supply Chain Management Ronald H. Ballou
100% (1)
Home Work Chapter 1 To 7: Book: Business Logistics/Supply Chain Management Ronald H. Ballou
35 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
DSA5102 Lecture1
No ratings yet
DSA5102 Lecture1
60 pages
Machine Learning Crash Course
No ratings yet
Machine Learning Crash Course
29 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
ML 01
No ratings yet
ML 01
24 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Lecture 17
No ratings yet
Lecture 17
33 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Andrew NG Complete Machine Learning
No ratings yet
Andrew NG Complete Machine Learning
170 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Mod 1
No ratings yet
Mod 1
99 pages
First Cours 2
No ratings yet
First Cours 2
42 pages
Lecture1 Intro ML
No ratings yet
Lecture1 Intro ML
60 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
CH 1
No ratings yet
CH 1
24 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Class 01
No ratings yet
Class 01
75 pages
ML 2
No ratings yet
ML 2
155 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Bruno Gonçalves: Deep Learning From Scratch
No ratings yet
Bruno Gonçalves: Deep Learning From Scratch
95 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Machine Learning The Theoretical Minimum
No ratings yet
Machine Learning The Theoretical Minimum
63 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Lecture 6
No ratings yet
Lecture 6
45 pages
OTNote
No ratings yet
OTNote
46 pages
Abstract Algebra Worksheet2
No ratings yet
Abstract Algebra Worksheet2
11 pages
Ws 8
No ratings yet
Ws 8
5 pages
IGCSE Mark Scheme The Internet and Its Uses
No ratings yet
IGCSE Mark Scheme The Internet and Its Uses
3 pages
Alternative Proposal 20160912 - Mtentu (Rev.2)
No ratings yet
Alternative Proposal 20160912 - Mtentu (Rev.2)
17 pages
Forging Temperature
No ratings yet
Forging Temperature
91 pages
Built-In Types - Python 3.11.4 Documentation
No ratings yet
Built-In Types - Python 3.11.4 Documentation
75 pages
Hauwam Muhammed - Updated CV
No ratings yet
Hauwam Muhammed - Updated CV
4 pages
The Smith Family Written by Damian Ofori
No ratings yet
The Smith Family Written by Damian Ofori
10 pages
Fransiskus Daud Try Surya A Bahasa Inggris PTK PPG DALJAB 2
No ratings yet
Fransiskus Daud Try Surya A Bahasa Inggris PTK PPG DALJAB 2
47 pages
Previewpdf
No ratings yet
Previewpdf
84 pages
Design and Modeling of Zvs Resonantsepic Converter For High Frequencyapplications
No ratings yet
Design and Modeling of Zvs Resonantsepic Converter For High Frequencyapplications
8 pages
Task - Fixing in Progress Report
No ratings yet
Task - Fixing in Progress Report
4 pages
Icici BNK Imp New ALL OVER File
No ratings yet
Icici BNK Imp New ALL OVER File
18 pages
Sarbananda Sonowal
No ratings yet
Sarbananda Sonowal
9 pages
Nutritional Guidance For Complementary Food: 1.1 Purpose
No ratings yet
Nutritional Guidance For Complementary Food: 1.1 Purpose
4 pages
Faq
No ratings yet
Faq
170 pages
Hafeez Contractor
No ratings yet
Hafeez Contractor
10 pages
IX. Farmstead-Distribution-System
No ratings yet
IX. Farmstead-Distribution-System
35 pages
Hart Vs Oconner
No ratings yet
Hart Vs Oconner
2 pages
Lista Alumnos de La Seccion 01 de Anatomia I
No ratings yet
Lista Alumnos de La Seccion 01 de Anatomia I
2 pages
Physical Education: Quarter 2 - Module 2b Social Dance (Waltz) and First Aid For Injuries in Dance Settings
No ratings yet
Physical Education: Quarter 2 - Module 2b Social Dance (Waltz) and First Aid For Injuries in Dance Settings
21 pages
Critique Paper Buddhism
No ratings yet
Critique Paper Buddhism
3 pages
Suzette Resume Offical Rev 1-2020
No ratings yet
Suzette Resume Offical Rev 1-2020
2 pages
CS602 Handouts PDF
No ratings yet
CS602 Handouts PDF
437 pages
Capitalstructureplanning 1
No ratings yet
Capitalstructureplanning 1
51 pages
United States v. Clarence Shamein Fitzgerald, 11th Cir. (2010)
No ratings yet
United States v. Clarence Shamein Fitzgerald, 11th Cir. (2010)
3 pages
Final
No ratings yet
Final
102 pages
Fill in The Blanks With Suitable Prepositions
No ratings yet
Fill in The Blanks With Suitable Prepositions
18 pages
India's Digital Economy - Current Affairs - Vision IAS
No ratings yet
India's Digital Economy - Current Affairs - Vision IAS
9 pages
School Paper Fnal
No ratings yet
School Paper Fnal
14 pages
Kalimpong Siliguri: Lachung
No ratings yet
Kalimpong Siliguri: Lachung
3 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Deep Learning from First Principles

Tan Minh Nguyen

Alan Turing, 1950

Turing, A. M. “Computing machinery and intelligence”. Mind 49 433-460.

I believe that in about fifty years' time it will be possible, to

Can machines think?

Can machines do what

How can machines

What this course isn’t

Used ML before? Introductory Class? Expectations

There are many intersections between them!

• 𝒟𝒕𝒓𝒂𝒊𝒏 is called the training set, and it is used to train our

It is about learning to make predictions

The oracle can be

The supervised learning approach:

Curve fitting methods and the message they send. https://xkcd.com/2048/

Empirical risk minimization Generalization Gap

This is the simplest case, where 𝑥+ , 𝑦+ are all scalars

Step 2: Find best approximation

We need to define a loss function

This right figure is an instance of under-fitting

Hypothesis space too big, so over-fitting can happen, with or

There are many other choices, e.g.

It turns out that we can easily generalize the previous approach by

Each 𝜙O : ℝP → ℝ is called a basis function or feature map

Why is this a generalization?

The empirical risk minimization problem is now

min 𝑅@AB (𝑓) = min$ 𝑅@AB (𝑤)

We can do this by setting ∇𝑅456 𝑤

Suppose ΦT Φ is invertible, then we have

What happens if ΦT Φ is not invertible, i.e. it is singular?

Here, ΦU denotes the Moore-Penrose pseudoinverse of Φ.

How do we pick a solution?

Without regularization With regularization

In 𝐾-class classification problems, each 𝑦+ takes on the class label

We will use the one-hot encoding introduced earlier to represent

The function 𝑔: ℝV → ℝV is called an activation function, and

Notice that 𝑔 always outputs a vector which can be interpreted as

What loss function should we use? We can always use mean

𝐿 𝑦 < , 𝑦 = − 3 𝑦W log 𝑦W<

You might also like