Cours 1
Cours 1
2023-2024
1
CHAPTER 1
LOGISTIC REGRESSION
2
Introduction to Deep Learning
3
The Machine Learning Approach
• Instead of writing a program by hand for each specific task, we collect lots
of examples that specify the correct output for a given input.
• A machine learning algorithm then takes these examples and produces a
program that does the job.
– The program produced by the learning algorithm may look very
different from a typical hand-written program. It may contain millions
of numbers.
– If we do it right, the program works for new cases as well as the ones
we trained it on.
– If the data changes the program can change too by training on the new
data.
• Massive amounts of computation are now cheaper than paying someone
to write a task-specific program.
4
It is very hard to say what makes a 2
5
Some examples of tasks best solved by learning
• Recognizing patterns:
– Objects in real scenes
– Facial identities or facial expressions
– Spoken words
• Recognizing anomalies:
– Unusual sequences of credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
– Which movies will a person like?
6
Types of learning task
• Supervised learning
– Learn to predict an output when given an input vector.
• Reinforcement learning
– Learn to select an action to maximize payoff.
• Unsupervised learning
– Discover a good internal representation of the input.
7
What you will learn in this course?
8
What is neural network?
It is a powerful learning algorithm inspired by how the brain works.
Example 1 – single neural network
• Given data about the size of houses on the real estate market and you
want to fit a function that will predict their price. It is a linear regression
problem because the price as a function of size is a continuous output.
• We know the prices can never be negative so we are creating a function
called Rectified Linear Unit (ReLU) which starts at zero.
9
Example 1 – single neural network
10
Example 2 – Multiple neural network
The price of a house can be affected by other features such as size, number of
bedrooms, zip code and wealth. The role of the neural network is to predict
the price and it will automatically generate the hidden units. We only need to
give the inputs x and the output y.
11
Supervised learning for Neural Network
In supervised learning, we are given a dataset and already know what our
correct output should look like, having the idea that there is a relationship
between the input and the output.
Supervised learning problems are categorized into "regression" and
"classification" problems.
In a regression problem, we are trying to predict results within a
continuous output, meaning that we are trying to map input variables to
some continuous function.
In a classification problem, we are instead trying to predict results in a
discrete output. In other words, we are trying to map input variables into
discrete categories.
12
Examples of supervised learning
Here are some examples of supervised learning
14
Why is deep learning taking off? ﻟﻤﺎذا ﻳﻨﻄﻠﻖ اﻟﺘﻌﻠﻢ اﻟﻌﻤﻴﻖ؟
Deep learning is taking off due to a large amount of data available through
the digitization of the society, faster computation and innovation in the
development of neural network algorithm. ﻳﻨﻄﻠﻖ اﻟﺘﻌﻠﻢ اﻟﻌﻤﻴﻖ ﺑﺴﺒﺐ ﻛﻤﻴﺔ ﻛﺒﻴﺮة ﻣﻦ اﻟﺒﻴﺎﻧﺎت
اﻟﻤﺘﺎﺣﺔ ﻣﻦ ﺧﻞال رﻗﻤﻨﺔ اﻟﻤﺠﺘﻤﻊ واﻟﺤﻮﺳﺒﺔ الأسرع
.والاﺑﺘﻜﺎر ﻓﻲ ﺗﻄﻮﻳﺮ ﺧﻮارزﻣﻴﺔ اﻟﺸﺒﻜﺔ اﻟﻌﺼﺒﻴﺔ
16
Binary Classification
In a binary classification problem, the result is a discrete value output.
For example
account hacked (1) or compromised (0)
a tumor malign (1) or benign (0)
64
64
17
Binary Classification
An image is store in the computer in three separate matrices
corresponding to the Red, Green, and Blue color channels of the image.
The three matrices have the same size as the image, for example, the
resolution of the cat image is 64 pixels x 64 pixels, the three matrices
(RGB) are 64 X 64 each.
The value in a cell represents the pixel intensity which will be used to
create a feature vector of n dimension.
In pattern recognition and machine learning, a feature vector represents
an object, in this case, a cat or no cat.
Red
- To create a feature vector, 𝑥, the pixel intensity
values will be “unroll” or “reshape” for each
color. Green
- The dimension of the input feature vector 𝑥 is:
𝒏𝒙 = 𝟔𝟒 × 𝟔𝟒 × 𝟑 = 𝟏𝟐 𝟐𝟖𝟖.
Blue
18
Binary Classification
𝟔𝟒 × 𝟔𝟒 × 𝟑 = 𝟏𝟐𝟐𝟖𝟖
𝒏 = 𝒏𝒙 = 𝟏𝟐𝟐𝟖𝟖
𝑥 →𝑦
19
Notation
𝑥, 𝑦 𝑥 ∈ ℝ𝑛𝑥 , 𝑦 ∈ 0,1
|| ⋯ |
𝑋 = 𝑥 (1)𝑥 (2) ⋯𝑥 (𝑚) 𝑛𝑥 𝑋 ∈ ℝ𝑛𝑥×𝑚 𝑋. 𝑠ℎ𝑎𝑝𝑒 = (𝑛𝑥 , 𝑚)
| | ⋯ |
20
Logistic Regression
Logistic regression is a learning algorithm used in a supervised learning
problem when the output 𝑦 are all either zero or one. The goal of logistic
regression is to minimize the error between its predictions and training
data.
Example: Cat vs No - cat
Given an image represented by a feature vector 𝑥, the algorithm will
evaluate the probability of a cat being in that image.
𝐺𝑖𝑣𝑒𝑛 𝑥, 𝑦 = 𝑃 𝑦 = 1 𝑥 , 𝑤ℎ𝑒𝑟𝑒 0 ≤ 𝑦 ≤ 1
The parameters used in Logistic regression are:
The input features vector: 𝑥 ∈ ℝ𝑛𝑥 , where 𝑛𝑥 is the number of features
The training label: 𝑦 ∈ 0,1
The weights: 𝑤 ∈ ℝ𝑛𝑥 , where 𝑛𝑥 is the number of features
The threshold: 𝑏 ∈ ℝ
The output: 𝑦 = 𝜎 𝑤 𝑇 𝑥 + 𝑏
1
Sigmoid function: 𝑠 = 𝜎(𝑤 𝑇 𝑥 + 𝑏) = 𝜎(𝑧) =
1+ 𝑒 −𝑧
21
Logistic Regression
𝑥0 = 1, 𝑥 ∈ ℝ𝑛𝑥 +1
𝑦 = 𝜎 𝜃𝑇 𝑥
𝜃0
𝜃1
Θ = 𝜃2
⋮
𝜃𝑛𝑥
23
Logistic regression cost function
Recap:
𝑖 1
𝑦 = 𝜎 𝑤𝑇𝑥 𝑖 + 𝑏 , where 𝜎(𝑧 𝑖 ) = 𝑖 𝑧 𝑖 = 𝑤𝑇𝑥 𝑖 +𝑏
1+ 𝑒 −𝑧
24
Logistic regression cost function
The loss function computes the error for a single training example.
1 2
ℒ 𝑦 𝑖 ,𝑦 𝑖 = 𝑦 𝑖 −𝑦 𝑖 (mean squared error)
2
25
26
Logistic regression cost function
Cost function
The cost function is the average of the loss function of the entire training
set. The goal is to find the parameters 𝑤 and 𝑏 that minimize the overall
cost function.
𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = ℒ 𝑦 𝑖 ,𝑦 𝑖 =− 𝑦 𝑖 log 𝑦 𝑖 + 1−𝑦 𝑖 log 1 − 𝑦 𝑖
𝑚 𝑚
𝑖=1 𝑖=1
27
Gradient Descent
Recap:
𝑖 1
𝑦 = 𝜎 𝑤𝑇𝑥 𝑖 + 𝑏 , where 𝜎 𝑧 𝑖 = 𝑖
1+ 𝑒 −𝑧
1 𝑚 1 𝑚
𝐽 𝑤, 𝑏 = 𝑚 𝑖=1 ℒ 𝑦 𝑖 ,𝑦 𝑖 = −𝑚 𝑖=1 𝑦 𝑖 log 𝑦 𝑖 + 1−𝑦 𝑖 log 1 − 𝑦 𝑖
𝐽 𝑤, 𝑏
𝑏
𝑤
28
Gradient Descent
𝐽 𝑤 Repeat {
𝑑𝐽
𝑤 ←𝑤−𝛼
𝑑𝑤
}
𝛼: Learning rate
𝜕𝐽(𝑤, 𝑏) 𝑑𝐽(𝑤, 𝑏)
= = 𝑑𝑤
𝜕𝑤 𝑑𝑤
𝐽(𝑤, 𝑏): cost function
𝜕𝐽(𝑤, 𝑏) 𝑑𝐽(𝑤, 𝑏)
= = 𝑑𝑏
𝜕𝑏 𝑑𝑏
29
Computation Graph
𝐽 𝑎, 𝑏, 𝑐 = 3 𝑎 + 𝑏𝑐 = 3 5 + 3 ∗ 2 = 33
𝑈 = 𝑏𝑐
𝑉 = 𝑎 + 𝑈
𝐽 = 3𝑉
𝑎=5
𝟏𝟏 𝟑𝟑
𝑏=3 𝟔 𝑉 = 𝑎+𝑈 𝐽 = 3𝑉
𝑈 = 𝑏𝑐
𝑐=2
30
Computation Graph
𝐽 𝑎, 𝑏, 𝑐 = 3 𝑎 + 𝑏𝑐 = 3 5 + 3 ∗ 2 = 33
𝑈 = 𝑏𝑐
𝑉 = 𝑎 + 𝑈
𝐽 = 3𝑉
𝑎=5
𝟏𝟏 𝟑𝟑
𝑏=3 𝟔 𝑉 = 𝑎+𝑈 𝐽 = 3𝑉
𝑈 = 𝑏𝑐
𝑐=2 𝑑𝐽 𝑑𝐽 𝑑𝑣
= =3
𝑑𝑢 𝑑𝑣 𝑑𝑢
𝑑𝐽 𝑑𝐽 𝑑𝑢
= =6
𝑑𝑏 𝑑𝑢 𝑑𝑏
𝑑𝐽 𝑑𝐽 𝑑𝑢
= =9
𝑑𝑐 𝑑𝑢 𝑑𝑐 31
Logistic Regression Gradient descent
𝑧 = 𝑤 𝑇 𝑥 + 𝑏,
𝑦=𝑎=𝜎 𝑧
ℒ 𝑎, 𝑦 = − 𝑦 log 𝑎 + 1 − 𝑦 log 1 − 𝑎
𝑥1
𝑤1
𝑥2 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑦=𝑎=𝜎 𝑧 ℒ 𝑎, 𝑦
𝑤2
𝑏
32
Logistic Regression Gradient descent
𝑥1
ℒ 𝑎, 𝑦 = − 𝑦 log 𝑎 + 1 − 𝑦 log 1 − 𝑎
𝑤1
𝑥2 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑦=𝑎=𝜎 𝑧 ℒ 𝑎, 𝑦
𝑤2
𝑏
𝑑ℒ 𝑎, 𝑦 𝑑ℒ 𝑦 1−𝑦
= = 𝑑𝑎 = − +
𝑑𝑎 𝑑𝑎 𝑎 1−𝑎
𝑑𝑎
= 𝑎(1 − 𝑎)
𝑑𝑧
𝑤1 ← 𝑤1 − 𝛼𝑑𝑤1
𝑑ℒ 𝑑ℒ 𝑑𝑎
𝑑𝑧 = = =𝑎−𝑦 𝑤2 ← 𝑤2 − 𝛼𝑑𝑤2
𝑑𝑧 𝑑𝑎 𝑑𝑧
𝑑ℒ 𝑏 ← 𝑏 − 𝛼𝑑𝑏
𝑑𝑤1 = = 𝑥1 𝑑𝑧
𝑑𝑤1
𝑑ℒ
𝑑𝑤2 = = 𝑥2 𝑑𝑧
𝑑𝑤2
𝑑ℒ
𝑑𝑏 = = 𝑑𝑧
𝑑𝑏 33
Logistic regression on m examples
𝑥1
𝑤1
𝑥2 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑦=𝑎=𝜎 𝑧 ℒ 𝑎, 𝑦
𝑤2
𝑏
𝑚
1
𝐽 𝑤, 𝑏 = ℒ 𝑎 𝑖 ,𝑦 𝑖
𝑚
𝑖=1
𝑎(𝑖) = 𝑦 𝑖 =𝜎 𝑧 𝑖 = 𝜎 𝑤𝑇𝑥 𝑖 +𝑏
𝑚 𝑚
𝜕𝐽 𝑤, 𝑏 1 𝜕ℒ 𝑎 𝑖 , 𝑦 𝑖
1
= = 𝑑𝑤1 (𝑖)
𝜕𝑤1 𝑚 𝜕𝑤1 𝑚
𝑖=1 𝑖=1
𝑚 𝑖 𝑖 𝑚
𝜕𝐽 𝑤, 𝑏 1 𝜕ℒ 𝑎 , 𝑦 1
= = 𝑑𝑏
𝜕𝑏 𝑚 𝜕𝑏 𝑚
𝑖=1 𝑖=1
34
Logistic regression on m examples (iterative)
𝐽 = 0; 𝑑𝑤1 = 0; 𝑑𝑤2 = 0; 𝑑𝑏 = 0;
for i=1 to m:
𝑖
𝑧 = 𝑤𝑇𝑥 𝑖
+ 𝑏;
𝑎 𝑖 =𝜎 𝑧 𝑖 ;
𝑖 𝑖 𝑖 𝑖
𝐽 += − 𝑦 log 𝑎 + 1−𝑦 log 1 − 𝑎 ;
𝑖 𝑖
𝑑𝑧 =𝑎 −𝑦 𝑖 ;
𝑑𝑤1 += 𝑥1 (𝑖) 𝑑𝑧 𝑖 ;
𝑑𝑤2 += 𝑥2 (𝑖) 𝑑𝑧 𝑖 ;
𝑑𝑏 += 𝑑𝑧 𝑖 ;
𝐽/=𝑚; 𝑑𝑤1 /=𝑚; 𝑑𝑤2 /=𝑚; 𝑑𝑏/=𝑚;
𝑤1 = 𝑤1 − 𝛼𝑑𝑤1 ;
𝑤2 = 𝑤2 − 𝛼𝑑𝑤2 ;
𝑏 = 𝑏 − 𝛼𝑑𝑏;
35
What is vectorization?
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑤= 𝑤 ∈ ℝ𝑛𝑥 𝑥= 𝑥 ∈ ℝ𝑛𝑥
𝐳=𝟎;
for i in range(nx):
z = np.dot(w,x) + b
𝐳+=𝐰[𝐢]∗𝐱[𝐢];
𝐳+=𝐛;
36
Neural network programming guideline
Whenever possible, avoid explicit for loops
U = Av
𝐔𝐢 = 𝐀𝐢𝐣 𝐯𝐣
𝐢 𝐣
U = np.zeros((n,1))
u = np.dot(A,v)
for i=1...
for j=1...
u[i] += A[i][j]*v[j];
37
Vectors and matrix valued functions
Say you need to apply the exponential operation on every element of a
matrix/vector.
𝑣1 𝑒 𝑣1
𝑣2 𝑒 𝑣2
𝑣= ⋮ 𝑢= ⋮
⋮ ⋮
𝑣𝑛 𝑒 𝑣𝑛
38
Logistic regression derivatives
39
Vectorizing Logistic Regression
| | ⋯ |
𝑋 = 𝑥 (1) 𝑥 (2) ⋯𝑥 (𝑚)
| | ⋯ |
40
Implementing Logistic Regression
41
References
Andrew Ng. Deep learning Specialization. Deeplearning.AI.
Geoffrey Hinton. Neural Networks for Machine Learning.
42