Lecture ai
Lecture ai
Mourad Gridach
Department of Computer Science
High Institute of Technology - Agadir
Last Lecture
2
Today’s Lecture
• We will cover classification
3
Notes on Notation
• In linear regression we used w to refer to weights or
parameters of the model
4
MucCulloch-Pitts Model of Neuron
5
Sigmoid Function
• We want
• Logistic regression:
• Where :
X1 x2 y
34.1 10.12 0
30.11 43.21 1
35.1 72.12 1
60.2 86.78 1
79.23 75.23 0
45.08 96.67 1
75.89 46.23 0
Applications
• Sentiment classification
• Medical diagnosis
8
Probabilistic Interpretation
• Logistic regression:
Probabilistic Interpretation
• Logistic regression:
Logistic Regression Hypothesis
Formula:
Linear Separating Hyper-planes
13
Bernoulli Distribution: a model of coins
θ if x=1
1–θ if x=0
14
Entropy
• In information theory, entropy H is a measure of the uncertainty
associated with a random variable. It is defined as:
15
Logistic Regression
• The logistic regression model specifies the probability of a binary output
yi∈{0, 1} given the input xi as follows:
16
Logistic Regression – Cost Function
17
(Binary) Cross Entropy Loss - Intuition
Summary
Correct answer à Loss
Cost function
19
Hessian of binary logistic regression - Exercise
Cost function
OR
21
Regularization :
22
Training vs. Testing
• Students vs Exams
Output (y)
Output (y)
Input (x) Input (x) Input (x)
24
Linear Regression Revisited
Output (y)
Output (y)
Output (y)
Input (x) Input (x) Input (x)
Overfitting: If we have too many features, the learned hypothesis may fit
the training set very well
but fail to generalize to new examples (predict prices on new examples).
25
Logistic Regression and Overfitting
26
Underfitting
27
Overfitting
28
Best Solution
29
How to Solve Overfitting
1. Reduce number of features.
– Manually select which features to keep.
– Model selection algorithm(out of the scope).
2. Regularization J
– Keep all the features, but reduce magnitude/values of parameters .
– Works well when we have a lot of features, each of which
contributes a bit to predicting .
30
So let us apply the second solution:
Regularization
31
Intuition
Output (y)
Output (y)
32
Intuition
Output (y)
Output (y)
33
Regularization
• Small values for parameters
– “Simpler” hypothesis
– Smooth function
– Less prone to Overfitting
• In the last example, we penalize θ3 and θ4
• Let us take the last example of car prices
ü Features :
ü Parameters :
• Question: how to choose which parameters to penalize ?
34
Regularization – General Mathematical Formula
• How to choose which parameters to penalize ?
35
Remark
• What if λ (the regularization term) is very large (λ=1020) ?
• Let us take this example with 5 parameters
– θ1 ≈ θ2≈ θ3 ≈ θ4 ≈ 0
θ0 – What will happen ?
Output (y)
underfitting
– Answer :
Input (x)
36
Regularization for Linear/Logistic Regression
• The new Linear Regression cost function after adding the regularization term
will be :
• Recall: the difference will be just in the hypothesis and the cost
function J(θ) where the gradient descent formula will be the same
37
Gradient Descent in Action
• Add derivative of the regularization term to gradient descent for Linear
regression
Repeat {
} For j=1, …, n
38
Summary
• Logistic Regression
• Regularization problem
39
Questions
40