0% found this document useful (0 votes)

5 views52 pages

3-LG Eval

The document outlines the key concepts of logistic regression in machine learning, including model definition, loss functions, and parameter optimization. It explains how logistic regression can be used for classification problems, providing examples such as house price prediction and credit card application acceptance. Additionally, it discusses the importance of regularization and gradient descent for optimizing the model's performance.

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views52 pages

3-LG Eval

Uploaded by

vinay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

EECS 836: Machine Learning

Zijun Yao
Assistant Professor, EECS Department
The University of Kansas

1
Anonymous Feedback (active till Feb 3)

2
Agenda

• Logistic Regression model

• Model definition

• Loss function

• Optimizing parameters

• Model Evaluation
• Metrics

• Methods

3
House price prediction - regression

Size of House
# of Bedrooms f Price of House
……. a value (scalar)

Feature x Target y

4
Linear regression recap
• The problem of predicting continuous values is called regression problem
• Given
• Data
• Corresponding labels

• Find a continuous function that models the continuous points

Model definition

𝑤0 is bias 𝑏, 𝑥0 is 1 for all data

Loss function
𝑛

𝐿 𝐰, 𝑏 = ෍ (𝑦 𝑖 − 𝑦ො (𝑖) )2
𝑖=1
5
House price prediction - classification

Size of House
# of Bedrooms f Trend of Price
……. a class (goes up or goes down)

Trend of Price
Down
Up
Down
Down
Up
Up
Down
Up

Feature x Target y

6
Linear classifiers - 3 steps
• Model definition:
z 𝒙 >0 Output = class 1
𝒙
𝑒𝑙𝑠𝑒 Output = class 0

• Loss function: how good is a classifier?

𝐿 𝑓 = ෍ 𝛿 𝑓 𝑥 (𝑖) ≠ 𝑦 (𝑖) The number of times f(x) get

incorrect results on training data.
𝑖

• Find the best classifier parameters: optimization algorithm

7
Step 1: Logistic regression definition
• Weight all features using linear regression
𝑧 = 𝑤0 𝑥0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑑 𝑥𝑑 (𝑤0 is bias 𝑏, 𝑥0 is 1)

• Pass real value 𝑧 to decision function for confidence of classification

Sigmoid function
1
𝑦ො = 𝜎(𝑧)

𝑦ො = 𝜎 𝑧 =
1 + 𝑒 −𝑧 1, 𝑧 ≥ 0 Positive class
𝑦ො = ቊ
0, 𝑧 < 0 Negative class

8
Probability
• Many problems require a probability estimate as output 𝑦ො
• Credit card application example
• Probability of accepting application 𝑝(𝑎𝑐𝑐𝑒𝑝𝑡|𝑎𝑔𝑒, 𝑖𝑛𝑐𝑜𝑚𝑒)
• 𝑧 = 𝑎𝑔𝑒 + 1.25 × 𝑖𝑛𝑐𝑜𝑚𝑒 − 80

Age
1
• Probability 𝑦ො = 𝜎 𝑧 =
1+𝑒 −𝑧

(55, 47)
age Income (k) z 𝑦=𝜎 𝑧 class 45
47 55 35.75 0.9999 1 (30, 42)
42 30 -0.5 0.3775 0

9
50K Income
Interpretation of logistic regression
• 𝑓(𝒙) estimates the probability of class
• Example: cancer diagnosis from tumor size
𝑥0 1
𝒙= 𝑥 =
1 Tumor size
𝑓 𝒙 = 0.7
The probability of this patient having malignant tumor is 70%

• Properties – probabilities of all classes sum up to 1

10
Interpretation of logistic regression
• Learns the odds of positive class

• Take the log on odds 𝑦=

1
1 + 𝑒 −𝑧

• Logistic regression model assumes that the log odds is a linear

function of 𝒙

11
Decision boundary

For class 0, 𝑤 𝑡 𝑥 should be For class 1, 𝑤 𝑡 𝑥 should be

large negative values large positive values

• Set a threshold that

• Predict 𝑦 = 1 if 𝑓(𝑥) ≥ 0.5
𝑤
• Predict 𝑦 = 0 if 𝑓(𝑥) < 0.5

12
Logistic regression as an artificial neuron:
Connection to neural networks

x1 w1
…

…
wi z  (z )
xi + 𝑝𝑤,𝑏 𝐶1 |𝑥
…
…

wI Sigmoid Function  (z )
𝑥𝑑 𝑤0 (or 𝑏)

 (z ) =
1
1 + e−z z
13
Step 2: Goodness of a function
Learning logistic regression model
• How are the parameters of the model (the weights w) learned?
• We want to learn parameters w that make 𝑦ො for each training data as close
as possible to the true class 𝑦

𝐷 = { 𝒙(1) , 𝑦 (1) , 𝒙(2) , 𝑦 (2) , … , 𝒙(𝑛) , 𝑦 (𝑛) }

𝑦ො = 𝜎 𝒘𝑇 𝒙

• Loss function: how close the classifier output (𝑦ො = 𝜎 𝒘𝑇 𝒙 ) is to the correct
output (𝑦, which is 0 or 1)
ො = How much predicted class 𝑦ො differs from the true 𝑦
ℒ(𝑦, 𝑦)
14
Loss function in probability

• Maximum likelihood estimate give training data:

𝐷= 𝒙 1 ,𝑦 1 , 𝒙 2 ,𝑦 2 ,…, 𝒙 𝑛 ,𝑦 𝑛

• Likelihood of a single data sample (given parameters)

• How likely the features are to produce an observed sample?
1
−(𝒘𝑇 𝒙) , if 𝑦 = 1
𝑝(𝑦|𝒙) = 1 + 𝑒
1
1− −(𝒘 𝑇 𝒙) , if 𝑦 = 0
1+𝑒

• Likelihood of the entire data is to multiply all sample ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 )

15
Loss function in probability

• Maximum likelihood estimate give training data:

𝐷= 𝒙 1 ,𝑦 1 , 𝒙 2 ,𝑦 2 ,…, 𝒙 𝑛 ,𝑦 𝑛

• Likelihood of a single data sample (given parameters)

• How likely the features are to produce an observed sample?
1
−(𝒘 𝑇 𝒙) , if 𝑦 = 1
𝑝(𝑦|𝒙) = 1 + 𝑒
1
(𝒘𝑇 𝒙) , if 𝑦 = 0
1+𝑒

• Likelihood of the entire data is to multiply all sample ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 )

16
Deriving the loss function

• Likelihood of the data 𝐷 = { 𝒙(1) , 𝑦 (1) , 𝒙(2) , 𝑦 (2) , … , 𝒙(𝑛) , 𝑦 (𝑛) } given the
parameters 𝒘 is ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝒘)

Probability if class 1 Probability if class 0 1

𝑇 𝒙) , if 𝑦 = 1
𝑝(𝑦|𝒙) = 1 + 𝑒 −(𝒘
1
𝑇 𝒙) , if 𝑦 = 0
1 + 𝑒 (𝒘

17
Deriving the loss function

• Likelihood of the data 𝐷 = { 𝒙(1) , 𝑦 (1) , 𝒙(2) , 𝑦 (2) , … , 𝒙(𝑛) , 𝑦 (𝑛) } given the
parameters 𝒘 is ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 ; 𝒘)

Probability if class 1 Probability if class 0 1

𝑇 𝒙) , if 𝑦 = 1
𝑝(𝑦|𝒙) = 1 + 𝑒 −(𝒘
1
• Take log of both sides 1 + 𝑒 (𝒘
𝑇 𝒙) , if 𝑦 = 0

Maximize the likelihood of the data, we get the best logistic regression model
18
Deriving the loss function

• Loss function (by adding a negative sign to likelihood)

Substitute probability 𝑝 with

discission function 𝜎 of 𝑤 𝑇 𝑥

Substitute discission function 𝜎 with sigmoid

𝑇
function 1/(1 + 𝑒 𝑤 𝑥 )

Finally, we show how loss function is

determined by parameters 𝑤
19
Deriving the loss function

• Loss function (by adding a negative sign to likelihood)

That’s it, we got cross-entropy loss

for classification!
20
Interpreting the loss function
• Cross-entropy loss for classification

gives the prediction

• Loss of a single data instance 𝑖

Plots of logarithm functions log 𝑒 (∙)

21
Interpreting the Loss ..
• Loss of a single data instance

When 𝑦 = 1
• loss = 0 if prediction is correct
• Or 𝑓(𝒙) ⟶ 0, loss ⟶ ∞
loss
Intuition is that larger mistakes should get
larger penalties
• e.g., predict 𝑓(𝒙) = 0, but 𝑦 = 1
𝑓(𝒙)

22
Interpreting the Loss ..
• Cost of a single data instance

When 𝑦 = 0
• loss = 0 if prediction is correct
• Or (1 − 𝑓(𝒙)) ⟶ 0, loss ⟶ ∞
loss
Larger mistakes get larger penalties as well
• e.g., predict 𝑓 𝒙 = 1, but 𝑦 = 0
𝑓(𝒙)

23
Regularized logistic regression
• Add a regularization term to constrain model complexity
• Prevent overfitting as we do for linear regression
• L2 norm (|| ∙ ||2 ) - the square root of the sum of the squared vector values
(Euclidean distance).
• Measure the size of parameter vector 𝒘

• Overall loss function for optimization

24
Step 3: Gradient descent for optimization
• To find the optimal weights: minimize the loss function we’ve
defined for the model
𝑛
1
𝐰 = argmin ෍ 𝐿 (𝛔(𝐰 ⊺ 𝐱 (𝑖) ) , 𝑦 (𝑖) )
∗
𝑤 𝑛
𝑖=1

• Learn by Gradient Descent method

• Choose a starting point [𝒘0 ]
• Repeat until convergence
• Compute gradient
• Update parameters

update for 𝑗 = 0 … 𝑑

25
Logistic Regression Linear Regression

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

𝑖 𝑖
Output: between 0 and 1 Output: real value

Step 2:

Step 3:

26
Logistic Regression Linear Regression

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

𝑖 𝑖
Output: between 0 and 1 Output: real value

Training data: 𝑥, 𝑦 Training data: 𝑥, 𝑦

Step 2: 𝑦: 1 for class 1, 0 for class 0 𝑦: a real number
2
𝐿 𝑓 = ෍ 𝐿 𝑓 𝑥 (𝑛) , 𝑦 (𝑛) 𝐿 𝑓 =෍ 𝑓 𝑥 (𝑛) −𝑦 (𝑛)

𝑛 𝑛 SSE loss

Cross entropy loss:

𝐿 𝑓 𝑥 (𝑖) , 𝑦 (𝑖) = − 𝑦 (𝑖) log𝑓 𝑥 (𝑖) + 1 − 𝑦 (𝑖) log 1 − 𝑓 𝑥 (𝑖)

27
Logistic Regression Linear Regression

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

𝑖 𝑖
Output: between 0 and 1 Output: real value

Training data: 𝑥, 𝑦 Training data: 𝑥, 𝑦

Step 2: 𝑦: 1 for class 1, 0 for class 0 𝑦: a real number
2
𝐿 𝑓 = ෍ 𝐿 𝑓 𝑥 (𝑛) , 𝑦 (𝑛) 𝐿 𝑓 =෍ 𝑓 𝑥 (𝑛) −𝑦 (𝑛)

𝑛 𝑛

(𝑛)
Logistic regression: 𝑤𝑖 ← 𝑤𝑖 − 𝛼 ෍ 𝑓𝑤,𝑏 𝑥 (𝑛) − 𝑦 (𝑛) 𝑥𝑖
𝑛
Step 3:
Linear regression: 𝑤𝑖 ← 𝑤𝑖 − 𝛼 ෍ 𝑓𝑤,𝑏 𝑥 (𝑛) − 𝑦 (𝑛) 𝑥𝑖(𝑛)
𝑛 28
Demo
class sklearn.linear_model.SGDClassifier(loss=‘log', *, penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=Tr
ue, max_iter=1000, tol=0.001, shuffle=True, verbose=0, epsilon=0.1, n_jobs=None, random_state=None, learnin
g_rate='optimal', eta0=0.0, power_t=0.5, early_stopping=False, validation_fraction=0.1, n_iter_no_change=5, cl
ass_weight=None, warm_start=False, average=False)
class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True,
intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto',
verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)
1. Prepare training and test data

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

2. Specific the model and train the model

clf = DecisionTreeClassifier(fit_intercept=True)
clf.fit(X_train, y_train)

3. Make prediction and evaluate the model

y_predict = clf.predict(X_test)
accuracy_score(y_test, y_predict)
29
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
Iris sample data set
• Iris Plant data set.
• Can be obtained from the UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html
• From the statistician Douglas Fisher
• Three flower types (classes):
• Setosa
• Versicolour
• Virginica
• Four (non-class) attributes
• Sepal width and length Setosa Versicolour Virginica
• Petal width and length
Data example
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
30
…
Agenda

• Logistic Regression model

• Model definition

• Loss function

• Optimizing parameters

• Model Evaluation
• Metrics

• Methods

31
Model evaluation
• Metrics for Performance Evaluation
• How to evaluate the predictive capability of a model?

• Methods for Evaluation Process

• How to obtain reliable estimates?

32
Metrics for regression task
• How close is your prediction against the target value?

Mean square error (MSE) Mean absolute error (MAE)

Gives larger penalization to big Treats all error the same

prediction error by square it

o Can NOT interpret how the model performance from one single result
o But can be used to compare against other models

33
Metrics for regression task
• How close your prediction is against the target value

R-squared (𝑅2 ), also called coefficient of determination

Prediction

Mean
A goodness-of-fit measure shows percentage of the target variation that a linear
model explains

o More informative than MSE and MAE by showing percentage rather

than absolute value (with arbitrary range)
34
Metrics for classification task
• Confusion Matrix (binary classification):

PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b a: TP (true positive)

ACTUAL
b: FN (false negative)
CLASS Class=No c d c: FP (false positive)
d: TN (true negative)

35
Metrics for performance evaluation
PREDICTED CLASS

Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS Class=No c d
(FP) (TN)

• Most widely-used metric:

a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
36
Limitation of accuracy
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10

• If model predicts everything to be class 0, accuracy is

9990/10000 = 99.9 %
• Accuracy is misleading because model does not detect any class 1
example

37
Cost matrix

PREDICTED CLASS

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)

ACTUAL
CLASS Class=No C(Yes|No) C(No|No)

C(i|j): Cost of classifying class j example as class i

38
CONFUSION PREDICTED CLASS
MATRIX
Class=Yes Class=No

Weighted ACTUAL
Class=Yes a
(TP)
b
(FN)
accuracy CLASS
Class=No c
(FP)
d
(TN)

COST PREDICTED CLASS

MATRIX
C(i|j) Class=Yes Class=No

ACTUAL Class=Yes 𝑤1 𝑤2
CLASS C(Yes|Yes) C(No|Yes)
Class=No 𝑤3 𝑤4
C(Yes|No) C(No|No)

𝑤1 𝑎+𝑤4 𝑑
Weighted Accuracy =
𝑤1 𝑎+𝑤2 𝑏+𝑤3 𝑐+𝑤4 𝑑 39
Computing weighted accuracy
Cost PREDICTED CLASS
Matrix
C(i|j) + -
ACTUAL
+ 1 100
CLASS
- 1 1

Model PREDICTED CLASS Model PREDICTED CLASS

M1 M2
+ - + -
ACTUAL ACTUAL
+ 150 40 + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%

Weighted Accuracy = 8.9% Weighted Accuracy= 9%
40
Precision-Recall
Count PREDICTED CLASS
Class=Yes Class=No
a TP
Precision (p) = = ACTUAL
Class=Yes a b
a + c TP + FP CLASS
Class=No c d
a TP
Recall (r) = =
a + b TP + FN
1 2rp 2a 2TP
F - measure (F) = = = =
 1 / r + 1 / p  r + p 2a + b + c 2TP + FP + FN
 
 2 

Assumption: The class YES is the one we care about.

Precision is biased towards C(Yes|Yes) & C(Yes|No)

Recall is biased towards C(Yes|Yes) & C(No|Yes)
F-measure is biased towards all except C(No|No)
41
ROC (Receiver Operating Characteristic)
• ROC curve plots the trade-off between TPR and FPR of a classifier
• Changing threshold of the model to classify data (e.g., 0.5 for sigmoid function)

Look at the positive predictions of the classifier and compute:

TP
TPR =
TP + FN
Prediction
What fraction of positive instances Yes No
are predicted correctly ?
Yes a b
Actual (TP) (FN)
FP
FPR = No c d
FP + TN
(FP) (TN)
What fraction of negative instances
were predicted incorrectly?
42
ROC curve
- Data set containing 2 classes (positive and negative)
- Any points located at x > t is classified as positive At threshold t:
TPR=0.5, FPR=0.12

model TP
TPR =
TP + FN

True Positive (TPR)

FP
FPR =
random guess FP + TN

Changing t False Positive (FPR)

(TPR=0, FPR=0): Model predicts every instance to be a negative class

(TPR=1, FPR=1): Model predicts every instance to be a positive class
(TPR=1, FPR=0): The perfect model with zero misclassifications 43
Model evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?

• Methods for Evaluation Process

• How to obtain reliable estimates?

44
Methods for evaluation process
• How to obtain a reliable estimate of performance?

• Performance of a model may depend on other factors besides

the learning algorithm:
• Class distribution
• Cost of misclassification
• Size of training and test sets

45
Methods of estimation
• Holdout
• Reserve two disjoint sets: training/testing (80%/20%)
• One sample may be biased -- Repeated holdout by random
subsampling
• Cross validation
• Partition data into k disjoint subsets
• k-fold: train on k-1 partitions, test on the remaining one
• Leave-one-out: k=n
• Guarantees that each record is used the same number of times for
training and testing
• Bootstrap
• Sampling with replacement of size N
• Repeat b times
46
Holdout evaluation

“Learn”
Training (features Model
+ labels)
Examples (predicted
Estimate (features) labels)

Holdout Eval
(true labels)

• Train data on examples training set (not held out)

• Evaluate model on holdout set (aka test set)
• A small holdout set from training set is validation set, for
monitoring overfitting 47
Cross-validation
• Why - data in test set is too different from data in training set

Fold 1
𝐾−1
Fold 2 Training
Examples

…
Fold 𝐾 Test

Repeat for all

𝐾 − 1 vs 1 test-train splits

• Gives better estimate of generalization performance

48
Cross-validation
Common choices of 𝐾:
• 10-fold cross-validation: very common
Data

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

…
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

• (𝑀 − 1)-fold cross-validation (where M is #examples):

• Aka “leave one out” cross-validation
49
Variations on cross-validation
• Repeated cross-validation
• Perform cross-validation a number of times
• Gives an estimate of the variance of the generalization error
• Stratified cross-validation
• Guarantee the same percentage of class labels in training and test
• Important when classes are imbalanced and the sample is small

50
Dealing with class imbalance
• If the class we are interested in is very rare, then the classifier
will ignore it.
• The class imbalance problem
• Solution
• We can modify the optimization criterion by using a cost sensitive
metric
• We can balance the class distribution
• Sample from the larger class so that the size of the two classes is the same
• Replicate the data of the class of interest so that the classes are balanced

51
Learning curve
Learning curve shows how accuracy
changes with varying sample size

Requires a sampling schedule for

creating learning curve

Effect of small sample size:

- Bias in the estimate
- Poor model
- Variance of estimate
- Poor training data

Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
DS203 2024 01 02 LogisticRegression
No ratings yet
DS203 2024 01 02 LogisticRegression
38 pages
Unit II
100% (1)
Unit II
13 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Lecture Note #9 - PEC-CS701E
No ratings yet
Lecture Note #9 - PEC-CS701E
41 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression by Nirzona
No ratings yet
Logistic Regression by Nirzona
11 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Final ML
No ratings yet
Final ML
54 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Fileml
No ratings yet
Fileml
54 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Task 1
No ratings yet
Task 1
7 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
CH 4
No ratings yet
CH 4
41 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Exp3 ML
No ratings yet
Exp3 ML
4 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Notes 05
No ratings yet
Notes 05
51 pages
Week 8
No ratings yet
Week 8
38 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Classification
No ratings yet
Classification
31 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
DS Unit 5
No ratings yet
DS Unit 5
27 pages
CH310
No ratings yet
CH310
2 pages
ECON022 BAP With Major
No ratings yet
ECON022 BAP With Major
3 pages
Ex: Luggage / Baggage / Breakage / Advice / Furniture / Information / Scenery / Poetry / Work / Soap / Food / Bread / Fish / Paper / Machinery Etc
No ratings yet
Ex: Luggage / Baggage / Breakage / Advice / Furniture / Information / Scenery / Poetry / Work / Soap / Food / Bread / Fish / Paper / Machinery Etc
3 pages
Bece Practice Questions
No ratings yet
Bece Practice Questions
11 pages
Math Set 1
No ratings yet
Math Set 1
9 pages
ULA Resource Pack (Urdu Version)
No ratings yet
ULA Resource Pack (Urdu Version)
70 pages
Frames of References 5th Sem Nep
No ratings yet
Frames of References 5th Sem Nep
16 pages
Surveying Solved MCQs (Set-14)
No ratings yet
Surveying Solved MCQs (Set-14)
8 pages
Module For Stem 12 Gen Physics
No ratings yet
Module For Stem 12 Gen Physics
23 pages
Machine Learning Experiment
No ratings yet
Machine Learning Experiment
69 pages
Bangxi Li (Auth.) - Linear Theory of Fixed Capital and China's Economy - Marx, Sraffa and Okishio-Springer Singapore (2017)
No ratings yet
Bangxi Li (Auth.) - Linear Theory of Fixed Capital and China's Economy - Marx, Sraffa and Okishio-Springer Singapore (2017)
132 pages
UNIT5 Comparison Tree
No ratings yet
UNIT5 Comparison Tree
52 pages
Super 15marks Question
100% (1)
Super 15marks Question
2 pages
1980 Kennedy
No ratings yet
1980 Kennedy
24 pages
Expt 4 Conclusion and Applications
0% (2)
Expt 4 Conclusion and Applications
2 pages
Traffic Engineering
No ratings yet
Traffic Engineering
24 pages
第七單元細線化與骨架抽取
No ratings yet
第七單元細線化與骨架抽取
13 pages
Norma E691 Ingles
No ratings yet
Norma E691 Ingles
21 pages
Introduction To Simio
No ratings yet
Introduction To Simio
10 pages
The Sublime Girls Academy of Science Rajanpur: Long Questions
No ratings yet
The Sublime Girls Academy of Science Rajanpur: Long Questions
2 pages
05 - Multiple-Stage Factory Models - With - Solutions - New
No ratings yet
05 - Multiple-Stage Factory Models - With - Solutions - New
74 pages
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
No ratings yet
Aggregate Functions Combine Multiple Rows Together To Form A Single Value of More Meaningful
3 pages
Revised Final Version Clean
No ratings yet
Revised Final Version Clean
26 pages
5NF and Other Normal Forms
No ratings yet
5NF and Other Normal Forms
22 pages
2000 Handbook of Weaving-7
No ratings yet
2000 Handbook of Weaving-7
30 pages
Resilience-Oriented Optimal Operation of Networked Hybrid Microgrids
No ratings yet
Resilience-Oriented Optimal Operation of Networked Hybrid Microgrids
11 pages
Paraview Tutorial
No ratings yet
Paraview Tutorial
28 pages
CN U2
No ratings yet
CN U2
162 pages
M.E Maths
No ratings yet
M.E Maths
87 pages

3-LG Eval

Uploaded by

3-LG Eval

Uploaded by

EECS 836: Machine Learning

• Logistic Regression model

• Find a continuous function that models the continuous points

𝑤0 is bias 𝑏, 𝑥0 is 1 for all data

• Loss function: how good is a classifier?

𝐿 𝑓 = ෍ 𝛿 𝑓 𝑥 (𝑖) ≠ 𝑦 (𝑖) The number of times f(x) get

• Find the best classifier parameters: optimization algorithm

• Pass real value 𝑧 to decision function for confidence of classification

• Properties – probabilities of all classes sum up to 1

• Take the log on odds 𝑦=

• Logistic regression model assumes that the log odds is a linear

For class 0, 𝑤 𝑡 𝑥 should be For class 1, 𝑤 𝑡 𝑥 should be

• Set a threshold that

𝐷 = { 𝒙(1) , 𝑦 (1) , 𝒙(2) , 𝑦 (2) , … , 𝒙(𝑛) , 𝑦 (𝑛) }

• Maximum likelihood estimate give training data:

• Likelihood of a single data sample (given parameters)

• Likelihood of the entire data is to multiply all sample ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 )

• Maximum likelihood estimate give training data:

• Likelihood of a single data sample (given parameters)

• Likelihood of the entire data is to multiply all sample ς𝑛𝑖=1 𝑝(𝑦 𝑖 |𝒙 𝑖 )

Probability if class 1 Probability if class 0 1

Probability if class 1 Probability if class 0 1

• Loss function (by adding a negative sign to likelihood)

Substitute probability 𝑝 with

Substitute discission function 𝜎 with sigmoid

Finally, we show how loss function is

• Loss function (by adding a negative sign to likelihood)

That’s it, we got cross-entropy loss

gives the prediction

• Loss of a single data instance 𝑖

Plots of logarithm functions log 𝑒 (∙)

• Overall loss function for optimization

• Learn by Gradient Descent method

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

Training data: 𝑥, 𝑦 Training data: 𝑥, 𝑦

Cross entropy loss:

Step 1: 𝑓𝑤,𝑏 𝑥 = 𝜎 ෍ 𝑤𝑖 𝑥𝑖 + 𝑏 𝑓𝑤,𝑏 𝑥 = ෍ 𝑤𝑖 𝑥𝑖 + 𝑏

Training data: 𝑥, 𝑦 Training data: 𝑥, 𝑦

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

2. Specific the model and train the model

3. Make prediction and evaluate the model

• Logistic Regression model

• Methods for Evaluation Process

Mean square error (MSE) Mean absolute error (MAE)

Gives larger penalization to big Treats all error the same

R-squared (𝑅2 ), also called coefficient of determination

o More informative than MSE and MAE by showing percentage rather

Class=Yes a b a: TP (true positive)

• Most widely-used metric:

• If model predicts everything to be class 0, accuracy is

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)

C(i|j): Cost of classifying class j example as class i

COST PREDICTED CLASS

Model PREDICTED CLASS Model PREDICTED CLASS

Accuracy = 80% Accuracy = 90%

Assumption: The class YES is the one we care about.

Precision is biased towards C(Yes|Yes) & C(Yes|No)

Look at the positive predictions of the classifier and compute:

True Positive (TPR)

Changing t False Positive (FPR)

(TPR=0, FPR=0): Model predicts every instance to be a negative class

• Methods for Evaluation Process

• Performance of a model may depend on other factors besides

• Train data on examples training set (not held out)

Repeat for all

• Gives better estimate of generalization performance

• (𝑀 − 1)-fold cross-validation (where M is #examples):

Requires a sampling schedule for

Effect of small sample size:

You might also like