0% found this document useful (0 votes)

5 views

Week 04 Logistic Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Week 04 Logistic Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

31-08-2024

TOD 533
Logistic Regression
Amit Das
TODS / AMSOM / AU
[email protected]

Supervised Learning: Regression and Classification

• SUPERVISED LEARNING
• We use a set of training data to set the parameters of a model, e.g.
a, b1, b2, …, bn of a regression model y = a + b1x1 + b2x2 + … + bnxn
• The training data contains known values of the dependent variable DV (the
variable to be predicted)
• The trained model (with optimum values of the parameters) is used to predict
the outcome for new (unseen) cases (whose DV values are not known)
• When the DV is interval-scaled (“continuous”) -> regression
• We built a model to predict the fuel economy of automobiles
• When the DV is nominal (“categorical”) -> classification
• We will build a model for a DV that has two classes, diabetic or healthy

1
31-08-2024

Why not regression?

• Consider the case of whether a person has diabetes* or not …
Dependent variable = has(diabetes), possible values (yes, no)

• Nominal dependent variable can be transformed to interval

variable p[has(diabetes)], and build a model such as
p[has(diabetes)] = a + b1X1 + b2X2 + b3X3 + … + bnXn

* WHO: 2 hour post-load plasma glucose at least 200 mg/dl

The bounds of probability

We need an equation of the type
p[has(diabetes)] = a + b1X1 + b2X2 + b3X3 + … + bnXn

• But how do we constrain it to stay between 0 and 1?

p = exp(a + b1x1 + …) = ea + b1x1 + … > 0, and

exp(a + b1x1 + …) ea + b1x1 + …

p= = <1
exp(a + b1x1 + …) ea + b1x1 + …

2
31-08-2024

Shape of the logistic regression function

Odds and log(odds)

With a little algebra
ln = a + b1x1 + b2x2 … + bnxn

Even though the probability is not a linear function of xi, the transform
ln is a linear function of xi

What is ln ? Log of the odds ratio, also called the logit.

= ea + b1x1 + … = ea . eb1x1 . eb2x2 … ebnxn (multiplicative model)

3
31-08-2024

The diabetes dataset

• Title: Pima Indians Diabetes Database
• Original owners: National Institute of Diabetes and Digestive and Kidney Diseases

• Attributes
1. preg = Number of times pregnant
2. plas = Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. pres = Diastolic blood pressure (mm Hg)
4. skin = Triceps skin fold thickness (mm)
5. insu = 2-Hour serum insulin (mu U/ml)
6. mass = Body mass index (weight in kg/(height in m)^2)
7. pedi = Diabetes pedigree function
8. age = Age (years)
9. Class variable = tested positive for diabetes (1 (268 instances) or 0 (500 instances))

Model Coefficients - diabetes

Predictor Estimate SE Z p Odds ratio

Intercept -5.71139 0.54034 -10.570 < .001 0.00331 a

age 0.03057 0.00821 3.723 < .001 1.03104 b1
pedi 1.00899 0.25928 3.891 < .001 2.74283 b2
mass 0.09974 0.01364 7.314 < .001 1.10489 b3
skin -0.00520 0.00555 -0.936 0.349 0.99482 b4
preg 0.09660 0.02857 3.381 < .001 1.10142 b5

Note. Estimates represent the log odds of "diabetes = 1" vs. "diabetes = 0"

ln = -5.711 + 0.031age + 1.009pedi + 0.100*mass

-0.005*skin + 0.097*preg

= ea + b x1 + … = e-5.711 * e0.031age e1.009pedi e0.100*mass

* e-0.005*skin* e0.097*preg
= 0.00331*(1.031^age)*(2.743^pedi)*(1.105^mass)*(0.995^skin)*(1.101^preg)

4
31-08-2024

Prediction for a new patient

Age = 50
Pedigree Score = 0.5 (some history of diabetes in the family)
BMI = 30
SkinFold Thickness = 22
Pregnancies = 3

What the likelihood of her being diabetic?

= 0.00331*(1.031^age)*(2.743^pedi)*(1.105^mass)*(0.995^skin)*(1.101^preg)
= 0.00331*(1.031^50)*(2.743^0.5)*(1.105^30)*(0.995^22)*(1.101^3)
= 0.00331*4.602*1.656*19.993*0.896*1.335 = 0.603
p = 0.603 / (1 + 0.603) = 37.6% < 50%

Dimension reduction / feature selection

• LASSO (forces some regression coefficients to be zero)
• Stepwise regression (retains only those variables that improve model
fit)
• Principal components PCA (creates “combination” variables that
capture the information in the original variables more
parsimoniously)
• Demo in STATA

2 Months Nclex Study Calendar
67% (3)
2 Months Nclex Study Calendar
2 pages
Critical Review of Planetary Coolers Technology
No ratings yet
Critical Review of Planetary Coolers Technology
10 pages
WillyDev Oracle2SQLServerGuide
No ratings yet
WillyDev Oracle2SQLServerGuide
382 pages
G26_report
No ratings yet
G26_report
4 pages
Report Diabetics
No ratings yet
Report Diabetics
8 pages
22IM30025 Prakriti Assign 02 Stl Lab
No ratings yet
22IM30025 Prakriti Assign 02 Stl Lab
9 pages
ppt sip[1]
No ratings yet
ppt sip[1]
9 pages
Logistic Regression Essentials in R - Articles - STHDA
No ratings yet
Logistic Regression Essentials in R - Articles - STHDA
10 pages
Classification
No ratings yet
Classification
9 pages
Capstone Presentation Version 1.0
No ratings yet
Capstone Presentation Version 1.0
21 pages
222ECO01 Anand Advanced Econometrics Activity1
No ratings yet
222ECO01 Anand Advanced Econometrics Activity1
6 pages
A Comparative Analysis Using Machine Learning Algorithm on - Copy
No ratings yet
A Comparative Analysis Using Machine Learning Algorithm on - Copy
19 pages
Classifier Model For Diabetes Prediction
No ratings yet
Classifier Model For Diabetes Prediction
30 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
Diabetes Mellitus Prediction and Diagnosis 2022
No ratings yet
Diabetes Mellitus Prediction and Diagnosis 2022
12 pages
Diabetes Prediction - ML
No ratings yet
Diabetes Prediction - ML
29 pages
Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression
No ratings yet
Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression
17 pages
YLP Logistic Regression
No ratings yet
YLP Logistic Regression
61 pages
Diabetes Metabolism Res - 2020 - Cahn - Prediction of Progression From Pre‐Diabetes to Diabetes Development and Validation
No ratings yet
Diabetes Metabolism Res - 2020 - Cahn - Prediction of Progression From Pre‐Diabetes to Diabetes Development and Validation
8 pages
lecture2-supervised-learning slides
No ratings yet
lecture2-supervised-learning slides
56 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Predictive Models For Diabetes Mellitus Using Machine Learning Techniques
No ratings yet
Predictive Models For Diabetes Mellitus Using Machine Learning Techniques
9 pages
1 Lecture 2: Supervised Machine Learning
No ratings yet
1 Lecture 2: Supervised Machine Learning
20 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Lecture 5. Part 1 - Regression Analysis
No ratings yet
Lecture 5. Part 1 - Regression Analysis
28 pages
Dundee June 18
No ratings yet
Dundee June 18
56 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes Prediction Model
No ratings yet
Diabetes Prediction Model
7 pages
BI Miniproject Report (Diabetes)
No ratings yet
BI Miniproject Report (Diabetes)
18 pages
Rplots
No ratings yet
Rplots
8 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
Homework 9 Solutions: Table (Type)
No ratings yet
Homework 9 Solutions: Table (Type)
6 pages
E_AI_Lab_EX_2and_3
No ratings yet
E_AI_Lab_EX_2and_3
9 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Aishwarya K S
No ratings yet
Aishwarya K S
15 pages
FUNDAMENTALS OF BUSINESS ANALYTICS
No ratings yet
FUNDAMENTALS OF BUSINESS ANALYTICS
5 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Logistic Regression Asssignment Solutions
No ratings yet
Logistic Regression Asssignment Solutions
17 pages
21201093 (4)
No ratings yet
21201093 (4)
3 pages
Pythone code for predicting diabetes using ML
No ratings yet
Pythone code for predicting diabetes using ML
18 pages
diabetology-05-00001
No ratings yet
diabetology-05-00001
11 pages
Diabe.pdf
No ratings yet
Diabe.pdf
11 pages
14-Improving SVM Performance For Type II Diabetes Prediction With An Improved Non-Linear Kernel Insights From The PIMA Dataset
No ratings yet
14-Improving SVM Performance For Type II Diabetes Prediction With An Improved Non-Linear Kernel Insights From The PIMA Dataset
9 pages
2017_18 Exam
No ratings yet
2017_18 Exam
4 pages
Multinomial Logistic Regression-2
No ratings yet
Multinomial Logistic Regression-2
21 pages
Early Detection of Type 2 Diabetes Mellitus Using Machine Learning-Based Prediction Models
No ratings yet
Early Detection of Type 2 Diabetes Mellitus Using Machine Learning-Based Prediction Models
12 pages
IBM PROJECT
No ratings yet
IBM PROJECT
9 pages
Exp2 Milf
No ratings yet
Exp2 Milf
7 pages
Diabetes Predection
No ratings yet
Diabetes Predection
7 pages
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
No ratings yet
Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub
8 pages
ML Diabetes Ieee
100% (1)
ML Diabetes Ieee
12 pages
Diabetes_Prediction_Report
No ratings yet
Diabetes_Prediction_Report
4 pages
21BCE9757 ITT Summer Internship AI ML Report
No ratings yet
21BCE9757 ITT Summer Internship AI ML Report
18 pages
Independent Project
No ratings yet
Independent Project
10 pages
A Multivariate Logistic Regression Equation To Screen For Diabetes
No ratings yet
A Multivariate Logistic Regression Equation To Screen For Diabetes
5 pages
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
No ratings yet
A Data-Driven Approach To Predicting Diabetes and Cardiovascular Disease With Machine Learning
15 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
27 pages
241410
No ratings yet
241410
10 pages
Research Proposal
100% (1)
Research Proposal
13 pages
Prognostic Biomarkers Identification For Diabetes Prediction by Utilizing Machine Learning Classifiers
No ratings yet
Prognostic Biomarkers Identification For Diabetes Prediction by Utilizing Machine Learning Classifiers
6 pages
A Comparative Analysis of Early Stage Diabetes Prediction Using Machine Learning and Deep Learning Approach
No ratings yet
A Comparative Analysis of Early Stage Diabetes Prediction Using Machine Learning and Deep Learning Approach
7 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
B.Sc-Interior Design
100% (2)
B.Sc-Interior Design
16 pages
Design of Beam Sections For Shear
100% (1)
Design of Beam Sections For Shear
5 pages
Dhaka Board HSC Challenge Result 2023
No ratings yet
Dhaka Board HSC Challenge Result 2023
77 pages
Advanced Grammer
100% (1)
Advanced Grammer
37 pages
30 Đề Luyện Thi Vào 10 Năm Học 2023-2024
No ratings yet
30 Đề Luyện Thi Vào 10 Năm Học 2023-2024
105 pages
Aethra (Greek Mythology)
No ratings yet
Aethra (Greek Mythology)
2 pages
Plot Summary: Season 1: Dino Charge
No ratings yet
Plot Summary: Season 1: Dino Charge
2 pages
MIDSEM II Exams Time Table Feb
No ratings yet
MIDSEM II Exams Time Table Feb
11 pages
Gender Politics and The Critical Gaze: Jean-Luc Godard's Masculin-Feminin
No ratings yet
Gender Politics and The Critical Gaze: Jean-Luc Godard's Masculin-Feminin
7 pages
Microbial Genetics
No ratings yet
Microbial Genetics
7 pages
Manual - Centrala Unipos 2 Bucle Ifs 70022 - 148
No ratings yet
Manual - Centrala Unipos 2 Bucle Ifs 70022 - 148
145 pages
Proposal Layout
50% (2)
Proposal Layout
11 pages
104 08655 440 DSH e 005 - 1
No ratings yet
104 08655 440 DSH e 005 - 1
5 pages
Signals Systems and Control
100% (1)
Signals Systems and Control
581 pages
Dbms
No ratings yet
Dbms
16 pages
Modern Physics Part 2
100% (1)
Modern Physics Part 2
11 pages
Wade C. Sherbrooke y George A. Middendorf Lll.2001.variabilidad de Chorros de Sangre en Lagartos (Phrynosoma)
No ratings yet
Wade C. Sherbrooke y George A. Middendorf Lll.2001.variabilidad de Chorros de Sangre en Lagartos (Phrynosoma)
10 pages
Proc 1159 Gma
No ratings yet
Proc 1159 Gma
4 pages
MediaTek Dimensity 8100 8000 Infographic
No ratings yet
MediaTek Dimensity 8100 8000 Infographic
1 page
BODYSTEP 107 Education Only - Print Ready
No ratings yet
BODYSTEP 107 Education Only - Print Ready
4 pages
Chemistry Practice Exam
No ratings yet
Chemistry Practice Exam
9 pages
Solution 1403695
No ratings yet
Solution 1403695
13 pages
Customizing Powerdesigner
No ratings yet
Customizing Powerdesigner
378 pages
9 The Simple Truth September 2024 F
No ratings yet
9 The Simple Truth September 2024 F
28 pages
IPL 08 Precedent
No ratings yet
IPL 08 Precedent
268 pages
Walkways 2019
No ratings yet
Walkways 2019
6 pages
45 Ways To Lose Money Trading Forex
No ratings yet
45 Ways To Lose Money Trading Forex
4 pages

Week 04 Logistic Regression

Uploaded by

Week 04 Logistic Regression

Uploaded by

31-08-2024

Supervised Learning: Regression and Classification

Why not regression?

• Nominal dependent variable can be transformed to interval

* WHO: 2 hour post-load plasma glucose at least 200 mg/dl

The bounds of probability

• But how do we constrain it to stay between 0 and 1?

exp(a + b1x1 + …) ea + b1x1 + …

Shape of the logistic regression function

Odds and log(odds)

What is ln ? Log of the odds ratio, also called the logit.

= ea + b1x1 + … = ea . eb1x1 . eb2x2 … ebnxn (multiplicative model)

The diabetes dataset

Model Coefficients - diabetes

Predictor Estimate SE Z p Odds ratio

Intercept -5.71139 0.54034 -10.570 < .001 0.00331 a

ln = -5.711 + 0.031*age + 1.009*pedi + 0.100*mass

= ea + b x1 + … = e-5.711 * e0.031*age * e1.009*pedi * e0.100*mass

Prediction for a new patient

What the likelihood of her being diabetic?

Dimension reduction / feature selection

You might also like

ln = -5.711 + 0.031age + 1.009pedi + 0.100*mass

= ea + b x1 + … = e-5.711 * e0.031age e1.009pedi e0.100*mass