0% found this document useful (0 votes)

45 views32 pages

Log Reg

Regression is a statistical modeling technique used to evaluate relationships between variables, where one variable is dependent on one or more independent variables. Linear regression fits a linear equation to continuous data to minimize the sum of squared errors between observed and predicted values. Logistic regression applies a sigmoid curve to binary dependent data and uses linear regression on the logit transform of the odds to model relationships between predictors and the log odds of the dependent variable.

Uploaded by

SIDDHARTH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views32 pages

Log Reg

Uploaded by

SIDDHARTH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Regression

• A form of statistical modeling that

attempts to evaluate the relationship
between one variable (termed the
dependent variable) and one or more
other variables (termed the independent
variables). It is a form of global analysis as
it only produces a single equation for the
relationship.
• A model for predicting one variable from
another.
Linear Regression
• Regression used to fit a linear model to data
where the dependent variable is continuous:

Y   0  1 X1   2 X2 …  n Xn  
• Given a set of points (Xi,Yi), we wish to find
a linear function (or line in 2 dimensions)
that “goes through” these points.
• In general, the points are not exactly
 aligned:
– Find line that best fits the points
Residue

• Error or residue:
– Observed value - Predicted value
Chart Title

4
Observed
Linear (Observed)
3

0
0 0.5 1 1.5 2 2.5
Sum-squared Error (SSE)

SSE   (yobserved  ypredicted )2

TSS  ( yobserved  yobserved ) 2

SSE
R  1
2

  TSS
What is Best Fit?

• The smaller the SSE, the better the

fit
• Hence,
– Linear regression attempts to minimize
SSE (or similarly to maximize R2)
• Assume 2 dimensions
Y   0  1 X
Analytical Solution

0 
 y   x1

1 
 xy   x y
n

n x    x
2
2
Example (I)

1 
 xy   x y
n

n x    x
2
2
x y x^2 xy
1.20 4.00 1.44 4.80 7  223.61  24.10  58.00

2.30 5.60 5.29 12.88 7  95.31  24.10 2
1565.27 1397.80
3.10 7.90 9.61 24.49 
667.17  580.81
3.40 8.00 11.56 27.20 167.47
 1.94
4.00 10.10 16.00 40.40 86.36
4.60 10.40 21.16 47.84
5.50 12.00 30.25 66.00 0 
 y   x
1

n
24.10 58.00 95.31 223.61 58.00 1.94  24.10

7
Target: y=2x+1.5 
11.27
1.61
7
Example (II)

Observed

14.00

12.00

10.00

8.00
Observed
6.00

4.00

2.00

0.00
0.00 1.00 2.00 3.00 4.00 5.00 6.00
Example (III)

SSE 0.975
R  1
2
 1  0.98
TSS 47.369
Logistic Regression

• Regression used to fit a curve to data

in which the dependent variable is
binary, or dichotomous
• Typical application: Medicine
– We might want to predict response to
treatment, where we might code
survivors as 1 and those who don’t
survive as 0
Example

Observations:
For each value of
SurvRate, the
number of dots is the
number of patients
with that value of
NewOut

Regression:
Standard linear
regression

Problem: extending the regression line a few units left or right along
the X axis produces predicted probabilities that fall outside of [0,1]
A Better Solution

Regression Curve:
Sigmoid function!

(bounded by
asymptotes y=0 and
y=1)
Odds
• Given some event with probability p of being 1,
the odds of that event are given by:
odds = p / (1–p)
• Consider the following data
Delinquent
Yes No Total
Normal 402 3614 4016
Testosterone High 101 345 446
503 3959 4462
• The odds of being delinquent if you are in the
Normal group are:
pdelinquent/(1–pdelinquent) = (402/4016) / (1 - (402/4016)) =
0.1001 / 0.8889 = 0.111
Odds Ratio
• The odds of being not delinquent in the Normal
group is the reciprocal of this:
– 0.8999/0.1001 = 8.99
• Now, for the High testosterone group
– odds(delinquent) = 101/345 = 0.293
– odds(not delinquent) = 345/101 = 3.416
• When we go from Normal to High, the odds of
being delinquent nearly triple:
– Odds ratio: 0.293/0.111 = 2.64
– 2.64 times more likely to be delinquent with high
testosterone levels
Logit Transform

• The logit is the natural log of the odds

• logit(p) = ln(odds) = ln (p/(1-p))

Logistic Regression

• In logistic regression, we seek a model:

logit( p)   0  1 X
• That is, the log odds (logit) is assumed
to be linearly related to the
independent variable X
 •  So, now we can focus on solving an
ordinary (linear) regression!
Recovering Probabilities

which gives p as a sigmoid function!

Logistic Response Function

• When the response variable is

binary, the shape of the response
function is often sigmoidal:
Interpretation of 1
• Let:
– odds1 = odds for value X (p/(1–p))
– odds2 = odds for value X + 1 unit
• Then:
odds2 e 0   1 (X 1)
   X
odds1 e0 1
e(  0   1 X ) 1 e(  0  1 X )e 1 1
  0  1 X
  0  1 X
 e
e e
• Hence, the exponent of the slope describes the
proportionate rate at which the predicted odds
ratio changes with each successive unit of X
 
Sample Calculations
• Suppose a cancer study yields:
– log odds = –2.6837 + 0.0812 SurvRate
• Consider a patient with SurvRate = 40
– log odds = –2.6837 + 0.0812(40) = 0.5643
– odds = e0.5643 = 1.758
– patient is 1.758 times more likely to be improved than not
• Consider another patient with SurvRate = 41
– log odds = –2.6837 + 0.0812(41) = 0.6455
– odds = e0.6455 = 1.907
– patient’s odds are 1.907/1.758 = 1.0846 times (or 8.5%) better than
those of the previous patient
• Using probabilities
– p40 = 0.6374 and p41 = 0.6560
– Improvements appear different with odds and with p
Example 1 (I)

• A systems analyst studied

the effect of computer
programming experience
on ability to complete a
task within a specified time
• Twenty-five persons
selected for the study, with
varying amounts of
computer experience (in
months)
• Results are coded in binary
fashion: Y = 1 if task
completed successfully; Y
= 0, otherwise
Loess: form of local regression
Example 1 (II)

• Results from a standard package

give:
– 0 = –3.0597 and 1 = 0.1615
• Estimated logistic regression
function: p
1
1 1 e3.05970.1615X
p 3.05970.1615(14 )
 0.31
1 e
• For example, the fitted value for X =
14 is:  
 
(Estimated probability that a person with 14 months experience will
successfully complete the task)
Example 1 (III)
• We know that the probability of success
increases sharply with experience
– Odds ratio: exp(1) = e0.1615 = 1.175
– Odds increase by 17.5% with each additional
month of experience
• A unit increase of one month is quite
small, and we might want to know the
change in odds for a longer difference in
time
– For c units of X: exp(c1)
Example 1 (IV)

• Suppose we want to compare

individuals with relatively little
experience to those with extensive
experience, say 10 months versus 25
months (c = 15)
– Odds ratio: e15x0.1615 = 11.3
– Odds of completing the task increase
11-fold!
Example 2 (I)

• In a study of the effectiveness

of coupons offering a price
reduction, 1,000 homes were
selected and coupons mailed
• Coupon price reductions: 5,
10, 15, 20, and 30 dollars
• 200 homes assigned at
random to each coupon value
• X: amount of price reduction
• Y: binary variable indicating
whether or not coupon was
redeemed
Example 2 (II)

• Fitted response function

– 0 = -2.04 and 1 = 0.097

• Odds ratio: exp(1) =

e0.097 = 1.102
• Odds of a coupon being
redeemed are
estimated to increase
by 10.2% with each $1
increase in the coupon
value (i.e., $1 in price
reduction)
Putting it to Work

• For each value of X, you may not have

probability but rather a number of <x,y>
pairs from which you can extract
frequencies and hence probabilities
– Raw data: <12,0>, <12,1>, <14,0>,
<12,1>, <14,1>, <14,1>, <12,0>, <12,0>
– Probability data (p=1, 3rd entry is number of
occurrences in raw data): <12, 0.4, 5>, <14,
0.66, 3>
– Odds ratio data…
Coronary Heart Disease (I)

Age Group Coronary Heart Disease

Total
No Yes
1 9 1 10 (20-29)
2 13 2 15 (30-34)
3 9 3 12 (35-39)
4 10 5 15 (40-44)
5 7 6 13 (45-49)
6 3 5 8 (50-54)
7 4 13 17 (55-59)
8 2 8 10 (60-69)
Total 57 43 100
Coronary Heart Disease (II)

Age Group p(CHD)=1 odds log odds #occ

1 0.1000 0.1111 -2.1972 10
2 0.1333 0.1538 -1.8718 15
3 0.2500 0.3333 -1.0986 12
4 0.3333 0.5000 -0.6931 15
5 0.4615 0.8571 -0.1542 13
6 0.6250 1.6667 0.5108 8
7 0.7647 3.2500 1.1787 17
8 0.8000 4.0000 1.3863 10
Coronary Heart Disease (III)

X ( AG) Y (log odds) X^2 XY #occ

1 -2.1972 1.0000 -2.1972 10
2 -1.8718 4.0000 -3.7436 15
3 -1.0986 9.0000 -3.2958 12
4 -0.6931 16.0000 -2.7726 15
5 -0.1542 25.0000 -0.7708 13
6 0.5108 36.0000 3.0650 8
7 1.1787 49.0000 8.2506 17
8 1.3863 64.0000 11.0904 10
448 -37.6471 2504.0000 106.3981 100

Note: the sums reflect the number of occurrences

(Sum(X) = X1.#occ(X1)+…+X8.#occ(X8), etc.)
Coronary Heart Disease (IV)

• Results from regression:

– 0 = -2.856 and 1 = 0.5535
Age Group p(CHD)=1 est. p
1 0.1000 0.0909
2 0.1333 0.1482
3 0.2500 0.2323
4 0.3333 0.3448
5 0.4615 0.4778
6 0.6250 0.6142
7 0.7647 0.7346
8 0.8000 0.8280

SSE 0.0028
TSS 0.5265
R2 0.9946
Summary

• Regression is a powerful data mining

technique
– It provides prediction
– It offers insight on the relative power of
each variable
• We have focused on the case of a
single independent variable
– What about the general case?

Logistic Regression
100% (3)
Logistic Regression
41 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
Logistic Regression
100% (1)
Logistic Regression
34 pages
Logistic Regression
100% (3)
Logistic Regression
30 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Saint Kabir's Pad & Dohe
100% (3)
Saint Kabir's Pad & Dohe
89 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
No Quarter 51 - New Monsternomicon
No ratings yet
No Quarter 51 - New Monsternomicon
10 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Basic Concepts of Logistic Regression
No ratings yet
Basic Concepts of Logistic Regression
5 pages
Lect7 Math231
No ratings yet
Lect7 Math231
29 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Logistic Regression
0% (1)
Logistic Regression
49 pages
Logistic
No ratings yet
Logistic
14 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Philosophical Paper
No ratings yet
Philosophical Paper
4 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Structural Irregularities 2
No ratings yet
Structural Irregularities 2
12 pages
Iso 13385-2 - 2011
No ratings yet
Iso 13385-2 - 2011
8 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
4 - C - Logistic Regression
No ratings yet
4 - C - Logistic Regression
13 pages
Logistic Regression: Multivariate Analysis
No ratings yet
Logistic Regression: Multivariate Analysis
29 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
No ratings yet
Capstone - Https:Users - Ox.ac - Uk: Jesu0073:Lecture 3:LogisticRegression
17 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regression: 30 March 2016
No ratings yet
Logistic Regression: 30 March 2016
49 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
18logistic Regression Yilma
No ratings yet
18logistic Regression Yilma
88 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Materi MT
No ratings yet
Materi MT
14 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Briefly Discuss The Concept of LR Analysis
No ratings yet
Briefly Discuss The Concept of LR Analysis
9 pages
Materi MT
No ratings yet
Materi MT
14 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Lecture 2.3.1
No ratings yet
Lecture 2.3.1
50 pages
Psii
No ratings yet
Psii
8 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Lecture 7 Logistic Regression
No ratings yet
Lecture 7 Logistic Regression
33 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Week6 1 GLM
No ratings yet
Week6 1 GLM
28 pages
Rotational Dynamics
No ratings yet
Rotational Dynamics
91 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Applied Maths 2000-2010
No ratings yet
Applied Maths 2000-2010
55 pages
Sexual Reproduction in Humans Notes
No ratings yet
Sexual Reproduction in Humans Notes
10 pages
B7 CREATIVE ARTS First-Term 2024 DEC EXAMS
No ratings yet
B7 CREATIVE ARTS First-Term 2024 DEC EXAMS
6 pages
ENG-Accessories For Operating tables-210210X52P-20200318-small
No ratings yet
ENG-Accessories For Operating tables-210210X52P-20200318-small
26 pages
9.intracellular Accumulations 1
No ratings yet
9.intracellular Accumulations 1
45 pages
PLAY - The Bean Game - Worksheet
No ratings yet
PLAY - The Bean Game - Worksheet
5 pages
Woodpecker Lx16
No ratings yet
Woodpecker Lx16
46 pages
Memory of The World (SCIFICT - REPORT)
No ratings yet
Memory of The World (SCIFICT - REPORT)
21 pages
Jagged Harts (Katelyn Taylor) (Z-Library)
No ratings yet
Jagged Harts (Katelyn Taylor) (Z-Library)
281 pages
Pinnacle DV500 User Guide
No ratings yet
Pinnacle DV500 User Guide
188 pages
Extraction of Caffeine
No ratings yet
Extraction of Caffeine
15 pages
OMGT Week 13 & 14 PPT (With EOQ Problem)
No ratings yet
OMGT Week 13 & 14 PPT (With EOQ Problem)
46 pages
032&58-CIR v. Marubeni Corp., December 18, 2001
No ratings yet
032&58-CIR v. Marubeni Corp., December 18, 2001
13 pages
Lab p2
No ratings yet
Lab p2
9 pages
Exam 01 Answer
No ratings yet
Exam 01 Answer
57 pages
Pengertian Narrative Text Kls 2
No ratings yet
Pengertian Narrative Text Kls 2
11 pages
The English Version Of Truy Ệ N Ki Ề U: ~ By Phan Huy MPH
No ratings yet
The English Version Of Truy Ệ N Ki Ề U: ~ By Phan Huy MPH
76 pages
2024 Acuvue Price List
No ratings yet
2024 Acuvue Price List
2 pages
Prinsip Dasar Teknik DNA Rekombinan
No ratings yet
Prinsip Dasar Teknik DNA Rekombinan
39 pages
Adaptive Quadrature - Revisited
No ratings yet
Adaptive Quadrature - Revisited
18 pages
Analisis Factorial 2 2 y 2 3
No ratings yet
Analisis Factorial 2 2 y 2 3
8 pages
Boiler Report
No ratings yet
Boiler Report
1 page
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
No ratings yet
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Log Reg

Uploaded by

Log Reg

Uploaded by

Regression

• A form of statistical modeling that

SSE   (yobserved  ypredicted )2

TSS  ( yobserved  yobserved ) 2

• The smaller the SSE, the better the

• Regression used to fit a curve to data

• The logit is the natural log of the odds

• logit(p) = ln(odds) = ln (p/(1-p))

• In logistic regression, we seek a model:

which gives p as a sigmoid function!

• When the response variable is

• A systems analyst studied

• Results from a standard package

• Suppose we want to compare

• In a study of the effectiveness

• Fitted response function

• Odds ratio: exp(1) =

• For each value of X, you may not have

Age Group Coronary Heart Disease

Age Group p(CHD)=1 odds log odds #occ

X ( AG) Y (log odds) X^2 XY #occ

Note: the sums reflect the number of occurrences

• Results from regression:

• Regression is a powerful data mining

You might also like