0% found this document useful (0 votes)

5 views20 pages

L11 LinearRegression

The document discusses the Pfizer COVID-19 vaccine trial, detailing the enrollment of 43,548 participants and the calculation of vaccine efficacy using risk ratios and p-values. It also introduces linear regression as a method for modeling relationships between variables, specifically focusing on predicting a daughter's height based on her mother's height. The document outlines the least squares method for fitting a linear model and the probabilistic approach to linear regression using maximum likelihood estimation.

Uploaded by

Ed Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views20 pages

L11 LinearRegression

Uploaded by

Ed Z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Linear Regression

Foundations of Data Analysis

March 3, 2022
Pfizer COVID-19 Vaccine Trial

Pfizer enrolled 43,548 participants, half received the

vaccine, half received a placebo. 1

Of the 18,508 completing vaccination, 9 got COVID-19

Of the 18,435 completing placebo, 169 got COVID-19

Was the vaccine effective?

1
https://www.nejm.org/doi/full/10.1056/NEJMoa2034577
Pfizer COVID-19 Vaccine Trial

Risk ratio:
risk of COVID-19 with vaccine
RR =
risk of COVID-19 with placebo
9/18508
=
169/18435
≈ 0.053

Vaccine Efficacy = 1 − RR ≈ 0.947

Pfizer COVID-19 Vaccine Trial
Contingency table:

Vaccine Placebo
Positive 9 169
Negative 18,499 18,266

Using hypergeometric probability, p(k), the p-value is:

9
X
P(X ≤ 9) = p(k) < 2 × 10−16
k=0

This is the probability for this result, or better, by random

chance if the vaccine were not effective.
Algebra

Geometry Statistics
Is there a relationship between the heights of mothers
and their daughters?
If you know a mother’s height, can you predict her
daughter’s height with any accuracy?
Linear regression is a tool for answering these types of
questions.
It models the relationship as a straight line.
Regression Setup

When we are given real-valued data in pairs:

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) ∈ R2

Example:
xi is the height of the ith mother
yi is the height of the ith mother’s daughter
Linear Regression
Model the data as a line:

y
(xi , yi )

�i � yi = α + βxi + i
1
α : intercept
β : slope
�
i : error
x
Geometry: Least Squares
We want to fit a line as close to the data as possible,
which means we want to minimize the errors, i .
y
(xi , yi )

�i � yi = α + βxi + i
1
α : intercept
β : slope
�
i : error
x
Geometry: Least Squares

Taking the line equation: yi = α + βxi + i

Rearrange to get: i = yi − α − βxi

We want to minimize the sum-of-squared errors (SSE):

n
X n
X
SSE(α, β) = 2i = (yi − α − βxi )2
i=1 i=1
Least Squares: Step 1

Center the data by removing the mean:

ỹi = yi − ȳ
x̃i = xi − x̄
Pn Pn
Note: i=1 ỹi = 0 and i=1 x̃i =0

We’ll first get a solution: ỹ = α + βx̃, then shift it back to

the original (uncentered) data at the end
Least Squares: Step 2
Take derivative of SSE(α, β) wrt α and set to zero:
n
∂ ∂ X
0= SSE(α, β) = (ỹi − α − βx̃i )2
∂α ∂α
i=1
Xn
= −2 (ỹi − α − βx̃i )
i=1
Xn n
X
= −2 ỹi + 2nα + 2β x̃i
i=1 i=1
P P
Using ỹi = x̃i = 0, we get
α̂ = 0
Least Squares: Step 3

With α = 0, we are left with

ỹi = βx̃i + i

Or, in vector notation:

     
ỹ1 x̃1 1
 .  = β x̃.2  + .2 
ỹ2     
 ..   ..   .. 
ỹn x̃n n
Least Squares: Step 3

~
y
     
ỹ1 x̃1 1
�
 .  = β x̃.2  + .2 
ỹ2     
 ..   ..   ..  ~
x
ỹn x̃n n �~
x

2i = kk2 is projection!

P
Minimizing SSE(α, β) =
hx̃,ỹi
Solution is β̂ = kx̃k2
Shifting Back to Uncentered Data

So far, we have:
ỹi = β̂x̃i + i
Expanding out x̃i and ỹi gives

(yi − ȳ) = β̂(xi − x̄) + i

Rearranging gives

yi = (ȳ − β̂x̄) + β̂xi + i

So, for the uncentered data, α̂ = ȳ − β̂x̄

Probability: Maximum Likelihood
So far, we have only used geometry, but if our data is
random, shouldn’t we be talking about probability?

To make linear regression probabilistic, we model the

errors as Gaussian:

i ∼ N(0, σ 2 )

The likelihood is
n
2i

Y 1
L(α, β) = √ exp − 2
2πσ 2σ
i=1
Probability: Maximum Likelihood

The log-likelihood is then

n
1 X 2
log L(α, β) = − 2 i + const.
2σ
i=1

Maximizing this is equaivalent to minimizing SSE!

X
max log L = min 2i = min SSE

Tweed-Merrifield Sequential Directional Force Treatment
No ratings yet
Tweed-Merrifield Sequential Directional Force Treatment
14 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Post-Exam 2 Practice Questions - Solutions 18.05, Spring 2014 Confidence Intervals
No ratings yet
Post-Exam 2 Practice Questions - Solutions 18.05, Spring 2014 Confidence Intervals
7 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Medicinal Plants of The World - (PG 702 - 703)
No ratings yet
Medicinal Plants of The World - (PG 702 - 703)
2 pages
Ecg Monitoring
No ratings yet
Ecg Monitoring
26 pages
ReimbursementFormA B2013-Editable
No ratings yet
ReimbursementFormA B2013-Editable
6 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
Lectures HD
No ratings yet
Lectures HD
301 pages
Maxillo-Mandibular - Records - PPT (Compatibility M
No ratings yet
Maxillo-Mandibular - Records - PPT (Compatibility M
62 pages
Commodity
No ratings yet
Commodity
91 pages
CLP Regulation ANNEX 1 20220301
No ratings yet
CLP Regulation ANNEX 1 20220301
152 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
Lehman GPR
No ratings yet
Lehman GPR
68 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Revised Quotation Invitation For UPS For The Year 2017-18
No ratings yet
Revised Quotation Invitation For UPS For The Year 2017-18
67 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
663 Detecting 2021
No ratings yet
663 Detecting 2021
78 pages
Session 08 (Predicting)
No ratings yet
Session 08 (Predicting)
54 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Ch2 Linear Regression Analysis
No ratings yet
Ch2 Linear Regression Analysis
57 pages
Commodity February 11 2005
No ratings yet
Commodity February 11 2005
57 pages
All Chapter Download Test Bank For Understanding The Essentials of Critical Care Nursing, 2nd Edition, Kathleen Ouimet Perrin, Carrie Edgerly MacLeod
100% (10)
All Chapter Download Test Bank For Understanding The Essentials of Critical Care Nursing, 2nd Edition, Kathleen Ouimet Perrin, Carrie Edgerly MacLeod
44 pages
Session CLRM Review 1
No ratings yet
Session CLRM Review 1
47 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Least Squares Method
No ratings yet
Least Squares Method
36 pages
31 Least Squares
No ratings yet
31 Least Squares
39 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
Pertemuan 3
No ratings yet
Pertemuan 3
23 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
L06 Vectors
No ratings yet
L06 Vectors
26 pages
Regression 2
No ratings yet
Regression 2
27 pages
Open Questions 2
No ratings yet
Open Questions 2
26 pages
Chance Constr
No ratings yet
Chance Constr
22 pages
Lectureslides Chap6-Annot PDF
No ratings yet
Lectureslides Chap6-Annot PDF
30 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Basic Econometrics Health
No ratings yet
Basic Econometrics Health
183 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
L18 Backprop
No ratings yet
L18 Backprop
18 pages
Regression Analysis With Scilab
No ratings yet
Regression Analysis With Scilab
57 pages
Cycling and Knees - ESL Lesson Plan - Breaking News English Lesson
No ratings yet
Cycling and Knees - ESL Lesson Plan - Breaking News English Lesson
20 pages
Ethnobotanical Konta
No ratings yet
Ethnobotanical Konta
15 pages
Math170S Lecture6
No ratings yet
Math170S Lecture6
13 pages
Referat
No ratings yet
Referat
21 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
663 Topics
No ratings yet
663 Topics
12 pages
Borja, Monica - Emr
No ratings yet
Borja, Monica - Emr
45 pages
CH 2
No ratings yet
CH 2
31 pages
122 132 - PROSIDING SEMNAS SVK 8 berISSN
No ratings yet
122 132 - PROSIDING SEMNAS SVK 8 berISSN
11 pages
Article 6jg6
No ratings yet
Article 6jg6
10 pages
Introduction Linear Regression 2015
No ratings yet
Introduction Linear Regression 2015
9 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Lecture 1: Optimal Prediction (With Refreshers) : 36-401, Fall 2017 Sunday 3 September, 2017
No ratings yet
Lecture 1: Optimal Prediction (With Refreshers) : 36-401, Fall 2017 Sunday 3 September, 2017
13 pages
Commodity Misperceptions January 30 2017
No ratings yet
Commodity Misperceptions January 30 2017
9 pages
Quantitative Methods in Economics Linear Predictor: Maximilian Kasy
No ratings yet
Quantitative Methods in Economics Linear Predictor: Maximilian Kasy
19 pages
Lec 12
No ratings yet
Lec 12
9 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Connor Sensible Return Forecasting 1997
No ratings yet
Connor Sensible Return Forecasting 1997
8 pages
25 Best Hospitals & Clinics in Fujairah - Top Ratings (2025 Reviews)
No ratings yet
25 Best Hospitals & Clinics in Fujairah - Top Ratings (2025 Reviews)
7 pages
Best Linear Predictor
No ratings yet
Best Linear Predictor
15 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Analysis in Otoplasty: Daniel G. Becker, MD, Facs, Stephen S. Lai, MD, PHD, Ioana Schipor, MD, Samuel S. Becker, MD
No ratings yet
Analysis in Otoplasty: Daniel G. Becker, MD, Facs, Stephen S. Lai, MD, PHD, Ioana Schipor, MD, Samuel S. Becker, MD
9 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
Typical Layout of Different Buildings & Their Components
No ratings yet
Typical Layout of Different Buildings & Their Components
5 pages
774 1245 1 SM
No ratings yet
774 1245 1 SM
7 pages
Primarydiagnost Philips Directo 2
No ratings yet
Primarydiagnost Philips Directo 2
4 pages
CHN Fhsis
No ratings yet
CHN Fhsis
2 pages
History of Medicine in Islamic Civilization Abou+
No ratings yet
History of Medicine in Islamic Civilization Abou+
4 pages
Statistical Methods
No ratings yet
Statistical Methods
7 pages
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice For Midterm Exam # 1
8 pages
Stability Chart
No ratings yet
Stability Chart
9 pages
1-AM-12 Patient Monitor
No ratings yet
1-AM-12 Patient Monitor
4 pages
Linear Regression Generalized Expressions
No ratings yet
Linear Regression Generalized Expressions
3 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Lesson 8 Rizal's Life As An Exile in Dapitan
No ratings yet
Lesson 8 Rizal's Life As An Exile in Dapitan
3 pages
SWS Kenya Certificate
No ratings yet
SWS Kenya Certificate
2 pages
Health Document
No ratings yet
Health Document
2 pages
Soal Bahasa Inggris Kelas 6
No ratings yet
Soal Bahasa Inggris Kelas 6
2 pages
Special Access List of Covid-19 Test Kit (For Professional Use Only)
No ratings yet
Special Access List of Covid-19 Test Kit (For Professional Use Only)
2 pages
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
No ratings yet
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
3 pages
Narrative Report Day 2
No ratings yet
Narrative Report Day 2
2 pages
Parameter Estimation 1: Linear Least Squares
No ratings yet
Parameter Estimation 1: Linear Least Squares
7 pages
36-325/725 Homework 1 Due Thursday Aug 29 (1) Chapter 2 Problem 1
No ratings yet
36-325/725 Homework 1 Due Thursday Aug 29 (1) Chapter 2 Problem 1
1 page
Homework 5
No ratings yet
Homework 5
1 page
Probability and Mathematical Statistics I: Lectures Instructor Office Extension Email Web-Site Text
No ratings yet
Probability and Mathematical Statistics I: Lectures Instructor Office Extension Email Web-Site Text
3 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Twelopele Afrika First Aid Box Checklist
No ratings yet
Twelopele Afrika First Aid Box Checklist
2 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

L11 LinearRegression

Uploaded by

L11 LinearRegression

Uploaded by

Linear Regression

Foundations of Data Analysis

Pfizer enrolled 43,548 participants, half received the

Of the 18,508 completing vaccination, 9 got COVID-19

Was the vaccine effective?

Vaccine Efficacy = 1 − RR ≈ 0.947

Using hypergeometric probability, p(k), the p-value is:

This is the probability for this result, or better, by random

When we are given real-valued data in pairs:

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) ∈ R2

Taking the line equation: yi = α + βxi + i

Rearrange to get: i = yi − α − βxi

We want to minimize the sum-of-squared errors (SSE):

Center the data by removing the mean:

We’ll first get a solution: ỹ = α + βx̃, then shift it back to

With α = 0, we are left with

Or, in vector notation:

2i = kk2 is projection!

(yi − ȳ) = β̂(xi − x̄) + i

yi = (ȳ − β̂x̄) + β̂xi + i

So, for the uncentered data, α̂ = ȳ − β̂x̄

To make linear regression probabilistic, we model the

The log-likelihood is then

Maximizing this is equaivalent to minimizing SSE!

You might also like

Taking the line equation: yi = α + βxi + i

Rearrange to get: i = yi − α − βxi

2i = kk2 is projection!

(yi − ȳ) = β̂(xi − x̄) + i

yi = (ȳ − β̂x̄) + β̂xi + i