0% found this document useful (0 votes)

19 views32 pages

Categorical Slide2024

Categorical

Uploaded by

mereninnas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views32 pages

Categorical Slide2024

Categorical

Uploaded by

mereninnas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Simple Linear Regression

• Simple linear regression analysis involves only two variables,

say X and Y.

• The data for such relationships are provided as pairs of

observation for X and Y as: (X1,Y1) (X2, Y2) ….(Xn,Yn)

• Suppose that we are interested in two quantitative variables, say

X and Y of a population in such a way that one of them
influences the other.

• The mathematical relationship between X and Y is described by:

Y=a+βX

• where a is a constant and β is a non-zero real number.

• The two quantities, a and β are called regression coefficients.
The Method of least square
► The values ‘a’ and ‘b’ in the equation are constants, i.e., their
values are fixed.

► The constant ‘a’ indicates the value of y when x=0. It is also

called the y intercept.

► The value of ‘b’ shows the slope of the regression line and gives
us a measure of the change in y for a unit change in x.

► This slope (b) is frequently termed as the regression coefficient

of Y on X.

► If we know the values of ‘a’ and ‘b’, we can easily compute the
value of Ŷ for any given value of X.
OLS estimates of coefficients
Based on the least squares estimation, the coefficients of
the estimated regression line y= a + bx are given by:

n n n n
 (x i  x)(yi  y)  x i y i  ( x i )( y i ) n
b  i 1 n  i 1 n i 1
n
i 1

 (x  x) 2
 x i  ( x i ) 2 /n
2
i 1 i 1 i 1

a  y  bx

Advanced Biostatistics by: Yasin 3

SLR-example
Heights of 10 fathers (X) together with their oldest sons (Y)
are given below (in inches). Find the regression of Y on X.

Father (X) oldest son (Y) product (XY) X²

63 65 4095 3969
64 67 4288 4096
70 69 4830 4900
72 70 5040 5184
65 64 4160 4225
67 68 4556 4489
68 71 4828 4624
66 63 4158 4356
70 70 4900 4900
71 72 5112 5041

Total 676 679 45967 45784

SLR-example
a =Y - b X

n XY   X  Y  ( X  X )(Y  Y )
b= n  X 2  ( X ) 2 = (X  X ) 2

10(45967)  (676x 679) 459670 459004 666

b= 10( 45784)  (676) 2
= 457840 456976 = 864 = 0.77

679 676
a= 10
- 0.77 ( 10
) = 67.9 – 52.05 = 15.85

Therefore, Ŷ = 15.85 + 0.77 X

The regression coefficient of Y on X (i.e., 0.77) tells us the change in Y due to a unit change in X.
SLR-example
Estimate the height of the oldest son for a father’s height of
70 inches.

Ŷ = 15.85 + 0.77 (70) = 69.75 inches

NB: 1) n is the number of pairs of X and Y scores

which are used in determining the regression line.
In the above example, n=10.

2) Be careful to distinguish between (ΣX)² and Σχ².

Assumptions
The assumptions made when using this method are:

♣ The relationship between the outcome and the

explanatory variable is linear or at least approximately
linear;

♣ At each value of the explanatory variable the outcomes

follow a normal distribution;

♣ The variance of the outcome is constant for all values of

the explanatory variable.
Assumptions of linear regression

*
*
Assumption 1 **
*
*
*
*
Linear relationship ** * *
Assumption 2 **
**
*
*
*

Y normally distributed **
**
*

at each value of x
Assumption 3
Same variance at each value of x

8
Checking Assumptions:
Assumption 1: linear relationship
Plot y against x to check for linearity

9
Checking Assumptions:
Assumption 2: Normality

Histogram of residuals
Dependent variable BMI
Normal P-P Plot of Standardized Residual

1.0

0.8

Expected Cum Prob

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0

Observed Cum Prob

10
Checking Assumptions:
Assumption 3: Spread of y values constant over range of x values
(plot of residuals against X)

11
Exercise
Suppose we have the following dataset with the weight
and height of seven individuals.

Let weight be the predictor variable and height be the

response variable. Then
a. Fit the regression equation
b. Interpret the result
c. Predict the weight of individual for height of 82 inches
Layout

• Confidence interval estimation of Bo and B1

• Hypothesis testing

• Correlation coefficient

• Coefficient of determination
Interval estimation of the regression parameters
Where
Example

Construct 95% confidence intervals for B1 and B0

SSE=Syy- 𝞫1 Sxy = 32.1-1.15*23 =5.65

𝑆𝑆𝐸 5.65
The estimate for 𝞼𝞮 2 = = =0.84
𝑛−2 8
• Cc
Hypothesis testing
Correlation Analysis
• Correlation is the method of analysis to use when
studying the possible association between two
continuous variables

• The standard method (Pearson Correlation) leads to

a quantity called r that can take on any value from -
1 to +1

• The correlation coefficient r measures the degree of

'straight-line' association between the values of
two variables
Correlation Analysis ..Cont’d
• The correlation between two variables is
positive if
– higher values of one variable are associated with
higher values of the other and
• negative if
– one variable tends to be lower as the other gets
higher
• A correlation of around zero indicates that
there is no linear relation between the values
of the two variables
Fig.1: Systolic Blood Pressure against Age
If we have two variables X and Y, the correlation
between them denoted by r(X, Y) is given by:

 (xi  x )(yi  y)  xy
r 
 i   i  x y
2 2 2 2
(x x ) (y y )
 XY  [  X  Y ] / n

[  X 2  (  X ) 2 / n][  Y 2  (  Y ) 2 / n]

where xi and yi are the values of X and Y for the ith individual

The equation is clearly symmetric as it does not matter which

variable is X and which is Y
Pearson’s r Correlation
• As a rule of thumb, the following guidelines on
strength of relationship are often useful (though
many experts would somewhat disagree on the
choice of boundaries).
Correlation value Interpretation
 0.70 or higher Strong relationship
 0.39 to 0.69 Moderate relationship
 0.20 to 0.39 Moderate relationship
 0.01 to 0.19 No or negligible relationship
Coefficient of determination (R2)
• The coefficient of determination (R ²) measures how
well a statistical model predicts an outcome.
• The outcome is represented by the model’s dependent
variable.
• The lowest possible value of R ² is 0 and the highest
possible value is 1.

• It determines how the independent variable explains

the dependent variable
Example: The following data shows the respective weight of a sample
of 12 fathers and their oldest son. Compute the correlation coefficient
between the two weight measurements
Wt of father – X Wt of son – Y
X2 Y2 XY
65 68 4225 4624 4420
63 66 3969 4356 4158
67 68 4489 4624 4556
64 65 4096 4225 4160
68 69 4624 4761 4692
62 66 3844 4356 4092
70 68 4900 4624 4760
66 65 4356 4225 4290
68 71 4624 5041 4828
67 67 4489 4489 4489
69 68 4761 4624 4692
71 70 5041 4900 4970
Scatter Plot
Scatter plot of father's by son's weight

72
71
70
69
68
67
66
65
Y

64
60 62 64 66 68 70 72
X
Calculating r
The correlation coefficient for the data on fathers’ and
sons’ will be:
Basic values from the data
 X  800,  X  53,418, Y  811, Y  54,849,  XY  54,107
2 2

 (x - x )(y  y)   xy  ( x )( y)/n  54,107  (800 811)/12  40.33

2 2 2 2
 ( x  x)   x  ( x) / n  53,418  (800) / 12  84.67
2 2 2 2
 ( y  y )   y  ( y ) / n  54,849  (811) / 12  38.92
Calculating r
40.33
r  0.703
(84.67)(38.92)
Exercise
• Given the following table about the relationship between
corporate bond expected return (Y) per year and its potential
risk (X), both in percentage term

a. Write the fitted sample regression line

b. Interpret the results obtained from the linear relationship?
c. Compute the value of r and 𝑟 2 then discuss what it implies
d. Compute the variance and standard error of 𝞫𝟎 and 𝞫𝟏
e. Construct a 95% confidence interval
f. Test the hypothesis that H0 : 𝞫 =0 against the alternative Ha :
𝞫 ≠ 0 at 5% significance level using both t-test and
Confidence internal approach

REGULA - FALSI METHOD Notes
0% (1)
REGULA - FALSI METHOD Notes
14 pages
Session 17
No ratings yet
Session 17
23 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Correlation Regression
100% (1)
Correlation Regression
7 pages
Probability Distributions and Curve Fitting
No ratings yet
Probability Distributions and Curve Fitting
53 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
Regression For Students
No ratings yet
Regression For Students
26 pages
Business Statistics Session 17: Simple Correlation and Regression
No ratings yet
Business Statistics Session 17: Simple Correlation and Regression
24 pages
Regression and Correlation 1
No ratings yet
Regression and Correlation 1
13 pages
Regression Analysis: Basic Statistics
No ratings yet
Regression Analysis: Basic Statistics
26 pages
Chapter 13 PowerPoint
No ratings yet
Chapter 13 PowerPoint
36 pages
Regression 9
No ratings yet
Regression 9
20 pages
CO 4 Session 34 Linear Regression and Its Applications
No ratings yet
CO 4 Session 34 Linear Regression and Its Applications
21 pages
Module 11 Linear Regression and Correlation
No ratings yet
Module 11 Linear Regression and Correlation
20 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
LESSON 3FINALS Linear Regression and Correlation
No ratings yet
LESSON 3FINALS Linear Regression and Correlation
8 pages
Bio2 Module 1 - Simple Linear Regression and Correlation
No ratings yet
Bio2 Module 1 - Simple Linear Regression and Correlation
20 pages
Lesson 3: Linear Regression and Correlations: Learning Objectives
No ratings yet
Lesson 3: Linear Regression and Correlations: Learning Objectives
8 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
32 pages
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
No ratings yet
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
15 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Screenshot 2025-02-21 at 4.32.06 AM
No ratings yet
Screenshot 2025-02-21 at 4.32.06 AM
47 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
Antwerpen2014sessie5 (Regression)
No ratings yet
Antwerpen2014sessie5 (Regression)
42 pages
Regression (Autosaved) (Autosaved)
No ratings yet
Regression (Autosaved) (Autosaved)
80 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
4-Biol 605-Regression Models
No ratings yet
4-Biol 605-Regression Models
25 pages
Unit 07 Regression Correlation
No ratings yet
Unit 07 Regression Correlation
36 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
14 pages
Stats and Maths
No ratings yet
Stats and Maths
29 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Sec D CH 12 Regression Part 2
100% (1)
Sec D CH 12 Regression Part 2
66 pages
UCBBSC0204 Unit 1 Linear Statistical Models 1
No ratings yet
UCBBSC0204 Unit 1 Linear Statistical Models 1
26 pages
Lecture 3.1.9 (REGRESSION)
No ratings yet
Lecture 3.1.9 (REGRESSION)
9 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
Simple Linear Regression and Its Properties 82
No ratings yet
Simple Linear Regression and Its Properties 82
8 pages
Sample Problems. EDA Final Report
No ratings yet
Sample Problems. EDA Final Report
26 pages
Chapter7
No ratings yet
Chapter7
52 pages
Unit-2 Numericals
No ratings yet
Unit-2 Numericals
17 pages
STAT1
No ratings yet
STAT1
17 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Chapter 8
No ratings yet
Chapter 8
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Introduction To Regression
No ratings yet
Introduction To Regression
13 pages
Title: Regression and Correlation: Mathematics Support Centre
No ratings yet
Title: Regression and Correlation: Mathematics Support Centre
2 pages
Unit 2-Part 3-Linear Regression
No ratings yet
Unit 2-Part 3-Linear Regression
38 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Regression Bhowal, Barua
No ratings yet
Regression Bhowal, Barua
12 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
Regression
No ratings yet
Regression
66 pages
Regression
No ratings yet
Regression
9 pages
Chapter Five Regression
No ratings yet
Chapter Five Regression
12 pages
Chapter 7 C
No ratings yet
Chapter 7 C
27 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Simple Linear Regression Problem
No ratings yet
Simple Linear Regression Problem
4 pages
DoE Lecture
100% (1)
DoE Lecture
315 pages
Trickling Filter Design
0% (1)
Trickling Filter Design
5 pages
Network Analysis
100% (1)
Network Analysis
28 pages
View Answer / Hide Answer
No ratings yet
View Answer / Hide Answer
90 pages
Introduction To Data Structures: Mansi A. Radke
No ratings yet
Introduction To Data Structures: Mansi A. Radke
217 pages
Link Prediction Thesis
100% (2)
Link Prediction Thesis
5 pages
Basic of PID Control
No ratings yet
Basic of PID Control
9 pages
Beyond Syllabus
No ratings yet
Beyond Syllabus
21 pages
CBTP Phase Two Document JHG
No ratings yet
CBTP Phase Two Document JHG
15 pages
Information Theory For Complex Systems Scientists
No ratings yet
Information Theory For Complex Systems Scientists
112 pages
SDXL Diffusion Model Training - Style & Objects
No ratings yet
SDXL Diffusion Model Training - Style & Objects
49 pages
ECON2112 Study Notes
No ratings yet
ECON2112 Study Notes
22 pages
Chapter 03 - Random Variables
No ratings yet
Chapter 03 - Random Variables
14 pages
Data Avengers PAP Analytics Course Brochure
No ratings yet
Data Avengers PAP Analytics Course Brochure
14 pages
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern Instant Download
No ratings yet
An Introduction To Mathematical Statistics Fetsje Bijma Marianne Jonker Aad Vaart Stichting Epsilon Uitgaven Reinie Ern Instant Download
83 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
(Ebook) Numerical Methods For Roots of Polynomials - Part Ii by J. M. Mcnamee, V. Y. Pan Isbn 9780444527301, 0444527303
No ratings yet
(Ebook) Numerical Methods For Roots of Polynomials - Part Ii by J. M. Mcnamee, V. Y. Pan Isbn 9780444527301, 0444527303
48 pages
Multinomial & Ordinal LR Possion1
No ratings yet
Multinomial & Ordinal LR Possion1
63 pages
Numerical Opt Models
No ratings yet
Numerical Opt Models
8 pages
Goal Seek in Excel (In Easy Steps)
No ratings yet
Goal Seek in Excel (In Easy Steps)
6 pages
A - Circular Distance: Problem Statement
No ratings yet
A - Circular Distance: Problem Statement
15 pages
Practical Attachement Final Report
No ratings yet
Practical Attachement Final Report
19 pages
PA Last
No ratings yet
PA Last
23 pages
What Is Recursion?: Five Main Recursion Methods
No ratings yet
What Is Recursion?: Five Main Recursion Methods
10 pages
Workshop Energy System Integration: 20 May 2016, University College Dublin
No ratings yet
Workshop Energy System Integration: 20 May 2016, University College Dublin
25 pages
Quantum Computing A Tool For Zero Trust Wireless Networks
No ratings yet
Quantum Computing A Tool For Zero Trust Wireless Networks
9 pages
Uma035 5
No ratings yet
Uma035 5
2 pages
Q Learning
No ratings yet
Q Learning
9 pages
A Preliminary Study On Accelerating Simulation Optimization With GPU Implementation
No ratings yet
A Preliminary Study On Accelerating Simulation Optimization With GPU Implementation
15 pages
EE2005 Problem 07
No ratings yet
EE2005 Problem 07
11 pages
Homework Week 4 Array Based Sequence
No ratings yet
Homework Week 4 Array Based Sequence
3 pages
CH6605 Process Instrumentation, Dynamics and Control
No ratings yet
CH6605 Process Instrumentation, Dynamics and Control
16 pages
State Variable Project
No ratings yet
State Variable Project
17 pages
Econometrics III, Summary
No ratings yet
Econometrics III, Summary
7 pages
Lol PDF
No ratings yet
Lol PDF
5 pages
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Categorical Slide2024

Uploaded by

Categorical Slide2024

Uploaded by

Simple Linear Regression

• Simple linear regression analysis involves only two variables,

• The data for such relationships are provided as pairs of

• Suppose that we are interested in two quantitative variables, say

• The mathematical relationship between X and Y is described by:

• where a is a constant and β is a non-zero real number.

► The constant ‘a’ indicates the value of y when x=0. It is also

► This slope (b) is frequently termed as the regression coefficient

Advanced Biostatistics by: Yasin 3

Father (X) oldest son (Y) product (XY) X²

Total 676 679 45967 45784

10(45967)  (676x 679) 459670 459004 666

Therefore, Ŷ = 15.85 + 0.77 X

Ŷ = 15.85 + 0.77 (70) = 69.75 inches

NB: 1) n is the number of pairs of X and Y scores

2) Be careful to distinguish between (ΣX)² and Σχ².

♣ The relationship between the outcome and the

♣ At each value of the explanatory variable the outcomes

♣ The variance of the outcome is constant for all values of

Expected Cum Prob

Observed Cum Prob

Let weight be the predictor variable and height be the

• Confidence interval estimation of Bo and B1

Construct 95% confidence intervals for B1 and B0

SSE=Syy- 𝞫1 Sxy = 32.1-1.15*23 =5.65

• The standard method (Pearson Correlation) leads to

• The correlation coefficient r measures the degree of

The equation is clearly symmetric as it does not matter which

• It determines how the independent variable explains

 (x - x )(y  y)   xy  ( x )( y)/n  54,107  (800 811)/12  40.33

a. Write the fitted sample regression line

You might also like