0% found this document useful (0 votes)
21 views32 pages

Week 1

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

Week 1

Uploaded by

Philip Owusu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Regression Analysis: STAT 367

By
Pels

Department of Statistics and Actuarial Science


KNUST

LECTURE SLIDES

January 24, 2024

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 1 / 31


COURSE OUTLINE
Week 1: Basic Concepts of Regression and Correlation Analysis
1 Course introduction, provision of course outline and recommended
textbooks
2 Types of variables in correlation and regression analysis
3 Properties of Pearson’s Correlation Coefficient
4 Quiz
5 Test of hypothesis about correlation coefficient.
6 Scatter diagrams.
Week 2:
1 Simple Linear Regression
2 Estimation of the parameters
3 Group Assignment
Week 3
1 Decomposition of the Sum of Squares
2 Analysis of Variance
3 Coefficient of determination
4 Inferential Analysis on regression parameters
5 Confidence intervals
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 2 / 31
COURSE OUTLINE
Week 4
1 Practicals using Excel and SPSS
Week 5:
1 Multiple Linear Regression
2 Statistical Inference
Week 6
1 Multiple Linear Regression-Assessing Collinearity
2 Model Building
Week 7: Midsem Exams
Week 8: Regression on Dummy variables
Week 9: Residual Analysis and Lack of fit test
Week 10: Model Diagnostics
Week 11
1 Practicals using R and STATA
Week 11: Group presentations
Week 12: Group presentations
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 3 / 31
COURSE OBJECTIVES

This course serves as an introduction to the art of statistical modelling for


phenomena that can be described by a linear function.
It is expected that after this course, the students should be able to apply
linear regression to model appropriate data using statistical package of
preference, carry out inferential procedures and properly interpret the
results.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 4 / 31


CORRELATION

Introduction
Finding the relationship between two quantitative variables without being
able to infer causal relationships.
Correlation is a statistical method used to determine whether a linear
relationship between variables exists.

Real life Examples of Correlation


Taller people have larger shoe size and shorter people have smaller
shoe size.
Marks obtained in a test has to do with the number of hours a
student spends studying.
The more time you spend jogging, the more calories you burn.
As the temperature goes up, pure water sales also go up

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 5 / 31


Real life Examples of Correlation
Higher levels of education have access to a wider range of job
prospects.
When companies increase their advertising and marketing activities, it
can lead to an increase in sales.
Higher-income individuals tend to have better access to healthcare
services, including insurance coverage, regular checkups, and medical
treatments.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 6 / 31


QUIZ

Give 5 Real life Examples of Correlation

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 6 / 31


CORRELATION

Two Types of Variables here!!!


Dependent = explained = regressand = endogenous = response =
outcome = uncontrolled
Independent = explanatory = regressor = exogenous = stimulus=
predictor= covariate= controlled

Note
Correlation does not imply causalty
Correlation does not distinguish between independent and dependent
variables.
The phrase "correlation does not imply causation" refers to the
inability to legitimately deduce a cause-and-effect relationship
between two events or variables solely on the basis of an observed
association or correlation between them.
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 7 / 31
What’s the difference between correlation and causation?
While causation and correlation can exist at the same time,
correlation does not imply causation. Causation explicitly applies to
cases where action A causes outcome B. On the other hand,
correlation is simply a relationship. Action A relates to Action B but
one event doesn’t necessarily cause the other event to happen.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 8 / 31


Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 9 / 31
CORRELATION

Properties of Pearson’s Correlation Coefficient


Measures the relative strength of the linear association between two
variables
Takes on the same sign as the slope estimate from the linear
regression
Not affected by linear transformations of y or x
Does not distinguish between dependent and independent variable(e.g
height and weight)
Population Parameter ρ
The population parameter ρ is estimated by r

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 10 / 31


CORRELATION
Pearson’s Correlation Coefficient
Cov (XY )
r= (1)
sd(X )sd(Y )
range: −1 ≤ r ≤ 1

(x − x̄ )(y − ȳ )
P

= sP n −s1 (2)
(x − x̄ )2 (y − ȳ )2
P

n−1 n−1
(x − x̄ )(y − ȳ )
P
= qP qP (3)
(x − x̄ )2 (y − ȳ )2

XY − nX̄ Ȳ
P
= qP q (4)
X 2 − nX̄ 2 Y 2 − nȲ 2
P

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 11 / 31


CORRELATION

Cont’d
Sxy
=p (5)
Sxx Syy

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 12 / 31


CORRELATION

Pearson’s Correlation Coefficient


Unit-less
Ranges between -1 and 1
the closer to -1, the stronger the negative linear relationship
the closer to +1, the stronger the positive linear relationship
When is 0, there is no linear relationship

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 13 / 31


Scatter Plot
A Scatter Plot is a mathematical diagram to display values of two
variables for a set of data. This plot helps in investigating the relationship
between two variables

Figure: Data

Figure: Plot
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 14 / 31
CORRELATION
Scatter plots of Data with Various Correlation Coefficients

Pels (KNUST)
Figure:
Regression Analysis: STAT 367 January 24, 2024 15 / 31
Interpretation of Correlation Coefficient
Strength
The greater the absolute value of the correlation coefficient, the stronger
the relationship.
The extreme values of -1 and 1 indicate a perfectly linear relationship
where a change in one variable is accompanied by a perfectly consistent
change in the other.
For these relationships, all of the data points fall on a line. In practice, you
won’t see either type of perfect relationship.
A coefficient of zero represents no linear relationship. As one variable
increases, there is no tendency in the other variable to either increase or
decrease.
When the value is in-between 0 and +1/-1, there is a relationship, but the
points don’t all fall on a line. As r approaches -1 or 1, the strength of the
relationship increases and the data points tend to fall closer to a line.
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 16 / 31
Interpretation of Correlation Coefficient

Direction
The sign of the correlation coefficient represents the direction of the
relationship.
1 Positive coefficients indicate that when the value of one variable
increases, the value of the other variable also tends to increase.
Positive relationships produce an upward slope on a scatterplot.
2 Negative coefficients represent cases when the value of one variable
increases, the value of the other variable tends to decrease. Negative
relationships produce a downward slope.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 17 / 31


CORRELATION

Example
Weight (Kg) 67 69 85 83 74 81 97 92 114 85
SBP (mHg) 120 125 140 160 130 180 150 140 200 130

Solution
XY − nX̄ Ȳ
P
r = qP q (6)
X 2 − nX̄ 2 Y 2 − nȲ 2
P

P
XY = 127325 X̄ = 84.7 Ȳ = 147.5
X 2 = 73495 Y 2 = 223525 Ȳ 2 = 21756.25
P P

X̄ 2 = 7174.09
r = 0.7398

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 18 / 31


CORRELATION
Cont’d

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 19 / 31


CORRELATION

cont’d
Nature of the scatter plot
What is the nature of the Scatter Plot?
Positive relationship
Negative relationship
No relationship

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 20 / 31


CORRELATION
cont’d

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 21 / 31


CORRELATION

Example 2
A sample of children was selected, data about their age in years and
weight in kilograms was recorded as shown below. It is required to find the
correlation between age and weight.
No. Age(years) Weight(kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 22 / 31


CORRELATION

Solution
Correlation Matrix
Age(years) Weight(kg)
Age(years) 1 0.75955452531275
Weight(kg) 0.75955452531275 1

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 23 / 31


GROUP EXERCISE

Calculate the Correlation coefficient of given data:

x 12 15 18 21 27
y 2 4 6 8 12

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 24 / 31


Solution

r = 0.84

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 25 / 31


CORRELATION

Distribution of the Correlation Coefficient


s
1 − r2
SE (r̂ ) = (7)
n−2
The sample correlation coefficient follows a T-distribution with n-2 degrees
of freedom(since you have to estimate the standard error)

Note:
Like a proportion, the variance of the correlation coefficient depends on
the correlation coefficient itself. substitute in estimated r.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 26 / 31


CORRELATION
Test of Significance of the Correlation Coefficient
The hypothesis is;
H0 : ρ = 0 vrs
H1 : ρ ̸= 0.
When the null hypothesis is accepted, it means that there is no linear
relationship between the variables.
r −0
t=s (8)
1 − r2
n−2
s
n−2
t=r (9)
1 − r2
we reject H0 if t > t ∗α
,n−2
2
Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 27 / 31
Trial questions

Olivia is studying for a test, and she wonders if her friend, Laney, is
also studying for the test. She calls Laney and asks her how long she
has been studying. Laney has been studying for her test all week,
approximately 8 hours total. Olivia has only been studying for her test
for a couple of hours. The next week, Olivia and Laney get their test
scores back. Laney got an A on her test, and Olivia got a C. Olivia
wonders if there is a correlation between the number of hours spent
studying and the grade a student earns. Take a look at the data Olivia
collected from her classmates, and see if you can find a correlation.
x 8 2 6 4 2
y 98 74 87 82 72

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 28 / 31


Trial questions

The following are the number of minutes it took 10 mechanics to


assemble a piece of machinery in the morning, x, and in the
afternoon,y.Calculate Pearson’s r .

x 11.1 10.3 12.0 15.1 13.7 18.5 17.3 14.2 14.8 15.2
y 10.9 14.2 13.8 21.5 13.2 21.1 16.4 19.3 17.4 19.0

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 29 / 31


Trial questions

In a study on the lowering of blood pressure, Sharon and Shiella


(1998) measured the systolic blood pressure (in mmHg) and the fall
in this blood pressure 30 minutes after giving the drug nifedipine. The
results were:
BP 180 200 215 235 190 205 200 210 220
Fall 30 49 75 42 65 59 50 69
a. Calculate the sample correlation coefficient between systolic blood
pressure and the fall in blood pressure 30 minutes after taking
nifedipine.
b. Test H0 : ρ = 0 against H1 : ρ ̸= 0. Use α = 0.05.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 30 / 31


Thank You.

Pels (KNUST) Regression Analysis: STAT 367 January 24, 2024 31 / 31

You might also like