0% found this document useful (0 votes)

30 views25 pages

ANALYTICAL TECHNIQUES LU4 Lecture Notes

This learning unit covers correlation and simple linear regression, focusing on identifying dependent and independent variables, interpreting scatterplots, calculating Pearson's correlation coefficient, and using regression models for predictions. Students will learn to perform these analyses using computational formulas and calculators, and understand the significance of the coefficient of determination. The unit emphasizes the importance of correctly identifying variable roles and the implications of linear relationships in data analysis.

Uploaded by

chisangachama18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views25 pages

ANALYTICAL TECHNIQUES LU4 Lecture Notes

Uploaded by

chisangachama18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

LEARNING UNIT 4: Correlation and Simple Linear Regression

Learning objectives

• Distinguish between the dependent and independent variables

• Identify the type of relationship in a bivariate dataset by using a scatterplot

• Calculate and interpret Pearson’s correlation coefficient and the coefficient of determination

• Calculate and interpret regression coefficients of the regression model

• Use the regression model to predict the value of the dependent variable for valid values of the
independent variable

Textbook reference
• Chapter 12
o §12.1 – §12.4
o Exclude §12.5 – §12.7

ATE01A1 – LU 4 1
INTRODUCTION

Researchers often investigate the nature of the relationship between numerical variables to
see what kind of relationship exists, if it does, and how strong it is. This relationship can be
modelled mathematically and used for prediction purposes. This is done through correlation
analysis, linear regression analysis, and the coefficient of determination.

Students must be able to:

• perform and interpret these analyses from raw data using computational formulae and
by using the calculator,

• by interpreting computer output.

ATE01A1 – LU 4 2
THE ROLES OF THE VARIABLES

For prediction purposes, it is important to correctly identify the roles that the variables have
in the relationship.

• Variable Y is called the dependent variable, as its value depends on the values of one or
more other variables.

• Variable X is called the independent variable as it impacts the value of the dependent
variable.

For example, a company’s sales for a particular product may depend on how much the
company spends on advertising. Therefore, sales is the dependent variable, and
advertising expenditure is the independent variable.

ATE01A1 – LU 4 3
Exercise 4.1

Give two more examples of dependent and independent variables.

ATE01A1 – LU 4 4
THE SCATTERPLOT

The scatterplot is a visual representation of the relationship between two numerical

variables. The independent variable is plotted on the x-axis and the dependent variable on
the y-axis. The nature of the relationship could be linear (positive or negative), non-linear or
non-existent.

Positive linear Negative linear

Non-linear Non-existent

ATE01A1 – LU 4 5
CORRELATION ANALYSIS

The correlation coefficient is a numerical measure that quantifies the strength of the
relationship between two variables.

A number of different correlation coefficients exist and depend on the nature of the
variables. The most commonly used correlation coefficient is Pearson’s Product Moment
Correlation Coefficient, denoted by r, which is a value between −1 and +1 (inclusive).

The computational formula for Pearson’s correlation is:

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

ATE01A1 – LU 4 6
• The roles of the variables are interchangeable and do not influence the value of r.
Therefore, if X is correlated with Y, then Y is correlated with X.

• Note that r is very sensitive to outliers.

• Two variables are positively correlated when large values of the one variable are
associated with large values of the other variable and small values of the one are
associated with small values of the other.

• Two variables are negatively correlated when large values of the one are associated
with small values of the other and vice versa.

• The closer the value of r is to 0, the weaker the linear relationship between the variables.

• When r = 0, there is no linear relationship between two variables, but it is still possible
that a non-linear relationship between the variables exists.

ATE01A1 – LU 4 7
The value of r is interpreted as follows:

r = −1 Perfect negative linear relationship

r close to −1 Strong negative linear relationship
r far from −1 and far from 0 Moderate negative linear relationship
r negative and relatively close to 0 Weak negative linear relationship
r negative and very close to 0 No linear relationship
r=0 No linear relationship
r positive and very close to 0 No linear relationship
r positive and relatively close to 0 Weak positive linear relationship
r far from 0 and far from +1 Moderate positive linear relationship
r close to +1 Strong positive linear relationship
r = +1 Perfect positive linear relationship

ATE01A1 – LU 4 8
Steps to find the correlation coefficient using the calculator

1) SETUP → down arrow → 3:STAT → 2:OFF

2) MODE → 2:STAT → 2:A + BX

3) Enter the independent variable values in column X and the dependent variable values in
column Y

4) AC

5) SHIFT STAT → 5:REG → 3:r → =

ATE01A1 – LU 4 9
SIMPLE LINEAR REGRESSION ANALYSIS

Linear regression analysis is a technique used to mathematically model the relationship

between two variables for the purpose of prediction.

• Specifically, one or more independent variables are used to predict the value of a single
numerical dependent variable.

• Simple linear regression has only one numerical independent variable.

• Multiple linear regression has multiple independent variables (categorical or numerical)

(this will be covered in ATE B).

• The main idea of simple linear regression is to fit a straight line through the data to
capture or describe the relationship with a simplistic mathematical model.

ATE01A1 – LU 4 10
If a linear relationship exists, then each pairwise observation (x, y) can be written in the
form 𝑦 = 𝑎 + 𝑏𝑥 + 𝑒, i.e., the value of the straight line plus the deviation from the line (e),
also referred to as the error term.

• The equation obtained to describe the linear relationship between the variables X and Y
is called the least squares regression equation/model of Y on X.

• The least squares method ensures that the sum of the squared distances between the
points on the scatterplot and the straight line is a minimum, i.e., the combined error is a
minimum.

• The resulting line is referred to as the line of best fit. This is the straight line that is
closest to all points simultaneously. This does not imply that the line is necessarily good;
it simply means it is the best of all possible lines.

ATE01A1 – LU 4 11
• The equation of a straight line is 𝑦ො = 𝑎 + 𝑏𝑥, where b denotes the slope (gradient) of the

line, 𝑎 denotes the point on the y-axis through which the line passes (y-intercept), and
𝑦ො is the predicted value of Y on the straight line.
• The values a and b are the regression coefficients, estimated through the method of
least squares, and are calculated as follows:

𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦 σ 𝑦−𝑏 σ 𝑥
𝑏= 𝑎=
𝑛 σ 𝑥2− σ 𝑥 2 𝑛

• Note that the roles of the variables are not interchangeable in these formulae, so it is
very important to correctly identify the roles of the variables for regression analysis.

ATE01A1 – LU 4 12
• The regression equation/model is used to predict the value of the dependent variable for
a given value of the independent variable.

• The x-values used to make predictions should fall within the range of the observed
values of X, as there is no guarantee that the regression model still applies to x-values
outside of the observed range. This is referred to as interpolation and will yield a valid
prediction.

• Values outside the observed range of X are not valid and will lead to extrapolation of the
model.

ATE01A1 – LU 4 13
• A positive value for b means that, on average, the values of Y will increase as the values of
X increase, indicating a positive (direct) relationship between X and Y. 𝑦ො = 3 + 2𝑥

• A negative value for b means that, on average, the values of Y will decrease as the values of
X increase, indicating a negative (inverse) relationship between X and Y. 𝑦ො = 3 − 2𝑥

• It, therefore, follows that the sign of the slope corresponds to the sign of the correlation
coefficient, although the actual numerical values of r and b are not related.

• The slope of the regression line shows the predicted change in the dependent variable for a
1-unit change in the independent variable.

Steps to find the regression coefficient using the calculator

1) Same steps as for the correlation coefficient (steps 1-4)

2) SHIFT STAT → 5:REG → 1:A (intercept) or 2:B (slope) → =

ATE01A1 – LU 4 14
COEFFICIENT OF DETERMINATION

The coefficient of determination provides a measure of how well the regression line fits the
data, i.e. a goodness-of-fit measure.

• It is the square of the correlation coefficient, denoted by r2.

• It is typically expressed as a percentage and interpreted as the percentage of variation

in the dependent variable that can be explained by the regression model.

• If all data points lie directly on the regression line in the positive direction, there is no
unexplained variation. For such data, the correlation coefficient r = 1; therefore
𝑟 2 × 100 = 12 × 100 = 100%. That is, 100% of the variation in Y is accounted for by the
variation in X in the regression model.

ATE01A1 – LU 4 15
• On the other hand, if the points are so scattered that none of the variation can be
explained by the regression model, in other words r = 0, it follows that
𝑟 2 × 100 = 02 × 100 = 0%, indicating that none of the variation in Y is accounted for by
the variation in X in the regression model.

• The value of r2 will always be between 0 and 1 (or between 0% and 100% when
expressed as a percentage) and does not indicate the direction of the linear relationship,
but rather the goodness-of-fit of the regression model.

• Also, if 𝑟 2 = 80%, then 20% of the variation in Y is unexplained by the regression model.

ATE01A1 – LU 4 16
Exercise 4.2

A sample of 10 athletes took a series of tests to measure their fitness. Their overall fitness
scores ranged from 1 to 3.5. After these tests, the athletes ran a standard marathon, and
their times were measured in hours.

1) Identify the dependent and the independent variables:

Dependent =

Independent =

ATE01A1 – LU 4 17
2) Describe the nature of the relationship between the athletes’ fitness and their marathon
times based on the following scatterplot.

ATE01A1 – LU 4 18
3) The following table shows the raw data for the 10 athletes:

Fitness score 2.4 1.7 2.8 2.8 3.5 2 1.8 2.5 2.2 1

Marathon time 5.7 7.4 5.6 5.2 3.2 9 9.3 6.5 6.5 10.6

Use your calculator to find the following values:

Correlation =

Coefficient of determination =

Regression equation =

ATE01A1 – LU 4 19
4) Use the sums below in the computational formulae to calculate the following values:

σ 𝑥 = 22.7 σ 𝑥 2 = 55.91

σ 𝑦 = 69.0 σ 𝑦 2 = 520.24 σ 𝑥𝑦 = 143.59

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
Correlation = 𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

ATE01A1 – LU 4 20
σ 𝑥 = 22.7 σ 𝑥 2 = 55.91

σ 𝑦 = 69.0 σ 𝑦 2 = 520.24 σ 𝑥𝑦 = 143.59

Coefficient of determination =

𝑛 σ 𝑥𝑦− σ 𝑥 σ 𝑦 σ 𝑦−𝑏 σ 𝑥
Regression equation: 𝑏 = 𝑎=
𝑛 σ 𝑥2− σ 𝑥 2 𝑛

ATE01A1 – LU 4 21
5) The following computer output gives the results of a least squares regression analysis
of marathon time on fitness:

Regression Statistics
Multiple R 0.94
R Square 0.88
Adjusted R Square 0.86
Standard Error 0.82
Observations 10

Coefficients Standard Error t Stat P-value

Intercept 13.66 0.92 14.82 0.00
Fitness score −2.98 0.39 −7.64 0.00

ATE01A1 – LU 4 22
a) Identify and interpret the strength of the linear relationship between fitness and
marathon time.

b) What proportion of variation in marathon time is explained by variation in fitness?

c) What proportion of variation in marathon time is not explained by variation in fitness?

ATE01A1 – LU 4 23
d) Interpret the slope of the regression model.

e) Predict an athlete’s marathon time if his/her fitness score is equal to 2. Is this a valid
prediction?

ATE01A1 – LU 4 24
d) Is it possible to predict an athlete’s marathon time if his/her fitness score is equal to 5?

e) Is it possible to predict an athlete’s fitness score if his/her marathon time is equal to 3

hours?

ATE01A1 – LU 4 25

Chapter 4 - Correlation and Linear Regression
No ratings yet
Chapter 4 - Correlation and Linear Regression
28 pages
Correlation
No ratings yet
Correlation
57 pages
13simple Linear Regression
No ratings yet
13simple Linear Regression
127 pages
Correlation and Regression
No ratings yet
Correlation and Regression
32 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Lecture 11
No ratings yet
Lecture 11
16 pages
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
No ratings yet
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
13 pages
IWB Chapter 10 - Inter-Relationships Between Variables
No ratings yet
IWB Chapter 10 - Inter-Relationships Between Variables
22 pages
Correlation
100% (1)
Correlation
29 pages
Lecture 12 Simple Linear Regression Analysis
No ratings yet
Lecture 12 Simple Linear Regression Analysis
22 pages
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
No ratings yet
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
14 pages
BL 234 Revised Correlation Notes
No ratings yet
BL 234 Revised Correlation Notes
8 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Iskak, Stats 2
No ratings yet
Iskak, Stats 2
5 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Regression and Correlation Notes
No ratings yet
Regression and Correlation Notes
28 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Chapter 12 Notes
No ratings yet
Chapter 12 Notes
60 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
Unit 6
No ratings yet
Unit 6
17 pages
Module 4 Advanced Data Analytics Techniques BRM
No ratings yet
Module 4 Advanced Data Analytics Techniques BRM
29 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
MS Clinical, Counselling Psychology Sample Test
No ratings yet
MS Clinical, Counselling Psychology Sample Test
4 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Lse Ppa M4u3 Notes
No ratings yet
Lse Ppa M4u3 Notes
15 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Regression Make Simple
No ratings yet
Regression Make Simple
13 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
CH 6
No ratings yet
CH 6
42 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
Chi Square
50% (2)
Chi Square
22 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
CH 6
No ratings yet
CH 6
43 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Correlation 140708105710 Phpapp01
No ratings yet
Correlation 140708105710 Phpapp01
21 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Ch. 8 Measures of Association
No ratings yet
Ch. 8 Measures of Association
8 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Lesson 7 - Linear Correlation and Simple Linear Regression
No ratings yet
Lesson 7 - Linear Correlation and Simple Linear Regression
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Regression Ex
No ratings yet
Regression Ex
13 pages
STPM Maths T 2020 Assignment Conclusion Example
No ratings yet
STPM Maths T 2020 Assignment Conclusion Example
1 page
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Applied Multiple Regression/Correlation Analysis For The Behavioral Sciences
No ratings yet
Applied Multiple Regression/Correlation Analysis For The Behavioral Sciences
545 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Fundamentals of Statistics For Aviation Research - Michael A - Gallo, Brooke E - Wheeler, Isaac M - Silver - Aviation Fundamentals, 1, 2023 - 9781003308300 - Anna's Archive
No ratings yet
Fundamentals of Statistics For Aviation Research - Michael A - Gallo, Brooke E - Wheeler, Isaac M - Silver - Aviation Fundamentals, 1, 2023 - 9781003308300 - Anna's Archive
367 pages
cmc7 PDF
No ratings yet
cmc7 PDF
1 page
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
T Tests Independent and Paired
No ratings yet
T Tests Independent and Paired
25 pages
Course Outline - Quantitative Reasoning - 1
No ratings yet
Course Outline - Quantitative Reasoning - 1
2 pages
Normal Distribution
No ratings yet
Normal Distribution
32 pages
Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
Woreda-Level Crop Production Rankings in Ethiopia:: A Pooled Data Approach
No ratings yet
Woreda-Level Crop Production Rankings in Ethiopia:: A Pooled Data Approach
43 pages
HT For PHO
No ratings yet
HT For PHO
89 pages
Sample Finalexam
No ratings yet
Sample Finalexam
3 pages
Course Outline MTS 202 - Statistical Inference
No ratings yet
Course Outline MTS 202 - Statistical Inference
5 pages
Human Resource Management Practices and Service Delivery in County Government of Siaya Kenya
No ratings yet
Human Resource Management Practices and Service Delivery in County Government of Siaya Kenya
19 pages
Describing Data: Displaying and Exploring Data
No ratings yet
Describing Data: Displaying and Exploring Data
13 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Machine Learning Mock
No ratings yet
Machine Learning Mock
3 pages
Midterm Examination MGT 600 - Statistical Analysis in MGT and Education
No ratings yet
Midterm Examination MGT 600 - Statistical Analysis in MGT and Education
3 pages
Q.1 Explain The Underlying Ideas Behind The Log It Model. Explain On What Grounds Log It Model Is An Improvement Over Linear Probability Model. Ans
No ratings yet
Q.1 Explain The Underlying Ideas Behind The Log It Model. Explain On What Grounds Log It Model Is An Improvement Over Linear Probability Model. Ans
17 pages
Mini Question Bank 6th sem-1903BS005-MACHINE LEARNING
No ratings yet
Mini Question Bank 6th sem-1903BS005-MACHINE LEARNING
3 pages
Psychometric Methods: Theory Into Practice: Larry R. Price
No ratings yet
Psychometric Methods: Theory Into Practice: Larry R. Price
9 pages
MATH 1281 Discussion Assignment Unit 1
No ratings yet
MATH 1281 Discussion Assignment Unit 1
3 pages
Estimation
No ratings yet
Estimation
6 pages
CS5805 Proposal 1
No ratings yet
CS5805 Proposal 1
2 pages
Statistical Methodology Past Paper 2018-2019
No ratings yet
Statistical Methodology Past Paper 2018-2019
4 pages
Descriptive Data Analysis Using PSPP and EpiData Analysis
No ratings yet
Descriptive Data Analysis Using PSPP and EpiData Analysis
9 pages
Western Mindanao State University Siay Campus: Mode Median
No ratings yet
Western Mindanao State University Siay Campus: Mode Median
5 pages
SelfStudy1 Forecasting Questions
No ratings yet
SelfStudy1 Forecasting Questions
3 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

ANALYTICAL TECHNIQUES LU4 Lecture Notes

Uploaded by

ANALYTICAL TECHNIQUES LU4 Lecture Notes

Uploaded by

LEARNING UNIT 4: Correlation and Simple Linear Regression

• Distinguish between the dependent and independent variables

• Identify the type of relationship in a bivariate dataset by using a scatterplot

• Calculate and interpret regression coefficients of the regression model

Students must be able to:

• by interpreting computer output.

Give two more examples of dependent and independent variables.

The scatterplot is a visual representation of the relationship between two numerical

Positive linear Negative linear

The computational formula for Pearson’s correlation is:

• Note that r is very sensitive to outliers.

r = −1 Perfect negative linear relationship

1) SETUP → down arrow → 3:STAT → 2:OFF

2) MODE → 2:STAT → 2:A + BX

5) SHIFT STAT → 5:REG → 3:r → =

Linear regression analysis is a technique used to mathematically model the relationship

• Simple linear regression has only one numerical independent variable.

• Multiple linear regression has multiple independent variables (categorical or numerical)

Steps to find the regression coefficient using the calculator

1) Same steps as for the correlation coefficient (steps 1-4)

2) SHIFT STAT → 5:REG → 1:A (intercept) or 2:B (slope) → =

• It is the square of the correlation coefficient, denoted by r2.

• It is typically expressed as a percentage and interpreted as the percentage of variation

1) Identify the dependent and the independent variables:

Use your calculator to find the following values:

σ 𝑦 = 69.0 σ 𝑦 2 = 520.24 σ 𝑥𝑦 = 143.59

σ 𝑦 = 69.0 σ 𝑦 2 = 520.24 σ 𝑥𝑦 = 143.59

Coefficients Standard Error t Stat P-value

b) What proportion of variation in marathon time is explained by variation in fitness?

c) What proportion of variation in marathon time is not explained by variation in fitness?

e) Is it possible to predict an athlete’s fitness score if his/her marathon time is equal to 3

You might also like