0% found this document useful (0 votes)
5 views31 pages

Lecture2 Xy 2025

The lecture covers key concepts in quantitative methods including covariance, correlation, and regression analysis. It explains how to calculate covariance and correlation coefficients, and how to find the line of best fit for data points. Additionally, it discusses the coefficient of determination (R²) to assess the effectiveness of the regression model.

Uploaded by

6684059929
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views31 pages

Lecture2 Xy 2025

The lecture covers key concepts in quantitative methods including covariance, correlation, and regression analysis. It explains how to calculate covariance and correlation coefficients, and how to find the line of best fit for data points. Additionally, it discusses the coefficient of determination (R²) to assess the effectiveness of the regression model.

Uploaded by

6684059929
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

BMAN10960 Quantitative Methods for

Business and Management 2

Lecture 2 - Covariance, Correlation and Regression

Dr. Xian Yang [email protected]


https://research.manchester.ac.uk/en/persons/xian.yang
What have we learned so
far?
General linear functions

For any linear function expressed as y = ax + b


• a is the slope or gradient of the function (i.e. the slope of the line on a graph)
• b is the intercept on the vertical (y) axis

Cost
(C)
e.g. C = 24q + 500

a = 24
b = 500
What if samples cannot fit perfectly
into a straight line

How do we find the best straight line to fit the linear function?

Costs against Output over the last 5 years

6,000

5,000
Year Output Actual Costs
4,000
1 50 2500
2 95 1800
Cost

3,000
3 200 5500
4 120 2800 2,000
5 150 4500
1,000

0
0 50 100 150 200 250
Output
Learning outcomes for today

Covariance
Correlation
The “best fit” line (Regression)
Further considerations
Recap

There are eight values: 4, 7, 7, 8, 9, 12, 16, 17, calculate


the Variance and standard deviation.
Variance
𝒏 4 +7+ 7+ 8+9+ 12+16+17
∑ ( 𝒙 − 𝒙𝒊 ) 𝟐 𝑥= =10
𝟐 𝒊=𝟏
8
𝒔 =
𝒏−𝟏 Deviations from the mean: 6, 3, 3, 2, 1, -2, -6, -7

Variance:
Standard deviation


𝒏

∑ ( 𝒙 − 𝒙 𝒊 )𝟐 Standard deviation:


𝒊=𝟏
𝒔=


𝒏 −𝟏
62 +3 2 +32 + 22+ 12 +22 +6 2+ 72 148
= ≈ 4.60
7 7
Excel 1:
=
Covariance

• Covariance is a measure of how much two random variables vary


together.
• It is similar to variance, but where variance tells you how
a single variable varies, covariance tells you how two variables
vary together.

y y y

x x x
Covariance

Covariance measures how much two random variables change together.

Y
- X +

∑ ( 𝑋 𝑖 − 𝑋 )(𝑌 𝑖 − 𝑌 )
𝑖=1
𝑐𝑜𝑣 ( X , Y ) =
𝑛 −1
y
It is calculated as the sum of
the product of the deviations
of each pair of data points
from their respective means,
divided by the number of
+ - x
data points minus one.
Example -- Calculating
covariance

 John is an investor. His main investment is in the S&P 500, and he is


considering buying the stocks of ABC Corp. However, before adding ABC
Corp stocks to his portfolio, he wants to assess the relationship between ABC
Corp stocks and the S&P 500.

John does not want to increase the unsystematic risk of his portfolio, so he
does not want to own securities that move in the same direction.
Calculating covariance

1. Estimate the mean points for x (S&P500)

2. Estimate the mean points for y (ABC Corp)


Calculating covariance

3. Subtract each value of x and y from the respective means


Mean of S&P500 =2044.80
Mean of ABC Corp=109.20

x  x y  y 
4. Multiply the results (a and b) to obtain and sum up.

Excel 2:
Quiz 1

Question: Given the covariance values for:

(1) Weight and Age: Cov (x,y) = 1.2 YrsKg


(2) Weight and Height: Cov (x,y) =2.1 FtKg

Which covariance, (1) or (2), indicates a stronger linear relationship?


Outline

Covariance
Correlation
The “best fit” line (Regression)
Further considerations
Correlations

 A Correlation is a Standardised (dimensionless) covariance

 Correlation coefficient : -1 ≤ R ≤ 1

 The correlation coefficient (R) is a measure of the strength and


direction of the relationship


 
 
 
 
R≈1 R ≈ -1 R≈ 0
Calculating correlation R

n n n
n  x i yi  x  y i i
cov( x, y )
Rxy R yx R   i 1 i 1 i 1
sx s y n n n n
n  x  ( x i )
2
i
2
n  y  ( y i ) 2
2
i
i 1 i 1 i 1 i 1

√ √
𝒏 𝒏

∑ ( 𝑿 − 𝑿𝒊) 𝟐
∑ (𝒀 − 𝒀 𝒊 )𝟐
𝒊=𝟏 𝒊=𝟏
𝑺𝒙 = 𝑺𝒚 =
𝒏 −𝟏 𝒏 −𝟏
Example -- Correlation

 In the last 5 years, the actual costs of manufacturing a product at various levels of
output have been recorded in the table below. Determine the correlation between output
and cost assuming that the actual cost (y) depends on the output (x).
Year Output Actual Costs
1 50 2500
2 95 1800
3 200 5500
4 120 2800
5 150 4500
x y x*x y*y x*y
1 50 2500 2500 6250000 125000
n n n 2 95 1800 9025 3240000 171000
n  x i yi   x i  yi 3 200 5500 40000 30250000 1100000
R i 1 i 1 i 1
n n n n 4 120 2800 14400 7840000 336000
n  x  ( x i )
2 2
n  y  ( y i ) 2
2

i 1
i
i 1 i 1
i
i 1
5 150 4500 22500 20250000 675000
615 17100 88425 67830000 2407000

5 ∗2407000 − 615 ∗17100 1518500 Excel 4:


𝑅= = =0.88
√ 5∗ 88425 − 6152 √ 5∗ 67830000 −17100 2 252.78∗ 6836.67
Quiz 2: Correlations

Question: Which graphs have the absolute value of correlation


equal to 1? (multiple choice)
Graph1 Graph2

Graph4 Graph5
Graph3
Outline

Covariance
Correlation
The “best fit” line (Regression)
Further considerations
Motivation
In the last 5 years, the actual costs of manufacturing a product at various levels of output
have been recorded as:
Year Output Actual Costs
1 50 2500
2 95 1800
3 200 5500
4 120 2800
5 150 4500

 How do we find the best straight line to fit the linear function?

Costs against Output over the last 5 years

6,000

5,000

4,000
By ‘eye’ the red line
Cost

3,000
seems to be the
2,000 closest to all the
points and looks like
1,000 it has a slope of
1200/50 = 24 and
0
0 50 100 150
intercept
200
of250
500
Output
Line of Best Fit

• A line of best fit is a straight line drawn through the centre of a group of data points
plotted on a scatter plot of data from two variables.
• It is used to identify trends occurring within the dataset that produces a scatter plot.
• It tells us whether the changes in two variables are related.

n n n
n x i y i  x y i i
slope a  i1
n
i 1
n
i 1

n x i2  (  x i )2
i1 i 1

n n

y i a x i
intercept b  i 1  i 1
n n
n n n
n  x i yi   x i  yi
cov( x, y )
Rxy R yx R   i 1 i 1 i 1
sx s y n n n n
n  x i2  ( x i ) 2 n  y i2  ( y i ) 2
i 1 i 1 i 1 i 1
Regression example

 In the last 5 years, the actual costs of manufacturing a product at


various levels of output have been recorded in Table 1. The managers
are interested to know the regression line of best fit that would tell them
the slope and intercept of the relationship between output and cost
assuming that the actual cost (y) depends on the output (x).

Table 1.
Year Output Actual Costs
1 50 2500
2 95 1800
3 200 5500
4 120 2800
5 150 4500
Slope and intercept of the
‘best’ fit line (Regression)

n n n
n x i y i  x y i i slope = 5 * 2407000 - 615 * 17100
slope a  i1
n
i 1
n
i 1 5 * 88425 - 615 *
615
n x i2  (  x i )2 = 23.76
i1 i1
n n

y i a x i intercept =17100 - 23.76 *


615
intercept b  i 1  i 1
n n 5
Excel 6:
5
How good is the
regression model?
 The coefficient of determination is a statistic that is used to determine
to what extent the difference/variance in one variable can be explained
by the difference/variance in a second variable. R2 lies between 0 and 1.
 R2 is the percentage of the variation in y explained by the model.

n n n
(n xi yi   xi  yi ) 2
 y ' y 
2
i
R2   i 1 i 1 i 1

 y  y 
2
 n n
2 
n n
2
  i  i    i  i 
2 2
i n x  ( x ) * n y  ( y )
 i 1 i 1   i 1 i 1 

yi

yi’

n n n
n  x i yi  x  y i i
cov( x, y )
Rxy R yx R   i 1 i 1 i 1
sx s y n n n n
n  x i2  ( x i ) 2 n  y i2  ( y i ) 2
i 1 i 1 i 1 i 1
Calculating R2 manually -
Example
i xi yi xi2 xiyi yi2
1 50 2500 2500 125000 6250000
2 95 1800 9025 171000 3240000
3 200 5500 40000 1100000 30250000
4 120 2800 14400 336000 7840000
5 150 4500 22500 675000 20250000
Sums = 615 17100 88425 2407000 67830000
n n n
( n xi yi  x y ) i i
2

R2  i 1 i 1 i 1

 n n
2  n n

  i     i 
2 2
n x  ( xi ) * n y  ( yi ) 2 
 i 1 i 1   i 1 i 1 

R2 = ( 5*2407000 – 615*17100 )2
( 5*88425 – 615*615 ) * ( 5*67830000 – 17100*17100 )
R2 = 0.77
 So, we’ve ‘explained’ 77% of the variation in cost using our linear regression model.
 It follows that the correlation coefficient is R = = 0.88.
Excel 7:
Outline

Covariance
Correlation
The “best fit” line (Regression)
Further considerations
Further considerations

Y = f(X) or X = f(Y) …?
Costs against Output over the last 5 years

7,000

6,000

5,000

4,000
Cost

3,000

2,000
Cost = 23.764*Output + 497.07
1,000
R2 = 0.772
0
0 50 100 150 200 250
Output

1. It is important to decide which is the independent variable


• Get a different fit line … but the same R2 and R
Excel 8:
Further considerations

2. Outliers can influence the slope and intercept of the best fit line
and give poor correlations, e.g.

Removing
this outlier
moves the
line and
increases the
correlation

Review the scattergram for possible outliers, question if they are really
outliers or valid data points and, if necessary, exclude them.

Excel 9:
Further considerations

3. Non-linear regression e.g., quadratic, polynomial (Lecture 3),


hyperbolic, and exponential (Lecture 4)

4. Multivariate regression (Lecture 5)


• y = f(x1, x2, . . . , xn)
e.g., Total Cost = fixed cost + variable cost for product 1
+ variable cost for product 2
+ etc
Further considerations

5. Correlation is not causation!


Further considerations

5. Correlation is not causation!


Summary

Covariance
Correlation
The “best fit” line (Regression)
Further considerations
Reading: Dewhurst Sections 8.3 and 8.4 & Oakshott
Section 10
Next week:
Quadratic functions
y=ax2+bx+c

You might also like