0% found this document useful (0 votes)
35 views29 pages

Regression Analysis

Uploaded by

Regnold Munuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views29 pages

Regression Analysis

Uploaded by

Regnold Munuo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

What is Regression Analysis?

Regression analysis is a set of statistical methods used for the estimation of


relationships between a dependent variable and one or more independent variables. It
can be utilized to assess the strength of the relationship between variables and for
modeling the future relationship between them.

Regression analysis includes several variations, such as linear, multiple linear, and
nonlinear. The most common models are simple linear and multiple linear. Nonlinear
regression analysis is commonly used for more complicated data sets in which the
dependent and independent variables show a nonlinear relationship.

Regression analysis offers numerous applications in various disciplines, including


finance.

Regression Analysis – Linear Model Assumptions

Linear regression analysis is based on six fundamental assumptions:

a) The dependent and independent variables show a linear relationship between the
slope and the intercept.
b) The independent variable is not random.
c) The value of the residual (error) is zero.
d) The value of the residual (error) is constant across all observations.
e) The value of the residual (error) is not correlated across all observations.
f) The residual (error) values follow the normal distribution.

Regression Analysis – Simple Linear Regression

Simple linear regression is a model that assesses the relationship between a dependent
variable and an independent variable. The simple linear model is expressed using the
following equation:

Y = a + bX + ϵ
Where:

Y – Dependent variable

X – Independent (explanatory) variable

a – Intercept

b – Slope

ϵ – Residual (error)

Check out the following video to learn more about simple linear regression:

https://corporatefinanceinstitute.com/assets/REG_C1L02-Simple-Linear-
Regression.mp4

Regression Analysis – Multiple Linear Regression

Multiple linear regression analysis is essentially similar to the simple linear model,
with the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regression is:

Y = a + bX1 + cX2 + dX3 + ϵ

Where:

Y – Dependent variable

X1, X2, X3 – Independent (explanatory) variables

a – Intercept

b, c, d – Slopes

ϵ – Residual (error)
Multiple linear regression follows the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analysis,
there is another mandatory condition for the model:

Non-collinearity: Independent variables should show a minimum correlation with


each other. If the independent variables are highly correlated with each other, it will
be difficult to assess the true relationships between the dependent and independent
variables.

Regression Analysis in Finance

Regression analysis comes with several applications in finance. For example, the
statistical method is fundamental to the Capital Asset Pricing Model (CAPM).
Essentially, the CAPM equation is a model that determines the relationship between
the expected return of an asset and the market risk premium.

The analysis is also used to forecast the returns of securities, based on different
factors, or to forecast the performance of a business. Learn more forecasting methods
in CFI’s Budgeting and Forecasting Course!

1. Beta and CAPM

In finance, regression analysis is used to calculate the Beta (volatility of returns


relative to the overall market) for a stock. It can be done in Excel using the Slope
function.
Download CFI’s free beta calculator!

2. Forecasting Revenues and Expenses

When forecasting financial statements for a company, it may be useful to do a


multiple regression analysis to determine how changes in certain assumptions or
drivers of the business will impact revenue or expenses in the future. For example,
there may be a very high correlation between the number of salespeople employed by
a company, the number of stores they operate, and the revenue the business generates.

The above example shows how to use the Forecast function in Excel to calculate a
company’s revenue, based on the number of ads it runs.

Learn more forecasting methods in CFI’s Budgeting and Forecasting Course!

Regression Tools

Excel remains a popular tool to conduct basic regression analysis in finance, however,
there are many more advanced statistical tools that can be used.

Python and R are both powerful coding languages that have become popular for all
types of financial modeling, including regression. These techniques form a core part
of data science and machine learning where models are trained to detect these
relationships in data.

Learn more about regression analysis, Python, and Machine Learning in CFI’s
Business Intelligence & Data Analysis certification.

The Linear Regression Equation


Linear regression is a way to model the relationship between two
variables. You might also recognize the equation as the slope
formula. The equation has the form Y= a + bX, where Y is the
dependent variable (that’s the variable that goes on the Y axis), X is
the independent variable (i.e. it is plotted on the X axis), b is
the slope of the line and a is the y-intercept.
The first step in finding a linear regression equation is to determine if
there is a relationship between the two variables. This is often a
judgment call for the researcher. You’ll also need a list of your data in
x-y format (i.e. two columns of data—independent and dependent
variables).

How to Find a Linear Regression Equation: Steps

Step 1: Make a chart of your data, filling in the columns in the same way
as you would fill in the chart if you were finding the Pearson’s
Correlation Coefficient..

Subject Age x Glucose Level y xy x2

1 43 99 4257 1849

2 21 65 1365 441

3 25 79 1975 625

4 42 75 3150 1764

5 57 87 4959 3249

6 59 81 4779 3481

Σ 247 486 20485 11409

From the above table, Σx = 247, Σy = 486, Σxy = 20485, Σx2 =


11409, Σy2 = 40022. n is the sample size (6, in our case).

Step 2: Use the following equations to find a and b.


a = 65.1416
b = .385225

Find a:

 ((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 2472)


 484979 / 7445
 =65.14
Find b:

 (6(20,485) – (247 × 486)) / (6 (11409) – 2472)


 (122,910 – 120,042) / 68,454 – 2472
 2,868 / 7,445
 = .385225

Step 3: Insert the values into the equation.

y’ = a + bx
y’ = 65.14 + .385225x

That’s how to find a linear regression equation by hand!.


Solved Example Problems for Regression Analysis

Example 9.9

Ca lculate the regression coefficient and obtain the lines of regression for the following
data

Solution:

Regression coefficient of X on Y
(i) Regression equation of X on Y

(ii) Regression coefficient of Y on X

(iii) Regression equation of Y on X


Y = 0.929X–3.716+11

= 0.929X+7.284

The regression equation of Y on X is Y= 0.929X + 7.284

Example 9.10

Calculate the two regression equations of X on Y and Y on X from the data given below,
taking deviations from a actual means of X and Y.

Estimate the likely demand when the price is Rs.20.

Solution:

Calculation of Regression equation


(i) Regression equation of X on Y

(ii) Regression Equation of Y on X


When X is 20, Y will be

= –0.25 (20)+44.25

= –5+44.25

= 39.25 (when the price is Rs. 20, the likely demand is 39.25)

Example 9.11

Obtain regression equation of Y on X and estimate Y when X=55 from the following

Solution:
(i) Regression coefficients of Y on X
(ii) Regression equation of Y on X

Y–51.57 = 0.942(X–48.29 )

Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57

Y = 0.942X+6.08

The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55

Y= 0.942(55)+6.08=57.89

Example 9.12

Find the means of X and Y variables and the coefficient of correlation between them from
the following two regression equations:

2Y–X–50 = 0

3Y–2X–10 = 0.

Solution:

We are given

2Y–X–50 = 0 ... (1)

3Y–2X–10 = 0 ... (2)

Solving equation (1) and (2)

We get Y = 90

Putting the value of Y in equation (1)

We get X = 130
Calculating correlation coefficient

Let us assume equation (1) be the regression equation of Y on X

2Y = X+50

NOTE

It may be noted that in the above problem one of the regression coefficient is
greater than 1 and the other is less than 1. Therefore our assumption on given
equations are correct.

Example 9.13

Find the means of X and Y variables and the coefficient of correlation between them from
the following two regression equations:

4X–5Y+33 = 0
20X–9Y–107 = 0

Solution:

We are given

4X–5Y+33 = 0 ... (1)

20X–9Y–107 = 0 ... (2)

Solving equation (1) and (2)

We get Y = 17

Putting the value of Y in equation (1)

Calculating correlation coefficient

Let us assume equation (1) be the regression equation of X on Y

Let us assume equation (2) be the regression equation of Y on X


But this is not possible because both the regression coefficient are greater than

So our above assumption is wrong. Therefore treating equation (1) has regression equation
of Y on X and equation (2) has regression equation of X on Y . So we get

Example 9.14

The following table shows the sales and advertisement expenditure of a form
Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Rs. 10 crores.

Solution:

When advertisement expenditure is 10 crores i.e., Y=10 then sales X=6(10)+4=64 which
implies sales is 64.

Example 9.15

There are two series of index numbers P for price index and S for stock of the commodity.
The mean and standard deviation of P are 100 and 8 and of S are 103 and 4 respectively.
The correlation coefficient between the two series is 0.4. With these data obtain the
regression lines of P on S and S on P.

Solution:
Let us consider X for price P and Y for stock S. Then the mean and SD for P is considered
as X-Bar = 100 and σx=8. respectively and the mean and SD of S is considered as Y-
Bar =103 and σy=4. The correlation coefficient between the series is r(X,Y)=0.4

Let the regression line X on Y be

Example 9.16

For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2 =55,
∑Y2 =135, ∑XY=83 Find the equation of the lines of regression and estimate the value
of X on the first line when Y=12 and value of Y on the second line if X=8.

Solution:
Y–5 = 0.8(X–3)

= 0.8X+2.6

When X=8 the value of Y is estimated as

= 0.8(8)+2.6

=9

Example 9.17

The two regression lines are 3X+2Y=26 and 6X+3Y=31. Find the correlation coefficient.

Solution:

Let the regression equation of Y on X be

3X+2Y = 26
Example 9.18

In a laboratory experiment on correlation research study the equation of the two regression
lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means of X and Y. Also work
out the values of the regression coefficient and correlation between the
two variables X and Y.

Solution:
Solving the two regression equations we get mean values of X and Y

Example 9.19

For the given lines of regression 3X–2Y=5and X–4Y=7. Find

(i) Regression coefficients

(ii) Coefficient of correlation

Solution:
(i) First convert the given equations Y on X and X on Y in standard form and find their
regression coefficients respectively.

Given regression lines are

3X–2Y = 5 ... (1)

X–4Y = 7 ... (2)

Let the line of regression of X on Y is

3X–2Y = 5

3X = 2Y+5
Coefficient of correlation

Since the two regression coefficients are positive then the correlation coefficient is also
positive and it is given by
Find a Linear Regression Equation in Excel.

Linear Regression Equation Microsoft Excel: Steps


Step 1: Install the Data Analysis Toolpak, if it isn’t already
installed. For instructions on how to load the Data Analysis Toolpak,
click here.

Step 2: Type your data into two columns in Excel. For example,
type your “x” data into column A and your “y” data into column b. Do
not leave any blank cells between your entries.

Step 3: Click the “Data Analysis” tab on the Excel toolbar.

Step 4: Click “regression” in the pop up window and then click


“OK.”
Step 5: Select your input Y range. You can do this two ways: either
select the data in the worksheet or type the location of your data into
the “Input Y Range box.” For example, if your Y data is in A2 through
A10 then type “A2:A10” into the Input Y Range box.

Step 6: Select your input X range by selecting the data in the


worksheet or typing the location of your data into the “Input X Range
box.”

Step 7: Select the location where you want your output range to
go by selecting a blank area in the worksheet or typing the location of
where you want your data to go in the “Output Range” box.

Step 8: Click “OK”. Excel will calculate the linear regression and
populate your worksheet with the results.

Tip: The linear regression equation information is given in the last


output set (the coefficients column). The first entry in the “Intercept”
row is “a” (the y-intercept) and the first entry in the “X” column is “b”
(the slope).

Exercise 9.2

1. From the data given below

Find (a) The two regression equations, (b) The coefficient of correlation between marks in
Economics and statistics, (c) The mostly likely marks in Statistics when the marks in
Economics is 30.

2. The heights ( in cm.) of a group of fathers and sons are given below
Find the lines of regression and estimate the height of son when the height of the father is
164 cm.

3. The following data give the height in inches (X) and the weight in lb. (Y) of a random
sample of 10 students from a large group of students of age 17 years:

Estimate weight of the student of a height 69 inches.

4. Obtain the two regression lines from the following data N=20, ∑X=80, ∑Y=40,
∑X2=1680, ∑Y2=320 and ∑XY=480

5. Given the following data, what will be the possible yield when the rainfall is 29₹₹

Coefficient of correlation between rainfall and production is 0.8

6. The following data relate to advertisement expenditure(in lakh of rupees) and their
corresponding sales( in crores of rupees)

Estimate the sales corresponding to advertising expenditure of Rs. 30 lakh.


7. You are given the following data:

If the Correlation coefficient between X and Y is 0.66, then find (i) the two regression
coefficients, (ii) the most likely value of Y when X=10

8. Find the equation of the regression line of Y on X, if the observations ( Xi, Yi) are the
following (1,4) (2,8) (3,2) ( 4,12) ( 5, 10) ( 6, 14) ( 7, 16) ( 8, 6) (9, 18)

9. A survey was conducted to study the relationship between expenditure on


accommodation (X) and expenditure on Food and Entertainment (Y) and the following
results were obtained:

Write down the regression equation and estimate the expenditure on Food and
Entertainment, if the expenditure on accommodation is Rs. 200.

10. For 5 observations of pairs of (X, Y) of variables X and Y the following results are
obtained. ∑X=15, ∑Y=25, ∑X2=55, ∑Y2=135, ∑XY=83. Find the equation of the lines of
regression and estimate the values of X and Y if Y=8 ; X=12.
11. The two regression lines were found to be 4X–5Y+33=0 and 20X–9Y–107=0 . Find the
mean values and coefficient of correlation between X and Y.

12. The equations of two lines of regression obtained in a correlation analysis are the
following 2X=8–3Y and 2Y=5–X . Obtain the value of the regression coefficients and
correlation coefficient.

You might also like