Regression Analysis
Regression Analysis
Regression analysis includes several variations, such as linear, multiple linear, and
nonlinear. The most common models are simple linear and multiple linear. Nonlinear
regression analysis is commonly used for more complicated data sets in which the
dependent and independent variables show a nonlinear relationship.
a) The dependent and independent variables show a linear relationship between the
slope and the intercept.
b) The independent variable is not random.
c) The value of the residual (error) is zero.
d) The value of the residual (error) is constant across all observations.
e) The value of the residual (error) is not correlated across all observations.
f) The residual (error) values follow the normal distribution.
Simple linear regression is a model that assesses the relationship between a dependent
variable and an independent variable. The simple linear model is expressed using the
following equation:
Y = a + bX + ϵ
Where:
Y – Dependent variable
a – Intercept
b – Slope
ϵ – Residual (error)
Check out the following video to learn more about simple linear regression:
https://corporatefinanceinstitute.com/assets/REG_C1L02-Simple-Linear-
Regression.mp4
Multiple linear regression analysis is essentially similar to the simple linear model,
with the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regression is:
Where:
Y – Dependent variable
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)
Multiple linear regression follows the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analysis,
there is another mandatory condition for the model:
Regression analysis comes with several applications in finance. For example, the
statistical method is fundamental to the Capital Asset Pricing Model (CAPM).
Essentially, the CAPM equation is a model that determines the relationship between
the expected return of an asset and the market risk premium.
The analysis is also used to forecast the returns of securities, based on different
factors, or to forecast the performance of a business. Learn more forecasting methods
in CFI’s Budgeting and Forecasting Course!
The above example shows how to use the Forecast function in Excel to calculate a
company’s revenue, based on the number of ads it runs.
Regression Tools
Excel remains a popular tool to conduct basic regression analysis in finance, however,
there are many more advanced statistical tools that can be used.
Python and R are both powerful coding languages that have become popular for all
types of financial modeling, including regression. These techniques form a core part
of data science and machine learning where models are trained to detect these
relationships in data.
Learn more about regression analysis, Python, and Machine Learning in CFI’s
Business Intelligence & Data Analysis certification.
Step 1: Make a chart of your data, filling in the columns in the same way
as you would fill in the chart if you were finding the Pearson’s
Correlation Coefficient..
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Find a:
y’ = a + bx
y’ = 65.14 + .385225x
Example 9.9
Ca lculate the regression coefficient and obtain the lines of regression for the following
data
Solution:
Regression coefficient of X on Y
(i) Regression equation of X on Y
= 0.929X+7.284
Example 9.10
Calculate the two regression equations of X on Y and Y on X from the data given below,
taking deviations from a actual means of X and Y.
Solution:
= –0.25 (20)+44.25
= –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
Example 9.11
Obtain regression equation of Y on X and estimate Y when X=55 from the following
Solution:
(i) Regression coefficients of Y on X
(ii) Regression equation of Y on X
Y–51.57 = 0.942(X–48.29 )
Y = 0.942X–45.49+51.57=0.942 #–45.49+51.57
Y = 0.942X+6.08
Y= 0.942(55)+6.08=57.89
Example 9.12
Find the means of X and Y variables and the coefficient of correlation between them from
the following two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Solution:
We are given
We get Y = 90
We get X = 130
Calculating correlation coefficient
2Y = X+50
NOTE
It may be noted that in the above problem one of the regression coefficient is
greater than 1 and the other is less than 1. Therefore our assumption on given
equations are correct.
Example 9.13
Find the means of X and Y variables and the coefficient of correlation between them from
the following two regression equations:
4X–5Y+33 = 0
20X–9Y–107 = 0
Solution:
We are given
We get Y = 17
So our above assumption is wrong. Therefore treating equation (1) has regression equation
of Y on X and equation (2) has regression equation of X on Y . So we get
Example 9.14
The following table shows the sales and advertisement expenditure of a form
Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Rs. 10 crores.
Solution:
When advertisement expenditure is 10 crores i.e., Y=10 then sales X=6(10)+4=64 which
implies sales is 64.
Example 9.15
There are two series of index numbers P for price index and S for stock of the commodity.
The mean and standard deviation of P are 100 and 8 and of S are 103 and 4 respectively.
The correlation coefficient between the two series is 0.4. With these data obtain the
regression lines of P on S and S on P.
Solution:
Let us consider X for price P and Y for stock S. Then the mean and SD for P is considered
as X-Bar = 100 and σx=8. respectively and the mean and SD of S is considered as Y-
Bar =103 and σy=4. The correlation coefficient between the series is r(X,Y)=0.4
Example 9.16
For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2 =55,
∑Y2 =135, ∑XY=83 Find the equation of the lines of regression and estimate the value
of X on the first line when Y=12 and value of Y on the second line if X=8.
Solution:
Y–5 = 0.8(X–3)
= 0.8X+2.6
= 0.8(8)+2.6
=9
Example 9.17
The two regression lines are 3X+2Y=26 and 6X+3Y=31. Find the correlation coefficient.
Solution:
3X+2Y = 26
Example 9.18
In a laboratory experiment on correlation research study the equation of the two regression
lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means of X and Y. Also work
out the values of the regression coefficient and correlation between the
two variables X and Y.
Solution:
Solving the two regression equations we get mean values of X and Y
Example 9.19
Solution:
(i) First convert the given equations Y on X and X on Y in standard form and find their
regression coefficients respectively.
3X–2Y = 5
3X = 2Y+5
Coefficient of correlation
Since the two regression coefficients are positive then the correlation coefficient is also
positive and it is given by
Find a Linear Regression Equation in Excel.
Step 2: Type your data into two columns in Excel. For example,
type your “x” data into column A and your “y” data into column b. Do
not leave any blank cells between your entries.
Step 7: Select the location where you want your output range to
go by selecting a blank area in the worksheet or typing the location of
where you want your data to go in the “Output Range” box.
Step 8: Click “OK”. Excel will calculate the linear regression and
populate your worksheet with the results.
Exercise 9.2
Find (a) The two regression equations, (b) The coefficient of correlation between marks in
Economics and statistics, (c) The mostly likely marks in Statistics when the marks in
Economics is 30.
2. The heights ( in cm.) of a group of fathers and sons are given below
Find the lines of regression and estimate the height of son when the height of the father is
164 cm.
3. The following data give the height in inches (X) and the weight in lb. (Y) of a random
sample of 10 students from a large group of students of age 17 years:
4. Obtain the two regression lines from the following data N=20, ∑X=80, ∑Y=40,
∑X2=1680, ∑Y2=320 and ∑XY=480
5. Given the following data, what will be the possible yield when the rainfall is 29₹₹
6. The following data relate to advertisement expenditure(in lakh of rupees) and their
corresponding sales( in crores of rupees)
If the Correlation coefficient between X and Y is 0.66, then find (i) the two regression
coefficients, (ii) the most likely value of Y when X=10
8. Find the equation of the regression line of Y on X, if the observations ( Xi, Yi) are the
following (1,4) (2,8) (3,2) ( 4,12) ( 5, 10) ( 6, 14) ( 7, 16) ( 8, 6) (9, 18)
Write down the regression equation and estimate the expenditure on Food and
Entertainment, if the expenditure on accommodation is Rs. 200.
10. For 5 observations of pairs of (X, Y) of variables X and Y the following results are
obtained. ∑X=15, ∑Y=25, ∑X2=55, ∑Y2=135, ∑XY=83. Find the equation of the lines of
regression and estimate the values of X and Y if Y=8 ; X=12.
11. The two regression lines were found to be 4X–5Y+33=0 and 20X–9Y–107=0 . Find the
mean values and coefficient of correlation between X and Y.
12. The equations of two lines of regression obtained in a correlation analysis are the
following 2X=8–3Y and 2Y=5–X . Obtain the value of the regression coefficients and
correlation coefficient.