0% found this document useful (0 votes)
21 views7 pages

Ankit Bansal CGT19005

This document discusses using multiple linear regression to understand the correlation between various variables that impact car mileage. Regression analysis is performed using data on different car models, with mileage as the dependent variable and factors like displacement, horsepower, and weight as independent variables. The analysis finds the regression model to be statistically significant and explains over 80% of the variation in mileage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views7 pages

Ankit Bansal CGT19005

This document discusses using multiple linear regression to understand the correlation between various variables that impact car mileage. Regression analysis is performed using data on different car models, with mileage as the dependent variable and factors like displacement, horsepower, and weight as independent variables. The analysis finds the regression model to be statistically significant and explains over 80% of the variation in mileage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Multiple regression analysis

Quantitative Techniques for Managerial Decision


Making

Submitted By:
Ankit Bansal - CGT19005
1. Problem Definition
Following is the data corresponding to different car models in USA.

Model Name: Make and Model of the Car


Mileage: Average fuel consumption defined as number of miles car will run per gallon of fuel
Displacement: Engine displacement in cubic inches. Engine displacement is the swept volume of all
the pistons inside the cylinders of a reciprocating engine in a single movement from top dead centre
(TDC) to bottom dead centre (BDC).
Horsepower: It is a measure of gross engine power or the maximum power that an engine can put
out
Weight: Weight of empty car in 1000 pounds
It is believed that Mileage of car is correlated with the engine displacement, engine power and
weight of the car. The study is to understand if there is any correlation between these variables and
gain further insights on the nature of the correlation. The analysis is performed using Data Analysis
tool of Microsoft Excel.

Model Name Mileage Displacement Horsepower Weight


Mazda RX4 21.0 160.0 110 2.620
Mazda RX4 Wag 21.0 160.0 110 2.875
Datsun 710 22.8 108.0 93 2.320
Hornet 4 Drive 21.4 258.0 110 3.215
Hornet Sportabout 18.7 360.0 175 3.440
Valiant 18.1 225.0 105 3.460
Duster 360 14.3 360.0 245 3.570
Merc 240D 24.4 146.7 62 3.190
Merc 230 22.8 140.8 95 3.150
Merc 280 19.2 167.6 123 3.440
Merc 280C 17.8 167.6 123 3.440
Merc 450SE 16.4 275.8 180 4.070
Merc 450SL 17.3 275.8 180 3.730
Merc 450SLC 15.2 275.8 180 3.780
Cadillac Fleetwoo 10.4 472.0 205 5.250
Lincoln Continent 10.4 460.0 215 5.424
Chrysler Imperial 14.7 440.0 230 5.345
Fiat 128 32.4 78.7 66 2.200
Honda Civic 30.4 75.7 52 1.615
Toyota Corolla 33.9 71.1 65 1.835
Toyota Corona 21.5 120.1 97 2.465
Dodge Challenger 15.5 318.0 150 3.520
AMC Javelin 15.2 304.0 150 3.435
Camaro Z28 13.3 350.0 245 3.840
Pontiac Firebird 19.2 400.0 175 3.845
Fiat X1-9 27.3 79.0 66 1.935
Porsche 914-2 26.0 120.3 91 2.140
Lotus Europa 30.4 95.1 113 1.513
Ford Pantera L 15.8 351.0 264 3.170
Ferrari Dino 19.7 145.0 175 2.770
Maserati Bora 15.0 301.0 335 3.570
Volvo 142E 21.4 121.0 109 2.780

1
2. Multiple Linear Regression
To understand the correlation between the variables, multiple linear regression was done using data
analysis tool of MS Excel. Mileage was taken as dependent variable (Y) and, Displacement (X1),
horsepower(X2) and weight(X3) were taken as independent variables. Confidence Level was set at
95%. Following is the summary of results obtained.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.909305308
R Square 0.826836142
Adjusted R Square 0.808282872
Standard Error 2.638930213
Observations 32

ANOVA
df SS MS F Significance F
Regression 3 931.0565128 310.3521709 44.56551986 8.64959E-11
Residual 28 194.9906747 6.963952669
Total 31 1126.047188

Coefficients Standard t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error
Intercept 37.10550527 2.110815245 17.57875558 1.16194E-16 32.78169625 41.42931429 32.78169625 41.42931429
Displacement - 0.010349745 -0.09053451 0.928507029 -0.0221375 0.020263482 -0.0221375 0.020263482
0.000937009
Horsepower - 0.011435794 - 0.010971032 -0.054581714 -0.007731388 -0.054581714 -0.007731388
0.031156551 2.724476326
Weight - 1.066190639 - 0.001330991 -5.984883102 -1.616898063 -5.984883102 -1.616898063
3.800890583 3.564925861

3. Interpreting Results (Regression Table)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.909305308
R Square 0.826836142
Adjusted R Square 0.808282872
Standard Error 2.638930213
Observations 32

The estimated population regression plane equation in given by

Y = A + B1X1 + B2X2 + B3X3 + e

The sample regression plane is estimated by the equation

y = a + b1x1 + b2x2 + b3x3

Multiple R: Multiple R value of 0.91 is the correlation coefficient and shows the linear relationship
between the dependent and the independent variables. It is closer to 1 and therefore shows a very
strong correlation.

2
R Square: This is R2, the Coefficient of Determination. It tells us how many points fall on the

regression line. for example, 80% means that 80% of the variation of y-values around the

mean is explained by the x-values. In other words, 80% of the values fit the model.

R2 = 1- SSE/∑(yi-ybar)
Where,

SSE = error sum of squares = Σ(y – ŷ)2

From excel data analysis, R2 = 0.83

This means that 83% of the variation in the measure of mileage is explained by

the linear regression model formulated. In other words, 83% of the variation of yi

around ybar (its mean) is explained by the regressors (Xs) i.e. displacement, horsepower and weight.

Adjusted R Square: The adjusted R2 adjusts for the number of terms in a model. When adjusted

for degrees of freedom,

Adjusted R2 = 1-[SSE/(n-k-1)] / [SST/(n-1)]


Where,

SSE = error of sum of squares = Σ(y – ŷ)2

n = no. of observations = 32

k+1 = 4 (degrees of freedom lost on calculating a, b1, b2 and b3)

Standard error (Sε): An estimate of the standard deviation of the error e. The standard error of the
regression is the precision that the regression coefficient is measured.

Sε = √(SSE/(n-k-1)

Where,
SSE = error of sum of squares = Σ(y – ŷ)2

n = no. of observations = 32

k+1 = 4 (degrees of freedom lost on calculating a, b1, b2 and b3)

From excel data analysis, the standard error Sε is found to be 2.64.

4. Interpreting ANOVA Table


ANOVA
df SS MS F Significance F
Regression 3 931.0565128 310.3521709 44.56551986 8.64959E-11
Residual 28 194.9906747 6.963952669
Total 31 1126.047188

3
ANOVA is used to investigate if dependent variable is linearly correlated to any of the independent
variables. To understand this, we follow the process of hypothesis testing where

H0: B1 = B2 = B3 = 0

Ha: At least one Bi is not equal to zero.


If at least one Bi is not equal to zero, the model is valid.

To test these hypotheses, we perform an analysis of variance procedure.

F test: F statistic is defined as

F = MSR / MSE
Where,

MSR = Mean square due to regression = SSR/k

MSE = Mean square due to error = SSE / (n-k-1)

SSR = regression sum of squares = Σ(ŷ – ybar)2

From the ANOVA table,

SSR = 931.06

Hence, MSR = 931.06 / 3 = 310.35

SSE = 194.99

Hence, MSE = 194.99 / (32 – 4 – 1) = 6.96

F = 310.35/6.96 = 44.57

Fα,k,n-k-1 = F0.05,3,29 = 2.92

F value > Fα,k,n-k-1

Therefore, there is sufficient evidence to reject the null hypothesis in favor of the alternative
hypothesis. At least one of the Bi is not equal to zero. Thus, at least one independent variable is
linearly related to y. Thus, linear regression model is valid.

Also, the validity of model can be tested using the following method

From the ANOVA table,

p-value (Significance F) = 8.65 X 10^-11

Clearly p-value < α = 0.05

Hence, null hypothesis rejected. Thus, linear regression model is valid.

4
5. Interpreting Regression Coefficients Table
Coefficients Standard t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error
Intercept 37.10550527 2.110815245 17.57875558 1.16194E-16 32.78169625 41.42931429 32.78169625 41.42931429
Displacement -0.000937009 0.010349745 -0.09053451 0.928507029 -0.0221375 0.020263482 -0.0221375 0.020263482
Horsepower -0.031156551 0.011435794 -2.724476326 0.010971032 -0.054581714 -0.007731388 -0.054581714 -0.007731388
Weight -3.800890583 1.066190639 -3.564925861 0.001330991 -5.984883102 -1.616898063 -5.984883102 -1.616898063

With the help of above table, the equation of estimated regression plane can be written as

y = 37.11 – 0.000937x1 - 0.0311x2 – 3.800x3

Interpreting the coefficients


a = 37.11: This is the intercept, the value of y when all the independent variables take the value zero.
However, it is not possible for any of the dependent variables to have a value of zero.

b1 = -0.000937: In this model, for each additional unit increase in displacement, mileage reduces by
0.000937 miles per gallon or mileage is negatively correlated to engine displacement.

b2 = -0.0311: In this model, for each additional unit increase in engine power, mileage decreases by
0.0311 or mileage and horsepower are negatively correlated.

b3 = -3.800 In this model, for each additional unit increase in weight mileage decrease by 3.8 miles
per gallon or mileage and car weight are negatively correlated

Test of statistical significance


Intercept: Intercept has estimated standard error of 2.11, t-statistic of 17.58 and p-value of
1.16*10^-16.

It is therefore statistically significant at significance level α = .05 as p < 0.05.

Displacement: b1 has estimated standard error of 0.10, t-statistic of -0.090 and p-value of 0.928.

It is therefore statistically insignificant at significance level α = .05 as p > 0.05.

Horsepower: b2 has estimated standard error of 0.011, t-statistic of -2.72 and p-value of 0.011

It is therefore statistically significant at significance level α = .05 as p < 0.05.

Weight: b3 has estimated standard error of 1.06, t-statistic of -3.56 and p-value of 0.0013

It is therefore statistically significant at significance level α = .05 as p < 0.05.

Confidence interval for coefficients

Confidence interval for coefficients are calculated as follows

CI = bi ± t (0.05, 32-3-1) × Standard Error(bi)

5
Confidence Interval for different Coefficients are

a: [32.78,41.43]
b1: [-.022,0.020]
b2: [-0.054,-0.0077]
b3: [-5.984,-1.617]

6. Conclusion
Multiple linear regression was performed to understand the correlation of displacement, engine
power, and car weight (independent variables) on car mileage (independent variable). The value of R
and R square suggest strong correlation between dependent and independent variables. The model
was validated using ANOVA analysis. The estimated regression equation is given by the following
equation

Mileage = 37.11 – 0.000937*displacement - 0.0311*horsepower – 3.800*weight

The results also suggest that impact of engine displacement on mileage is statistically insignificant
while engine power and car weight significantly impact the car. Confidence intervals for 95%
confidence level for different coefficients were also analyzed.

You might also like