Ankit Bansal CGT19005
Ankit Bansal CGT19005
Submitted By:
Ankit Bansal - CGT19005
1. Problem Definition
Following is the data corresponding to different car models in USA.
1
2. Multiple Linear Regression
To understand the correlation between the variables, multiple linear regression was done using data
analysis tool of MS Excel. Mileage was taken as dependent variable (Y) and, Displacement (X1),
horsepower(X2) and weight(X3) were taken as independent variables. Confidence Level was set at
95%. Following is the summary of results obtained.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.909305308
R Square 0.826836142
Adjusted R Square 0.808282872
Standard Error 2.638930213
Observations 32
ANOVA
df SS MS F Significance F
Regression 3 931.0565128 310.3521709 44.56551986 8.64959E-11
Residual 28 194.9906747 6.963952669
Total 31 1126.047188
Coefficients Standard t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error
Intercept 37.10550527 2.110815245 17.57875558 1.16194E-16 32.78169625 41.42931429 32.78169625 41.42931429
Displacement - 0.010349745 -0.09053451 0.928507029 -0.0221375 0.020263482 -0.0221375 0.020263482
0.000937009
Horsepower - 0.011435794 - 0.010971032 -0.054581714 -0.007731388 -0.054581714 -0.007731388
0.031156551 2.724476326
Weight - 1.066190639 - 0.001330991 -5.984883102 -1.616898063 -5.984883102 -1.616898063
3.800890583 3.564925861
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.909305308
R Square 0.826836142
Adjusted R Square 0.808282872
Standard Error 2.638930213
Observations 32
Multiple R: Multiple R value of 0.91 is the correlation coefficient and shows the linear relationship
between the dependent and the independent variables. It is closer to 1 and therefore shows a very
strong correlation.
2
R Square: This is R2, the Coefficient of Determination. It tells us how many points fall on the
regression line. for example, 80% means that 80% of the variation of y-values around the
mean is explained by the x-values. In other words, 80% of the values fit the model.
R2 = 1- SSE/∑(yi-ybar)
Where,
This means that 83% of the variation in the measure of mileage is explained by
the linear regression model formulated. In other words, 83% of the variation of yi
around ybar (its mean) is explained by the regressors (Xs) i.e. displacement, horsepower and weight.
Adjusted R Square: The adjusted R2 adjusts for the number of terms in a model. When adjusted
n = no. of observations = 32
Standard error (Sε): An estimate of the standard deviation of the error e. The standard error of the
regression is the precision that the regression coefficient is measured.
Sε = √(SSE/(n-k-1)
Where,
SSE = error of sum of squares = Σ(y – ŷ)2
n = no. of observations = 32
3
ANOVA is used to investigate if dependent variable is linearly correlated to any of the independent
variables. To understand this, we follow the process of hypothesis testing where
H0: B1 = B2 = B3 = 0
F = MSR / MSE
Where,
SSR = 931.06
SSE = 194.99
F = 310.35/6.96 = 44.57
Therefore, there is sufficient evidence to reject the null hypothesis in favor of the alternative
hypothesis. At least one of the Bi is not equal to zero. Thus, at least one independent variable is
linearly related to y. Thus, linear regression model is valid.
Also, the validity of model can be tested using the following method
4
5. Interpreting Regression Coefficients Table
Coefficients Standard t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error
Intercept 37.10550527 2.110815245 17.57875558 1.16194E-16 32.78169625 41.42931429 32.78169625 41.42931429
Displacement -0.000937009 0.010349745 -0.09053451 0.928507029 -0.0221375 0.020263482 -0.0221375 0.020263482
Horsepower -0.031156551 0.011435794 -2.724476326 0.010971032 -0.054581714 -0.007731388 -0.054581714 -0.007731388
Weight -3.800890583 1.066190639 -3.564925861 0.001330991 -5.984883102 -1.616898063 -5.984883102 -1.616898063
With the help of above table, the equation of estimated regression plane can be written as
b1 = -0.000937: In this model, for each additional unit increase in displacement, mileage reduces by
0.000937 miles per gallon or mileage is negatively correlated to engine displacement.
b2 = -0.0311: In this model, for each additional unit increase in engine power, mileage decreases by
0.0311 or mileage and horsepower are negatively correlated.
b3 = -3.800 In this model, for each additional unit increase in weight mileage decrease by 3.8 miles
per gallon or mileage and car weight are negatively correlated
Displacement: b1 has estimated standard error of 0.10, t-statistic of -0.090 and p-value of 0.928.
Horsepower: b2 has estimated standard error of 0.011, t-statistic of -2.72 and p-value of 0.011
Weight: b3 has estimated standard error of 1.06, t-statistic of -3.56 and p-value of 0.0013
5
Confidence Interval for different Coefficients are
a: [32.78,41.43]
b1: [-.022,0.020]
b2: [-0.054,-0.0077]
b3: [-5.984,-1.617]
6. Conclusion
Multiple linear regression was performed to understand the correlation of displacement, engine
power, and car weight (independent variables) on car mileage (independent variable). The value of R
and R square suggest strong correlation between dependent and independent variables. The model
was validated using ANOVA analysis. The estimated regression equation is given by the following
equation
The results also suggest that impact of engine displacement on mileage is statistically insignificant
while engine power and car weight significantly impact the car. Confidence intervals for 95%
confidence level for different coefficients were also analyzed.