0% found this document useful (0 votes)
42 views9 pages

Application of Linear Regression

The document summarizes the application of a linear regression model to analyze factors influencing annual road accident deaths in several US states. It tests the assumptions of linearity, multivariate normality, lack of multicollinearity, lack of autocorrelation, and homoscedasticity. Most assumptions are met, though multivariate normality is only partially satisfied. A regression finds a statistically significant relationship between deaths and several predictors like number of drivers and road length. Population density and fuel consumption have negative relationships with deaths.

Uploaded by

wasifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views9 pages

Application of Linear Regression

The document summarizes the application of a linear regression model to analyze factors influencing annual road accident deaths in several US states. It tests the assumptions of linearity, multivariate normality, lack of multicollinearity, lack of autocorrelation, and homoscedasticity. Most assumptions are met, though multivariate normality is only partially satisfied. A regression finds a statistically significant relationship between deaths and several predictors like number of drivers and road length. Population density and fuel consumption have negative relationships with deaths.

Uploaded by

wasifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Jahangirnagar University

Department/Institute:
1 year 1 semester, Masters Final Examination-2020
st st

Assignment for Final Examination


Course No.# BUS 502
Course Title# Advanced Research Methodology
Name of the Student:
Class Roll No. #
07

Examination Roll No. #

Registration No. #

Academic Session #
2019-2020

Total number of written pages in the assignment

Date of Submission: 31st July, 2021

Instructions:
1. Don’t copy from other’s assignment. Copying from others will be punished severely.

2. The student must submit the assignment online (Google classroom/email/google form
etc.) as the course-teacher prescribes.

3. You must use your name# your EXAM ID only for naming your submitted file.
Application of linear Regression

Here the data is used from the analysis of annual deaths in road accidents across half of the US states.
The dependent variable is Number of Deaths in road accidents. Independent variables are number of
drivers dead, Population density in people per mile, Length of rural roads in 1000s of mile, Average
maximum daily temperature in January and Fuel Consumption in 10,000,000 US gallons per year.
A multiple regression analysis has been conducted using the data and before conducting a regression
analysis the following assumptions need to be tested.

Assumption 1: linear relationship


The relationship between independent and dependent variables needs to be linear. Here there is five
independent variables and to test this assumption the relation of each of the independent variable with the
dependent variable is shown separately.
Relation between road accidents and number of drivers dead

From the scatter diagram we can see that the relationship between IV and DV could be modeled by
straight line which means there is a linear relation between the variables.
Relation between road accidents and population density per mile
Here, from the scatter diagram, we can see that a straight line can be drawn through the scatter plots
though some of the data are far from the line which means they are associated with residuals. But most of
the data show that there is a linear relationship between the independent and dependent variable.
Relation between road accidents and length of rural roads

Here, from the scatter diagram we can see that the data plots are randomly distributed though some of the
data plots show some linearity which means a single line can be drawn through the data plots but few of
them is near to the line. So, it can be said that here the independent and dependent variables show little
linearity.
Relation between road accidents and Average maximum daily temperature

Here from the scatter diagram, we can see that the data plots are very randomly distributed and the
relationship between IV and DV cannot be modeled by a straight line suggesting that the relation between
the variables is not linear.
Relation between road accidents and Fuel consumption

Here from the scatter diagram, we can see that most of the plots are distributed following a straight line
which suggests that there is linear relationship between the independent and dependent variable.

Assumption 2: multivariate normality


Linear regression requires all variables to be multivariate normal. This assumption is tested with a P-P
plot. In the plot the closer the dots lie to the diagonal line, the closer to normal the residuals are
distributed.

Here from the P-P plot we can see that, data points hardly touch the line. So, it can be concluded that
though the data point staying near to the line, majority of the points are not touching the line and it
indicates that the data are not fully normally distributed but the data set is showing multivariate normality
to some extent.

Assumption 3: No or little multicollinearity


This assumption indicates that the predictors should not be highly correlated to each other or very little
correlation is accepted between the independent variables. Multicollinearity occurs when the independent
variables are not independent from each other. The error of the mean must be independent from the
independent variables. Here multicollinearity is tested into ways -
First from the correlation table we can test that the predictors are not highly correlated. Correlations of
more than 0.8 is problematic, here almost all of the predictors show correlation below 0.8. only one of the
variables is above 0.8 and this problem can be solved by removing that variable.
We can also test the assumption by coefficient table-

Here from the coefficient table VIF and tolerance statistic have been used to assess the assumption. For
the assumption to be met VIF score is preferred to be below 10 and tolerance score to be above 0.2 and
from the output we can see that all the VIF score is below 10 and tolerance score is above 0.2. so, it can
be said that there is no multicollinearity in the data.
Assumption 4: No auto-correlation
Autocorrelation occurs when the residuals are not independent from each other or it can be said that when
the value of y(x+1) is not independent from the value of y(x). Here, autocorrelation has been tested
through Durbin-Watson test. The Durbin Watson value, D can assume values between 0 and 4. When the
value of D is around 2 it means there is no auto-correlation. If the value is less than 2 then there is a
positive correlation and if the value is greater than 2 then there is a negative auto-correlation.

Here, from the model summary we can see that the Durbin-Watson value is 2.07 which is around 2 and
this indicates that there is no auto-correlation in the data set.

Assumption 5: Homoscedasticity
For linear regression it is important for the data to be homoscedastic which means the data should have
similar dispersion from the standard line, showing similar variability of their dispersion from the standard
line. So basically, it is an assumption that the variation in the residuals is similar at each point across the
model. It can also be said that the spread of the residuals should be fairly constant at each point of the
independent variables. The graph plots the standardized values the model can predict, against the
standardized residuals obtained. If the predicted values increase, the variation in the residual should show
roughly similar results. If everything goes right it should look like random array of dots but if the graph
looks like a funnel shape, then it is likely that the assumption is violated or heteroscedastic.
From the scattered data plot, it is a bit difficult to predict homoscedasticity as the data set was small in
size. But it generally appears to be more random than funneled, so it can be said that the data set is not
fully heteroscedastic, to some extent it shows homoscedasticity.

Regression analysis:
Before conducting regression analysis, the data set should meet the above-mentioned assumptions. From
the multiple assumption tests, it has been found that the data set is suitable for regression analysis as it has
met all the assumptions.
Hypothesis development
Null hypothesis, Ho: There is no relation between annual deaths in road accidents in US and are number
of drivers dead, Population density, Length of rural roads, Average maximum daily
temperature in January and Fuel Consumption.
Hypothesis test:
For testing hypothesis, a multiple regression analysis has been conducted.
From the above table it is clearly shown that the regression model is statistically significant as the p value
is lower than .05 (.0 < .05), thus the null hypothesis gets rejected and it can be concluded that there is a
relation between annual deaths in road accidents in US and are number of drivers dead, Population
density, Length of rural roads, Average maximum daily temperature in January and Fuel Consumption.

The beta coefficient tells us how many units or measures of dependent variable increases for a single unit
of the independent variables. Here from the coefficient table, we can see that the beta value for population
density and fuel consumption is negative which means they have a negative relation with annual death
rates in road accidents in U.S. so it can be said that increase in any of the variable except population
density and fuel consumption will result in increase of the death rates by road accidents.

You might also like