0% found this document useful (0 votes)
4 views5 pages

GGGGH

The report discusses the process of fitting a multiple linear regression model using the World Bank dataset, focusing on the relationships between employment variables. It includes tests for heteroscedasticity, autocorrelation, and multicollinearity, concluding that there is no evidence of heteroscedasticity or multicollinearity, while autocorrelation was detected. The regression model indicates that both active and child labor are significant predictors of total employment in the firm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

GGGGH

The report discusses the process of fitting a multiple linear regression model using the World Bank dataset, focusing on the relationships between employment variables. It includes tests for heteroscedasticity, autocorrelation, and multicollinearity, concluding that there is no evidence of heteroscedasticity or multicollinearity, while autocorrelation was detected. The regression model indicates that both active and child labor are significant predictors of total employment in the firm.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Report on , “Fitting e regression line and heteroscedasticity,

autocorrelation and multicollinearity problem by R programming. “

 Problem Indicator:

1.Making a new csv file from given database (‘world bank dataset’).
2.Fitting a multiple linear regression line,
3.Test-1.Heterscedasticity
2.Autocorrelation
3.Multicollinearity problem

 Methodology:
 New CSV file:
Making a new CSV file or data set:
To making the new CSV file ,just selecting three variables that have not null value or
empty value from the world bank dataset.

.Fitting a multiple linear regression model and interpret the result:


Let, a multiple linear regression model (in our short CSV file, there are one dependent
variable and two independent variables.)

Worker in firm = β 0+
β 1∗worker ∈ firm, active person+ β 2∗worker ∈ firm, child labour+ εi
Here, Dependent variable +worker in firm
Regression coefficients = β 0, β 1 , β 2
Independent variable = worker in firm,active person &
worker in firm,child labour
Result:
Call:
lm(formula = worker_in_firm ~ worker_in_firm _active person +
worker_in_firm _child labour, data = data)

Residuals:
Min 1Q Median 3Q Max
-0.017017 -0.007594 -0.003687 0.008458 0.023209

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013846 0.023402 -0.592 0.564

Comment: The multiple linear regression model indicates that both active and child labour worker
in the firm are highly significant predictors of total employment in the firm. The intercept is - 0.0138,
which is not statistically significant (p = 0.564). The coefficient for active worker in the industry is
0.1876, which is statistically significant (p < 0.001). The coefficient for child labour in the firm is 0.8043,
which is statistically significant (p < 0.001). The model fits the data perfectly, as indicated by the R-
squared value of 1.000.

 Heteroscedasticity Test:
Heteroscedasticity:
When the underlying assumptions are doesn’t hold,then the problem
is known as heteroscedasticity.

1. X is measured without errors,


2. ui doesn’t depend on xi;i=1,2,3……….n
3. U Nn(0,σ 2.I) for all l and E(uiuj)={0 for i≠j}
0 for i≠j

Result:
studentized Breusch-Pagan test

data: model

BP = 1.123, df = 2, p-value = 0.570

Comment: The Breusch-Pagan test for heteroscedasticity yielded a p-value of


0.570. Since the p-value is greater than 0.05, we do not reject the null hypothesis
of homoscedasticity. Therefore, there is no evidence of heteroscedasticity in the
residuals of the model, indicating that the variance of the residuals is constant.

 Autocorrelation Test:
Autocorrelation, also known as serial correlation, is a statistical concept that measures the
correlation between a time series and a lagged version of itself. In other words,autocorrelation
assesses the degree of similarity between observations of a time series at different time points.

Durbin-Watson d test:

The test statistic is calculated based on the residuals of the regression model and is interpreted as
follows:
• d close to 2: Indicates no autocorrelation.
• d significantly less than 2: Indicates positive autocorrelation.
• d significantly greater than 2: Indicates negative autocorrelation.
Let, H0: there are no autocorrelation in the data.

Result:
Durbin-Watson test

data: model
DW = 1.3568, p-value = 0.02213
alternative hypothesis: true autocorrelation is greater than 0

Comment: The Durbin-Watson test for autocorrelation yielded a DW statistic of 1.357 and
a p-value of 0.02213. Since the p-value is less than 0.05, we can reject the null hypothesis
of no autocorrelation.

 Multicollinearity Test:
Multicolinearity refers to the high relation among idependent variables.
i.VIF( variance inflation factor): if the value of VIF is greater than 10,then we can say that there is
multicolinearty problem in the data.
Result:
Worker in firm ,active person worker in firm ,child labour
3.523043 3.523043

Comment: Since both VIF values are less than 10, this indicates no multicollinearity between the
predictors "worker in firm ,active person and " worker in firm, child labour".

You might also like