GGGGH
GGGGH
Problem Indicator:
1.Making a new csv file from given database (‘world bank dataset’).
2.Fitting a multiple linear regression line,
3.Test-1.Heterscedasticity
2.Autocorrelation
3.Multicollinearity problem
Methodology:
New CSV file:
Making a new CSV file or data set:
To making the new CSV file ,just selecting three variables that have not null value or
empty value from the world bank dataset.
Worker in firm = β 0+
β 1∗worker ∈ firm, active person+ β 2∗worker ∈ firm, child labour+ εi
Here, Dependent variable +worker in firm
Regression coefficients = β 0, β 1 , β 2
Independent variable = worker in firm,active person &
worker in firm,child labour
Result:
Call:
lm(formula = worker_in_firm ~ worker_in_firm _active person +
worker_in_firm _child labour, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.017017 -0.007594 -0.003687 0.008458 0.023209
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.013846 0.023402 -0.592 0.564
Comment: The multiple linear regression model indicates that both active and child labour worker
in the firm are highly significant predictors of total employment in the firm. The intercept is - 0.0138,
which is not statistically significant (p = 0.564). The coefficient for active worker in the industry is
0.1876, which is statistically significant (p < 0.001). The coefficient for child labour in the firm is 0.8043,
which is statistically significant (p < 0.001). The model fits the data perfectly, as indicated by the R-
squared value of 1.000.
Heteroscedasticity Test:
Heteroscedasticity:
When the underlying assumptions are doesn’t hold,then the problem
is known as heteroscedasticity.
Result:
studentized Breusch-Pagan test
data: model
Autocorrelation Test:
Autocorrelation, also known as serial correlation, is a statistical concept that measures the
correlation between a time series and a lagged version of itself. In other words,autocorrelation
assesses the degree of similarity between observations of a time series at different time points.
Durbin-Watson d test:
The test statistic is calculated based on the residuals of the regression model and is interpreted as
follows:
• d close to 2: Indicates no autocorrelation.
• d significantly less than 2: Indicates positive autocorrelation.
• d significantly greater than 2: Indicates negative autocorrelation.
Let, H0: there are no autocorrelation in the data.
Result:
Durbin-Watson test
data: model
DW = 1.3568, p-value = 0.02213
alternative hypothesis: true autocorrelation is greater than 0
Comment: The Durbin-Watson test for autocorrelation yielded a DW statistic of 1.357 and
a p-value of 0.02213. Since the p-value is less than 0.05, we can reject the null hypothesis
of no autocorrelation.
Multicollinearity Test:
Multicolinearity refers to the high relation among idependent variables.
i.VIF( variance inflation factor): if the value of VIF is greater than 10,then we can say that there is
multicolinearty problem in the data.
Result:
Worker in firm ,active person worker in firm ,child labour
3.523043 3.523043
Comment: Since both VIF values are less than 10, this indicates no multicollinearity between the
predictors "worker in firm ,active person and " worker in firm, child labour".