Dummy Variable With Regression
Dummy Variable With Regression
Rstudio
#Regression analysis
lml=lm(income~years.phd+years.service+sex.female, data=salary1)
summary(lml)
results
#Import data set
> salary=read.table("clipboard", header=TRUE)
>
> #Create dummy variables
> female<- ifelse(salary$sex=='Female',1,0)
>
> #Create a new data table with dummy variables method 1
> salary1=data.frame(years.phd=salary$yrs.since.phd,
+ years.service=salary$yrs.service,
+ sex.female=female,
+ income=salary$salary)
>
> #Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)
Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)
Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
interpretation
#Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)
Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)
Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27280 on 393 degrees of freedom
Multiple R-squared: 0.1951, Adjusted R-squared: 0.189
F-statistic: 31.75 on 3 and 393 DF, p-value: < 2.2e-16
>
2.Significance of model
According to this results, p value of intercept, years.phd, years.service are less than 0.05 therefore these
3 variable significant. P value of sex.female is not less than 0.05 so it is not significant.
4. If female is significant –
when we keep the other variable constant female workers getting 8457 incomes less than reference to
male workers
5.Regression model
Income=82875.9+15552.8(year.phd)-649.8(yrs. Service
Residual analysis
Accord g to bellow residual analysis underline assumption are not violated
Standardized residuals
2 4
44 44
Residuals
365 365
-1e+05
-2
283 283
90000 120000 -3 -2 -1 0 1 2 3
Standardized residuals
44
365 283 250331
2
1.0
-2
0.0
Cook's
283 distance