0% found this document useful (0 votes)
31 views

Dummy Variable With Regression

1. The document performs a regression analysis to predict income based on years of PhD, years of service, and sex (female=1, male=0). 2. The regression results show that years of PhD and years of service significantly predict income, but sex is not a significant predictor. 3. The model explains 18.9% of the variation in income. It indicates that for each additional year of PhD, income increases by $1552.80 on average, and for each additional year of service, income decreases by $649.80 on average, holding other variables constant.

Uploaded by

kasuweda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Dummy Variable With Regression

1. The document performs a regression analysis to predict income based on years of PhD, years of service, and sex (female=1, male=0). 2. The regression results show that years of PhD and years of service significantly predict income, but sex is not a significant predictor. 3. The model explains 18.9% of the variation in income. It indicates that for each additional year of PhD, income increases by $1552.80 on average, and for each additional year of service, income decreases by $649.80 on average, holding other variables constant.

Uploaded by

kasuweda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Dummy variable with regression

Rstudio

#Import data set


salary=read.table("clipboard", header=TRUE)

#Create dummy variables


female<- ifelse(salary$sex=='Female',1,0)

#Create a new data table with dummy variables method 1


salary1=data.frame(years.phd=salary$yrs.since.phd,
years.service=salary$yrs.service,
sex.female=female,
income=salary$salary)

#Regression analysis
lml=lm(income~years.phd+years.service+sex.female, data=salary1)
summary(lml)

#check goodness of freedom


par(mfrow=c(2,2))
plot(lml)

results
#Import data set
> salary=read.table("clipboard", header=TRUE)
>
> #Create dummy variables
> female<- ifelse(salary$sex=='Female',1,0)
>
> #Create a new data table with dummy variables method 1
> salary1=data.frame(years.phd=salary$yrs.since.phd,
+ years.service=salary$yrs.service,
+ sex.female=female,
+ income=salary$salary)
>
> #Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)

Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)

Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27280 on 393 degrees of freedom


Multiple R-squared: 0.1951, Adjusted R-squared: 0.189
F-statistic: 31.75 on 3 and 393 DF, p-value: < 2.2e-16
>
> #check goodness of freedom
> par(mfrow=c(2,2))
> plot(lml)

interpretation

#Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)

Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)

Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27280 on 393 degrees of freedom
Multiple R-squared: 0.1951, Adjusted R-squared: 0.189
F-statistic: 31.75 on 3 and 393 DF, p-value: < 2.2e-16

>

1.Dependent and independent variables

2.Significance of model
According to this results, p value of intercept, years.phd, years.service are less than 0.05 therefore these
3 variable significant. P value of sex.female is not less than 0.05 so it is not significant.

3.R squared vales


Adjusted R-squared is 0.189 (18.90%). So in the model,18.90% of total revenue
variable is described by the , years.phd, years.service and other variables.

4. If female is significant –
when we keep the other variable constant female workers getting 8457 incomes less than reference to
male workers

5.Regression model
Income=82875.9+15552.8(year.phd)-649.8(yrs. Service

Residual analysis
Accord g to bellow residual analysis underline assumption are not violated
Standardized residuals

Residuals vs Fitted Q-Q Residuals


1e+05

2 4

44 44
Residuals

365 365
-1e+05

-2

283 283

90000 120000 -3 -2 -1 0 1 2 3

Fitted values Theoretical Quantiles


Standardized residuals

Standardized residuals

Scale-Location Residuals vs Leverage


2.0

44
365 283 250331
2
1.0

-2
0.0

Cook's
283 distance

90000 120000 0.00 0.02 0.04

Fitted values Leverage

You might also like