0% found this document useful (0 votes)
12 views

Dummy Variable With Regression

1. The document performs a regression analysis to predict income based on years of PhD, years of service, and sex (female=1, male=0). 2. The regression results show that years of PhD and years of service significantly predict income, but sex is not a significant predictor. 3. The model explains 18.9% of the variation in income. It indicates that for each additional year of PhD, income increases by $1552.80 on average, and for each additional year of service, income decreases by $649.80 on average, holding other variables constant.

Uploaded by

kasuweda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Dummy Variable With Regression

1. The document performs a regression analysis to predict income based on years of PhD, years of service, and sex (female=1, male=0). 2. The regression results show that years of PhD and years of service significantly predict income, but sex is not a significant predictor. 3. The model explains 18.9% of the variation in income. It indicates that for each additional year of PhD, income increases by $1552.80 on average, and for each additional year of service, income decreases by $649.80 on average, holding other variables constant.

Uploaded by

kasuweda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Dummy variable with regression

Rstudio

#Import data set


salary=read.table("clipboard", header=TRUE)

#Create dummy variables


female<- ifelse(salary$sex=='Female',1,0)

#Create a new data table with dummy variables method 1


salary1=data.frame(years.phd=salary$yrs.since.phd,
years.service=salary$yrs.service,
sex.female=female,
income=salary$salary)

#Regression analysis
lml=lm(income~years.phd+years.service+sex.female, data=salary1)
summary(lml)

#check goodness of freedom


par(mfrow=c(2,2))
plot(lml)

results
#Import data set
> salary=read.table("clipboard", header=TRUE)
>
> #Create dummy variables
> female<- ifelse(salary$sex=='Female',1,0)
>
> #Create a new data table with dummy variables method 1
> salary1=data.frame(years.phd=salary$yrs.since.phd,
+ years.service=salary$yrs.service,
+ sex.female=female,
+ income=salary$salary)
>
> #Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)

Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)

Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 27280 on 393 degrees of freedom


Multiple R-squared: 0.1951, Adjusted R-squared: 0.189
F-statistic: 31.75 on 3 and 393 DF, p-value: < 2.2e-16
>
> #check goodness of freedom
> par(mfrow=c(2,2))
> plot(lml)

interpretation

#Regression analysis
> lml=lm(income~years.phd+years.service+sex.female, data=salary1)
> summary(lml)

Call:
lm(formula = income ~ years.phd + years.service + sex.female,
data = salary1)

Residuals:
Min 1Q Median 3Q Max
-79586 -19564 -3018 15071 105898

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 91333.0 2941.2 31.053 < 2e-16 ***
years.phd 1552.8 256.1 6.062 3.15e-09 ***
years.service -649.8 254.0 -2.558 0.0109 *
sex.female -8457.1 4656.1 -1.816 0.0701 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27280 on 393 degrees of freedom
Multiple R-squared: 0.1951, Adjusted R-squared: 0.189
F-statistic: 31.75 on 3 and 393 DF, p-value: < 2.2e-16

>

1.Dependent and independent variables

2.Significance of model
According to this results, p value of intercept, years.phd, years.service are less than 0.05 therefore these
3 variable significant. P value of sex.female is not less than 0.05 so it is not significant.

3.R squared vales


Adjusted R-squared is 0.189 (18.90%). So in the model,18.90% of total revenue
variable is described by the , years.phd, years.service and other variables.

4. If female is significant –
when we keep the other variable constant female workers getting 8457 incomes less than reference to
male workers

5.Regression model
Income=82875.9+15552.8(year.phd)-649.8(yrs. Service

Residual analysis
Accord g to bellow residual analysis underline assumption are not violated
Standardized residuals

Residuals vs Fitted Q-Q Residuals


1e+05

2 4

44 44
Residuals

365 365
-1e+05

-2

283 283

90000 120000 -3 -2 -1 0 1 2 3

Fitted values Theoretical Quantiles


Standardized residuals

Standardized residuals

Scale-Location Residuals vs Leverage


2.0

44
365 283 250331
2
1.0

-2
0.0

Cook's
283 distance

90000 120000 0.00 0.02 0.04

Fitted values Leverage

You might also like