Assignment 2
Assignment 2
1) You previously classified the variables in the gapminder dataset as numerical and continuous, numerical
and discrete, ordinal, or nominal.
a) For the nominal variable “continent”, what are the categories? (1 mark)
ANSWER:
Asia
Africa
Americas
Europe
Oceania
b) Set up a table to show how dummy or indicator variables could be used as numeric stand-ins for
the continent variable in a regression model. (2 marks)
ANSWER:
2) The dataset called “us crime stats” is available on OWL. It gives a variety of variables by US state at two
time points 10 years apart:
Variable Description
CrimeRate Crime rate (number of offences per million population)
Youth Young males (number of males aged 18-24 per 1000)
Southern Southern state 1 = yes, 0 = no
Education Education time (average number of years schooling up to 25)
ExpenditureYear0 Expenditure (per capita expenditure on police)
LabourForce Youth labour force (males employed 18-24 per 1000)
Males Males (per 1000 females)
MoreMales More males identified per 1000 females 1 = yes, 0 = no
StateSize State size (in hundred thousands)
YouthUnemployment Youth Unemployment (number of males aged 18-24 per 1000)
MatureUnemployment Mature Unemployment (number of males aged 35-39 per 1000)
HighYouthUnemploy High Youth Unemployment 1 = yes, 0 = no (high if Youth >3*Mature )
Wage Wage (median weekly wage)
BelowWage Below Wage (number of families below half wage per 1000)
Note: The same variables are collected 10 years later and have 10 on the end.
Using R, make a linear regression model with CrimeRate as the outcome variable and Youth, Wage, and
ExpenditureYear0 as the predictors. (Use only data for the initial time point)
ANSWER:
library(readxl)
uscrimestats <- read_excel("Desktop/CLASSES/MMASc/SEM 2/Comp/uscrimestats.xlsx")
View(uscrimestats)
uscrimestatsinitial <- subset(uscrimestats, select = c("CrimeRate", "Youth", "Wage", "ExpenditureYear0"))
model <- lm(CrimeRate ~ Youth + Wage + ExpenditureYear0, data = uscrimestatsinitial)
summary(model)
ANSWER:
c) Are your predictor variables significant predictors of the outcome? Explain. (2 marks)
ANSWER:
P Values:
Intercept - 0.14450
Youth - 0.00993
Wage - 0.78440
ExpenditureYear - <0.001
In comparing these values with the significance level 0.05, Youth and ExpenditureYear0 appear to be statistically
significant predictors. The Intercept and Wage do not appear to be statistically significant predictors at the 0.05
significance level.
ANSWER:
The significant predictors, Youth and ExpenditureYear0, are related to CrimeRate as follows:
Youth: For each unit increase in the Youth variable, we expect the CrimeRate to increase by approximately 0.886
units, holding other predictors constant.
ExpenditureYear0: For each unit increase in the ExpenditureYear0 variable, we expect the CrimeRate to increase
by approximately 0.776 units, holding other predictors constant.
These coefficients indicate the expected change in CrimeRate associated with a one-unit increase in the
corresponding predictor, while keeping the other predictors constant. The positive coefficients suggest a positive
relationship, implying that higher values of Youth and ExpenditureYear0 are associated with higher CrimeRates.
e) Remove the Wage predictor from the model. How does the Multiple R-squared value change?
How does the Adjusted R-squared value change? Why does this happen? (2 marks)
ANSWER:
The model without the Wage predictor still captures a similar amount of variability in CrimeRate.
The Adjusted R-squared improvement suggests that the model without Wage may be more parsimonious
and potentially better at generalizing to new data.
3) You learned that slope (e.g. β1) is mathematically related to the correlation coefficient. This is the
Pearson’s correlation coefficient since both Pearson’s R and regression depend on linear relationships and
numerical variables. Determine whether the relationship between CrimeRate and ExpenditureYear0 is
linear, but only for the Southern states. Paste you script and findings below. (3 mark)
ANSWER:
correlation
The correlation coefficient is 0.7275851. Therefore, the relationship between CrimeRate and ExpenditureYear0 in
SouthernStates is reasonably linear.
4) Page 64 of your textbook lists 4 points describing different scenarios about confidence intervals.
Summarize these points in your own words, as a list or table or figure, for easier interpretation. If your
question is more than 75% identical to another groups, both groups will receive -10 on this assignment. (4
marks)
ANSWER: