Problem Set 7
Problem Set 7
1. Facing claims that city police were engaging in racial profiling, the city of
Grand Rapids hired a consulting firm to perform a study on traffic stops in
the city. The results of this study were released in April 2017, and the
consulting firm’s report is posted on Canvas in the Problem Sets folder.
In short, the study found that Black motorists were stopped at “close to
twice the rate that would be expected given their presence in the traffic.”
Hi, I apologize for submitting this problem set so late. I’d finished all
problems a while ago, except for one, and finally got around to it.
2
Measurement Methodology
The consulting firm measured the benchmark data by using a few methods to
reduce overall bias:
They also trained surveyors to visually identify the race and ethnicity of
drivers and recorded these driver demographics using a standardized
form.
Reliability
The method seems to be highly reliable since they took measures to avoid bias
and ensure proper data acquisition through:
Validity
3
The results make sense and align with other studies’ findings. However, visual
identification of race is still inherently subjective, which would vary between
surveyors.
(b) According to the data (p. 56), at the corner of Bridge &
Stocking, 15.0% of the 2,383 drivers were Black. Out of 673
traffic stops made in that vicinity, 32.8% of the drivers were
Black. Construct a 99% confidence interval for the difference
of proportions. Be sure to use the correct standard error for a
confidence interval.
CI = (p1 - p2) ± z * SE
CI = 0.178 ± .0525
CI = (0.1255, 0.2305)
Z = ( p1 - p2 ) / SE
Z = 8.725
Since |z| > 2.576 and –value < 0.01, we reject the null hypothesis.
This means that Black drives are stopped at a higher rate than their
presence in the general driving population would show.
5
2. One way that analysts measure the level of education in a country is to
calculate the number of years of school, on average, that people in the
country have completed. Suppose that, across a sample of democracies,
the mean of this variable is 8.2 years (n=81; s=2.8), and in a sample of
non-democracies the mean is 7.1 years (n=31; s=3.1).
o (𝜇1 − 𝜇2 = 0)
6
o (𝜇1 − 𝜇2 ≠ 0)
Using calculator…
s = 2.86
Calculate standard
error (SE) of difference
in means
Using calculator…
SE = 0.61
Calculate t-statistic
t = 1.81
Calculate standard
xˉ1 = 8.2 (mean for democracies)
error
xˉ2 = 7.1 (mean for non-democracies)
n1 = 81, s1 = 2.8
n2 = 31, s2 = 3.1
Using calculator…
SE = 0.64
8
Calculate degrees of
freedom
Using calculator…
df = 50
Comparison Standard error
Degrees of freedom
9
Unequal variance confidence intervals tend to
be wider due to the larger standard error,
meaning the unequal variance assumption is
more cautious and less likely to result in Type
1 errors/false positives.
df = 50
CI = (-0.182, 2.382)
Interpretation We are confident that in 95% of samples we take, the true
difference in the average years of education between
democracies and non-democracies is between -0.182 and
2.382 years.
1
0
3. Use the dataset InfantMortality for this question. In this dataset, the
variable inf- mort2000 is the level of infant mortality for each country in
the year 2000. The variable infmort2010 is the level of infant mortality for
the same set of countries in the year 2010.
So that you can see the result of this command, browse your results
with either browse in Stata or View(InfMort) in R. When done, use
commands to obtain the mean, standard deviation, and sample
size for IMdiff.
Mean = -12.08
o This mean indicates that on average, the 2010 infant mortality rate
was 12.08 deaths per 1,000 live births lower than in 2000 for the
countries in this dataset.
load("InfantMortality")
load("InfantMortality")
load("InfantMortality.rdata")
InfMort$IMdiff<-InfMort$infmort2010-InfMort$infmort2000
1
1
View(InfMort)
mean(IMdiff)
mean(InfMort$IMdiff)
sd(InfMort$IMdiff)
sd(InfMort$IMdiff, na.rm = FALSE)
table(InfMort$IMdiff)
> load("InfantMortality")
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'InfantMortality', probable reason 'No such file or directory'
> setwd("C:/Users/sunny/OneDrive - Umich/2024-2025/PUBPOL 529 Augmented
Statistics/Working Directory")
> load("InfantMortality")
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'InfantMortality', probable reason 'No such file or directory'
> load("InfantMortality.rdata")
> InfMort$IMdiff<-InfMort$infmort2010-InfMort$infmort2000
> View(InfMort)
> mean(IMdiff)
Error: object 'IMdiff' not found
> mean(InfMort$IMdiff)
[1] -12.07809
> sd(InfMort$IMdiff)
[1] 12.15155
> sd(InfMort$IMdiff, na.rm = FALSE)
[1] 12.15155
> table(InfMort$IMdiff)
1
2
1 2 1 3
-11.2 -10.7 -10.6 -10.5
1 1 2 1
-10.4 -10.2 -10.1 -10
2 1 2 1
-9.5 -8.4 -8.2 -8.1
1 1 1 1
-7.80000000000001 -7.6 -7.5 -7.4
1 1 1 1
-7.2 -6.7 -6.6 -6.4
2 2 2 1
-6.3 -6.2 -5.8 -5.7
2 1 2 2
-5.6 -5.5 -5.3 -5.2
1 2 1 1
-5.1 -5 -4.8 -4.6
2 2 1 1
-4.5 -4.4 -4.2 -4.1
3 1 2 1
-4 -3.7 -3.69999999999999 -3.4
2 1 1 1
-3.1 -3 -2.9 -2.7
2 1 1 1
-2.6 -2.5 -2.4 -2.3
1 2 2 1
-2.2 -2.1 -2.09999999999999 -2
1 1 1 3
-1.9 -1.8 -1.7 -1.6
1 2 2 1
-1.5 -1.4 -1.3 -1.2
2 2 5 3
-1.1 -1 -0.9 -0.8
2 2 5 3
-0.5 -0.3 0.0999999999999996 0.100000000000001
1 1 1 1
(b) We learned that the mean of the differences is the same as the
difference of the means. Is that true? Use commands to find
the means of infmort2010 and infmort2000, then calculate the
difference between them. Compare this to the mean of IMdiff
that you found above.
1
3
o Difference of means = difference between 2010 rates’ mean and
2000 rates’ mean
mean(infmort2010)
mean(InfMort$infmort2010)
mean(InfMort$infmort2000)
> mean(infmort2010)
Error: object 'infmort2010' not found
> mean(InfMort$infmort2010)
[1] 29.09663
> mean(InfMort$infmort2000)
[1] 41.17472
(c) This insight from part (b) tells us that testing whether the mean of
IMdiff=0 is the same as testing whether infmort2010 and infmort2000
have different means. By hand, perform the test of whether the
mean of IMdiff=0. You have the information you need to calculate
the standard error from your summary statistics in part (a). It’s just a
one-sample test of statistical significance. Produce a t statistic and
p-value for this test.
1
4
In Stata, the command is: ttest infmort2010 = infmort2000
At a 95% confidence interval of (10.28, 13.88), the true mean difference is not equal
to 0.
Paired t-test
1
5
4. The 2016 American National Election Studies asked respondents whether
the federal government should make it easier or more difficult to buy a
gun. The possible answers were: more difficult, keep rules the same,
easier. It also asked whether they had a four- year college degree: yes,
no. The table below shows the joint frequency distribution of the
responses.
College Degree?
Gun Yes No Total
Purchases
More Difficult 1,01 1,24 2,259
4 5
Keep the 530 1,17 1,70
Same 1 1
Easier 86 189 275
Total 1,630 2,605 4,235
(a) Calculate fe, the expected frequency in each cell under the
scenario that the variables are independent.
df = (# of rows - 1) * (# of columns - 1)
df = ( 3 - 1 ) * ( 2 - 1 ) = 2
1
6
Based on 2 degrees of freedom and a = .05, the critical value is 5.99.
(c) Calculate the χ2 statistic from the data presented above. Can
you reject the null hypothesis?
Calculating each term for χ2 using the observed frequencies (given) and the
expected frequencies:
Since the chi-square statistic of 83.71 is greater than the critical value of 5.99,
the null hypothesis can be rejected.
There is strong evidence that having a college degree and opinions on whether
the federal government should make buying a gun easier or more difficult are not
independent of each other at the 0.05 significance level.
1
7
1
8