CONSCI 3940 Final Exam Review Questions
11/14/24 Questions
1. Define the following
a. Observational units – an observational unit is the object from
which data are collected. For example, a household can be an
observational unit and data collected can be household
income, total household members, and the ages of all those
who live in the household.
b. Random sample – a random sample is a sample of size n that has
an equally likely chance of being selected as any other sample
of the same size (n). For example, consider a researcher that
wants to study educational status in the US. This researcher
randomly chooses 1,000 individuals via a Census directory.
c. Target population – the target population is the set of individuals
for which a researcher intends to make inferences. It is also
the population from which a sample is drawn. For example, if a
researcher wishes to study the factors that influence teens to
drop out of high school, the target population is all high-
school-age teens that are enrolled or have been enrolled in HS.
2. Define each type of data and give an example
a. Categorical – categorical data are groupings of observations
that have no ranking. For example, race groups individuals
into certain categories and there is no specific order to the
groups.
b. Ordinal – Ordinal data are like categorical data but have an
order. For example, school rank – freshman, sophomore, junior,
senior – group individuals into their respective level of school
and there is a clear order.
c. Interval – interval data are numeric data for which differences
between the values is meaningful, but there is no natural zero.
Temperature is a great example because it is only a scale on
the number line and zero degrees does not mean that there is
no temperature.
d. Ratio – Ratio data are numeric data that have a natural zero.
Income is an example. Your income can be zero, and that
means that you have no income.
3. Define each data structure and give an example
a. Cross-section – multiple observational units, one period of time.
An example would be the Football data we worked with
because each observational unit is a game and we only have
one season of data.
b. Time-series – Time series data are when there is one
observational unit and multiple periods of time. For example,
Apple stock has an opening, closing, high, and low price every
week day and we can track all of those variables for Apple
stock over many days.
c. Panel – Panel data is when there are multiple observational
units and multiple periods of time. For example, if we collected
open, close, high, and low prices each week day for a year for
Apple, Dell, and Microsoft stocks, we would have a panel of
data.
4. What is the difference between univariate and multivariate data? –
Univariate data has only one variable whereas multivariate has
many variables.
11/15/24
Review descriptive statistics
1. What is the formula for the sample mean (average)? X =
∑ xi
n
2. What is the formula for the standard deviation of the sample mean?
s=
∑ ( x i− X )
2
n−1
x
3. What is the formula for the sample proportion? p= , where x is the
n
number of successes from a binomial distributed random variable.
Proportions range from 0 to 1.
4. What is the formula for the standard deviation of the sample proportion?
SD ( p )=√ p (1− p)
5. What is a correlation coefficient, and what is its range? When are two
variables highly positively correlated?
a. Correlation coefficient measures the linear association
between two variables.
b. Range is from -1 to 1.
c. Two variables are highly positively correlated with their
correlation coefficient is greater than 0.7.
6. What does it mean when data are skewed to the right? Left? When data are
skewed to the right, the data have a long right tail. When data are
skewed to the left, the data have a long left tail.
a. When data are skewed right, is the mean larger than the median? Yes,
because the mean is influenced by extreme values
b. When data are skewed left, is the mean larger than the median? No, it
is smaller than the median because the mean is influence by
extreme values.
11/19/24
Standard error and confidence intervals for the mean
Use the information in the table below to calculate the standard error and confidence intervals for each mean. Then
interpret each confidence interval. To practice for the exam, I suggest you use the calculator you plan to bring to the
exam.
Note that to generate the sample means, standard deviations, and margins of error, I used the 2019Q3 data from
the CES Interview 2018-2019 file.
Standard
Xbar Std Dev N MOE LCI UCI
Error
s 806.17 966.31- 966.31+21.6
Food at home $966.31 $806.17 5337 $21.63 =
√ n √5337 21.63 3
Food away from home $503.24 $746.90 5337 $20.04
Men and Boys Clothing Expenditures $22.97 $97.57 5337 $2.62
Women and Girls Clothing
$35.46 $192.29 5337 $5.16
Expenditures
Footwear $28.18 $85.48 5337 $2.29
Apparel Purchases $29.01 $224.93 5337 $6.04
Entertainment $91.53 $986.01 5337 $26.46
$1,552.1
Car $140.85 5337 $41.65
8
Solution Table
Standar
Xbar Std Dev N MOE LCI UCI
d Error
$966.3 $944.6 $987.9
Food at home $806.17 5337 $21.63 $11.04
1 8 4
$503.2 $483.1 $523.2
Food away from home $746.90 5337 $20.04 $10.22
4 9 8
Men and Boys Clothing
$22.97 $97.57 5337 $2.62 $1.34 $20.35 $25.59
Expenditures
Women and Girls Clothing
$35.46 $192.29 5337 $5.16 $2.63 $30.30 $40.62
Expenditures
Footwear $28.18 $85.48 5337 $2.29 $1.17 $25.88 $30.47
Apparel Purchases $29.01 $224.93 5337 $6.04 $3.08 $22.98 $35.05
$117.9
Entertainment $91.53 $986.01 5337 $26.46 $13.50 $65.07
9
$140.8 $1,552.1 $182.5
Car 5337 $41.65 $21.25 $99.20
5 8 1
11/21/24
Use the information in the table below to calculate the relative frequency and
cumulative relative frequency for each race category. Note that to generate the
sample frequencies I used the 2019Q3 data from the CES Interview 2018-2019 file.
Race Frequency Relative Cumulative
Frequency Relative
(proportion) Frequency
283
Asian 283 =0.053 0.053
5337
570
Black 570 =0.107 0.053+0.107=0.16
5337
Native
39
American
Other Race 80
Pacific
24
Islander
White 4341
11/22/24
Use the information in the table below to calculate the relative frequency, standard deviation, standard error, and
lower and upper limits on a 95% confidence interval for each race category. Note that to generate the sample
frequencies I used the 2019Q3 data from the CES Interview 2018-2019 file.
Relative Standard Standard
Race Frequency N MOE LCI UCI
Frequency Deviation Error
Asian 283 0.053 5337 0.0060
Black 570 5337 0.0083
Native
39 5337 0.0023
American
Other Race 80 5337 0.0033
Pacific
24 5337 0.0018
Islander
White 4341 5337 0.0105
Solution Table for the questions on 11/21/24 and 11/22/24
Standar
Relative Cumulativ
Frequenc d Standar
Race Frequenc e Relative N MOE LCI UCI
y Deviatio d Error
y Frequency
n
Asian 283 0.053 0.053 5337 0.224 0.003 0.006 0.047 0.059
Black 570 0.107 0.160 5337 0.309 0.004 0.008 0.099 0.115
Native
39 0.007 0.167 5337 0.085 0.001 0.002 0.005 0.010
American
Other Race 80 0.015 0.182 5337 0.122 0.002 0.003 0.012 0.018
Pacific
24 0.004 0.187 5337 0.067 0.001 0.002 0.003 0.006
Islander
White 4341 0.813 1 5337 0.390 0.005 0.010 0.803 0.824
11/26/24
Hypothesis tests
1. What are the 5 steps that we followed to conduct a hypothesis test?
a. Determine the null and alternative hypotheses
b. Determine alpha, the level of significance
c. Determine the type of test, calculate the critical values, and state the decision rule
d. Calculate the test statistic
e. Evaluate the outcome based on the test statistic and interpret the result.
2. Define the null and alternative hypotheses? The null hypothesis is what is given or what we assume to
be true and the alternative hypothesis is what we want to prove.
3. The level of significance of a test, α , is the probability of which type of error? Type I error, which is
incorrectly rejecting the null hypothesis.
4. What are the decision rules for a lower-tail, upper-tail, and two-tail test?
a. Lower-tail: Reject the null hypothesis when the test statistic < the lower critical value.
Otherwise, fail to reject the null.
b. Upper-tail: Reject the null hypothesis when the test statistic > the upper critical value.
Otherwise, fail to reject the null.
c. Two-tail: Reject the null hypothesis when the test statistic < the lower critical value OR >
the upper critical value . Otherwise, fail to reject the null.
5. What is the p-value of a hypothesis test? The p-value is the observed level of significance. In other
words, the p-value is, based on the data, the probability of rejecting the null when it really is
true. If the p-value is less than your alpha, you reject the null. Also, a very small p-value
indicates that you can reject the null with a high degree of confidence that you are not making a
Type I error.
11/27/24
One-sample test of means
X−μ
1. What is the formula for a one-sample test of means? TS=
s / √n
p−π 0
TS=
2. What is the formula for a one-sample test of proportions? √π 0 ( 1−π 0 )
√n
( p1 − p2 )
TS=
√ ( n +n1 )
3. What is the formula for a two-sample test of proportions?
p ( 1− p )
1 2
11/29/24
Probability concepts
1. When are two events mutually exclusive? Two events are mutually exclusive when they do not
overlap.
2. When are a set of events collectively exhaustive? A set of events is collectively exhaustive when all
data are contained within the events.
3. What are the four rules of probability?
a. Rule 1: The probability of any event is the sum of the probabilities of the outcomes that
compose the event.
b. Rule 2: The probability of the complement of any event A is P(A c) = 1-P(A).
c. Rule 3: If events A and B are mutually exclusive, then P(A or B) = P(A) + P(B).
d. Ruel 4: If two events A and B are not mutually exclusive, then P(A or B) = P(A) + P(B) – P(A
and B)
12/2/24
Use the table below to answer the following questions. Frequencies in this table were created based on data in the
2019 Q3 CES Interview 2018-2019 file.
Native Pacific
Other Row
White Black America Asian Islande
Race Total
n r
No Education 10 1 0 2 0 0 13
8th Grade or Less 127 10 1 3 2 0 143
9-12 Grade 278 56 2 8 2 4 350
HS Degree 990 147 9 42 5 16 1209
Some College 894 142 16 33 2 21 1108
Associate's Degree 415 79 5 23 2 5 529
Bachelor's Degree 1010 85 4 85 7 24 1215
More than Bachelor's
617 50 2 87 4 10 770
Degree
Column Total 4341 570 39 283 24 80 5337
What are the joint relative frequencies of each joint event? Fill in the table below.
I
Solution Table
Native Pacific
Other Row
Education White Black America Asian Islande
Race Total
n r
No Education 0.19% 0.02% 0.00% 0.04% 0.00% 0.00% 0.24%
8th Grade or Less 2.38% 0.19% 0.02% 0.06% 0.04% 0.00% 2.68%
9-12 Grade 5.21% 1.05% 0.04% 0.15% 0.04% 0.07% 6.56%
18.55 22.65
HS Degree 2.75% 0.17% 0.79% 0.09% 0.30%
% %
16.75 20.76
Some College 2.66% 0.30% 0.62% 0.04% 0.39%
% %
Associate's Degree 7.78% 1.48% 0.09% 0.43% 0.04% 0.09% 9.91%
18.92 22.77
Bachelor's Degree 1.59% 0.07% 1.59% 0.13% 0.45%
% %
More than Bachelor's 11.56 14.43
0.94% 0.04% 1.63% 0.07% 0.19%
Degree % %
81.34 10.68
Column Total 0.73% 5.30% 0.45% 1.50% 1
% %
12/3/24
Use the table below to answer the following questions. Frequencies in this table were created based on data in the
2019 Q3 CES Interview 2018-2019 file.
Native Pacific
Other Row
White Black America Asian Islande
Race Total
n r
No Education 10 1 0 2 0 0 13
8th Grade or Less 127 10 1 3 2 0 143
9-12 Grade 278 56 2 8 2 4 350
HS Degree 990 147 9 42 5 16 1209
Some College 894 142 16 33 2 21 1108
Associate's Degree 415 79 5 23 2 5 529
Bachelor's Degree 1010 85 4 85 7 24 1215
More than Bachelor's
617 50 2 87 4 10 770
Degree
Column Total 4341 570 39 283 24 80 5337
1. Given that someone is White, what is the chance their level of education is HS or less?
990+278+127 +10
a. P ( HS∨Less|White )= =0.324
4341
2. Given that someone has a level of education of HS or less, what is the chance they are White?
990+278+127+10
a. P ( White|HS∨Less )= =0.819
1209+350+143+13
3. Given that someone is Asian, what is the chance they have more than a Bachelor’s Degree?
87
a. P ( ¿ BS Degree| Asian ) = =0.307
283
4. Given that someone has an Associate’s degree, what is the chance they are not White?
79+5+23+ 2+ 5
a. P ( Not White| Associat e s Degree )=
'
=0.216
529
12/5/24
You are studying consumer preferences for healthy and unhealthy foods and want to know if after the Thanksgiving
and Christmas holidays, consumer spending on unhealthy foods returns to the expenditure amounts before the
holidays. The output below represents a test to determine whether average weekly expenditures on unhealthy foods
after the holiday exceed average weekly expenditures on unhealthy foods before the holiday. Let alpha = 0.05.
Unhealthy Unhealthy
Expenditures after Expenditures before
Holiday Holiday
Mean 52.63941827 46.05178626
Variance 1395.773501 810.2883979
Observations 208 208
Hypothesized Mean
Difference 0
df 387
t Stat 2.022796305
P(T<=t) one-tail 0.021890668
t Critical one-tail 1.648800515
P(T<=t) two-tail 0.043781335
t Critical two-tail 1.966112774
a. What are the null and alternative hypotheses?
a. Ho: Average weekly expenditures on unhealthy food after the holiday <= average weekly
expenditures on unhealthy food before the holiday
b. Ha: Average weekly expenditures on unhealthy food after the holiday > average weekly
expenditures on unhealthy food before the holiday
b. What is the type of test?
a. This is an upper-tail test
c. What is (are) the critical value(s)?
a. UCV = 1.65
d. What is the decision rule?
a. Reject the null if the test statistic > the upper critical value. Otherwise, fail to reject the
null.
e. What is the test statistic?
a. 2.022
f. What is the p-value?
a. 0.022
g. What is the outcome of the test? Interpret the outcome in the context of the null and alternative hypotheses.
a. Reject the null since 2.022 > 1.65. Also, the p-value, 0.022, is less than alpha, 0.05.
b. This means that we have evidence to say that after the holiday season, people do not
revert to their pre-holiday levels of spending on unhealthy foods.