SOLVED - Practice Test
SOLVED - Practice Test
Question 1:
A researcher conducts a survey to measure people's opinions on a political issue. The options for
responses are "Agree," "Disagree," and "Neutral." What is the measurement level of this variable?
a. Interval
b. Nominal
c. Ordinal
d. Ratio
Question 2:
x/y 0 1 Total
0 120 80 200
1 80 120 200
Total 200 200 400
Question 3:
I: Correlation measures the strength of the linear relationship between two variables.
II: Correlation can only range from -1 to +1.
Question 4:
Now, they are interested in estimating the range within which the population mean score is likely to
fall. Using a 95% confidence level, what is the confidence interval for the population mean score?
Question 6:
After the introduction of a new mobile app, a company receives feedback from its users. The product
development team wants to determine the proportion of users who preferred the previous version
of the app compared to the new one (assuming all users have a preference and none are indifferent).
The team asks you to design a study using a random sample and determine the required sample size.
They are willing to accept a margin of error of 3 percent points.
How large should the sample be? (Rounding errors will be accepted).
Which measure is most appropriate for assessing the relationship between study hours and exam
scores?
Question 8:
A researcher conducted a multiple regression analysis to examine the relationship between customer
satisfaction (Var satisfaction) and three predictor variables: service quality (Var quality), price (Var
price), and advertising expenditure (Var advertising). The researcher obtained the following
information:
- 2 possible formulas:
o 1st: Var(Predicted) / Var(dependent variable) = 5.4/9.6 = 0.5625 so 56.25%
o 2nd : 1 – [ Var(residual) / Var(dependnet variable) = 1 – [4.2/9.6] = 0.5625
Question 9:
Given:
In a large country some people want to support farmers in the transition towards a more sustainable
operation of their farm, while others think farmers should not get that support. Imagine selecting a
large number of samples (all sized 100) from a population. For each of the samples the percentage of
people who want to support farmers is calculated. You put all percentages in a histogram. The middle
of the histogram is at 0.5 (50%).
Two statements:
Call:
lm(formula = exam_grade ~ study_time, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.4200 -0.6884 -0.1655 0.5024 3.2341
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.07059 0.18992 5.637 1.67e-07 ***
study_time 0.58888 0.03311 17.784 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- What is the expected grade of a student who studied 5 hours? Expected Y = 4,0149
- Suppose one person who studied 10 hours for the exam, had a residual of 2.5, what is the
observed (real) grade of this student ? Observed Y = 9,4593
Question 11:
A team of researchers conducted a study to assess the fitness levels of individuals in a particular
population. They found that the mean fitness score in the population is 120, with a standard
deviation of 15.Now, they are interested in estimating the percentage of people in the population
who are likely to have a lower fitness level than John. John's fitness score is 150.
Using this information, what is the estimated percentage of people in the population who have a
lower fitness level than John?
Question 12:
A researcher is studying the proportion of smartphone users who have installed a specific social
media app. In a random sample of 300 smartphone users, it was found that 180 of them had the app
installed. Given this information, what is the 95% confidence interval for the proportion ?
- Reference category
- Group 1
Question 14 :
A researcher investigates the association between political affiliation (measured with three
categories: Democrat, Republican, Independent) and voting preference for a specific policy (four
options are considered, focusing on the "most preferred policy" only). The researcher utilizes the chi-
square statistic to analyze the significance of this relationship.
How many degrees of freedom are associated with the chi-square statistic in this test?
Question 15:
a. Interval
b. Nominal
c. Ordinal
d. Ratio
Question 16:
x/y 0 1 Total
0 40 10 50
1 30 20 50
Total 70 30 100
Question 17 :
Suppose you calculated the chi square in a table by hand in a 3x3 table. The outcome of your
calculation is 7.14.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.27 0.259 -1.068 0.287
x1 2.032 0.027 73.447 <2e-16 ***
x2 -2.965 0.026 - 110.767 <2e-16 ***
x3 0.0402 0.026 ………. 0.125
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Reference category
- Group 1
- Group 2
Question 20:
A sociologist wants to examine the association between marital status (e.g., single, married,
divorced) and job satisfaction among employees in a company. The researcher aims to determine the
strength and nature of this relationship.
Which measure is the most suitable choice for analyzing the association between marital status and
job satisfaction?
a. Spearman's correlation
b. Pearson's correlation
c. Cramer's V
d. Kendall's tau-b
Open questions
Question 1: A researcher is interested in exploring the relationship between marital status and the
level of happiness in individuals. She believes that marital status could play a significant role in
shaping people's happiness levels. To investigate this relationship, the researcher gathers a dataset
consisting of information on individuals' marital status (married or single) and their corresponding
happiness scores, which are measured on a scale from 1 to 10. The researcher hypothesizes that
individuals who are married may experience higher levels of happiness compared to those who are
single.
a- Describe shortly which test is be used to assess the impact of marriage. Two sample t-test
b- Explain in a few (two or three) lines why you selected for this test:
- We are measuring happiness (scale) and we have 2 groups of marital status (married and
single), therefore we should use a t-test for 2 samples. Which one depends on the
assumption of equal variance:
- Equal variance assumption: (using R)
o var(dataset$happiness[dataset$marital_status == "married"]) = 0.53
o var(dataset$happiness[dataset$marital_status == "single"]) = 0.32
o 0.53/0.32 = 1.66 < 2 so we can assume equal variance
- Since we assume equal variance we must use Two sample t-test.
d- Perform a test. Do you conclude the fact that people get married has an effect on the level of
happiness? Explain shortly. A simple yes/no answer will not give you points (but omitting a
simple yes/no answer to the question will also make you lose points). Also shortly discuss the
confidence interval.
- Using P-Value <0.05 we can reject the null hypothesis and conclude that there is an effect of
marital status on happiness. In other words, the fact that you are married or not matters in
your happiness. Married people scored on average 7 while single people score 5.1 on
happiness.
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the difference in happiness between the two groups is not going to
be 0. That difference in happiness is between 1.71 and 2.05.
Question 2: A researcher is interested in exploring the participation in yoga classes among individuals
in a certain population. Yoga is known to offer various benefits for physical and mental well-being.
The researcher conducts a survey to collect data on individuals' involvement in yoga classes, with
response options indicating whether they have applied to yoga (1) classes or not (0).
a- Describe shortly which statistical analysis would be appropriate to assess the participation in
yoga classes. Proportion test
b- Explain in a few lines why you selected this analysis.
- We are measuring whether people applied to yoga classes or not which is a dichotomist
variable with possible answers being yes or no
d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).
"After the summer season, a group of people decided to focus on their weight (measured in kg) and
get healthier. They started a weight management program in September, which included changes in
their diet and exercise habits during 4 months so until December.
To assess the effectiveness of the program, the participants' weights were measured first in
September and then in December. Let's analyze the data and find out if the weight management
program led to noticeable weight loss."
a- Describe shortly which statistical analysis would be appropriate to assess the effectiveness of
the program. T-test one sample
b- Explain in a few lines why you selected this analysis
- We are measuring weight loss (scale) and there is nothing specified about different groups.
So we only have 1 group. There fore we choose a t-test one sample
d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).
e- Based on your analysis, provide a conclusion regarding whether there was a significant
change in weight over the specified time period."
f- Use a 95% confidence interval to answer the question and explain what does this interval
mean.
- Using P-Value <0.05 we can reject the null hypothesis and conclude that there was an effect
of the weight management program. Indeed we can see that on average people lost 0.984
kilograms
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the weight loss was not 0. Onn average people lost between -1.058
and -0.91 kg.
Question 4:
Remember the weight management program from the previous question? Before starting the
program, some participants were already practicing yoga while others were not. Now, a question
arises: Some researchers claim that the weight loss was actually due to the fact that some individuals
were already engaged in yoga before the program experience. To find out, in the dataset there are
multiple weights measured at different times. By examining this dataset, we may gain insights into
the potential impact of pre-existing yoga practice.
- We are measuring weight in september (scale) and we have 2 groups (yes yoga and no yoga),
therefore we should use a t-test for 2 samples. Which one depends on the assumption of
equal variance:
- Equal variance assumption: (using R)
o var(dataset$weight_september[dataset$yoga == 0]) = 223
o var(dataset$weight_september [dataset$yoga == 1]) = 218
o 223/218 = 1.02 < 2 so we can assume equal variance
- Since we assume equal variance we must use Two sample t-test.
d- Based on your analysis, provide a conclusion regarding the potential impact of pre-existing
yoga practice on weight loss.
e- Compute a 95% confidence interval and explain what does this interval mean in this context.
- Using P-Value = 0.8708 >0.05 we can’t reject the null hypothesis and conclude that the
weights in September were not so different between groups. In fact we can see that people
not doing yoga weighted on average 75.65 while people attending to yoga weighted 75.97
- Using confidence interval: since 0 is included inside the confidence interval, we can say that
in 95% of the cases the difference in weights between the two groups is around 0. That
difference is between -4.12 and 3.49.
Question 5: A company offers five different training programs to its employees: Program A to
Program E. The company is interested in assessing whether the distribution of employees across
these programs matches the expected distribution based on their preferences. A random sample of
employees is taken, and their program preferences are recorded. The company believes that the
distribution of program preferences in the sample may deviate from the expected distribution.
a- Describe shortly which test is used to answer this question (name of the test)”
a. Goodness of fit test
b- Explain in a few (two or three) lines why you selected this test:
a. I have proportions about a nominal variable (we have 5 programs) and I want to
compare if the proportions of my sample (observed) match the expected proportion
in the population (expected).
c- Upload a screenshot displaying the relevant (numerical, not graphical) output of the test.
d- Upload a screenshot of the commands used to perform the test (even if you were unable to
execute the test).
e- Based on the output, do you conclude that the distribution of employees across training
programs matches the expected distribution?
H0: It is likely that the data come from a random sample in the population (The sample fractions and
population fractions are “equal = close enough”)
H1: It is UNlikely that the data come from a random sample in the population (The fractions or
proportions are statistically “different”)
Question 6: A researcher is interested in investigating the potential benefits of yoga practice on
individuals' happiness levels. It is believed that yoga can promote relaxation, reduce stress, and
enhance overall well-being. The researcher gathers a dataset consisting of information on individuals'
participation in yoga classes (yes or no) and their corresponding happiness scores, measured on a
scale from 1 to 10. The researcher hypothesizes that individuals who participate in yoga classes may
experience higher levels of happiness compared to those who do not engage in yoga practice.
a- Describe shortly which test is used to assess the impact of participating in yoga classes on
happiness levels. Welch test
b- Explain in a few lines why you selected this test for evaluating the relationship between yoga
participation and happiness.
- We are measuring happiness (scale) and we have 2 groups of yoga (0 and 1), therefore we
should use a t-test for 2 samples. Which one depends on the assumption of equal variance:
- Equal variance assumption: (using R)
o var(dataset$happiness[dataset$yoga == "1"]) = 2.01
o var(dataset$happiness[dataset$yoga == "0"]) = 0.54
o 2.01/0.54 = 3.72 > 2 so we CAN’T assume equal variance
- Since we can’t assume equal variance we must use Welch test
d- Perform the test and draw conclusions regarding the effect of participating in yoga classes on
happiness levels. Avoid giving a simple yes/no answer and instead provide a brief
explanation based on the statistical analysis. Additionally, discuss the confidence interval and
its role in interpreting the results."
- Using P-Value <0.05 we can reject the null hypothesis and conclude that there is an effect of
yoga on happiness. In other words, the fact that you do yoga or not matters in your
happiness. People doing yoga scored on average 6.28 while doesn’t not doing yoga scored
5.92 on happiness
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the difference in happiness between the two groups is not going to
be 0. That difference in happiness is between 0.65 and 0.07 in 95% of the cases.
Question 7: The management of a wellness center is interested in understanding whether there are
significant differences in the happiness levels of its clients based on their diets. The center offers
various wellness programs and wants to explore if there is a connection between dietary choices and
happiness. The center believes that people with different dietary preferences may experience varying
levels of happiness and wants to validate this claim:
a- Which statistical test would you use to explore whether there are significant differences in
happiness levels among individuals with different diets?
Anova test (F-test) from lm
b- Explain in a few lines why you selected this test.
Because we want to compare differences between all three groups.
c- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.
Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.01446 0.12596 47.750 <2e-16 ***
vegan 0.06481 0.17867 0.363 0.717
no_diet 0.23202 0.18550 1.251 0.212
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
A popular health magazine published a claim that individuals following a Vegan diet tend to exhibit
higher motivation levels compared to individuals with Vegetarian diets.
a- Which statistical test would you use to answer this question and why? Lm with dummies
because I want to see the differences between each group with each other, not as a whole.
b- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.
Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.07927 0.12672 47.973 <2e-16 ***
vegetarian -0.06481 0.17867 -0.363 0.717
no_diet 0.16721 0.18602 0.899 0.370
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.01446 0.12596 47.750 <2e-16 ***
vegan 0.06481 0.17867 0.363 0.717
no_diet 0.23202 0.18550 1.251 0.212
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
P-value = 0.717 > 0.05 so we can’t reject H0. This means that there is no significant difference in
happiness levels between vegan and vegetarians (reference).