W3 (Extra) - Data 123 Practice Open Questions With Means
W3 (Extra) - Data 123 Practice Open Questions With Means
"The management of a company is interested in evaluating the impact of a recent training program
on the performance of its employees. To assess this, they collected data on the pre-training and post-
training scores of a random sample of employees. The company believes that, on average, the
training program should result in a positive improvement in the scores.
- A: Which statistical test would be appropriate to investigate the belief of the company that
about the training program? T-test one sample
- B: Explain in a few lines why you chose this test.
o Because we are interested in 1 sample (the one of the differences)
- C: Show (copy paste) the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis). Also upload the output
- #1 sample t-test
- # Step 1: Compute the differences btw post and pre
- data123$differences <- data123$Post_training - data123$Pre_training
-
- # Step 2: Compute the test
- t.test(data123$differences)
Output:
One Sample t-test
data: data123$differences
t = 6.0243, df = 49, p-value = 2.147e-07
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
4.112664 8.229868
sample estimates:
mean of x
6.171266
- D: Based on your analysis, provide a conclusion regarding whether there was a significant
change in performance."
P-value < 0.05, so we reject H0. This means that there is a significant improvement in performance
after training program.
Imagine you are tasked with analyzing the impact of employee experience on the performance
before the training. In your dataset (training_data) the variable "Experience" represents the years of
experience that each employee has and it is used to differentiate the employees between
experienced (Group A) and not experienced (Group B).
90.27/77.51
# The ratio btw them is 1.16 < 2 so we can assume equal variance
P-Value >0.05 so we cannot reject H0. We don’t have enough evidence to say that there is a
significant difference btw the 2 groups in performance before training.
- E: Use a 95% confidence interval to answer the question and explain what does this interval
mean.
0 lies between the confidence interval, which means that in 95% of the times the averages in
performance (before training) are similar. The confidence intervals tells we are 95% confident that
the difference is between the 2 groups is between -4 and 7.
Question 3:
"A multinational company is curious about the motivational levels of its employees from different
nationalities. The company collected data on motivation scores, categorizing employees into three
nationalities: Spanish, Dutch, and German. The company hypothesizes that there might be
differences in motivation levels among these nationalities.
- A: Which statistical test would you use to explore whether there are significant differences in
motivation levels among the three nationalities? F-test from Anova
- B: Explain in a few lines why you selected this test. We chose this because we are interested
in the effect of nationality (including all groups) on motivation.
- C: Show (copy paste) the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis). Also upload the output
Residuals:
Min 1Q Median 3Q Max
-3.2604 -0.7765 -0.0648 1.1413 2.2651
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.5845 0.2844 9.086 6.42e-12 ***
NationalityGerman 1.0661 0.4345 2.454 0.0179 *
NationalitySpanish 3.4010 0.4345 7.827 4.62e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For Elke:
P-Value <0.05 so we can say that the motivational levels differ per each nationality.
Extra question: Imagine I ask you whether Spanish are more motivated than dutch ?
We can see that when comparing Spanish to the reference (Dutch), the P-value <0.05, which means
that there is a significant difference in motivation between these 2 groups. On average, Spanish score
3.4 higher in motivation than dutch.
Question 3.5
Some people claim that Spanish people are more motivated than the rest of the nationalities, can
you answer this question with your current output ? If not do something about it and answer the
following questions:
Call:
lm(formula = Motivation ~ Dutch_dummy + German_dummy, data = data123)
Residuals:
Min 1Q Median 3Q Max
-3.2604 -0.7765 -0.0648 1.1413 2.2651
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.9855 0.3284 18.224 < 2e-16 ***
Dutch_dummy -3.4010 0.4345 -7.827 4.62e-10 ***
German_dummy -2.3349 0.4645 -5.027 7.68e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So the claim is indeed correct as Spanish are the most motivated. Both P-values are significant and
on average germans scored 2.3 lower in motivation than Spanish and Dutch scored 3.4 lower in
motivation than Spanish.
#Codes:
# Data simulation
set.seed(123)
n_employees <- 50
library(tidyverse)
t.test(difference)
t.test(data123$difference2.0)
#Interpretation:
# P-Value is <0.05 we reject H0 (u(diff = 0)), so we can say that there is a significant improvement.
# Using the confidence interval, we see tat 0 it's outside, so we reject H0.
# We have 2 samples: We investigate the difference in pretraining scores between group a and b
90.27/77.51
# Since the result 1.16 is <2 we can assume equal variance, so in the following code we write
var.equal = TRUE
# 3 samples:
summary(model_original)
# If question is about general differences between all groups -> Look at the F-test
# the implied null hypothesis H0 is that all means are similar µ(german)=µ(dutch)=µ(spanish)
# In our example : F = 0.328 and the corresponding P-Value = 0.7 > 0.05 so we don't reject H0
# Conclusion I can't say that the average hapiness levels are different between nationalities
# If quesiton is about two specific nationalities, check who is the reference category
# With this output (model_original) you can interpret the differences between german and dutch or
spanish and dutch
# However, we can't say much about the differences between german and spanish, so we need to
change the reference
#Step 2: Re-create the model, by excluding one of the dummies, it automatically makes that variable
the reference
summary(model_spanish_ref)