0% found this document useful (0 votes)

26 views9 pages

W3 (Extra) - Data 123 Practice Open Questions With Means

Uploaded by

z13612909240

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views9 pages

W3 (Extra) - Data 123 Practice Open Questions With Means

Uploaded by

z13612909240

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Question 1:

"The management of a company is interested in evaluating the impact of a recent training program
on the performance of its employees. To assess this, they collected data on the pre-training and post-
training scores of a random sample of employees. The company believes that, on average, the
training program should result in a positive improvement in the scores.

- A: Which statistical test would be appropriate to investigate the belief of the company that
about the training program? T-test one sample
- B: Explain in a few lines why you chose this test.
o Because we are interested in 1 sample (the one of the differences)
- C: Show (copy paste) the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis). Also upload the output
- #1 sample t-test
- # Step 1: Compute the differences btw post and pre
- data123$differences <- data123$Post_training - data123$Pre_training
-
- # Step 2: Compute the test
- t.test(data123$differences)

Output:
One Sample t-test

data: data123$differences
t = 6.0243, df = 49, p-value = 2.147e-07
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
4.112664 8.229868
sample estimates:
mean of x
6.171266
- D: Based on your analysis, provide a conclusion regarding whether there was a significant
change in performance."

P-value < 0.05, so we reject H0. This means that there is a significant improvement in performance
after training program.

- E: Use a 95% confidence interval to

o 1st : answer the question: Since 0 is outside the confidence interval, we reject h0 and
conclude that there is a significant improvement in performance after training
program.
o 2nd : explain what does this interval mean: The confidence intervals tells we are 95%
confident that the difference is between 4.1 and 8.2
Question 2:

Imagine you are tasked with analyzing the impact of employee experience on the performance
before the training. In your dataset (training_data) the variable "Experience" represents the years of
experience that each employee has and it is used to differentiate the employees between
experienced (Group A) and not experienced (Group B).

Your objective is to determine whether there is a significant difference in pre-training performance

between employees with less than 5 years of experience (Group A) and those with more than 5 years
of experience (Group B).

- A: Which statistical test would be appropriate? We choose a two sample t-test

- B: Explain in a few lines why you chose this test. Because we have 2 independent samples
(group A and group B) which variances are similar
- C: Show (copy paste) the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis). Also upload the output

var(data123$Pre_training[data123$Group == "Group A"])

#result for group A = 77.51

var(data123$Pre_training[data123$Group == "Group B"])

#result for group B = 90.27

90.27/77.51

# The ratio btw them is 1.16 < 2 so we can assume equal variance

t.test(Pre_training ~ Group, data = data123, var.equal = TRUE)

Two Sample t-test

data: Pre_training by Group

t = 0.61765, df = 48, p-value = 0.5397
alternative hypothesis: true difference in means between group Group A and
group Group B is not equal to 0
95 percent confidence interval:
-4.088114 7.713497
sample estimates:
mean in group Group A mean in group Group B
71.64917 69.83648

- D: Based on your analysis, provide a conclusion regarding whether there is a significant

difference in effectiveness."

P-Value >0.05 so we cannot reject H0. We don’t have enough evidence to say that there is a
significant difference btw the 2 groups in performance before training.

- E: Use a 95% confidence interval to answer the question and explain what does this interval
mean.

0 lies between the confidence interval, which means that in 95% of the times the averages in
performance (before training) are similar. The confidence intervals tells we are 95% confident that
the difference is between the 2 groups is between -4 and 7.
Question 3:

"A multinational company is curious about the motivational levels of its employees from different
nationalities. The company collected data on motivation scores, categorizing employees into three
nationalities: Spanish, Dutch, and German. The company hypothesizes that there might be
differences in motivation levels among these nationalities.

- A: Which statistical test would you use to explore whether there are significant differences in
motivation levels among the three nationalities? F-test from Anova
- B: Explain in a few lines why you selected this test. We chose this because we are interested
in the effect of nationality (including all groups) on motivation.
- C: Show (copy paste) the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis). Also upload the output

model_original = lm(Motivation ~ Nationality, data = data123)

summary(model_original)

- D: Based on your analysis, provide a conclusion regarding whether there is a significant

difference in effectiveness."
Call:
lm(formula = Motivation ~ Nationality, data = data123)

Residuals:
Min 1Q Median 3Q Max
-3.2604 -0.7765 -0.0648 1.1413 2.2651

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.5845 0.2844 9.086 6.42e-12 ***
NationalityGerman 1.0661 0.4345 2.454 0.0179 *
NationalitySpanish 3.4010 0.4345 7.827 4.62e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.272 on 47 degrees of freedom

Multiple R-squared: 0.5698, Adjusted R-squared: 0.5515
F-statistic: 31.13 on 2 and 47 DF, p-value: 2.456e-09

For Elke:

- 1st df = number of groups - 1 -> 3-1 = 2

- 2nd df = total people – total groups = 50-3 = 47

P-Value <0.05 so we can say that the motivational levels differ per each nationality.

Extra question: Imagine I ask you whether Spanish are more motivated than dutch ?

We can see that when comparing Spanish to the reference (Dutch), the P-value <0.05, which means
that there is a significant difference in motivation between these 2 groups. On average, Spanish score
3.4 higher in motivation than dutch.
Question 3.5

Some people claim that Spanish people are more motivated than the rest of the nationalities, can
you answer this question with your current output ? If not do something about it and answer the
following questions:

- C: Which statistical test would you use to answer the question?

- D: Explain in a few lines why you selected this test.
- C+D answer: Since we are interested in comparing Spanish to the rest, we compute lm model
with Spanish as reference
- E: Show (copy paste) the commands or steps you would use to answer the new question 3.5
Also upload the output

data123$German_dummy = ifelse(data123$Nationality == "German", 1, 0)

data123$Dutch_dummy = ifelse(data123$Nationality == "Dutch", 1, 0)
data123$Spanish_dummy = ifelse(data123$Nationality == "Spanish", 1, 0)

model_spanish_ref<- lm(Motivation ~ Dutch_dummy + German_dummy, data = data123)

summary(model_spanish_ref)

- F: Based on your analysis, provide a conclusion regarding the claim.

Call:
lm(formula = Motivation ~ Dutch_dummy + German_dummy, data = data123)

Residuals:
Min 1Q Median 3Q Max
-3.2604 -0.7765 -0.0648 1.1413 2.2651

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.9855 0.3284 18.224 < 2e-16 ***
Dutch_dummy -3.4010 0.4345 -7.827 4.62e-10 ***
German_dummy -2.3349 0.4645 -5.027 7.68e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.272 on 47 degrees of freedom

Multiple R-squared: 0.5698, Adjusted R-squared: 0.5515
F-statistic: 31.13 on 2 and 47 DF, p-value: 2.456e-09

So the claim is indeed correct as Spanish are the most motivated. Both P-values are significant and
on average germans scored 2.3 lower in motivation than Spanish and Dutch scored 3.4 lower in
motivation than Spanish.
#Codes:

# Data simulation

set.seed(123)

n_employees <- 50

Employee_ID <- 1:n_employees

Pre_training <- rnorm(n_employees, mean = 70, sd = 10)

Post_training <- Pre_training + rnorm(n_employees, mean = 5, sd = 8)

Age <- sample(25:45, n_employees, replace = TRUE)

Experience <- sample(1:10, n_employees, replace = TRUE)

Motivation <- rnorm(n_employees, mean = 3, sd = 1)

data123 <- data.frame(Employee_ID, Pre_training, Post_training, Age, Experience, Motivation)

data123$Group <- ifelse(data123$Experience < 5, "Group A", "Group B")

data123$Nationality <- sample(c("Spanish", "Dutch", "German"), nrow(data123), replace = TRUE)

data123$Motivation <- with(data123, ifelse(Nationality == "Spanish", rnorm(n_employees, mean = 6,

sd = 1.53),

ifelse(Nationality == "Dutch", rnorm(n_employees, mean = 3, sd = 1.12),

rnorm(n_employees, mean = 3.5, sd = 0.97))))

library(tidyverse)

##Question 1: One sample t-test

#Step 1: Create your new sample

difference <- data123$Post_training- data123$Pre_training

#Step 2: Do the t-test

t.test(difference)

#Other method, but same result

data123$difference2.0 <- data123$Post_training - data123$Pre_training

t.test(data123$difference2.0)

#Interpretation:

# P-Value is <0.05 we reject H0 (u(diff = 0)), so we can say that there is a significant improvement.

# Using the confidence interval, we see tat 0 it's outside, so we reject H0.

##Question 2: Two sample t-test

# We have 2 samples: We investigate the difference in pretraining scores between group a and b

# The samples are independent

# We need to see whether there is equal variance or not, HOW ?

# we compute the variances and calculate the rule of thumb

var(data123$Pre_training[data123$Group == "Group A"])

var(data123$Pre_training[data123$Group == "Group B"])

# rule of thumb : var(biggest)/var(smallest)

90.27/77.51

# Since the result 1.16 is <2 we can assume equal variance, so in the following code we write
var.equal = TRUE

t.test(Pre_training ~ Group, data=data123, var.equal = TRUE)

##Question 2: More than 2 samples

# 3 samples:

model_original <- lm(Motivation ~ Nationality, data = data123)

summary(model_original)

# If question is about general differences between all groups -> Look at the F-test

# the implied null hypothesis H0 is that all means are similar Âµ(german)=Âµ(dutch)=Âµ(spanish)

# In our example : F = 0.328 and the corresponding P-Value = 0.7 > 0.05 so we don't reject H0
# Conclusion I can't say that the average hapiness levels are different between nationalities

# If quesiton is about two specific nationalities, check who is the reference category

# (it is the one that is missing in the output)

# Originally (model_original) the missing category is dutch, so it is the REFERENCE.

# With this output (model_original) you can interpret the differences between german and dutch or
spanish and dutch

# However, we can't say much about the differences between german and spanish, so we need to
change the reference

#Step 1: Make dummy variables

data123$German_dummy = ifelse(data123$Nationality == "German", 1, 0)

data123$Dutch_dummy = ifelse(data123$Nationality == "Dutch", 1, 0)

data123$Spanish_dummy = ifelse(data123$Nationality == "Spanish", 1, 0)

#Step 2: Re-create the model, by excluding one of the dummies, it automatically makes that variable
the reference

#Here we want spanish as reference, so we don't include it in the model

model_spanish_ref<- lm(Motivation ~ Dutch_dummy + German_dummy, data = data123)

summary(model_spanish_ref)

Visual Threat Intelligence
67% (3)
Visual Threat Intelligence
136 pages
Complete - PMP Green Book - Exercises v7
100% (2)
Complete - PMP Green Book - Exercises v7
95 pages
NEW BM2517 Marketing Sustainability For The Next Generation Syllabus - 2024
No ratings yet
NEW BM2517 Marketing Sustainability For The Next Generation Syllabus - 2024
9 pages
Electronic Edge Example
No ratings yet
Electronic Edge Example
138 pages
Research
No ratings yet
Research
14 pages
SM 38
No ratings yet
SM 38
58 pages
FutureBrand Index 2014
No ratings yet
FutureBrand Index 2014
63 pages
Solutions - Exercises - 1 - and - 2 Multiple - Linear - Regression
No ratings yet
Solutions - Exercises - 1 - and - 2 Multiple - Linear - Regression
8 pages
Discrete Probability Distribution-1
No ratings yet
Discrete Probability Distribution-1
41 pages
Ramandeep Resume
No ratings yet
Ramandeep Resume
4 pages
Statistical Inference Assignment 3
No ratings yet
Statistical Inference Assignment 3
9 pages
Beckman Shelby CapstonePoster
No ratings yet
Beckman Shelby CapstonePoster
1 page
Regression in R
No ratings yet
Regression in R
40 pages
Talent Guide 2021: Market, Challenges, Opportunities & Salary Range
No ratings yet
Talent Guide 2021: Market, Challenges, Opportunities & Salary Range
5 pages
Belay Research
No ratings yet
Belay Research
43 pages
EC Marie Curie Initial Training Network: Advanced Technologies For Biogas Efficiency, Sustainability and Transport
No ratings yet
EC Marie Curie Initial Training Network: Advanced Technologies For Biogas Efficiency, Sustainability and Transport
15 pages
1 What Is Ob
No ratings yet
1 What Is Ob
53 pages
Sheet10 Solution
No ratings yet
Sheet10 Solution
14 pages
Brain Fingerprinting
No ratings yet
Brain Fingerprinting
27 pages
Worksheet 9 - Spring 2014 - Chapter 6 - Key
No ratings yet
Worksheet 9 - Spring 2014 - Chapter 6 - Key
8 pages
English Assignment
No ratings yet
English Assignment
5 pages
VJST Template
No ratings yet
VJST Template
4 pages
A+B A-B B AB A: Homework 1
No ratings yet
A+B A-B B AB A: Homework 1
3 pages
Statistics Two Workbook
100% (1)
Statistics Two Workbook
67 pages
Practice Final Exam S15 Ver2
No ratings yet
Practice Final Exam S15 Ver2
16 pages
Categorical Predictor S
No ratings yet
Categorical Predictor S
41 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
46 pages
De La Salle University - Dasmariñas: SMATH001LA - Data Analytics For Engineering
No ratings yet
De La Salle University - Dasmariñas: SMATH001LA - Data Analytics For Engineering
5 pages
Business Research
No ratings yet
Business Research
15 pages
Goodman Et Al 2011 An Investigation of The Relationship Between Students Motivation and Academic Performance As
No ratings yet
Goodman Et Al 2011 An Investigation of The Relationship Between Students Motivation and Academic Performance As
13 pages
Efectos de Interacción
No ratings yet
Efectos de Interacción
30 pages
Basic Inferential Statistics Nov. 5
No ratings yet
Basic Inferential Statistics Nov. 5
50 pages
Final Report - Group 5
No ratings yet
Final Report - Group 5
21 pages
What Is SPSS?: "Statistical Package For The Social Science"
No ratings yet
What Is SPSS?: "Statistical Package For The Social Science"
143 pages
TMM 1
No ratings yet
TMM 1
4 pages
(SOLVED) - Practice Test 2
No ratings yet
(SOLVED) - Practice Test 2
29 pages
Unit 10 - More Multiple Regression - 1 Per Page
No ratings yet
Unit 10 - More Multiple Regression - 1 Per Page
30 pages
Public Policy Evaluation Process
No ratings yet
Public Policy Evaluation Process
13 pages
D4L1-Introduction-sep 2023
No ratings yet
D4L1-Introduction-sep 2023
35 pages
Final Exam Quantitative Data Analysis 1 2023 For Canvas - 451650306
No ratings yet
Final Exam Quantitative Data Analysis 1 2023 For Canvas - 451650306
8 pages
Institute of Rural Management Anand: End-Term Examination (Openbook)
No ratings yet
Institute of Rural Management Anand: End-Term Examination (Openbook)
4 pages
Week 13 - Causal-Comparative Research T-Test
No ratings yet
Week 13 - Causal-Comparative Research T-Test
43 pages
Estepa, Edrianne M.
No ratings yet
Estepa, Edrianne M.
8 pages
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
No ratings yet
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
38 pages
Module 4. T Test For Dependetn Samples
No ratings yet
Module 4. T Test For Dependetn Samples
11 pages
(EMPTY) - Practice Test 2
No ratings yet
(EMPTY) - Practice Test 2
19 pages
Chap 6 MultipleLinearRegression Adjusted
No ratings yet
Chap 6 MultipleLinearRegression Adjusted
30 pages
An Overview of Production-Oriented Approach: March 2022
No ratings yet
An Overview of Production-Oriented Approach: March 2022
6 pages
Docxbvcxzasqwrretyhgf 4532 Wesadfgvc
No ratings yet
Docxbvcxzasqwrretyhgf 4532 Wesadfgvc
3 pages
Problem Set 2 Quantitative Methods UNIGE
No ratings yet
Problem Set 2 Quantitative Methods UNIGE
10 pages
Notes
No ratings yet
Notes
8 pages
Simulation Software Comparison
No ratings yet
Simulation Software Comparison
39 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Unit 522 Understanding and Visualizing Linear Equations Without Answers
No ratings yet
Unit 522 Understanding and Visualizing Linear Equations Without Answers
8 pages
Unit 6 - Assignment With Answers
No ratings yet
Unit 6 - Assignment With Answers
9 pages
W6 - Interaction Equations
No ratings yet
W6 - Interaction Equations
6 pages
Activity 5 Stat
No ratings yet
Activity 5 Stat
6 pages
Unit 545 Differences Between Two or More Groups Non Parametric With Answers
No ratings yet
Unit 545 Differences Between Two or More Groups Non Parametric With Answers
10 pages
Unit 6 - Assignment Without Answers
No ratings yet
Unit 6 - Assignment Without Answers
6 pages
Nur Syuhada Binti Safaruddin - Rba2403a - 2020819322
No ratings yet
Nur Syuhada Binti Safaruddin - Rba2403a - 2020819322
6 pages
Sample Questions
No ratings yet
Sample Questions
4 pages
Unit 545 Differences Between Two or More Groups Non Parametric Without Answers
No ratings yet
Unit 545 Differences Between Two or More Groups Non Parametric Without Answers
8 pages
Unit 1 - Assignment With Answers
No ratings yet
Unit 1 - Assignment With Answers
4 pages
Computer Project - Student Choose Data
No ratings yet
Computer Project - Student Choose Data
4 pages
MINDEX Data To Determine Whether Managerial Success (Y) Is Related To The Extensiveness of A
No ratings yet
MINDEX Data To Determine Whether Managerial Success (Y) Is Related To The Extensiveness of A
6 pages
Statistics Econometrics Exam Feb
No ratings yet
Statistics Econometrics Exam Feb
8 pages
Unit 10 - Assignment Without Answers PM
No ratings yet
Unit 10 - Assignment Without Answers PM
3 pages
Assignmentdyads6 - 71455 - 4039886 - Assignment 4 - Method and Results Qualitative Draft-1
No ratings yet
Assignmentdyads6 - 71455 - 4039886 - Assignment 4 - Method and Results Qualitative Draft-1
4 pages
Unit 5 - Assignment Without Answers
No ratings yet
Unit 5 - Assignment Without Answers
2 pages
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
No ratings yet
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
9 pages
R Tutorials - Independent Samples T Test
100% (1)
R Tutorials - Independent Samples T Test
5 pages
Cohen Chap 7 T Test For Independent Sample Means (Screen)
No ratings yet
Cohen Chap 7 T Test For Independent Sample Means (Screen)
20 pages
For Review
No ratings yet
For Review
12 pages
(EMPTY) - Practice Test 2.5
No ratings yet
(EMPTY) - Practice Test 2.5
16 pages
6.2.statistical Treatment
No ratings yet
6.2.statistical Treatment
64 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Blended Learning
No ratings yet
Blended Learning
17 pages
W3 - Testing Means - Choose Your Test
No ratings yet
W3 - Testing Means - Choose Your Test
7 pages
Best Example by Henk - Research Proposal 1
No ratings yet
Best Example by Henk - Research Proposal 1
14 pages
Chapter 1-2
No ratings yet
Chapter 1-2
27 pages
Homework 9: Independent and Paired Samples T-Tests: Information 1
No ratings yet
Homework 9: Independent and Paired Samples T-Tests: Information 1
7 pages
Seminaron CH13
No ratings yet
Seminaron CH13
14 pages
R Session Bootstrapping Randomisation 2024
No ratings yet
R Session Bootstrapping Randomisation 2024
4 pages
QM Lab Exercise 5a
No ratings yet
QM Lab Exercise 5a
7 pages
Checkpoint B1+ - TRC - Culture - U6
No ratings yet
Checkpoint B1+ - TRC - Culture - U6
2 pages
Lecture8-Two Sample T Tests
No ratings yet
Lecture8-Two Sample T Tests
25 pages
(EMPTY) - Practice Test 2
No ratings yet
(EMPTY) - Practice Test 2
21 pages
IAR Lecture 3
No ratings yet
IAR Lecture 3
6 pages
Written Assignment - Lumiere Program Manager
No ratings yet
Written Assignment - Lumiere Program Manager
3 pages
12 AS Statistics and Mechanics Practice Paper F Mark Scheme
No ratings yet
12 AS Statistics and Mechanics Practice Paper F Mark Scheme
9 pages
Time Table ( - )
No ratings yet
Time Table ( - )
2 pages
Lab6 - HT and CI in R Some Solutions
No ratings yet
Lab6 - HT and CI in R Some Solutions
7 pages
RPH SN DLP Y4 V2 (w5)
No ratings yet
RPH SN DLP Y4 V2 (w5)
2 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Econometrics Trial Exam 1
No ratings yet
Econometrics Trial Exam 1
15 pages
Hyp Test 2pop
No ratings yet
Hyp Test 2pop
87 pages
Inbound 423911485785647080
No ratings yet
Inbound 423911485785647080
23 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Project (1) For Mechanical Engineer-2024
No ratings yet
Project (1) For Mechanical Engineer-2024
2 pages
Exp 7
No ratings yet
Exp 7
8 pages
Challenge of Good Governance in Urban Land Administration
No ratings yet
Challenge of Good Governance in Urban Land Administration
23 pages
Week 10 Tue
No ratings yet
Week 10 Tue
22 pages
Numerical Reasoning: Emergency Services Training
From Everand
Numerical Reasoning: Emergency Services Training
Craig MacKellar
No ratings yet