0% found this document useful (0 votes)
11 views19 pages

(EMPTY) - Practice Test 2

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

(EMPTY) - Practice Test 2

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Practice test 2:

Q1:

Levene's Test for Homogeneity of Variance (center = median)


Df F value Pr(>F)
group 2 4.0539 0.02038 *
97
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

A- There is clear sign of non-normality in the residuals


B- There is sign of homoskedasticity in the residuals
C- The data is normally distributed
D- There is a clear violation of the equal variance assumption

Q2:
Q3:

In an article called “The influence of cultural context on job satisfaction from the Journal of
Organizational Behavior , it is argued that individuals from different cultural backgrounds experience
job satisfaction differently. The dependent variable in the study is the level of job satisfaction
reported by individuals. The main independent variable is "cultural alignment" which may vary based
on the extent to which it aligns with the individual's cultural values.

To examine this idea, a study was conducted with the following variables:

- y = level of job satisfaction reported by an individual (job satisfaction)


- x1 = cultural alignment between individual and organizational culture (ranging from 0 to 10)
- x3 = a dummy variable with two groups. 0 (the reference category) represents individuals
from the local culture, while 1 represents individuals from a different cultural background.

Call:
lm(formula = y ~ x1cult_align + x3 + x1cult_align * x3)

Residuals:
Min 1Q Median 3Q Max
-2.77367 -0.74912 0.02706 0.74357 3.04931

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.06475 0.07430 0.871 0.384
x1 cult_alig 5.02042 0.07555 66.456 <2e-16 ***
x3 3.95939 0.07571 52.294 <2e-16 ***
x1:x3 0.09430 0.08853 1.065 0.288
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.16 on 246 degrees of freedom


Multiple R-squared: 0.968, Adjusted R-squared: 0.9676
F-statistic: 2480 on 3 and 246 DF, p-value: < 2.2e-16

Ignoring significance:

- What is the effect of cultural alignment?


- What is the level of satisfaction among individuals from a local culture with cultural
alignment of 4 out of 10?
- What is the effect of cultural alignment among individuals from a different cultural
background?
Q4: In the graph below: X is a scale variable and the lines represent the group variable where the red
line is the reference category (coded 0) while blue line is the group 2 (coded 1)

Red line is the reference category (Group 1) and the blue line is the group 2

- The value of the intercept (the reference category) is:


- The value of the b-coefficient associated with the group variable is:
- The value of the b coefficient associated with the x variable is:
- The value of the b coefficient associated with the interaction term is:
- If the x value is 2, the predicted (=expected) y value for people in group 2 is:
Q5:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11676 0.08404 1.389 0.167
x1 2.84154 0.43105 6.592 1.25e-09 ***
x2 2.09099 0.42788 4.887 3.23e-06 ***
x3 3.89287 0.07576 51.387 < 2e-16 ***
x4 -0.02173 0.09293 -0.234 0.815
x5 0.07909 0.07737 1.022 0.309
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> vif(lm(y ~ x1 + x2 + x3 + x4 + x5))


x1 x2 x3 x4 x5
21.334892 21.406599 1.053488 1.055034 1.032924

According to the output above which statements are correct

- 1st statement: x1 and x4 are highly correlated


- 2nd statement: We could remove v1 to improve our model.
- 3rd statement: x1 and x2 are highly correlated
- 4th statement : We could mix x1 and x2 to improve the model.

Q6:

Suppose a researcher named Mark wants to examine whether there is a difference in the average
satisfaction levels among three different customer service departments in a company. The researcher
collects satisfaction ratings from customers who have interacted with Department A, Department B,
and Department C. He couldn’t collect so much data about the different departments, only 35.

Which statistical procedure you recommend?

- ANOVA
- Independent sample ttest
- One sample t-test
- Kruskall wallis test
- Mann-Whitnew-wilcoxon test
- Welch-t test
- Wilcoxon signed rank test
Q7: Hypothesis

- Y: dependent variable
- X: Ratio independent variable
o called x in the graph
o called x2 in the equations

Which of the following formula summarize the best the model above?

A- Y = b1x1 + b2x2 + b3x22


B- Y = b1x1 + b2x2 + b3x2
C- Y = b0 + b1x12 + b2x2 + b3x22
D- Y = b0 + b1x1 + b2x2 + b3x22
E- Y = b0 + b1x12 + b2x2 + b3x2

Q8:

A- There is clear sign of non-normality in the residuals


B- There is sign of heteroskedasticity in the residuals
C- The data seems to be normally distributed
D- There is equal variance in the residuals
Q9: In the graph below: X is a scale variable and the lines represent the group variable where the red
line is the reference category (coded 0) while blue line is the group 2 (coded 1)

Red line is the reference category (Group 1) and the blue line is the group 2

- The value of the intercept (the reference category) is:


- The value of the b-coefficient associated with the group variable is:
- The value of the b coefficient associated with the x variable is:
- The value of the b coefficient associated with the interaction term is:
- If the x value is -6, the predicted (=expected) y value for people in group 2 is:

Q10:

After the introduction of a new mobile app, a company receives feedback from its users. The product
development team wants to determine the proportion of users who preferred the previous version
of the app compared to the new one (assuming all users have a preference and none are indifferent).
The team asks you to design a study using a random sample and determine the required sample size.
They are willing to accept a margin of error of 3 percent points.

How large should the sample be? (Rounding errors will be accepted).
Q11:

In a research paper titled "The Impact of Leadership Styles on Employee Motivation in the Hospitality
Industry" published in the Journal of Hospitality and Tourism Management, the authors explore how
different leadership styles affect the motivation levels of employees. The dependent variable in the
study is the level of employee motivation. The main independent variable is "leadership style," which
may vary based on the dominant approach adopted by supervisors.

To investigate this concept, a study was conducted with the following variables:

- y = level of employee motivation (employee motivation)


- x4 = leadership style adopted by supervisors (ranging from autocratic = 0 to democratic = 10)
- x3 = gender where men = 0 and women = 1.

Call:
lm(formula = y ~ x4 + x3 + x4 * x3)

Residuals:
Min 1Q Median 3Q Max
-15.9310 -3.4420 0.1234 3.0036 14.1565

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5993 0.3196 -1.875 0.0620 .
x4:leadership 0.5884 0.3403 1.729 0.0851 .
x3gender 4.2074 0.3287 12.799 <2e-16 ***
x4:x3 -0.4049 0.3281 -1.234 0.2183
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.019 on 246 degrees of freedom


Multiple R-squared: 0.4009, Adjusted R-squared: 0.3935
F-statistic: 54.86 on 3 and 246 DF, p-value: < 2.2e-16

Ignoring significance:

- What is the effect of leadership style on employee motivation in the hospitality industry?
- What is the level of motivation among men working under completely autocratic leaders
(aka x4=0)?
- What is the effect of leadership style among women?
- What is the level of employee motivation among women working under leaders who scored
5/10 in the leadership style?
Q12:

A- There is clear sign of non-normality in the residuals


B- The variance of the residuals is uniform (homoskedasticity)
C- The data is normally distributed
D- There is a clear violation of the equal variance assumption (heteroskedasticity)

Q13: Cooks distance

True false questions

- There are no influential cases


- Case 3 has higher residual than case 6
- Case 6 has lower leverage than case 3
- Case 9 has a lower residual than case 6
Q14: Shapiro-wilk

Shapiro-Wilk normality test

data: residuals
W = 0.99547, p-value = 0.9855

A- The residuals look skewed, not normal distributed


B- There is sign of heteroskedasticity
C- The data is normally distributed
D- The equal variance assumption is not met

Q15:

Which one of the following statements is correct ?

A- The variance is the same within the groups, the researcher is safe to use an independent
sample t-test.
B- The variance is different within the groups, the researcher should use the non-parametric
alternative Mann-Whitnew-wilcoxon test
C- The variance is the same within the groups, the researcher should use the parametric test
ANOVA.
D- The variance is different within the groups, the researcher should use the non-parametric
alternative Kruskall wallis test
Q16:

Suppose a researcher named Alex wants to investigate the effectiveness of a new teaching method in
improving students' reading comprehension skills. Alex collects reading comprehension test scores
from a group of students before and after implementing the new teaching method. However, upon
inspecting the data, Alex notices that the distribution of the reading comprehension scores is highly
skewed.

- ANOVA
- Independent sample ttest
- One sample t-test
- Kruskall wallis test
- Mann-Whitnew-wilcoxon test
- Welch-t test
- Wilcoxon signed rank test

Q17: In the graph below: X is a scale variable and the lines represent the group variable where the
red line is the reference category (coded 0) while blue line is the group 2 (coded 1)

Red line is the reference category (Group 1) and the blue line is the group 2

- The value of the intercept (the reference category) is:


- The value of the b-coefficient associated with the group variable is:
- The value of the b coefficient associated with the x variable is:
- The value of the b coefficient associated with the interaction term is:
- If the x value is 1, the predicted (=expected) y value for people in group 2 is:
Q18: Breush-pagan test

studentized Breusch-Pagan test

data: model
BP = 6.8297, df = 3, p-value = 0.07753

A- There is clear sign of non-normality in the residuals


B- There is sign of heteroskedasticity in the residuals
C- The data seems to be normally distributed
D- There is equal variance in the residuals

Q19:

A study is being conducted to determine the proportion of patients who experience a certain side
effect from a medication. In a sample of 125 patients, 18 experienced the side effect.

a- Create a 95% confidence interval for the proportion of patients who experience the side
effect.
b- What is the sample size we need, if we want a confidence interval to be half ?

Q20:

A- There is clear sign of non-normality in the residuals


B- There is sign of homoskedasticity in the residuals
C- The data is normally distributed
D- There is a clear violation of the equal variance assumption

In case of non normality what would you do ? give at least 2 arguments

A- First argument :
B- Second argument:
Q21: Hypothesis

Q22: Cooks distance

True false questions

- There are no influential cases


- Case 64 has higher residual than case 49
- Case 43 has lower leverage than case 64
- Both 43 and 49 are influential cases
Q23:

In a research paper titled "The Impact of Salary on Employee Engagement in the Technology Sector,"
published in the Journal of Organizational Psychology, the authors examine how different salary
levels influence the level of employee engagement in technology companies. The dependent variable
in the study is the level of employee engagement. The main independent variable is "salary," which
varies based on the monetary compensation received by employees.

To investigate this concept, a study was conducted with the following variables:

- y = level of employee engagement (employee engagement)


- x2 = salary level of employees (ranging from low = 0 to high = 10)
- x5 = a dummy variable representing job position, where 0 represents non-managerial
positions and 1 represents managerial positions.
Call:
lm(formula = y ~ x2_salary + x5 + x2 * x5)

Residuals:
Min 1Q Median 3Q Max
-10.318 -3.421 0.352 3.311 9.957

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1678 0.4132 -0.406 0.685
x2_salary 4.8912 0.4642 10.538 <2e-16 ***
x5 0.5163 0.3823 1.351 0.179
x2:x5 0.5506 0.4565 1.206 0.230
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.547 on 121 degrees of freedom


Multiple R-squared: 0.4911, Adjusted R-squared: 0.4784
F-statistic: 38.92 on 3 and 121 DF, p-value: < 2.2e-16

Ignoring significance:

- What is the level of employee engagement among managers with a salary of 8000$?
- What is the effect of salary among non-managers?
- What is the level of employee engagement among non-managers with a salary of 5000$?
- What is the effect of salary on employee engagement?
- What is the effect of salary among managers?
Q24:

Suppose a researcher named Olivia wants to investigate whether there is a difference in the
happiness levels between individuals who practice mindfulness meditation (Group M) and individuals
who do not practice any form of meditation (Group N). Olivia administers a happiness questionnaire
to participants from both groups but she couldn’t collect enough data. Additionally she suspected
that the answers within the people not practicing were quite skewed.

Which statistical procedure you recommend?

- ANOVA
- Independent sample ttest
- One sample t-test
- Kruskall wallis test
- Mann-Whitnew-wilcoxon test
- Welch-t test
- Wilcoxon signed rank test

Q25:

A researcher wants to estimate the proportion of adults who support the legalization of
marijuana. Based on a sample of 500 adults, the proportion who support legalization is 0.60.
What is the sample size we need, if we want a confidence interval to be half its current size
(based on n = 500) and by taking the study with a proportion of 0.6 as our starting point?
Open questions from the practice test:

NB: You need the excel files data_change_motivation and data_risk

Question 1:

1- What is the null hypothesis of the Kruskall Wallis test?


2- Why did the researcher chose a Kruskall wallis test ?
3- What conclusion can be drawn based on this test? What is the most specific conclusion that
can be drawn? Make explicit use of the output provided here.
Question 2:

1- Create a linear model to test the aforementioned idea.


a. Copy then paste the commands you used:
b. Clarify which linear model you are estimating:
2- Upload a screenshot of the model outcome (including P-values and F-statistic)
3- Interpret the output in words. A few sentences is enough. Discuss the expectations (plural)
mentioned in the introduction. Are they supported by evidence ?
4- Store the residuals and the predicted values and make a scatterplot with relating predicted
and residuals. What do you observe? Is there a pattern? What are the consequences of this?

Question 3:

1- Select the most appropriate test to consider given this context


2- Explain in a few liens, why you selected that test? Make sure your answer does not also fit
another test. Make use of the information provided about the study.
3- Import the data into R and do the test you selected.
4- Upload a screenshot of the commands used to maybe change variables and to perform the
test. Then upload the output.
5- Interpret the output: What is most likely (95%) the outcome of a test against the null
hypothesis.
More Open questions: Copy paste the following code:

library(tidyverse)
library(haven)
library(broom)
library(modelr)
library(car)
library(lmtest)
library(dplyr)

## CREATED DATASET
# Set seed for reproducibility
set.seed(98765)
# Number of observations
n <- 79
# Create a dataset
dataset <- tibble(
happiness = round(runif(n, 0, 10), 1), # Dependent variable: Happiness
marital_status = sample(c("married", "single"), n, TRUE), # Independent variable: Marital_Status
experiences_abroad = round(runif(n, 0, 10), 1), # Independent variable: Experiences Abroad
age = sample(18:70, n, TRUE), # Independent variable: Age
fitness = round(runif(n, 0, 100)), # Independent variable: Fitness
gender = sample(c("male", "female"), n, TRUE), # Independent variable: Gender
exam = round(runif(n, 0, 10), 1), # Independent variable: Exam score
retake = exam + round(runif(n, 0, 2), 1) # Independent variable: Retake score (generally higher than
exam)

# Calculate happiness based on fitness and experiences abroad


dataset$happiness <- dataset$happiness + 0.5 * dataset$fitness + 0.3 * dataset$experiences_abroad
dataset$marital_status_dummy = ifelse(“married”,1,0),
Open question 4:

"Imagine you are conducting a study on factors influencing happiness levels in individuals. You have
collected data on several variables for a sample of individuals. The dependent variable is 'happiness,'
which is measured on a scale from 0 to 10. The independent variables include 'Marital_Status' (a
dummy variable with 'married' or 'single' as possible outcomes, you expect married people to be
happier). We also have 'experiences_abroad' (measured on a scale of 0 to 10 where you expect that
people who travelled a lot to be in general happier).

We also control for 'age' (ranging from 18 to 70, where we think that older people tend to be more
grumpy), and 'fitness' (measured on a scale from 0 to 100, where people who are more fit usually
score also higher in happiness).”

- A – Which model can be used in this study ?


- B – Perform the test in R,
o copy past the code used below :
o take a screenshot of the output
- C – Interpret the output in words. A few sentences is enough. Discuss the expectations
(plural) mentioned in the introduction. Are they supported by evidence ?
- D - Store the residuals and the predicted values and make a scatterplot with relating
predicted and residuals. What do you observe? Is there a pattern? What are the
consequences of this? In other words we ask you to investigate the assumptions of normality
and equal variance. Do it in two ways :
o Statistically and Graphically

Open question 5:

In this study, we focus on a group of students who took two exams—a first attempt and a retake—
within a specific subject. We want to see if their performance significantly changed after the retake
exam. The first exam served as a starting point, where we measured how well the students did
initially. We then identified the students who struggled and needed extra attention. The scores from
both exams were not evenly distributed. Only a few students showed a big improvement, while most
only saw a slight increase or even a decrease in their scores.

Our main research question is whether the identified group of students showed a significant change
in grades between the first attempt and the retake exam.

1- Select the most appropriate test to consider given this context:


a. Not evenly distributed: so we will use non-parametric
b. We have 2 samples , they are related “paired”
2- Explain in a few liens, why you selected that test? Make sure your answer does not also fit
another test. Make use of the information provided about the study.
3- Import the data into R and do the test you selected
4- Upload the output of the test
5- Interpret the output: What is most likely (95%) the outcome of a test against the null
hypothesis.
Open question 6:

“Imagine you are conducting a research study on happiness levels among individuals, and you want
to explore the impact of 'marital status' and 'age' on happiness. The dependent variable, 'happiness,'
is measured on a scale from 0 to 10. The independent variable 'marital status' has two categories:
'married' and 'single,' while 'age' ranges from 18 to 70.”

Your task is to investigate the relationship between happiness and age as you imagine that older
people tend to be less happy. However, you suspect that this relationship is especially significant
among married people as older people who are still married are happier.

- A – Which model can be used in this study ?


- B – Perform the test in R,
o copy past your commands
o take a screenshot of the output
- C – Interpret the outcome of the test
- D – Have a look at the assumptions in a graphically way.

Open question 7:

Understanding the impact of fitness on individuals is a topic of great interest, particularly when
considering potential differences between genders. In this study, we aim to investigate the effect of
fitness levels on males and females. To assess fitness levels, a series of physical tests were conducted
on a group of participants, consisting of both males and females. The data collected from these tests
revealed that the distributions of fitness scores within each gender group were not evenly
distributed.

The central objective of this study is to examine whether there is a significant difference in fitness
levels between genders based on the collected data.

1- Select the most appropriate test to consider given this context


2- Explain in a few liens, why you selected that test? Make sure your answer does not also fit
another test. Make use of the information provided about the study.
3- Import the data into R and do the test you selected
4- Upload the output of the test
5- Interpret the output: What is most likely (95%) the outcome of a test against the null
hypothesis.

You might also like