0% found this document useful (0 votes)
27 views15 pages

SOLVED - Practice Test

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views15 pages

SOLVED - Practice Test

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Practice test

Question 1:

A researcher conducts a survey to measure people's opinions on a political issue. The options for
responses are "Agree," "Disagree," and "Neutral." What is the measurement level of this variable?

a. Interval
b. Nominal
c. Ordinal
d. Ratio

Question 2:

Given the following table, calculate the chi square statistic: 16

x/y 0 1 Total
0 120 80 200
1 80 120 200
Total 200 200 400

Question 3:

Which of the following statements about correlation is correct?

I: Correlation measures the strength of the linear relationship between two variables.
II: Correlation can only range from -1 to +1.

a. Both I and II are correct


b. Both I and II are incorrect
c. Only II is correct
d. Only I is correct

Question 4:

A group of researchers conducted a study to examine the academic performance of students in a


particular course. They collected data from a random sample of 180 students who took the course
and calculated the mean score to be 20, with a variance of 9.

Now, they are interested in estimating the range within which the population mean score is likely to
fall. Using a 95% confidence level, what is the confidence interval for the population mean score?

- Using critical value (t) of 2 : [19,55 – 20,44]


- Using critical value (t) of 1.96: [19,56 – 20,43]
Question 5: Linear equation.

a- The value of the intercept is: 3


b- The value of the slope is: (6-0) / (-2-2) = 6/-4 = -1.5
c- The sign of the relationship is: negative
d- If the x value is 2, the predicted y value is: 0

Question 6:

After the introduction of a new mobile app, a company receives feedback from its users. The product
development team wants to determine the proportion of users who preferred the previous version
of the app compared to the new one (assuming all users have a preference and none are indifferent).
The team asks you to design a study using a random sample and determine the required sample size.
They are willing to accept a margin of error of 3 percent points.

How large should the sample be? (Rounding errors will be accepted).

- Using critical_t of 2 -> n = 1111


- Using critical t of 1,96 -> n = 1067
2
critical value ( p)(1− p) 1.96 2(0.5)(1−0.5)
n= Margin of Error
2
=
0.03
2
=1067
Question 7:

Which measure is most appropriate for assessing the relationship between study hours and exam
scores?

a. Pearson’s or Spearman's correlation


b. Kendall's tau-c
c. Cramer's V
d. Kendall's tau-b

Question 8:

A researcher conducted a multiple regression analysis to examine the relationship between customer
satisfaction (Var satisfaction) and three predictor variables: service quality (Var quality), price (Var
price), and advertising expenditure (Var advertising). The researcher obtained the following
information:

- Variance of customer satisfaction (Var satisfaction) = 9.6


- Variance of service quality (Var quality) = 12.3
- Variance of price (Var price) = 8.9
- Variance of advertising expenditure (Var advertising) = 6.5
- Variance of residuals (Var residuals) = 4.2
- Variance of predicted values (Var predicted) = 5.4

What is the unadjusted R squared of the model?

- 2 possible formulas:
o 1st: Var(Predicted) / Var(dependent variable) = 5.4/9.6 = 0.5625 so 56.25%
o 2nd : 1 – [ Var(residual) / Var(dependnet variable) = 1 – [4.2/9.6] = 0.5625

Question 9:

Given:

In a large country some people want to support farmers in the transition towards a more sustainable
operation of their farm, while others think farmers should not get that support. Imagine selecting a
large number of samples (all sized 100) from a population. For each of the samples the percentage of
people who want to support farmers is calculated. You put all percentages in a histogram. The middle
of the histogram is at 0.5 (50%).

Two statements:

I: the middle of the histogram is the same as the population proportion


II: Its shape very much looks like a normal distribution

a. Both I and II are correct


b. Both I and II are incorrect
c. Only I is correct
d. Only II is correct
Question 10: R output expected , predicted and residual

Call:
lm(formula = exam_grade ~ study_time, data = data)

Residuals:
Min 1Q Median 3Q Max
-1.4200 -0.6884 -0.1655 0.5024 3.2341

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.07059 0.18992 5.637 1.67e-07 ***
study_time 0.58888 0.03311 17.784 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.939 on 98 degrees of freedom


Multiple R-squared: 0.7634, Adjusted R-squared: 0.761
F-statistic: 316.3 on 1 and 98 DF, p-value: < 2.2e-16

- What is the expected grade of a student who studied 5 hours? Expected Y = 4,0149
- Suppose one person who studied 10 hours for the exam, had a residual of 2.5, what is the
observed (real) grade of this student ? Observed Y = 9,4593

Question 11:

A team of researchers conducted a study to assess the fitness levels of individuals in a particular
population. They found that the mean fitness score in the population is 120, with a standard
deviation of 15.Now, they are interested in estimating the percentage of people in the population
who are likely to have a lower fitness level than John. John's fitness score is 150.

Using this information, what is the estimated percentage of people in the population who have a
lower fitness level than John?

- T(number of standard deviations from population mean) = (150-120) / 15 = 2


- By rule of thumb 2 standard deviations correspond to 95% confidence interval
- The % of lower is from the left is 97.5% (draw the distribution and you will understand)

Question 12:

A researcher is studying the proportion of smartphone users who have installed a specific social
media app. In a random sample of 300 smartphone users, it was found that 180 of them had the app
installed. Given this information, what is the 95% confidence interval for the proportion ?

- Using r: prop.test(180,300, alternative = "two.sided", conf.level = 0.95 )


- Result = [0.5419 and 0.6554]
Question 13: Linear equation:

- Reference category
- Group 1

a- The value of the intercept (reference category) is: 0


b- The value of the statistic associated with group 1 is: -3
c- The value of the general slope is: When x increases by 4, y increases by 6 so 6/4 = 1.5

Question 14 :

A researcher investigates the association between political affiliation (measured with three
categories: Democrat, Republican, Independent) and voting preference for a specific policy (four
options are considered, focusing on the "most preferred policy" only). The researcher utilizes the chi-
square statistic to analyze the significance of this relationship.

How many degrees of freedom are associated with the chi-square statistic in this test?

- Df = (3-1) * (4-1) = 2*3 = 6

Question 15:

The variable ‘income’ is measured in euros.


What is the measurement level of the variable ‘income’?

a. Interval
b. Nominal
c. Ordinal
d. Ratio
Question 16:

Given the following table calculate the chisquare statistic: 4.76

x/y 0 1 Total
0 40 10 50
1 30 20 50
Total 70 30 100

Question 17 :

Suppose you calculated the chi square in a table by hand in a 3x3 table. The outcome of your
calculation is 7.14.

a- By using R, what is the associated p-value (2 decimals only).


a. Code: pchisq(7.14 , 4 , lower.tail = FALSE) -> result = 0.13
b- Does this mean the association is significant at a level of 95%?
a. P-Value = 0.1286 > 0.05 so not significant association.

Question 18: Output Regression and confidence interval

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.27 0.259 -1.068 0.287
x1 2.032 0.027 73.447 <2e-16 ***
x2 -2.965 0.026 - 110.767 <2e-16 ***
x3 0.0402 0.026 ………. 0.125
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.05 on 232 degrees of freedom


Multiple R-squared: 0.9874, Adjusted R-squared: 0.9872
F-statistic: 6046 on 3 and 232 DF, p-value: < 2.2e-16

a- Create a 95% confidence interval for the variable x1


a. What is the lower bound: 1.979
b. What is the upper bound: 2.08
b- Create a 95% confidence interval for the variable x3
a. What is the lower bound: -0.01
b. What is the upper bound: 0.09
c- Which of the following statements are correct?
a. A high (and significant) F value means that the model estimated here is correct ? YES
b. Using the estimate and SE you can calculate t? YES
i. Try to do it yourself for variable x3 -> t = 0.0402 / 0.026 = 1.546
c. In this model all variables are associated with the dependent variable?
i. NO only x1 and x2 are significant, x3 is not since P-Value = 0.125>0.05
Question 19: Linear equation

- Reference category
- Group 1
- Group 2

a- The value of the intercept (reference category) is: -1


b- The value of the b-coefficient associated with the dummy of group 1 is: -3
c- The value of the b-coefficient associated with the dummy of group 2 is: +1
d- If x is -4, the predicted y value for people in group 1 is: 4

Question 20:

A sociologist wants to examine the association between marital status (e.g., single, married,
divorced) and job satisfaction among employees in a company. The researcher aims to determine the
strength and nature of this relationship.

Which measure is the most suitable choice for analyzing the association between marital status and
job satisfaction?

a. Spearman's correlation
b. Pearson's correlation
c. Cramer's V
d. Kendall's tau-b
Open questions

Question 1: A researcher is interested in exploring the relationship between marital status and the
level of happiness in individuals. She believes that marital status could play a significant role in
shaping people's happiness levels. To investigate this relationship, the researcher gathers a dataset
consisting of information on individuals' marital status (married or single) and their corresponding
happiness scores, which are measured on a scale from 1 to 10. The researcher hypothesizes that
individuals who are married may experience higher levels of happiness compared to those who are
single.

a- Describe shortly which test is be used to assess the impact of marriage. Two sample t-test
b- Explain in a few (two or three) lines why you selected for this test:

- We are measuring happiness (scale) and we have 2 groups of marital status (married and
single), therefore we should use a t-test for 2 samples. Which one depends on the
assumption of equal variance:
- Equal variance assumption: (using R)
o var(dataset$happiness[dataset$marital_status == "married"]) = 0.53
o var(dataset$happiness[dataset$marital_status == "single"]) = 0.32
o 0.53/0.32 = 1.66 < 2 so we can assume equal variance
- Since we assume equal variance we must use Two sample t-test.

c- Show the output and commands


a. t.test(happiness ~ marital_status, data = dataset, var.equal = TRUE)

d- Perform a test. Do you conclude the fact that people get married has an effect on the level of
happiness? Explain shortly. A simple yes/no answer will not give you points (but omitting a
simple yes/no answer to the question will also make you lose points). Also shortly discuss the
confidence interval.

- Using P-Value <0.05 we can reject the null hypothesis and conclude that there is an effect of
marital status on happiness. In other words, the fact that you are married or not matters in
your happiness. Married people scored on average 7 while single people score 5.1 on
happiness.
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the difference in happiness between the two groups is not going to
be 0. That difference in happiness is between 1.71 and 2.05.
Question 2: A researcher is interested in exploring the participation in yoga classes among individuals
in a certain population. Yoga is known to offer various benefits for physical and mental well-being.
The researcher conducts a survey to collect data on individuals' involvement in yoga classes, with
response options indicating whether they have applied to yoga (1) classes or not (0).

a- Describe shortly which statistical analysis would be appropriate to assess the participation in
yoga classes. Proportion test
b- Explain in a few lines why you selected this analysis.

- We are measuring whether people applied to yoga classes or not which is a dichotomist
variable with possible answers being yes or no

c- Upload a screenshot displaying the relevant output

d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).

- 1st we calculated the number of participants in yoga classes: sum(dataset$yoga == 1)


o The result was 119
nd
- 2 we compute a proportion test:
o prop.test(119,236, alternative = "two.sided", conf.level = 0.95 )

e- Based on the output from the dataset:


a. estimate the percentage of individuals who have applied to yoga classes. 0.504
b. Provide the lower bound of the conf. interval for the estimated percentage: 0.43
c. Provide the upper bound of the conf. interval for the estimated percentage:0.56
Question 3:

"After the summer season, a group of people decided to focus on their weight (measured in kg) and
get healthier. They started a weight management program in September, which included changes in
their diet and exercise habits during 4 months so until December.

To assess the effectiveness of the program, the participants' weights were measured first in
September and then in December. Let's analyze the data and find out if the weight management
program led to noticeable weight loss."

a- Describe shortly which statistical analysis would be appropriate to assess the effectiveness of
the program. T-test one sample
b- Explain in a few lines why you selected this analysis

- We are measuring weight loss (scale) and there is nothing specified about different groups.
So we only have 1 group. There fore we choose a t-test one sample

c- Upload a screenshot displaying the relevant output

d- Upload a screenshot of the commands or steps you would use to perform the statistical
analysis (even if you were unable to execute the analysis).

- 1st : we computed the difference in weight between December and September


dataset$weight_diff <- dataset$weight_december- dataset$weight_september
- 2nd : We compute a t-test one sample of the newly generated weight_diff variable
t.test(dataset$weight_diff, mu = 0, conf.level = 0.95)

e- Based on your analysis, provide a conclusion regarding whether there was a significant
change in weight over the specified time period."
f- Use a 95% confidence interval to answer the question and explain what does this interval
mean.

- Using P-Value <0.05 we can reject the null hypothesis and conclude that there was an effect
of the weight management program. Indeed we can see that on average people lost 0.984
kilograms
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the weight loss was not 0. Onn average people lost between -1.058
and -0.91 kg.
Question 4:

Remember the weight management program from the previous question? Before starting the
program, some participants were already practicing yoga while others were not. Now, a question
arises: Some researchers claim that the weight loss was actually due to the fact that some individuals
were already engaged in yoga before the program experience. To find out, in the dataset there are
multiple weights measured at different times. By examining this dataset, we may gain insights into
the potential impact of pre-existing yoga practice.

a- Describe shortly which statistical analysis would be appropriate in this scenario.


Two sample t-test
b- Explain in a few lines why you selected this analysis

- We are measuring weight in september (scale) and we have 2 groups (yes yoga and no yoga),
therefore we should use a t-test for 2 samples. Which one depends on the assumption of
equal variance:
- Equal variance assumption: (using R)
o var(dataset$weight_september[dataset$yoga == 0]) = 223
o var(dataset$weight_september [dataset$yoga == 1]) = 218
o 223/218 = 1.02 < 2 so we can assume equal variance
- Since we assume equal variance we must use Two sample t-test.

c- Upload a screenshot displaying the relevant output and commands


a. t.test(weight_september ~ yoga, data = dataset, var.equal = TRUE)

d- Based on your analysis, provide a conclusion regarding the potential impact of pre-existing
yoga practice on weight loss.
e- Compute a 95% confidence interval and explain what does this interval mean in this context.

- Using P-Value = 0.8708 >0.05 we can’t reject the null hypothesis and conclude that the
weights in September were not so different between groups. In fact we can see that people
not doing yoga weighted on average 75.65 while people attending to yoga weighted 75.97
- Using confidence interval: since 0 is included inside the confidence interval, we can say that
in 95% of the cases the difference in weights between the two groups is around 0. That
difference is between -4.12 and 3.49.
Question 5: A company offers five different training programs to its employees: Program A to
Program E. The company is interested in assessing whether the distribution of employees across
these programs matches the expected distribution based on their preferences. A random sample of
employees is taken, and their program preferences are recorded. The company believes that the
distribution of program preferences in the sample may deviate from the expected distribution.

a- Describe shortly which test is used to answer this question (name of the test)”
a. Goodness of fit test
b- Explain in a few (two or three) lines why you selected this test:
a. I have proportions about a nominal variable (we have 5 programs) and I want to
compare if the proportions of my sample (observed) match the expected proportion
in the population (expected).
c- Upload a screenshot displaying the relevant (numerical, not graphical) output of the test.

Chi-squared test for given probabilities


data: observed
X-squared = 44.7, df = 4, p-value = 4.59e-09

d- Upload a screenshot of the commands used to perform the test (even if you were unable to
execute the test).

observed <- c(50,60,45,40,55)


expected <- c(0.20,0.25,0.20,0.25,0.1)
chisq.test(x = observed, p=expected)

e- Based on the output, do you conclude that the distribution of employees across training
programs matches the expected distribution?

H0: It is likely that the data come from a random sample in the population (The sample fractions and
population fractions are “equal = close enough”)

H1: It is UNlikely that the data come from a random sample in the population (The fractions or
proportions are statistically “different”)
Question 6: A researcher is interested in investigating the potential benefits of yoga practice on
individuals' happiness levels. It is believed that yoga can promote relaxation, reduce stress, and
enhance overall well-being. The researcher gathers a dataset consisting of information on individuals'
participation in yoga classes (yes or no) and their corresponding happiness scores, measured on a
scale from 1 to 10. The researcher hypothesizes that individuals who participate in yoga classes may
experience higher levels of happiness compared to those who do not engage in yoga practice.

a- Describe shortly which test is used to assess the impact of participating in yoga classes on
happiness levels. Welch test
b- Explain in a few lines why you selected this test for evaluating the relationship between yoga
participation and happiness.

- We are measuring happiness (scale) and we have 2 groups of yoga (0 and 1), therefore we
should use a t-test for 2 samples. Which one depends on the assumption of equal variance:
- Equal variance assumption: (using R)
o var(dataset$happiness[dataset$yoga == "1"]) = 2.01
o var(dataset$happiness[dataset$yoga == "0"]) = 0.54
o 2.01/0.54 = 3.72 > 2 so we CAN’T assume equal variance
- Since we can’t assume equal variance we must use Welch test

c- Provide the output and commands used to conduct the test.

t.test(happiness ~ yoga, data = dataset, var.equal = FALSE)

d- Perform the test and draw conclusions regarding the effect of participating in yoga classes on
happiness levels. Avoid giving a simple yes/no answer and instead provide a brief
explanation based on the statistical analysis. Additionally, discuss the confidence interval and
its role in interpreting the results."

- Using P-Value <0.05 we can reject the null hypothesis and conclude that there is an effect of
yoga on happiness. In other words, the fact that you do yoga or not matters in your
happiness. People doing yoga scored on average 6.28 while doesn’t not doing yoga scored
5.92 on happiness
- Using confidence interval: since 0 is not included inside the confidence interval, we can say
that in 95% of the cases the difference in happiness between the two groups is not going to
be 0. That difference in happiness is between 0.65 and 0.07 in 95% of the cases.
Question 7: The management of a wellness center is interested in understanding whether there are
significant differences in the happiness levels of its clients based on their diets. The center offers
various wellness programs and wants to explore if there is a connection between dietary choices and
happiness. The center believes that people with different dietary preferences may experience varying
levels of happiness and wants to validate this claim:

a- Which statistical test would you use to explore whether there are significant differences in
happiness levels among individuals with different diets?
Anova test (F-test) from lm
b- Explain in a few lines why you selected this test.
Because we want to compare differences between all three groups.
c- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.

question7 <- lm(happiness~diet, data=dataset)


summary(question7)
Call:
lm(formula = happiness ~ vegan + no_diet, data = dataset)

Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.01446 0.12596 47.750 <2e-16 ***
vegan 0.06481 0.17867 0.363 0.717
no_diet 0.23202 0.18550 1.251 0.212
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.148 on 233 degrees of freedom


Multiple R-squared: 0.006975, Adjusted R-squared: -0.001549
F-statistic: 0.8183 on 2 and 233 DF, p-value: 0.4424

d- Based on your analysis, provide a conclusion regarding whether there is a significant


difference in happiness levels.
P-value = 0.4424 > 0.05 so we can’t reject H0. This means that we don’t have enough
evidence to say that the average levels of happiness differs between the different
groups of diets. In other words it means that there is no effect on the type of diet on
happiness.

A popular health magazine published a claim that individuals following a Vegan diet tend to exhibit
higher motivation levels compared to individuals with Vegetarian diets.

a- Which statistical test would you use to answer this question and why? Lm with dummies
because I want to see the differences between each group with each other, not as a whole.
b- Show (copy paste) the commands or steps you would use to perform the statistical analysis
(even if you were unable to execute the analysis). Also upload the output.

1st we create dummies


dataset$vegan = ifelse(dataset$diet == "Vegan", 1, 0)
dataset$vegetarian = ifelse(dataset$diet == "Vegetarian", 1, 0)
dataset$no_diet = ifelse(dataset$diet== "No diet", 1, 0)
2nd we create the model:
#you can answer this question by choosing whether vegan or vegetarian as reference example with
vegan:

model_vegan_ref <- lm(happiness ~ vegetarian + no_diet, data = dataset)


summary(model_vegan_ref)
Call:
lm(formula = happiness ~ vegetarian + no_diet, data = dataset)

Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.07927 0.12672 47.973 <2e-16 ***
vegetarian -0.06481 0.17867 -0.363 0.717
no_diet 0.16721 0.18602 0.899 0.370
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.148 on 233 degrees of freedom


Multiple R-squared: 0.006975, Adjusted R-squared: -0.001549
F-statistic: 0.8183 on 2 and 233 DF, p-value: 0.4424

#example with vegetarian


model_vegetarian_ref <- lm(happiness ~ vegan + no_diet, data = dataset)
summary(model_vegetarian_ref)
Call:
lm(formula = happiness ~ vegan + no_diet, data = dataset)

Residuals:
Min 1Q Median 3Q Max
-2.24648 -0.91446 -0.03047 0.73693 2.38554

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.01446 0.12596 47.750 <2e-16 ***
vegan 0.06481 0.17867 0.363 0.717
no_diet 0.23202 0.18550 1.251 0.212
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.148 on 233 degrees of freedom


Multiple R-squared: 0.006975, Adjusted R-squared: -0.001549
F-statistic: 0.8183 on 2 and 233 DF, p-value: 0.4424

c- Based on your analysis, provide a conclusion regarding the claim.

P-value = 0.717 > 0.05 so we can’t reject H0. This means that there is no significant difference in
happiness levels between vegan and vegetarians (reference).

You might also like