0% found this document useful (0 votes)
30 views13 pages

Chapter 1 Solutions

Uploaded by

ciel33shum9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views13 pages

Chapter 1 Solutions

Uploaded by

ciel33shum9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1.

Drug A is a new drug created to treat Congenital Amegakaryocytic Thrombocytopenia


(CAMT). An experiment was done to evaluate survival outcomes of drug A, against the
current standard, Rituximab, for treating CAMT. 150 CAMT patients were assigned
to the drug A group, and 150 CAMT patients were assigned to the Rituximab group.
A portion of the experimental design is summarised in the table below.
Drug A Rituximab
Male 82 85
Female 68 65
Total 150 150
Determine if the statement below is true or false:
“The table shows that random assignment was not done, because the two groups (drug
A and Rituximab) do not have the same number of females.”
The statement is false. Random assignment does not guarantee that the number of
females will be exactly the same in both groups (drug A and Rituximab). If random
assignment was done on a large number of subjects, the two groups will tend to be
similar (not necessarily the same) in all aspeccts.
2. In a study on the relationship between watching television and obesity, 3000 individ-
uals were recruited via an advertisement put up by ABC newspaper in Singapore.
Afterwards, the investigator of the study separated the participants into two groups.
The participants in one group consists of people who, on average, watch television for
at least 4 hours a day, while the other group consists of people who, on average watch
television strictly less than 4 hours a day. The investigator then records how many
participants are obese in each group. Which of the following is/are true?
(I) The result of this study is generalisable to the population who reads ABC news-
paper.
(II) This is an observational study.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).
Answer is (B). The result is not generalisable because the investigator uses non-
probability sampling (volunteer sampling). The study is clearly an observational study
because the researchers are not involved in the assignment of the subjects to either
group, but rather is done by self assignment.
3. In a large scale experiment, a researcher randomly assigned 6000 subjects to receive
either a drug or a placebo. 4000 patients were assigned to receive the drug, and the
other 2000 patients received the placebo. The researcher did a quick headcount in the
drug-receiving group and noted that there were 3002 males who received the drug.
The researcher does not have time to do a headcount in the placebo group. Which is
the most reasonable number of males to be expected in the placebo group?
(A) 1000.
(B) 1500.
(C) 2000.
(D) 2998.

1
Answer is (B). Randomised assignment of a large number of subjects tend to produce
groups which are similar in all aspects (including the proportion of males in each
group). 6000 is a reasonably large number, so we would expect the proportion of
males and females in each group to be similar. Among the 4000 subjects who received
the drug, 3002 (about 75%) were males. Hence among the 2000 patients who received
the placebo, about 75% (1500) of them should be male.
4. Virus X has been known to cause very severe symptoms in its patients. Previously
there has been no anti-viral medicine to treat virus X. Recently, researchers have
finally managed to produce a trial drug in the form of a tablet. Researchers want
to investigate if the trial drug helps to reduce the duration of symptoms (number of
days) in patients. 1000 patients were sampled for the study, and all consented to join
the study.
Which of the following statements is/are true? Select all that apply.
(A) Random sampling should be done to ensure that the subjects’ demographic-
s/characteristics are similar (in the treatment and control groups).
(B) Blinding the researchers to the subjects’ assigned groups (treatment or control
group) is important because the researchers may have certain bias for/against
the drug.
(C) If the study randomly assigns 400 subjects into the treatment group, and 600
subjects into the control group, the result of the study will be biased due to the
unequal number in the two groups.
Only (B) is correct. Random assignment (not random sampling) should be done to en-
sure that the subjects’ demographics/characteristics are similar in the treatment and
control groups. Furthermore, random assignment does not require the same number
of subjects in both treatment and control groups. As long as the number of subjects
is large, the treatment and control groups will likely have similar demographics/char-
acteristics. In fact, since we are comparing rates and not numbers, it does not matter
if we have unequal group sizes.
5. A researcher has invited 500 people to participate in his study. He uses random assign-
ment to assign the subjects into the Treatment and Control groups. The Treatment
group has 200 subjects, and the demographics of the 200 subjects are as follows:
Male Female
Old 83 18
Young 32 67
The 300 remaining subjects are in the Control group. The researcher should expect
the number of young males in the Control group to be around .
(A) 32.
(B) 48.
(C) 51.
(D) 173.
Answer is (B). When random assignment is conducted, we can expect the percentage
of males to be similar in both Treatment and Control grpups. We can also expect the
percentage of young people to be similar in both Treatment and Control groups. The
32
proportion of young males in the Treatment group is 200 = 0.16. We can expect a
similar proportion in the Control group, implying that we expect about 0.16×300 = 48
young males in the Control group.

2
6. Xiao Lian is a frequent TikTok user, and she wants to compare the effect of watching
TikTok dance videos versus watching maths videos on the happiness levels of all NUS
students.
She video calls 200 of her closest NUS friends to take part in her study, but only 100
of her friends responded. Half of these 100 friends were chosen at random to watch
a TikTok dance video while the other half were shown a maths video, and she asks
them to rate their happiness levels before and after watching the videos.
Which of the following statements is true?
(A) This is an observational study.
(B) This study is likely to contain selection bias.
(C) If all 200 of her friends responded, the study’s findings are generalisable to all
NUS students.
(D) Random assignment was conducted, so the study does not contain selection bias.
Answer is (B). This study is likely to contain both selection bias and non-response
bias. It is likely to have selection bias because Xiao Lian chose from only her friends,
and not the entire NUS population of students. For example, Xiao Lian might have
more friends in the same faculty as her as compared to other faculties, or they might
have the same interests (TikTok) as her. The study is likely to have non-response
bias because 100 of them did not respond. Random assignment is a technique used
to remove the effects of potential confounding variables. Random assignment will not
remove selection bias in a study.
7. A study was conducted to investigate if memory foam pillows helped to improve one’s
sleep quality. A large number of subjects were sampled via simple random sampling.
The study aimed to randomly assign the subjects into the treatment and control
groups. Thus, a fair coin was flipped for each subject to determine if the subject
should be assigned into the treatment or control group – if “heads”, the subject is
assigned into the treatment group, if “tails”, the subject is assigned into the control
group. Subjects in the treatment group received a memory foam pillow, while subjects
in the control group received a regular pillow. Below is the description of the number
of subjects between the two groups.
Treatment Control Total
Male q r s
Sex Female t u v
Total w x y
From the above table, which of the following statements is true?
q r
(A) w should be approximately equal to x.

(B) q must be equal to r.


(C) s should be approximately equal to v.
(D) w must be equal to x.
Answer is (A) because we expect that if random assignment was done and the coin
is fair, we would see a similar proportion (may not be the exact proportion) of males
between both treatment and control groups. None of the other statements results from
random assignment using a fair coin or from simple random sampling.
8. Which of the following statements is true regarding observational studies and experi-
mental studies?

3
(A) Observational studies involve manipulating variables to establish cause-and-effect
relationships.
(B) Experimental studies are only used to study rare events or long-term trends.
(C) Observational studies rely on random assignment to groups to control for con-
founding variables.
(D) Observational studies can establish associations.
Answer is (D). We are not able to manipulate variables in observational studies. The
use of experimental studies is not restricted to the study of rare events nor long-term
trends. Random assignment is not feasible in observational studies.
9. Patch Z is a new medicine created to remove muscle soreness. A study was done to
investigate the effectiveness of patch Z. The population of interest was Singaporean
adult males. For this study, the researchers requested the Singapore Sports Association
to sample all male athletes who reported for training over the week. 200 male athletes
were sampled. There was no non-response. The 200 subjects had their identity tags
randomly shuffled in a box. The first 100 tags picked from the box were assigned to
the treatment group - administered patch Z. The remaining 100 were administered a
placebo. 72% of the group that received patch Z had their muscle soreness alleviated,
while 34% of the other group had their soreness alleviated. Which of the following
is/are true?
(I) We are not able to generalise these results to the population of interest.
(II) Random assignment was not conducted.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).
Answer is (A). The result cannot be generalised, because of the sampling method
and the fact that the sampling frame does not contain the population of interest.
The method is non-probability sampling, and the sampling frame contains only male
athletes, or those who went for training that week, but the population of interest is all
Singaporean adult males. Take note that random assignment was actually conducted.

10. A researcher is trying to study the happiness level of all current NUS students. Which
of the following is/are (an) example(s) of probability sampling methods?
(I) The researcher gets a list of all current NUS students’ emails from the adminis-
trative office and randomly selects 100 students’ emails. He then emails them a
link to a short e-survey. 50 students replied to his survey.
(II) The researcher invites all final year NUS students in his faculty to visit his lab
for a psychological test to determine their happiness level, with a promise to
compensate them for their time with a $10 voucher. 200 students turned up for
the test.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).

4
Answer is (A). Randomly selecting 100 emails from the list of student emails is a
probability sampling method, even if the response rate is not high. On the other
hand, the process of sending an email to only final year students in the researchers’
faculty is done out of convenience, and convenience sampling is a non-probability
sampling method.
11. To find out the employment status of fresh graduates from University ABC, a ques-
tionnaire was sent to all of them. 30% of fresh graduates responded to the survey.
The employment rate was calculated from the responses. Which of the following is
likely to cause the calculated rate to differ from the population rate?
(I) Selection bias.
(II) Non-response bias.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (B). 30% of fresh graduates may not be representative of the whole popu-
lation. It is possible that those who did not respond may have different employment
status from those who responded. Although the response rate is low, it is still a census.
There is no selection bias in a census.
12. Suppose I wish to find the average intelligence quotient (IQ) of all Primary 5 children
studying in local schools in Singapore. I first selected a random sample of 10 schools
out of all local primary schools in Singapore. Then I asked all the Primary 5 children
in these chosen 10 schools to take an IQ test. Finally, I obtained the average value of
all the IQ scores of children who took the test, which was 106. Which of the following
statements is/are correct?
(I) The parameter in this study is the average IQ of all Primary 5 children who took
the IQ test.
(II) Stratified sampling was employed in this study.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (D). The parameter in this study is the average IQ of all Primary 5 children
studying in local schools in Singapore. 106 is a sample estimate of the actual
parameter. Hence statement (I) is incorrect. In stratified sampling, the population is
divided into groups (strata) and then we randomly obtain a sample from each group.
In cluster sampling, the population is first divided into groups (clusters). Then we
take a random selection of clusters from all clusters, and include all units in the chosen
clusters to comprise our sample. Here, cluster sampling is employed, where each school
is a cluster. Hence statement (II) is also incorrect.
13. An airline would like to find out the quality of service provided by their staff on one
of its flights. The airline posted an interviewer right after the plane landed and told
the interviewer to start her interview when the third passenger came out from the
plane. Thereafter, she would interview the next customer who came out from the

5
plane each time she had finished interviewing the previous customer. What is the
sampling method employed by the interviewer?
(A) Systematic sampling.
(B) Simple random sampling.
(C) Cluster sampling.
(D) None of the other given options.
Answer is (D). This sampling method is not a probability sampling method since there
is no probability involved in selecting passengers to be interviewed. All the methods
given in the other options are probability sampling methods.
14. In a drug factory, pills were manufactured in 1000 batches, with 20 units per batch,
forming a total of 20000 units. You decide to sample some of these pills to ensure that
the dosage is right. Suppose you randomly sample two batches and then select every
unit in these batches to be in your sample. What sampling method did you use?
(A) Systematic sampling.
(B) Stratified sampling.
(C) Cluster sampling.
(D) Simple random sampling.
Answer is (C). Since every unit from the randomly selected batches (the clusters) are
included in the sample, this is an example of the cluster sampling method.
15. Assume you have a sampling frame of your entire population of interest, which com-
prises of 100 people’s names. Which of the following methods can be used to select a
simple random sample of 10 people from this population?
(I) Assign, without replacement, to each person a random number from 1 to 100 such
that none of them share the same number. Choose the people assigned numbers
1 to 10.
(II) Write the names on equal sized pieces of paper, put the papers in a hat. Shake
the hat, mix the papers well and draw out 10 names.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (C). In simple random sampling, every unit in the population of interest
must have the same chance of being selected in the sample.
16. Caleb resides in Stamford Hall, which comprises 400 university students. He wishes
to understand the association between Cumulative Average Point (CAP) and sleeping
hours among students residing there. There are a total of 100 rooms, with rooms
numbered 1 to 100, and each having 4 resident students. Rooms 1 to 50 contain male
residents, while rooms 51 to 100 contain female residents. Caleb picks all rooms that
are multiples of 5 (i.e., rooms 5, 10, 15, . . ., to 100), and all students in the selected
rooms were asked to do the study.
Which of the following must be true?
(I) The above is an example of systematic sampling.

6
(II) The response rate for the study is 100%.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (D). It is not stated explicitly that the selection of the rooms is done via a
randomised mechanism. Therefore, the sampling process is non-probability in nature.
Also, it is not stated explicitly if all who were asked to do the study will respond.
Therefore, the response rate for the study may not necessarily be 100%.
17. A researcher wants to know the average weight of year 1 students in University A. The
researcher does not have access to such information, hence he decided to do a survey.
All University A year 1 students have to take a compulsory module in the first semester
of their studies, and hence have to be present for an in-person examination on 24th
April at 1pm. The researcher stood outside the examination venue’s only exit with a
weighing scale and waited for the examination to end.
There were too many students for the researcher to weigh. Hence, to decide whom
to weigh, while students were exiting, the researcher used a random integer generator
to produce a random integer for each student. If the random integer was even, the
researcher will measure the student’s weight. If the random integer was odd, the re-
searcher will not measure the student’s weight. Assume that all students exited the
venue orderly in a line and were compliant with the researcher. There were 800 stu-
dents exiting the venue, and the random integer generator produced 200 even numbers
and thus 200 students were weighed.
What is an/are issue(s) that the study is likely to face? Select all that apply.
(A) Bias present due to the non-probability sampling method.
(B) Bias present due to the random integer generator producing unlikely random
numbers.
(C) Bias present due to a poor sampling frame chosen.
(D) None of the other options.
Answer is (D). The study uses a form of probability sampling, where every unit has a
non-zero and known probability of being selected. Also, the random integer generator
is not always expected to produce 50% odd numbers and 50% even numbers, as it
depends on the settings. The population of interest matches the sampling frame for
this situation - the year 1 students.
18. A class of 150 men and 250 women is seated in an examination hall that has 25 rows
of chairs, with 16 chairs in each row. The men are seated in chairs numbered 1 to
150, and the women are seated in chairs numbered 151 to 400. Which of the following
scenarios will NOT produce a simple random sample of students from this class of
400? Select all that apply.
(A) Choose a row at random and select all the students in that row.
(B) Use a random number generator to generate 10 integers from 1 to 400. Select the
students seated in chairs corresponding to these numbers.
(C) Use a random number generator to generate 10 integers from 1 to 150, and 10
integers from 151 to 400. Select the students seated in chairs corresponding to
these numbers.

7
(D) Randomly select a letter from the English alphabet. Select for the sample stu-
dents whose family names begin with that letter. If no family name begins with
that letter, randomly choose another letter from the alphabet.
(A), (C) and (D) will not produce a simple random sample of students. Recall that
a SRS is a sample such that any sample of size n is equally likely to be chosen. (A)
does not produce a SRS, since not every group of size 16 has the same chance of being
selected. For example, students from different rows have no chance of being selected
together. (B) produces a SRS, since every group of size 10 has the same chance of
being selected. (C) does not produce a SRS, since not every group of size 20 has the
same chance of being selected. For example, a group of 20 women cannot all be in the
sample. This is actually a stratified sample, with sex as strata. (D) does not produce
a SRS, since any two students with family names starting with different letters cannot
both be in the sample.
19. Which of the following scenarios involve(s) the use of probability in the sampling
process?
Select all that apply.
(A) A student wants to know how receptive bus commuters in Singapore are to the
recent announcement of a price increase. To obtain a sample, he went to a
nearby bus interchange and looked for commuters wearing white clothing (his
favourite colour). For each commuter wearing white, if the commuter was wearing
spectacles, he approached the commuter. Otherwise, he did not approach the
commuter. We may assume that every commuter that he approached completed
the survey.
(B) An event organiser wants to survey a sample of the participants of his event to find
out if they liked the activities he planned. He announced to all the participants
that anyone who completed the survey would win a prize with a probability of
0.8. The sample was made up of 75 participants who responded to the survey.
(C) The principal of a primary school wants to select a sample of his primary one
students to find out how they are coping with formal school education. There are
10 primary one classes and each class has 42 students. The principal went into
each class and rolled a six-sided die. If the die showed k (k = 1, 2, 3, 4, 5 or 6), then
students in the class with register numbers k, k + 6, k + 12, k + 18, k + 24, k + 30
and k + 36 would be included in the sample. A final sample of 70 students, 7
from each class, was formed.
(D) A researcher is trying to study how much time students from the Faculty of
Science sleep every day. He obtained a list of all Science students from the ad-
ministration and for each student on the list, he generated a random number from
1 to 10. If the number generated was less than 7, the student was not selected.
If the number was 7 or more, the student was selected, and the researcher sent
an email to the student with a few survey questions. A total of 700 students
received the survey email and 500 of them replied.
(C) and (D) are correct as both employ a randomised mechanism in the selection
process. (B) is non-probability sampling (volunteer sampling) while (A) is also non-
probability sampling as the student conveniently went to a nearby bus interchange
(convenient sampling).
20. The Registry of Marriages is interested to see the relationship between the ages of
husbands and wives in City X. They randomly sampled 1000 pairs of husbands and
wives from the population of City X and obtained data of their ages (in years). Looking
through the data, they found that men always marry women who are younger than

8
them.
Based only on the information given above, which of the following statements must
be true?
(I) The average age of the husbands is more than the average age of the wives.
(II) The standard deviation of husband’s age is more than the standard deviation of
wife’s age.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).
Answer is (A). We cannot tell anything about the spread of either the husbands’ ages
or the wives’ ages just by knowing that men are marrying women younger than them.
If all husbands are older than their wives, then it follows that the average age of the
husbands is going to be more than the average age of the wives.
21. A teacher has just finished marking the final examination scripts for her class of 50
Secondary 1 students. She informs the students that the class average is 67.3. The
maximum mark for the examination is 100 and the passing mark is 50. A student
receives his examination script and realises his score is 65 which is lower than the
average score. Based only on the information given above, which of the following
statements must be true?
(I) The student has performed worse than half the class.
(II) Everyone in the class has passed the test.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).
Answer is (C). A score lower than the mean does not imply that the student has
performed worse than half the class. Consider the following set of scores for 10 students
45, 55, 60, 62, 64, 64, 65, 85, 86, 87.
The average is 67.3. The student who has scored 65 clearly has not performed worse
than half the class. Neither has everyone in the class passed the test since one student
has scored 45. A similar data set can also be constructed for 50 students. Therefore,
neither statement is true.
22. We have learnt that the standard deviation and interquartile range (IQR) are examples
of summary statistics that help us to quantify the spread of data points. However, they
are not the only ways of quantifying spread and there are other summary statistics
that can also help us to do this. For a numerical variable x, we can define the Mean
Absolute Deviation (commonly abbreviated as MAD) using the formula
|x1 − x| + |x2 − x| + · · · + |xn − x|
Mean Absolute Deviation of x = ,
n
where x1 , x2 , . . . , xn are values for the variable in a data set and n is the number of
data points in the data set. The MAD is sometimes used in place of the standard
deviation as a measure of quantifying the spread of the data. Based on the above
formula, which properties must the MAD possess? Select all that apply.

9
(A) The MAD cannot take a negative value.
(B) The MAD does not change when a constant is added to all the data points.
(C) The MAD does not change when a constant is multiplied to all the data points.
(D) If the MAD is zero, then all the values of x1 , x2 , . . . , xn in the data set are the
same.
(A), (B) and (D) are correct. Based on the formula above, the MAD behaves very
similarly to the standard deviation. Since we are taking absolute values, the MAD
can never be negative.
If a constant is added to all the data points, then the constant is also added to the
mean of the new data therefore the absolute difference between each point and the
mean continues to remain the same.
When we multiply a constant to all the data points, the mean is also multiplied by
the same constant, therefore the difference between each point and the mean is also
multiplied by the same constant. Hence, the MAD is multiplied by the constant. The
constant can be numbers other than 1 or −1, so the MAD can change.
Finally, if the MAD of x is zero, it means

|x1 − x| + |x2 − x| + · · · + |xn − x|


= 0,
n
which means
|x1 − x| + |x2 − x| + · · · + |xn − x| = 0.
Since the absolute value of any number cannot be negative, we are adding numbers
which cannot be negative to give us 0. Therefore, each number can only be zero which
means every data point is equal to the mean.
23. Suppose X is a numerical variable and the following are 10 data points for this variable.

4, 7, 4, 14, 10, 11, 17, 3, 8, r,

where r is a positive whole number that is unknown. Which of the following statements
is/are always correct? Select all that apply.
(A) If the mean is greater than 8 then r must be greater than 2.
(B) If r is greater than 2, then the median must be greater than 8.
(C) The mean is always greater than the median regardless of the value of r.
(D) The mode is always greater than the median regardless of the value of r.
Only (A) is correct. The sum of the 10 data points is

4 + 7 + 4 + 14 + 10 + 11 + 17 + 3 + 8 + r = 78 + r.

If the mean is greater than 8, then 78 + r must be greater than 80, which implies r
must be greater than 2. Arranging the 9 numbers excluding r in increasing order, we
have
3, 4, 4, 7, 8, 10, 11, 14, 17.
Note that the median is a number m where 50% of the numbers are smaller than m.
For example, if r = 3, the median is 7.5 thus (B) is incorrect. From above, since r is
at least 1, the mean is at least 7.9. But if r = 10 for example, then the median is 9
which is higher than the mean. So (C) is incorrect. (D) is also incorrect since if r = 4,
the mode is 4 while the median is 7.5.

10
24. Consider a data set consisting of values for a numerical variable x. Let the values
be x1 , x2 , . . . , xn arranged in ascending order. A value y is said to be the balancing
point of x in the data set if the following condition is satisfied.

(y − x1 ) + (y − x2 ) + · · · + (y − xk ) = (xk+1 − y) + (xk+2 − y) + · · · + (xn − y)

where x1 , x2 , . . . , xk are the values of x in the data set that are smaller than or
equal to y and xk+1 , xk+2 , . . . , xn are the values of x in the data set that are larger
than y. For example consider a small data set {1, 3, 5, 5, 5, 7, 9}. In this case the value
5 is the balancing point of the data set since

(5 − 5) + (5 − 5) + (5 − 5) + (5 − 3) + (5 − 1) = (7 − 5) + (9 − 5).

Which of the two statements below is/are true?


(I) The median of x is always the balancing point of x in any data set.
(II) The mode of x is always the balancing point of x in any data set.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (D). Neither median nor mode is always the balancing point of a data set.
Consider a small data set {1, 1, 3, 4, 5}. The median of the data set is 3. However, we
observe that (3 − 1) + (3 − 1) is not the same as (4 − 3) + (5 − 3). Similarly, the mode is
1 but once again we see that (1−1)+(1−1) is not the same as (3−1)+(4−1)+(5−1).

25. City planners wanted to know how many people lived in a typical housing unit so they
compiled data from hundreds of forms that had been submitted in various city offices.
Summary statistics are shown in the table below.
Mean Standard Deviation Min Q1 Median Q3 Max
2.53 1.4 1 1 2 3 8
The city bases their garbage disposal fee on the occupancy level of the home or apart-
ment. The annual fee is $50 plus $4 per person, so a single-occupant home pays $54
and homes with 10 people pay $50 + $4 ×10 =$90 a year.
The median fee paid is (1) and the IQR of the fee paid is
(2) .
Fill in the blanks for the statement above, give your answers correct to 2 decimal
places.
The answer to the first blank is 50 + 2 × 4 = 58 since the median number of occupants
is 2. The IQR of the fee paid is (50 + 3 × 4) − (50 + 1 × 4) = 8.
26. An examination was given to Class A and Class B, which consisted of 20 students each.
The minimum possible score for the examination is 0 and the maximum possible score
is 100. The range of scores in Class A is from 75 to 95 marks. All the students in
Class B scored less than or equal to 50 marks. Due to shortage of teachers, Class A
and Class B were combined to form Class C.
Consider the following statements:
(I) The median score in Class C must be greater than the median score in Class B.

11
(II) The interquartile range of scores in Class C must be greater than the interquartile
range of scores in Class A.
Which of the statements is/are true?
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (C). Since Class A and Class B have the same number of students, and all
students in Class A scored strictly greater than the maximum score of Class B, the
median for Class C will be greater than the maximum score of Class B. Hence the
median of Class C will be greater than the median of Class B. The largest possible
interquartile range for Class A is 95 − 75 = 20. The smallest possible interquartile
range for Class C is 75 − 50 = 25. Hence the interquartile range for Class C must be
greater than the interquartile range for Class A.
27. Which of the following statements is true?
(A) The 3rd quartile is a measure of central tendency.
(B) The 2nd quartile is a measure of central tendency.
(C) The 3rd quartile is a measure of dispersion.
(D) The 2nd quartile is a measure of dispersion.
Answer is (B). The 2nd quartile is the median, which is a measure of central tendency.
The 3rd quartile is neither a measure of central tendency nor a measure of dispersion.

28. A multiple-choice mid-term examination was conducted for 2000 students in a General
Education Module GEB1000. There were 20 questions. Students were awarded 1 mark
for each correct answer and received 0 mark for any wrong answer. There was no
partial credit awarded for all questions. A teaching assistant helped with the collation
of the scores of the paper, and provided the following summary statistics:
ˆ Minimum = 2.0.
ˆ 1st quartile = 7.5.
ˆ Median = 11.5.
ˆ Mean = 9.0.
ˆ Mode = 12.0.
ˆ 3rd quartile = 13.2.
ˆ Maximum = 20.0.
Which of the following statements is/are true? Select all that apply.
(A) The 3rd quartile is incorrect.
(B) Based on the above information, we can conclude that the range is 18.0.
(C) Based on the above information, we can conclude that the coefficient of variation
is 2.
(D) None of the other statements is true.

12
(A) and (B) are correct. (A) is correct since the score of the exam is between 0 and
20 inclusive, with no partial credit given for any question, it is not possible for the 3rd
quartile to be 13.2. It is worth noting though, that it is possible for the 1st quartile,
median or 3rd quartile to be of the form x + 0.5, where x is an integer. For example,
consider the following values for simplicity: 1, 2, 3, 4, 5, 7, 12, 13, 15. To derive the
3rd quartile, we look at the values above the median. That is, 7, 12, 13 and 15. The
median of these four values is the average of 12 and 13, which is 12.5. (B) is correct
as the range is the difference between the minimum and maximum values, which is
20.0 − 2.0 = 18.0. (C) is incorrect. The coefficient of variation (CV) is Standard
Deviation divided by Mean. Since we do not have any information about the standard
deviation, we are unable to determine the value of CV.
29. There are 20 students in each of the groups: A and B. All the students in both groups
took a test. The minimum possible score is 0 and the maximum attainable score is
50. The range of scores obtained from the students in group A was 28 to 40. All
the students in group B scored at least 42 marks. Let M be the average value of the
combined scores of all the 40 students. Which of the following statements must be
true?
(A) M is equal to the average score in group B.
(B) M is higher than the average score in group A.
(C) M is lower than the average score in group A.
(D) There is insufficient information to deduce the relationship between M and the
average score in group A or group B.
Answer is (B). Since group A and group B have the same number of students, and
all the students in group A scored strictly lower than the lowest score in group B, it
implies that the average for group A is strictly lower than that for group B. Therefore,
when the scores of both groups are combined, the overall average score will be the
‘sum of the average score in group A and the average score in group B’ divided by
2, which will be strictly greater than the average score in group A and strictly lower
than the average score in group B.
30. A study was done to determine the relationship between perceived obstetric violence
and the risk of postpartum depression (PPD). A total of 782 women were asked to
report on their baseline characteristics including the following: Age, Education level
(Secondary School/ Pre-university / University), Family monthly wage (Euros), and
Nationality (Spanish / Portuguese / French / Others). Which of the following state-
ments is true about the types of variables of Age, Education level, Family monthly
wage and Nationality, respectively?
(A) Numerical, Categorical ordinal, Numerical, Categorical ordinal
(B) Numerical, Categorical ordinal, Numerical, Categorical nominal
(C) Numerical, Categorical nominal, Numerical, Categorical ordinal
(D) Categorical ordinal, Categorical nominal, Categorical ordinal, Categorical nomi-
nal
Answer is (B). Education level and Nationality are categorical variables. There is some
natural ordering for Education level but not for Nationality. Therefore, Education level
is categorical ordinal and Nationality is categorical nominal. Age and Family monthly
wage are numerical variables.

13

You might also like