0% found this document useful (0 votes)

48 views17 pages

Chapter 3 Solutions

Uploaded by

26YiJie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views17 pages

Chapter 3 Solutions

Uploaded by

26YiJie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

1. Outliers are observations that fall well above or below the overall bulk of the data.

Consider a set of 50 (univariate) data points with a single outlier. Suppose the
outlier is removed from the data set, which of the following is/are always true?
Select all that apply.
(A) The removal will cause the mean to decrease.
(B) The removal will cause the interquartile range to decrease.
(C) The removal will cause the standard deviation to decrease.
(D) The removal will cause the range to change.
(C) and (D) are true. The mean will increase if the outlier falls below the bulk of
the data. Interquartile range depends on the value of Q1 and Q3 . When an outlier is
removed, suppose we assume the outlier falls above the bulk of the data, the values
of Q1 and Q3 can either remain the same or become smaller. Depending on which
happens, and also the magnitude of the changes in Q1 and Q3 , the interquartile range
can increase, remain the same or decrease. The same argument applies if the outlier
falls below the bulk of the data. For example,
the IQR decreases when 60 (the outlier) is removed from 4, 6, 6, 7, 11, 60.
the IQR increases when 60 (the outlier) is removed from 1, 5, 6, 7, 7, 60.
the IQR remains unchanged when 60 (the outlier) is removed from 2, 2, 2, 4,
4, 60.
The standard deviation will decrease if the outlier is removed. The range is the
difference between the maximum and minimum values, so the removal will cause a
decrease in the range.
2. The GEA1000 midterm results for the year 2050 Semester 1 are shown in the boxplot
below. There were 50 students who took the test, and the test scores are out of 100.
No outliers were removed.

1
Which of the following can be derived from the boxplot? Select all that apply.
(A) There is at least one outlier.
(B) The range is 40.
(C) The interquartile range is 40.
(D) The standard deviation is 14.
Only (A) is correct. There is one outlier shown at 100. The range is the difference
between the maximum and minimum, which is 50. The interquartile range is the
difference between the 3rd and 1st quartiles, which is 14. The standard deviation
cannot be derived from the boxplot.
3. Suppose that there are 76 pairs of siblings living in a particular block in Ang Sua,
where the older sibling is always heavier than the younger sibling. Consider a scatter
plot using the younger sibling’s weight to predict the older sibling’s weight, where each
point in the scatter plot represents the weights of a pair of two siblings in the block.
Which of the following statements must be true?
(I) There is a positive association between the older and younger siblings’ weights.
(II) All the points lie above the line y = x in the scatter plot.
(A) Only (I).
(B) Only (II).
(C) Neither (I) nor (II).
(D) Both (I) and (II).
Answer is (B). Since each older sibling is heavier than each younger sibling, for each
point (x, y), we must have y > x. Thus, all the points will lie above the line y = x in
the scatter plot. With only this condition that all the points lie above the line y = x,
it is not possible to determine the direction of association between the two variables.
For example, y can be negatively associated with x as shown in the plot below.

4. Consider data sets A, B and C, each consisting of 10,000 numbers with mean 5. The
histograms for A, B and C are shown below.

2
Order the data sets according to the values of their standard deviations, from the
smallest to the largest.
(A) A, B, C.
(B) A, C, B.
(C) B, A, C.
(D) B, C, A.
Answer is (C). Note that B has a smaller standard deviation compared to A, since more
of the values are closer to the mean compared to A. C has a larger standard deviation
compared to A, since more of the values are further from the mean compared to A.
Thus the required order is B, A, C.
5. The five-number summary for a numerical variable X with 77 values is given as
57, 68, 70, 72, 77. Define Y = 10 − 2X. What is the IQR of Y ?
(A) −8.
(B) −2.
(C) 4.
(D) 8.
Answer is (D). The IQR of X is Q3 − Q1 = 72 − 68 = 4. The effect of transforming
X into Y is that the data points will be reordered in ‘reverse’ way, scaled by a factor
of 2, and finally translated by a magnitude of 10. It means that, for example, the
maximum value of X will be mapped to a minimum value for Y , Q1 for X will
become Q3 for Y , etc. Thus the five-number summary of Y = 10 − 2X will be
−144, −134, −130, −126, −104 and the IQR of Y is −126 − (−134) = 8.
6. The boxplot below shows the distribution of the marks of 30 students.

3
Which of the following statements must be true?
(I) There is only one student who scored higher than 23.5 marks.
(II) The range of the marks of the 30 students is 17.5.
(A) Only (I).
(B) Only (II).
(C) Both (I) and (II).
(D) Neither (I) nor (II).
Answer is (D). As there can be more than one student who scored 40, statement
(I) may not be true. Statement (II) is not true since the range is 40–6 = 34, not
17.5(23.5–6).
7. Professor X conducted a test for his class of 16 students, and tabulated the following
five-number summary for the test scores:
Minimum Q1 Median Q3 Maximum
41.20 45.00 50.75 54.12 58.90
Two days later, he discovered, to his horror, that he had made a mistake in the
computation of the test scores, and everyone should get 10 marks more.
The new (and correct) median score is (1) and the IQR is (2) .
Fill in the blanks for the statement above, giving your answers correct to 2 decimal
places.
Since all the scores are shifted by the same amount of 10, the new median score
is shifted by 10 as well to 50.75 + 10 = 60.75. The IQR remained unchanged at
54.12 − 45.00 = 9.12, as Q1 and Q3 are shifted by the same amount.
8. Consider the following data set, which we will refer to as set A:

{15, 23, 13, 17, 8, 42, 4, 37, 12, 16}.

A student decided to do a check for outliers, after which such value(s) was/were
removed. Let us designate the set of remaining data points as set B. Which of the
following statements is/are true? Select all that apply.
(A) The range of B is 19.
(B) The median of B is lower than the median of A.

4
(C) The median of B is greater than the mean of B.
(D) The median of B is lower than the mean of A.
Only (B) and (D)are true. Re-arranging set A in ascending order, we get

{4, 8, 12, 13, 15, 16, 17, 23, 37, 42}.

As such, Q3 is 23, Q1 is 12, and IQR is 11 (which makes 1.5 × IQR = 16.5). Only the
value 42 qualifies as an outlier, since 42 > 23 + 16.5, which gives us the set B as

{4, 8, 12, 13, 15, 16, 17, 23, 37}.

Hence the range of B is 37 − 4 = 33 and thus (A) is incorrect. The median of A is

15.5 while the median of B is 15; hence (B) is correct. The mean of A is 18.7 while
the mean of B is 16.11 (to 2 d.p.); hence (C) is incorrect and (D) is correct.
9. The following histogram is constructed using 100 observations of a discrete numerical
variable X. For the first bin, [0, 1], both 0 and 1 are included in the bin. For every
other bin, the left endpoint is excluded while the right endpoint is included.

Based on the histogram, which of the following statements is/are definitely true?
Select all that apply.
(A) The distribution is right-skewed.
(B) The maximum value is 8.
(C) The value that occurs the most often in this data set is in the bin (1, 2].
(D) Only a quarter of the observations is larger than 3.
(A) and (D) are correct. The distribution is right-skewed, hence (A) is true. (B) is
false as the maximum value could be any value between 7 and 8 (excluding 7). (C) is
false. Although the histogram shows that the range (1, 2] has the highest frequency,
it does not mean that the most frequent value in the data set must be in the range
(1, 2]. As an example, there could be 16 values at 1.3, 16 values at 1.4 and 24 values

5
at 2.5, making 2.5 the most frequent value, which is not between 1 and 2. (D) is true
as there are exactly 25 (14+8+2+1=25) values greater than 3.
10. The following two diagrams are adapted from a paper published on Nature titled
“Irregular sleep/wake patterns are associated with poorer academic performance and
delayed circadian and sleep/wake timing”, which studied a group of students. The
diagrams describe the association between the variables Grade Point Average (GPA),
Sleep Regularity Index (SRI) and Actual Dim Light Salivary Onset (Actual DLMO).
Actual DLMO was not recorded for participants that have neither regular nor irregular
sleep/wake patterns.

6
Based only on the two diagrams above, which of the following is necessarily true?
(A) If the researchers collected information about the average household income
amongst the participants and found a positive association between average house-
hold income and Grade Point Average, then they may conclude that average
household income and SRI are positively associated amongst the participants.
(B) The predicted Actual DLMO value for a student who has neither regular nor
irregular sleep/wake patterns is less than 24.
(C) A higher SRI value is associated with a lower Actual DLMO value for students
who have regular sleep/wake patterns.
(D) Given a student’s GPA, the researchers should use the equation of the regression
line of GPA against SRI to predict SRI.
Answer is (C). (A) may not be true since associations are not necessarily transitive.
Since Actual DLMO was not recorded for participants who have neither regular nor
irregular sleep/wake patterns, (B) is incorrect. (C) is correct since we can see from
the 2nd diagram that the best fit line for that group will have a negative slope. (D)
is incorrect because we would need the regression line of SRI against Grade Point
Average to predict SRI.
11. You’ve been helping a friend generate some nice-looking figures in Radiant. Unfor-
tunately, you lost track of which data sets were being used for which histograms and
boxplots. You don’t want to make them all over again (though it would be trivial if
you saved your script). Which boxplot (1-4) goes with which histogram (A-D)?

(A) 1C, 2A, 3D, 4B

(B) 1C, 2B, 3D, 4A
(C) 1D, 2A, 3C, 4B
(D) 1B, 2C, 3D, 4A
Answer is (B).
Based on the boxplot, distribution 1 clearly has the highest median, equal to about
the 3rd quartile for the other three. This corresponds pretty clearly to histogram C.
This leaves only answers (a) and (b). Histograms A and B show roughly symmetric
distributions, but with a wider IQR for B. Among the two remaining boxplots, 2 has
the wider IQR compared to 4 so that the remaining pairs must be 2B and 4A.
12. Suppose that the following are 10 data points for a numerical variable X:

16, 82, 72, 100, r, 22, 83, 62, −2, 99,

7
Where r is an unknown whole number less than 72. It is known that there is only one
outlier in this data set. An outlier is defined as a data point having a value greater
than Q3 + 1.5*IQR or less than Q1 – 1.5*IQR. What is the maximum possible value
of r?
Answer is -85.
Since r is less than 72, the largest 5 numbers in the data set can be arranged as 72,
82, 83, 99, 100. Q3 is therefore 83. Excluding r, the other 4 numbers are arranged as
-2, 16, 22 and 62. The Q1 value will change depending on the value of r. If r is less
than 16, the Q1 will be 16. If r is greater than 22, the Q1 will be 22. If r is between
16 and 22, then r will be the Q1.
Case 1: Suppose r is less than 16. IQR = 83 – 16 = 67. Values greater than 83 +
1.5*67 = 183.5 or less than 16 – 1.5*67 = -84.5 will be considered as outliers. Since
the greatest value is 100 which is not an outlier, r has to be less than -84.5 for the
data set to have an outlier. Therefore, the maximum possible value of r is -85 in this
case.
Case 2: Suppose r is greater than 22 but less than 72. IQR = 83 – 22 = 61. Values
greater than 83 + 1.5*61 = 174.5 or less than 22 – 1.5*61 = -69.5 will be considered
as outliers. However, this would mean there is no outlier in the data set, since the
minimum value is -2 and the maximum value is 100. Hence this case is not possible.
Case 3: Suppose r is between 16 and 22 (inclusive). Note that Q3 + 1.5*IQR will
not be less than the value in case 2, i.e. 174.5, since the IQR is longer or of the same
length. On the other hand, the Q1 - 1.5*IQR will not be greater than the value in
case 2, i.e. -69.5, because of the same reason. This also implies that there is no outlier
in the data set, since the minimum value is -2 and the maximum value is 100. Hence
this case is also not possible.
Therefore, r has to be less than 16, and the maximum value of r is -85.
13. 3000 students took a multiple-choice quiz in school. The quiz consisted of 10 questions.
Each student answered all 10 questions. For each question answered correctly, 1 mark
was awarded, and for each question answered wrongly, no marks were awarded. There
was no partial credit awarded. The average score was 5, and the standard deviation
of the scores was 2.
The number of correct answers and wrong answers for each student was plotted in a
scatter plot, with the number of correct answers represented on the horizontal axis
and the number of wrong answers on the vertical axis.
The correlation coefficient between the number of correct answers and number of
wrong answers is:
(A) 1.
(B) −1.
(C) 0.
(D) Unable to tell from the information provided.
Answer is (B). Sketch a scatter plot for some values of correct and wrong answers,
with the number of correct answers represented on the horizontal axis and the number
of wrong answers on the vertical axis. Since the standard deviation of the scores is 2,
not all the students will have exactly the same number of right and wrong answers, so
the scatter plot will not have a congregation of all points at one spot.

8
You can see from the scatter plot sketch that all points will lie on a straight line with
negative gradient so the value of r is −1. Alternatively, observe that the number
of correct and wrong answers follow a deterministic linear relationship given by the
equation

number of wrong answers = 10 − number of correct answers.

This is because if a question is not answered correctly, then it is answered wrongly

and vice versa.
14. Of the four values below, which would be that of a correlation coefficient with the
strongest correlation?
(A) −1.4.
(B) −0.9.
(C) 0.3.
(D) 0.7.
Answer is (B). A correlation coefficient always lies between −1 and 1 (inclusive). The
higher the magnitude of the correlation coefficient, the stronger the correlation.
15. What will happen to the correlation coefficient between X and Y if a point with
coordinates (80, 110) is added to the scatter plot shown below?

(A) It will increase.

(B) It will decrease.
(C) It will remain the same.
Answer is (A). The correlation coefficient between X and Y from the scatter plot
above is very close to −1. Adding a point with coordinates (80, 110) will decrease the
strength of the correlation. Thus, the correlation coefficient will increase.
16. A system for marking students’ R computer programs, called markeR, has been used
successfully at a university. markeR takes into account both program correctness and
program style when marking students’ assignments.
To evaluate its effectiveness, markeR was used to grade the R assignments of a class
of 40 students. These scores, which range from 10.5 to 19, were then compared to the
scores given by the instructor of the class. The results are summarised below.

9
Variable Sample mean Sample standard deviation
markeR score (x) 16.5 1.5
Instructor score (y) 14.5 2.25
The sample correlation between y and x is 0.85. A least squares regression line is used
to predict the average instructor score from the markeR score. We are given that the
regression line passes through the point (16.5, 14.5).
(Fill in the blank.) When the markeR score is 15, the predicted average instructor
score is (rounded to 2 decimal places.)
The answer is 12.59. Suppose the regression line is given by y = a + bx. Then
sy 2.25
b=r = 0.85 × = 1.275.
sx 1.5
Since the line passes through (16.5, 14.5), we have

14.5 = a + 1.275 × 16.5,

which gives a = −6.5375. When x = 15, the predicted (average) value for y is
−6.5375 + 1.275 × 15 = 12.5875. Rounded to 2 decimal places, the answer is 12.59.
17. Below is a scatter plot showing preliminary exam and final exam scores for students
in a secondary school along with the linear regression line.

The average scores for the preliminary exam and final exam were both 60, with stan-
dard deviations of 5.1 and 6.6 respectively. What does the slope of 0.98 of the linear
regression line predict?
(A) The increase in average final exam scores, corresponding to an increase of 1 mark
in the preliminary exam.
(B) The correlation between the final and preliminary exam scores.
(C) The average final exam score of students who scored 0 on the preliminary exam.
(D) None of the other options.

10
Answer is (A). The gradient in a linear regression equation gives the difference in
average Y values for two groups who differ by one unit in the X value. It is not
the correlation coefficient between the final and preliminary exam scores as visually,
the correlation coefficient should not be so close to 1. In fact, it is about 0.76. In
general, the correlation coefficient is not equal to the slope of the regression line. 0.98
is also not the average final exam score for students who scored 0 in the preliminary
exam because that is the Y-intercept of the regression line, which is theoretically the
predicted average Y value when the X value is equal to 0.
18. The scatter plot below shows the relationship between height and shoulder girth (cir-
cumference of shoulders measured over deltoid muscles).

The equation of the regression line for height vs shoulder girth is given by y = 0.6x +
106, where y refers to the height and x refers to shoulder girth. Which of the following
statements below is/are correct? Select all that apply.
(A) If we were to predict average shoulder girth from height using simple linear re-
gression, the gradient of the regression line is also positive.
(B) Using simple linear regression, when the shoulder girth is equal to 141cm, the
predicted average height is 190.6cm.
(C) Using simple linear regression, when the height of the individual is 170cm, the
predicted average shoulder girth is 106.67cm.
(D) If the shoulder girth of all individuals above are 2cm shorter, then the gradient
of the regression line for height vs shoulder girth is 0.6.
(A) and (D) are correct. Interchanging x and y does not change the correlation
coefficient, and so the gradient is also positive. As 141cm is outside the range of the
x variable, in general, we cannot use the equation above to predict average height.
Similarly, we have remarked in Chapter 3 that a regression line for using x to predict
y cannot be used to predict x using y. So we cannot use the equation above to predict
average shoulder girth from height. Recall that adding/subtracting a constant to/from
x does not change the standard deviation for x. As the gradient and correlation
coefficient are related by
sy
m=r ,
sx
we see that m does not change.

11
19. A researcher examined the relationship between variables X and Y among 250 male
and female subjects. He graphed the relationship in the scatter plot shown below. Let
r be the correlation coefficient for all 250 subjects, r1 be the correlation coefficient
among male subjects only and r2 be the correlation coefficient among female subjects
only.

Which of the following correctly describes the relationship between r, r1 and r2 ?

(A) r1 < r < r2 .
(B) r1 > r > r2 .
(C) r > r1 > r2 .
(D) r < r1 < r2 .
Answer is (A). The correlation coefficient for all subjects is closer to zero when com-
pared to either r1 or r2 . The correlation coefficient for males only is negative, while
the correlation coefficient for females only is positive.
20. A researcher is interested in the correlation between the amount of time an individ-
ual spends on social media and the individual’s level of happiness. Suppose that she
observed that the correlation coefficient r1 for males only is 0.8, and that the corre-
lation coefficient r2 for females only is also 0.8. Which of the following statements
must be true for r, the correlation coefficient when the data for males and females are
combined?
(A) 0 ≤ r ≤ 0.8.
(B) r = 0.8.
(C) 0.8 < r ≤ 1.
(D) None of the other given options is correct.
Answer is (D). It is possible that the correlation coefficient in the combined data set
is negative (see example below), so none of the other three options is correct.

12
21. Based on the scatter plot shown below, which of the following is closest to the equation
for the regression line? Here, W is the weight of the car and C is the consumption.

(A) W = 3 − 0.1C.
(B) W = 5 − 0.1C.
(C) W = 3 + 0.8C.
(D) W = 5 + 0.8C.
Answer is (B). The regression line should pass through the cloud of points in the

13
scatter diagram. So its slope should be negative. Also, from the scatter plot, its
y-intercept is more likely to be 5 than 3. Hence W = 5 − 0.1C is the correct answer.
22. Which of the following is/are true about a non-zero correlation coefficient? Select all that apply.
(A) The correlation coefficient does not change when we add 5 to all the values of
one variable.
(B) The correlation coefficient is positive when the slope of the regression line is
positive.
(C) The correlation coefficient does not change when we multiply all the values of
one variable by 2.
(D) A correlation of −0.3 is stronger than a correlation of −0.8.
(A), (B) and (C) are correct. The correlation coefficient r does not change when we
S
add or multiply all the values of one variable by a positive number. Since m = r Sxy ,
r and m will have the same sign. Only (D) is incorrect, as a correlation of −0.8 is
stronger (since it is closer to −1) than a correlation of −0.3.
23. The relationship between the number of glasses of beer consumed daily (x) and blood
alcohol content in percentage (y) was studied in young adults. The equation of the
regression line is y = −0.015 + 0.02x for 1 ≤ x ≤ 10. The legal limit to drive in
Singapore is having a blood alcohol content below 0.08%. Des, a young adult, had
just finished 5 glasses of beer. After that, he wanted to take his car out for a drive. Is
it legal for him to drive in Singapore?
(A) Yes.
(B) No.
(C) Unable to determine.
Answer is (C). The regression line only provides the predicted average blood alcohol
content for someone who drank 5 glasses of beer, which is 0.085%. Although the value
is in the illegal range, Des’ blood alcohol content may have been below average, and
not have hit 0.08%.
24. Three father-son pairs had their heights measured. The following table shows their
heights:
Pair Father (inches) Son (inches)
A 68 72
B 70 71
C 66 70
Using these three data points, the standard deviation for the fathers would be 2 and
for the sons it would be 1. From the table, what is the standard unit for the son from
pair A?
(A) −1.
(B) 0.
(C) 1.
(D) 1.88.
Answer is (C). Since the son’s average height is 71, the standard unit for the son from
pair A is
72 − 71
SU = = 1.
1

14
25. Suppose that there are 40 male students in a class and each student scored 5 less
marks for his maths test than what he scored for his science test. What can we say
about their maths and science test marks? Select all that apply.
(A) The interquartile range of science test marks is higher than that for maths test
marks.
(B) If student A scored a higher mark for the maths test than student B, then he
must have scored a higher mark than student B for the science test.
(C) The science test marks and maths test marks are perfectly negatively correlated.
(D) The standard deviation of maths test marks is equal to that of science test marks.
(B) and (D) are correct. Since quartile 1 and quartile 3 of the maths test marks
decrease by the same amount (5 marks) as compared to quartile 1 and quartile 3 of
the science test marks, there is no difference in the interquartile ranges of the maths
and science test marks.
As standard deviation does not change when we subtract or add a number to every
data point in a data set, the standard deviations of the maths and science test marks
are equal.
Suppose that we let x and y denote the maths and science test marks of the students,
respectively. Then we see that y = x + 5, that is, there is a perfect positive correlation
between the science and maths test marks. In addition, y increases as x increases.
Thus, if student A scored a higher mark for maths than student B, then he must have
scored a higher mark than student B for the science test as well.
26. The regression line for Y vs X is given by Y = 0.82X + 59.1. The standard deviations
for X and Y are 1.5 and 2.2 respectively. Suppose now we construct a regression line
that uses Y to predict X.
The predicted average increase of X when Y is increased by 1 unit is .
(Give your answer correct to 2 decimal places.)
Answer is 0.38. The correlation coefficient can be determined by
msX
r= .
sY
We find that r = 0.559. The gradient of the regression line for X vs Y is given by
rsX
m= = 0.38.
sY
−59.1)
It should be noted that if we simply rearrange Y = 0.82X + 59.1 to X = (Y 0.82 , we
1
obtain 0.82 = 1.22 as the gradient, which is different from 0.38. This shows that in
general, we cannot use the regression line for Y vs X to predict X as a function of Y .

27. A professor wants to know the percentage of right-handed students in NUS. Since he is
teaching a course in NUS this semester, he decides to do a survey in his class. From the
single survey, he concluded that eighty percent of students in NUS are right-handed.
Which one of the following fallacies was committed by the professor?
(A) Atomistic fallacy.
(B) Ecological fallacy.
(C) None of the other options.

15
Answer is (C). Atomistic fallacy occurs when a person generalizes the correlation
about individuals towards ecological correlation. Ecological fallacy occurs when a
person deduces the inferences on correlation about individuals based on ecological
correlation. In this question, the professor merely generalises the result of the class to
the entire NUS. Note that correlation and ecological correlation were not computed
by the professor. Hence, both fallacies were not committed.
28. The total number of people who are infected by a disease (denoted by y) can be
predicted using the regression model y = 2x+1 − 1, where x is the number of days from
the first infection, up till the 30th day. Based on the information above, which of the
following is true?
(A) After 3 days from the first infection, there will be exactly 15 people infected.
(B) If there were 7 people infected, it means that exactly 2 days have passed from
the first infection.
(C) After exactly 20 days, there will be approximately less than 2 million people
infected.
(D) The relationship can be modelled as a simple linear regression Y = mX + c,
where Y = y, X = 2x , m = 2, and c = −1.
Answer is (D).
(A) is incorrect as the regression line only predicts and does not guarantee 15 are
infected. For (B), like the reasoning given for (A), the y values in a regression equation
are only predicted numbers. In addition, the regression line was modelled using x to
predict y and so the equation may not be the same when we model a line using y
to predict x. (C) is incorrect as there will be approximately 2097151 people infected
after exactly 20 days. (D) is correct as y = 2x+1 − 1 = 2(2x ) − 1.
29. Bivariate numerical data can be represented in the form (x, y). Which of these 4 data
sets, after having added an additional data point (2, 8), would have the magnitude of
their correlation coefficient decrease as a result? Select all that apply.
(A) (2, 2), (8, 2), (8, 8)
(B) (2, 2), (4, 5), (6, 2)
(C) (2, 2), (5, 5), (8, 8)
(D) (2, 8), (5, 5), (8, 2)
(A) and (C) are correct.
(A) is correct because the addition of (2, 8) nullifies the existing sloping trend from
the data set as a result. (B) is incorrect because the present three points are vertically
symmetrical and display zero correlation, and the addition of (2, 8) breaks the sym-
metry and moves the magnitude of correlation away from 0 as a result. (C) is correct
because the addition of (2, 8) will cause the existing perfect correlation to decrease
as a result. (D) is incorrect because the addition of (2, 8) will not cause the existing
perfect correlation to change.
30. ”The relation between anxiety and BMI - is it all in our curves?” was published in
the journal Psychiatry Research in 2016. As stated in the abstract of that research
paper, ”The relation between anxiety and excessive weight is unclear. The aims of the
present study were three-fold: First, we examined the association between anxiety and
Body Mass Index (BMI). Second, we examined this association separately for female
and male participants...”

16
The first result reported was: No linear correlation between anxiety scores and BMI
among all the participants was observed. If the researchers had not proceeded to
investigate the association between anxiety scores and BMI separately for female and
male participants, but concluded straightaway from their first result that ”there is no
linear correlation between anxiety scores and BMI among the females and among the
males separately’, what mistake would they have committed?
(A) Ecological fallacy
(B) Atomistic fallacy
(C) Confusing correlation and causation
(D) None of the other options is correct
Answer is (D).
Atomistic fallacy occurs when a person claims that the ecological correlation (corre-
lation between two sets of averages across certain subgroups) will be the same as the
correlation obtained for individuals. Ecological fallacy occurs when a person deduces
the inferences on correlation for individuals based on ecological correlation. We can
see that ecological correlation was not mentioned nor relevant and neither of these
fallacies was committed. In addition, the researchers did not indicate causation but
only correlation in the results. Hence, none of the other options is correct.

CRSP Examination Preparation 1695464322
No ratings yet
CRSP Examination Preparation 1695464322
14 pages
Mock Test 1 Solution (1) Statistics
100% (2)
Mock Test 1 Solution (1) Statistics
23 pages
Statistics Quiz Questions
100% (3)
Statistics Quiz Questions
104 pages
Introduction To Business Statistics 7th Edition by Ronald Weiers - Test Bank
100% (1)
Introduction To Business Statistics 7th Edition by Ronald Weiers - Test Bank
26 pages
Mock 2024 الحل
No ratings yet
Mock 2024 الحل
9 pages
Tutorial 2 - Asnwer Key
No ratings yet
Tutorial 2 - Asnwer Key
14 pages
AP Questions Chapter 4
No ratings yet
AP Questions Chapter 4
8 pages
Lo MA .1.02 (Grade 10) First Semester MA .Lo 2.02 (Grade 11) First Semester +MA - Lo.1.12 (Grade 10) Second Semester
No ratings yet
Lo MA .1.02 (Grade 10) First Semester MA .Lo 2.02 (Grade 11) First Semester +MA - Lo.1.12 (Grade 10) Second Semester
33 pages
Multiple Choice Questions
100% (2)
Multiple Choice Questions
5 pages
Review Mid Term Exam 2 Answer Keys
No ratings yet
Review Mid Term Exam 2 Answer Keys
11 pages
Mock Exam - Summer 2024 (Business Stat 1)
No ratings yet
Mock Exam - Summer 2024 (Business Stat 1)
10 pages
Kami Export - MAX WEBER - Chapter 3 Review
No ratings yet
Kami Export - MAX WEBER - Chapter 3 Review
9 pages
FINAL EXAMINATION (Math in Modern World) Questionnaire
No ratings yet
FINAL EXAMINATION (Math in Modern World) Questionnaire
4 pages
CHAP 4 Multiple Choice
100% (1)
CHAP 4 Multiple Choice
103 pages
Maths Apt
No ratings yet
Maths Apt
11 pages
Review Exercises
No ratings yet
Review Exercises
11 pages
DA Answer-Key
No ratings yet
DA Answer-Key
12 pages
Statistics - Sample Qualifier Soution (Mock Test)
No ratings yet
Statistics - Sample Qualifier Soution (Mock Test)
25 pages
This Study Resource Was: Assignment Number - 5
No ratings yet
This Study Resource Was: Assignment Number - 5
9 pages
2024 Division Statistics Month Celebration Statistics Quiz Answers
No ratings yet
2024 Division Statistics Month Celebration Statistics Quiz Answers
7 pages
Bam 212
No ratings yet
Bam 212
7 pages
Exam # 1 STAT 110
No ratings yet
Exam # 1 STAT 110
9 pages
Chapter 04
No ratings yet
Chapter 04
95 pages
FDS Unit 2 QB
No ratings yet
FDS Unit 2 QB
10 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
Quant Test
100% (4)
Quant Test
32 pages
Unit 22 Solution
No ratings yet
Unit 22 Solution
11 pages
Statistics - Lecture Slides 3 - For Lecture
No ratings yet
Statistics - Lecture Slides 3 - For Lecture
37 pages
Fall Term Final Review MC
No ratings yet
Fall Term Final Review MC
7 pages
AP STAT Midterm Review
No ratings yet
AP STAT Midterm Review
8 pages
Mock Test Official
No ratings yet
Mock Test Official
15 pages
Answers IBS
No ratings yet
Answers IBS
13 pages
PracticeTest 1
No ratings yet
PracticeTest 1
5 pages
Applied Statistics MCQ
0% (2)
Applied Statistics MCQ
7 pages
Reviewer For 4th Quarter
No ratings yet
Reviewer For 4th Quarter
41 pages
1st Midterm - Spring 2019 OSL
100% (1)
1st Midterm - Spring 2019 OSL
10 pages
MATH 102 Prelim Exam
No ratings yet
MATH 102 Prelim Exam
9 pages
EVJAYMED
No ratings yet
EVJAYMED
9 pages
Assignment 1 (Sol.) : Introduction To Data Analytics
No ratings yet
Assignment 1 (Sol.) : Introduction To Data Analytics
4 pages
Assignment 1 Solved
No ratings yet
Assignment 1 Solved
4 pages
Assignment 1 (Sol.) : Introduction To Data Analytics
No ratings yet
Assignment 1 (Sol.) : Introduction To Data Analytics
4 pages
Statistics and Probability Review Psabe Yes 2023
No ratings yet
Statistics and Probability Review Psabe Yes 2023
4 pages
Stats Mcqs Calculations
No ratings yet
Stats Mcqs Calculations
21 pages
U1-Ch1-6 - Extra Prac
No ratings yet
U1-Ch1-6 - Extra Prac
5 pages
STA 122 CBE Past Questions
No ratings yet
STA 122 CBE Past Questions
41 pages
Microsoft Word - Unit 1 Extra Prac A
100% (1)
Microsoft Word - Unit 1 Extra Prac A
5 pages
MCQ Statistics
No ratings yet
MCQ Statistics
8 pages
Probability & Statistics For P-8 Teachers: MATH 4713 Review Questions (Chapter 3)
No ratings yet
Probability & Statistics For P-8 Teachers: MATH 4713 Review Questions (Chapter 3)
6 pages
đề trí
No ratings yet
đề trí
7 pages
Stochastic Sample Space. Deterministic Probability
100% (1)
Stochastic Sample Space. Deterministic Probability
7 pages
Iie 3017 02
No ratings yet
Iie 3017 02
35 pages
Output 2 - Stat-Analysis
No ratings yet
Output 2 - Stat-Analysis
5 pages
2 Mcqs - Bank Statistics
No ratings yet
2 Mcqs - Bank Statistics
66 pages
Check Your Progress - 3
No ratings yet
Check Your Progress - 3
13 pages
Statistic I New
No ratings yet
Statistic I New
8 pages
QNT 351 Final Exam Correct Answers 100%
100% (1)
QNT 351 Final Exam Correct Answers 100%
4 pages
Part A: Mcqs (10 Questions) : Gen-Z Iitian Stats 1 Most Important Qs
No ratings yet
Part A: Mcqs (10 Questions) : Gen-Z Iitian Stats 1 Most Important Qs
4 pages
MQM100 MultipleChoice Chapter3
100% (2)
MQM100 MultipleChoice Chapter3
21 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
No ratings yet
An Overview and Comparative Analysis of Recurrent Neural Networks For Short Term Load Forecasting
41 pages
Regression
No ratings yet
Regression
4 pages
q2 Activity Sheets - Grade 3
100% (2)
q2 Activity Sheets - Grade 3
13 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
UoS BABS 3 HRM Assignment
No ratings yet
UoS BABS 3 HRM Assignment
15 pages
Geological Map Symbol
No ratings yet
Geological Map Symbol
5 pages
FFBL FML FPCL Answer Key
No ratings yet
FFBL FML FPCL Answer Key
19 pages
SOP (Mahi - Project Coordinator)
No ratings yet
SOP (Mahi - Project Coordinator)
1 page
Y9 2. Possibility Diagram
No ratings yet
Y9 2. Possibility Diagram
13 pages
Lab1 Syarifuddin 2016490588 PDF
No ratings yet
Lab1 Syarifuddin 2016490588 PDF
14 pages
A Genetic Algorithm Tutorial
No ratings yet
A Genetic Algorithm Tutorial
42 pages
137-E Blank Form
No ratings yet
137-E Blank Form
3 pages
TNX Tower Manual
No ratings yet
TNX Tower Manual
265 pages
Homework Riddles
100% (1)
Homework Riddles
5 pages
15-Nguyen Van Thin-Bai Bao28!3!2007
No ratings yet
15-Nguyen Van Thin-Bai Bao28!3!2007
8 pages
Prueba Modelo Diagnostica Optativa Ingles
No ratings yet
Prueba Modelo Diagnostica Optativa Ingles
5 pages
Faculty Profile: Professional (Industry) Experience (32 Years)
No ratings yet
Faculty Profile: Professional (Industry) Experience (32 Years)
1 page
Lectures On Divergent Series (Emile Borel)
No ratings yet
Lectures On Divergent Series (Emile Borel)
129 pages
Short Essay On Abraham Lincoln
100% (2)
Short Essay On Abraham Lincoln
3 pages
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
No ratings yet
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
29 pages
Masterprotect 1813: Amine-Cured, Pitch Free Epoxy
100% (1)
Masterprotect 1813: Amine-Cured, Pitch Free Epoxy
2 pages
Diversity Race Module Allen
No ratings yet
Diversity Race Module Allen
47 pages
Answer Sheets
No ratings yet
Answer Sheets
4 pages
Table.1 Demographic Profile of The Respondents in Terms of Age
No ratings yet
Table.1 Demographic Profile of The Respondents in Terms of Age
5 pages
Learning Area Grade Level 7 Quarter Date: English 4
No ratings yet
Learning Area Grade Level 7 Quarter Date: English 4
4 pages
THESESAASTU 2019 Diversion Weir
50% (2)
THESESAASTU 2019 Diversion Weir
68 pages
A Model of Self-Regulation
No ratings yet
A Model of Self-Regulation
15 pages
San Chit
No ratings yet
San Chit
2 pages
Beta Catalog Et b1 2005
No ratings yet
Beta Catalog Et b1 2005
317 pages