0% found this document useful (0 votes)
99 views28 pages

Statistics For GMAT

Uploaded by

12q23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views28 pages

Statistics For GMAT

Uploaded by

12q23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Statistics: Mean, Median and Standard Deviation

CONTENT

1. The Meaning of Arithmetic Mean


2. Some Mean Questions
3. Finding Arithmetic Mean Using Deviations
4. Application of Arithmetic Means
5. Means Questions on Median
6. A Range of Questions
7. Dealing with Standard Deviation
8. Dealing with Standard Deviation II
9. Some Tricky Standard Deviation Questions
10. 3 Important Concepts for Statistics Questions on the GMAT
11. How to Quickly Solve Standard Deviation Questions on the GMAT
12. A 750 Level GMAT Question on Statistics!
13. A 750+ Level Question on SD
14. Other Resources on Statistics
15. Using the Standard Deviation Formula on the GMAT
16. Solving GMAT Standard Deviation Problems By Using as Little Math as
Possible

The Meaning of Arithmetic Mean

Let’s start today with statistics – mean, median, mode, range and standard deviation. The
topics are simple but the fun lies in the questions. Some questions on these topics can be
extremely tricky especially those dealing with median, range and standard deviation.
Anyway, we will tackle mean today.

So what do you mean by the arithmetic mean of some observations? I guess most of you
will reply that it is the ‘Sum of Observations/Total number of observations’. But that is how
you calculate mean. My question is ‘what is mean?’ Loosely, arithmetic mean is the number
that represents all the observations. Say, if I know that the mean age of a group is 10, I
would guess that the age of Robbie, who is a part of that group, is 10. Of course Robbie’s
actual age could be anything but the best guess would be 10.

Say, I tell you that the average age of a group of 10 people is 15 yrs. Can you tell me the
sum of the ages of all 10 people? I am sure you will say that it is 10*15 = 150. You can think
of it in two ways:

Mean = Sum of all ages/No of people

So Sum of all ages = Mean * (No of people) = 15*10

Or

Since there are 10 people and each person’s age is represented by 15, the sum of their
ages = 10*15. Basically, the total sum was distributed evenly among the 10 people and each
person got 15 yrs.

Now, let’s say you made a mistake. A boy whose age you thought was 20 was actually 30.
What is the correct mean? Again, you can think of it in two ways:
New sum = 150 + 10 = 160

New average = 160/10 = 16

Or

You can say that there is an extra 10 that has to be distributed evenly among the 10 people,
so each person gets 1 extra. Hence, the average becomes 15 + 1 = 16.

As you might have guessed, we will work on the second interpretation. Let’s look at an
example now.

Example 1: The average age of a group of n people is 15 yrs. One more person aged
39 joins the group and the new average is 17 yrs. What is the value of n?

(A) 9
(B) 10
(C) 11
(D) 12
(E) 13

Solution: First tell me, if the age of the additional person were 15 yrs, what would have
happened to the average? The average would have remained the same since this new
person’s age would have been the same as the age that represents the group. But his age is
39 – 15 = 24 more than the average. We know that we need to evenly split the extra among
all the people to get the new average. When 24 is split evenly among all the people
(including the new guy), everyone gets 2 extra (since average age increased from 15 to 17).
There must be 24/2 = 12 people now (including the new guy) i.e. n must be 11 (without
including the new guy).

Let’s look at another similar example though a little trickier. Try solving it on your own first. If
not logically, try using the formula approach. Then see how elegant the solution becomes
once you start ‘thinking’ instead of just ‘calculating’.

Example 2: When a person aged 39 is added to a group of n people, the average age
increases by 2. When a person aged 15 is added instead, the average age decreases
by 1. What is the value of n?

(A) 7
(B) 8
(C) 9
(D) 10
(E) 11

Solution: What is the first thing you can say about the initial average? It must have been
between 39 and 15. When a person aged 39 is added to the group, the average increases
and when a person aged 15 is added, the average decreases.

Let’s look at the second case first. When the person aged 15 is added to the group, the
average becomes (initial average – 1). If instead, the person aged 39 were added to the
group, there would be 39 – 15 = 24 extra which would make the average = (initial average +
2). This difference of 24 creates a difference of 3 in the average. This means there must
have been 24/3 = 8 people (after adding the extra person). The value of n must be 8 – 1 = 7.
Means Questions on Median

Conceptually, the median is very simple. It is just the middle number. Arrange all the
numbers in increasing/decreasing order and the number you get right in the middle, is the
median. So it is quite straight forward when you have odd number of numbers since you
have a “middle” number. What about the case when you have even number of numbers? In
that case, it is just the average of the two middle numbers.

Median of [2, 5, 10] is 5

Median of [3, 78, 102, 500] is (78+102)/2=90

If it’s that simple, why are we discussing it? – because it isn’t “that simple”! Conceptually it is,
but when the test writers make questions using median and arithmetic mean together, they
make some very mean questions! I will show you with an example, but first, we will look at a
simpler question.

Question 1: A, B and C have received their Math midterm scores today. They find that
the arithmetic mean of the three scores is 78. What is the median of the three scores?

(1) A scored a 73 on her exam.

(2) C scored a 78 on her exam.

Solution: Recall from the arithmetic mean post that the sum of deviations of all scores from
the mean is 0.
i.e. if one score is less than mean, there has to be one score that is more than the mean.
e.g. If mean is 78, one of the following must be true:
All scores are equal to 78.
At least one score is less than 78 and at least one is greater than 78.
For example, if one score is 70 i.e. 8 less than 78, another score has to make up this deficit
of 8. Therefore, there could be a score that is 86 (8 more than 78) or there could be two
scores of 82 each etc.

Statement 1: A scored 73 on her exam.

For the mean to be 78, there must be at least one score higher than 78. But what exactly are
the other two scores? We have no idea! Various cases are possible:

73, 78, 83 or

73, 74, 87 or

70, 73, 91 etc.

In each case, the median will be different. Hence this statement alone is not sufficient.

Statement 2: C scored 78 on her exam.

Now we know that one score is 78. Either the other two will also be 78 or one will be less
than 78 and the other will be greater than 78. In either case, 78 will be the middle number
and hence will be the median. This statement alone is sufficient.

Answer (B) .
Were you tempted to say (C) is the answer? I hope this question shows you that median can
be a little tricky. Let’s go on to the tougher question now.

Question 2: Five logs of wood have an average length of 100 cm and a median length
of 116 cm. What is the maximum possible length, in cm, of the shortest piece of
wood?

(A) 50
(B) 76
(C) 84
(D) 96
(E) 100

Solution:

First thing that comes to mind – median is the 3rd term out of 5 so the lengths arranged in
increasing order must look like this:

___ ___ 116 ___ ___

The mean is given and we need to maximize the smallest number. Basically, the smallest
number should be as close to the mean as possible. This means the greatest number should
be as close to the mean as possible too (if the shortfall deviation is small, the excess
deviation should by equally small).

If this doesn’t make sense, think of a set with mean 20:

19, 20, 21 (smallest number is very close to mean; greatest number is very close to the
mean too)
1, 20, 39 (smallest number is far away from the mean, greatest number is far away too)

Using the same logic, let’s make the greater numbers as small as possible (so the smallest
number can be as large as possible). The two greatest numbers should both be at least 116
(since 116 is the median). Now the lengths arranged look like this:

___ ___ 116 116 116

Since the mean is 100 and each of the 3 large numbers are already 16 more than 100 i.e.
total 16*3 = 48 more than the mean (excess deviation is 48), the deviations of the two small
numbers should be a total of 48 less than the mean. To make the smallest number as great
as possible, each of the small numbers should be 48/2 = 24 less than the mean i.e. they
both should be 76.

Answer (B).
Some Mean Questions

Question 1: For the past n days the average daily production at a company was 60
units. If today’s production of 100 units raises the average to 65 units per day, what is
the value of n?

(A) 30
(B) 18
(C) 10
(D) 9
(E) 7

Solution: If today’s production were also 60 units, what would have happened to the
average? Obviously, it would have stayed the same! But today’s production is 40 units extra
and hence it raised the average. It raised the average by 5 units which means that each one
of the n observations and today’s observation got an extra 5. Since 40 got distributed and
each was given 5, there must have been a total of 40/5 = 8 observations including today’s.
Therefore, the value of n must have been 8 – 1 = 7.

Answer (E)

Question 2: When Anna makes a contribution to a charity fund at school, the average
contribution size increases by 50%, reaching $75 per person. If there were 5 other
contributions made before Anna’s, what is the size of her donation?

(A) $100
(B) $150
(C) $200
(D) $250
(E) $450

Solution: After Anna’s contribution, the average size increases by 50% and reaches $75.
What must have been the average size of contribution before Anna’s donation? It must have
been $50 since a 50% increase would lead us to $75. So, $50 was the average size of 5
donations before Anna made her donation. Had Anna donated $50 as well, the average
would have stayed the same i.e. $50. But the average increased to $75 which means that
Anna donated an extra $25 for each of the 6 observations (including her) in addition to the
$50 she would have donated to keep the average same.

Hence, the amount Anna donated = 50 + 6*25 = $200

Answer (C) .

Again, this was a relatively straight forward question. Let’s look at a tricky one now.

Question 3: A set of numbers has an average of 50. If the largest element is 4 greater
than 3 times the smallest element, which of the following values cannot be in the set?

(A) 85
(B) 90
(C) 123
(D) 150
(E) 155

Solution: This question might look a little ominous but it isn’t very tough, really! The set has
an average of 50 so that already tells us that we can represent each element of the set by
50. If there is an element which is a little less than 50, there will be another element which is
a little more than 50.

The largest element is 4 greater than 3 times the smallest element so L = 4 + 3S.

The smallest element must be less than 50 and the largest must be greater than 50. Say, if
the smallest element is 20, the largest will be 4 + 3*20 = 64.
Is there any limit imposed on the largest value of the largest element? Yes, because there is
a limit on the largest value of the smallest element. The smallest element must be less than
50. The smallest member of the set can be 49.9999… The limiting value of the smallest
number is 50. As long as the smallest number is a tiny bit less than 50, you can have the
greatest number a tiny bit less than 4 + 3*50 = 154. The number 154 and all numbers
greater than 154 cannot be a part of the set. Say if the smallest element is 49, the largest
element will be 4 + 3*49 = 151. So the set could look something like this:

S = {49, 49, 49, 49, … (101 times to balance out the extra 101 in 151), 50, 50, 151}

Only option (E) cannot be a part of the set.

These were some of the basic (and not so basic) questions of mean that we could come
across in GMAT.
Application of Arithmetic Means

In the above post we discussed arithmetic means of arithmetic progressions in GMAT math
problems. Now, let’s see those concepts in action.

Question 1: If x is the sum of the even integers from 200 to 600 inclusive, and y is the
number of even integers from 200 to 600 inclusive, what is the value of x + y?

(A) 200*400
(B) 201*400
(C) 200*402
(D) 201*401
(E) 400*401

Solution:

There are various ways of getting the answer here. We will use the concepts we learned last
week.

The given sequence is 200, 202, 204, … 600

It is an arithmetic progression. What is the total number of terms here?

You can use one of two methods to get the number of terms here:

Method 1: Using Logic

In every 100 consecutive integers, there are 50 odd integers and 50 even integers. So we
will get 50 even integers from each of 200 – 299, 300 – 399, 400 – 499 and 500 – 599 i.e. a
total of 50*4 = 200 even integers. Also, since the sequence includes 600, number of even
integers = 200 + 1 = 201

Method 2:

Recall that in our arithmetic progressions post, we saw that the last term of a sequence
which has n terms will be first term + (n – 1)* common difference.

600=200+(n–1)∗2600=200+(n–1)∗2
n=201n=201

Hence y=201y=201 (because y is the number of even integers from 200 to 600)

Let’s go on now. What is the average of the sequence? Since it is an arithmetic progression
with odd number of integers, the average must be the middle number i.e. 400.

Notice that since this arithmetic progressions looks like this:

(n – m), … (n – 6), (n – 4), ( n – 2), n, (n + 2), (n + 4), (n + 6), … (n + m)

We can find the middle number i.e. the average by just averaging the first and the last terms.

{(n–m)+(n+m)}/2=2n/2=n

Average=(200+600)/2=400

Sum of all terms in the sequence = x = Arithmetic Mean * Number of terms = 400*201

x+y=400∗201+201=401∗201

Answer (D)

This question was simple. You could have found the sum using the
formula (n/2)∗(2a+(n−1)d) that we saw in the AP post. But this method is more intuitive since
if you don’t want to, you don’t have to use any formula here. Anyway, let’s go on to our
second question for today.

Question 2: The sum of n consecutive positive integers is 45. What is the value of n?

Statement I: n is even

Statement II: n < 9

Solution: First I will give the solution of this question and then discuss the logic used to
solve it.

In how many ways can you write n consecutive integers such that their sum is 45? Let’s see
whether we can get such numbers for some values of n.

n = 1 -> Numbers: 45
n = 2 -> Numbers: 22 + 23 = 45
n = 3 -> Numbers: 14 + 15 + 16 = 45
n = 4 -> No such numbers
n = 5 -> Numbers: 7 + 8 + 9 + 10 + 11 = 45
n = 6 -> Numbers: 5 + 6 + 7 + 8 + 9 + 10 = 45

Let’s stop right here.

Statement I: n must be even.

n could be 2 or 6. Statement I alone is not sufficient.

Statement II: n < 9


n can take many values less than 9 hence statement 2 alone is not sufficient.
Both statements together: Since n can take values 2 or 6 which are even and less than 9,
both statements together are not sufficient.

Answer (E).

Now, the interesting thing is how do we get these numbers for different values of n. How do
we know the values that n can take? It’s pretty easy really. Follow my thought here.

Of course, n can be 1. In that case we have only one number i.e. 45.

n can be 2. Why? When we divide 45 by 2, we get 22.5. Since 2*22.5 is 45, we have to find
2 consecutive integers such that their arithmetic mean is 22.5. The integers are obviously 22
and 23.

n can be 3. When we divide 45 by 3, we get 15. So we need 3 consecutive integers such


that their mean is 15. They are 14, 15, 16.

When we divide 45 by 4, we get 11.25. Do we have 4 consecutive integers such that their
mean is 11.25? No, because mean of even number of consecutive integers is always of the
form x.5.

n can be 5. When we divide 45 by 5, we get 9 so we need 5 consecutive integers such that


their mean is 9. They must be 7, 8, 9, 10, 11.

n can be 6. When we divide 45 by 6, we get 7.5. We need 6 consecutive integers such that
their mean is 7.5. The integers are 5, 6, 7, 8, 9, 10

Obviously, we just need to focus on getting 2 even values of n which are less than 9. So we
check for 2, 4 and 6 and we immediately know that the answer is (E). We don’t have to do
this process for all numbers less than 9 and we don’t have to do it for odd values of n.

A 750 Level GMAT Question on Statistics!

In this post, we have a very interesting statistics question for you. Above, we have already
discussed statistics concepts such as mean, median, range.

This question needs you to apply all these concepts but can still be easily done in under two
minutes. Now, without further ado, let’s go on to the question – there is a lot to discuss there.

Question: An automated manufacturing unit employs N experts such that the range of
their monthly salaries is $10,000. Their average monthly salary is $7000 above the
lowest salary while the median monthly salary is only $5000 above the lowest salary.
What is the minimum value of N?

(A)10
(B)12
(C)14
(D)15
(E)20

Solution: Let’s first assimilate the information we have. We need to find the minimum
number of experts that must be there. Why should there be a minimum number of people
satisfying these statistics? Let’s try to understand that with some numbers.

Say, N cannot be 1 i.e. there cannot be a single expert in the unit because then you cannot
have the range of $10,000. You need at least two people to have a range – the difference of
their salaries would be the range in that case.

So there are at least 2 people – say one with salary 0 and the other with 10,000. No salary
will lie outside this range.

Median is $5000 – i.e. when all salaries are listed in increasing order, the middle salary (or
average of middle two) is $5000. With 2 people, one at 0 and the other at 10,000, the
median will be the average of the two i.e. (0 + 10,000)/2 = $5000. Since there are at least 10
people, there is probably someone earning $5000. Let’s put in 5000 there for reference.

0 … 5000 … 10,000

Arithmetic mean of all the salaries is $7000. Now, mean of 0, 5000 and 10,000 is $5000, not
$7000 so this means that we need to add some more people. We need to add them more
toward 10,000 than toward 0 to get a higher mean. So we will try to get a mean of $7000.

Let’s use deviations from the mean method to find where we need to add more people.

0 is 7000 less than 7000 and 5000 is 2000 less than 7000 which means we have a total of
$9000 less than 7000. On the other hand, 10,000 is 3000 more than 7000. The deviations
on the two sides of mean do not balance out. To balance, we need to add two more people
at a salary of $10,000 so that the total deviation on the right of 7000 is also $9000. Note that
since we need the minimum number of experts, we should add new people at 10,000 so that
they quickly make up the deficit in the deviation. If we add them at 8000 or 9000 etc, we will
need to add more people to make up the deficit at the right.

Now we have

0 … 5000 … 10000, 10000, 10000

Now the mean is 7000 but note that the median has gone awry. It is 10,000 now instead of
the 5000 that is required. So we will need to add more people at 5000 to bring the median
back to 5000. But that will disturb our mean again! So when we add some people at 5000,
we will need to add some at 10,000 too to keep the mean at 7000.

5000 is 2000 less than 7000 and 10,000 is 3000 more than 7000. We don’t want to disturb
the total deviation from 7000. So every time we add 3 people at 5000 (which will be a total
deviation of 6000 less than 7000), we will need to add 2 people at 10,000 (which will be a
total deviation of 6000 more than 7000), to keep the mean at 7000 – this is the most
important step. Ensure that you have understood this before moving ahead.

When we add 3 people at 5000 and 2 at 10,000, we are in effect adding an extra person at
5000 and hence it moves our median a bit to the left.

Let’s try one such set of addition:

0 … 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000

The median is not $5000 yet. Let’s try one more set of addition.
0 … 5000, 5000, 5000, 5000, 5000, 5000, 5000 … 10000, 10000, 10000, 10000, 10000,
10000, 10000

The median now is $5000 and we have maintained the mean at $7000.

This gives us a total of 15 people.

Answer (D).

Finding Arithmetic Mean Using Deviations

In this post is again focused on arithmetic mean. Let’s start our discussion by considering
the case of arithmetic mean of an arithmetic progression.

We will start with an example. What is the mean of 43, 44, 45, 46, 47? (Hint: If you are
thinking about adding the numbers, that’s not the way I want you to go.)

As we discussed in our previous posts, arithmetic mean is the number that can
represent/replace all the numbers of the sequence. Notice in this sequence, 44 is one less
than 45 and 46 is one more than 45. So essentially, two 45s can replace both 44 and 46.
Similarly, 43 is 2 less than 45 and 47 is 2 more than 45 so two 45s can replace both these
numbers too.

The sequence is essentially 45, 45, 45, 45, 45.

Hence, the arithmetic mean of this sequence must be 45! (If you have doubts, you can
calculate and find out.)

It makes sense, doesn’t it? The middle number in the sequence of consecutive positive
integers will be the mean. The deviations of all numbers to the left of the middle number will
balance out the deviations of all the numbers to the right of the middle number.

(In this post, we will assume that the given numbers are in increasing/decreasing order. If
that is not the case, you can always put them in increasing order and use these concepts.)

Once again, what is the mean of 192, 193, 194, 195, 196, 197, 198?

It is 195 since it is the middle number!

Ok, what about 192, 193, 194, 195, 196, 197? What is the mean in this case? There is no
middle number here since there are 6 numbers. The mean here will be the middle of the two
middle numbers which is 194.5 (the middle of the third and the fourth number). It doesn’t
matter that 194.5 is not a part of this list. If you think about it, arithmetic mean of some
numbers needn’t be one of the numbers.

What about 71, 73, 75, 77, 79? What will be the mean in this case? Even though these
numbers are not consecutive integers, the difference between two adjacent numbers in the
list is the same (it is an arithmetic progression). So the deviations of the numbers on the left
of the middle number will cancel out the deviations of the numbers on the right of the middle
number (71 is 4 less than 75 and 79 is 4 more than 75. 73 is 2 less than 75 and 77 is 2 more
than 75). Hence, the mean here will be 75 (just like our first example).

Just to reinforce:
102, 106, 110 –> Mean = 106

102, 106, 110, 114 -> Mean = 108 (Middle of the second and third numbers)

Let’s twist this concept a little now. What is the mean of 36, 40, 42, 43, 44, 47?

This is not an arithmetic progression. So do we need to sum and then divide by 6 to get the
mean? Not so fast! Let’s try and use the deviations concept we have just learned.

Given sequence: 36, 40, 42, 43, 44, 47

It seems that the mean would be around 42, right? Some numbers are less than 42 and
others are more than 42.

36 is 6 less than 42.

40 is 2 less than 42.

Overall, the numbers less than 42 are 6+2 = 8 less than 42.

43 is 1 more than 42.

44 is 2 more than 42.

47 is 5 more than 42

Overall, the numbers more than 42 are 1+2+5 = 8 more than 42.

The deviations of the numbers less than 42 get balanced out by deviations of the numbers
greater than 42! Hence, the average must be 42.

This method is especially useful in cases involving big numbers which are close to each
other.

Example 1: What is the average of 452, 453, 463, 467, 480, 499, 504?

What would you say the average is here? Perhaps, around 470?

Let’s see:

452 is 18 less than 470.

453 is 17 less than 470.

463 is 7 less than 470.

467 is 3 less than 470.

Overall, the numbers less than 470 are 18 + 17 + 7 + 3 = 45 less.

480 is 10 more than 470.

499 is 29 more than 470.

504 is 34 more than 470.


Overall, the numbers more than 470 are 10 + 29 + 34 = 73 more than 470.

The shortfall is not balanced by the excess. There is an excess of 73 – 45 = 28.

So what is the average? If we assume the average of these 7 numbers to be 470, there is an
excess of 28. We need to distribute the excess evenly among all the numbers and hence the
average will increase by 28/7 = 4. (Go back to the first post on arithmetic mean if this is not
clear.)

Hence, the required mean is 470 + 4 = 474.

(If we had assumed the mean to be 474, the shortfall would have balanced the excess.)

Let’s go through one more example using this concept:

Example 2: What is the mean of 99, 103, 104, 109, 120, 123, 128, 130?

Let’s start by guessing a mean for this sequence. Say, around 115?

Let’s see if the shortfall is balanced by the excess.

99 is 16 less, 103 is 12 less, 104 is 11 less and 109 is 6 less than 115.

Overall shortfall = 16 + 12 + 11 + 6 = 45

120 is 5 more, 123 is 8 more, 128 is 13 more and 130 is 15 more than 115.

Overall excess = 5 + 8 + 13 + 15 = 41

We are close, but not quite there yet! There is a shortfall of 4. Since there are a total of 8
numbers, the average must be 4/8 = 0.5 less than 115. Hence, the average here is 114.5

Once you get a hang of this method and understand what you are doing, it is much faster
than adding all the big numbers and then dividing the sum since you only deal with small
numbers in this method.

Some Tricky Standard Deviation Questions

In the above post, we promised you a couple of tricky standard deviation (SD) GMAT
questions. We start with a 600-700 level question and then look at a 700 – 800 level one.

Question 1: During an experiment, some water was removed from each of the 8 water
tanks. If the standard deviation of the volumes of water in the tanks at the beginning
of the experiment was 20 gallons, what was the standard deviation of the volumes of
water in the tanks at the end of the experiment?

Statement 1: For each tank, 40% of the volume of water that was in the tank at the
beginning of the experiment was removed during the experiment.

Statement 2: The average volume of water in the tanks at the end of the experiment was 80
gallons.

Solution:
We have 8 water tanks. This implies that we have 8 elements in the set (volume of water in
each of the 8 tanks). SD of the volume of water in the tanks is 20 gallons. We need to find
the new SD i.e. the SD after water was removed from the tanks.

Statement 1: For each tank, 40% of the volume of water that was in the tank at the
beginning of the experiment was removed during the experiment.

Initial SD is 20. When 40% of the water is removed from each tank, the leftover water is 60%
of the initial volume of water i.e. 0.6*initial volume of water. This means that each element of
the initial set was multiplied by 0.6 to obtain the new set. The SD will change. It will become
0.6*previous SD i.e. 0.6*20 = 12 (think of the formula of SD we discussed in the first SD
post). This statement alone is sufficient.

Statement 2: The average volume of water in the tanks at the end of the experiment was 80
gallons.

The average volume doesn’t give us the SD of the new set. Hence, this statement alone is
not sufficient.

Answer (A)

Now that we are done with the easier one, let’s go on to the tougher one.

Question 2: M is a collection of four odd integers. The range of set M is 4. How many
distinct values can standard deviation of M take?

(A) 3
(B) 4
(C) 5
(D) 6
(E) 7

Solution:

Since the range of M is 4, it means the greatest difference between any two elements is 4.
One way of doing this will be M = {1, x, y, 5} (obviously, there are innumerable ways of
writing M)
Here, x and y can take one of 3 different values: 1, 3 and 5 (x and y cannot be less than 1 or
greater than 5 because the range of the set is 4).

Both x and y could be same. This can be done in 3 ways. Or x and y could be different. This
can be done in 3C2 = 3 ways. Total x and y can take values in 3 + 3 = 6 ways.

(Note here that the number of ways in which you can select x and y is not 3*3 = 9. Why?)

For clarification, let me enumerate the 6 ways in which you can get the desired set:
{1, 1, 1, 5}, {1, 3, 3, 5}, {1, 5, 5, 5}, {1, 1, 3, 5}, {1, 1, 5, 5}, {1, 3, 5, 5}

Note here that standard deviations of {1, 1, 1, 5} and {1, 5, 5, 5} are same. Why? Because
SD measures deviation from mean. It has nothing to do with the actual value of mean and
actual value of numbers.

Mean of {1, 1, 1, 5} is 2. Three of the numbers are distance 1 away from mean and one
number is distance 3 away from mean. Mean of {1, 5, 5, 5} is 4. Three of the numbers are
distance 1 away from mean and one number is distance 3 away from mean. Sum of the
squared deviations will be the same in both the cases and the number of elements is also
the same in both the cases. Therefore, both these sets will have the same SD.

Similarly, {1, 1, 3, 5} and {1, 3, 5, 5} will have the same SD.

From the leftover sets, {1, 3, 3, 5} will have a distinct SD and {1, 1, 5, 5} will have a distinct
SD.

In all, there are 4 different values that SD can take in such a case.

Note: It doesn’t matter what the actual numbers are. Since we have found 4 distinct values
for SD, we will always have 4 distinct values of SD for a set under the given constraints.

Answer (B)
A Range of Questions

Let’s discuss the idea of “range” today. It is simply the difference between the smallest and
the greatest number in a set. Consider the following examples:

Range of {2, 6, 10, 25, 50} is 50 – 2 = 48

Range of {-20, 100, 80, 30, 600} is 600 – (-20) = 620

and so on…

That’s all the theory we have on the concept of range! So let’s jump on to some questions
now (therein lies the challenge)!

Question 1: Which of the following cannot be the range of a set consisting of 5 odd
multiples of 9?

(A) 72
(B) 144
(C) 288
(D) 324
(E) 436

Solution:

There are infinite possibilities regarding the multiples of 9 that can be included in the set.
The set could be any one of the following (or any one of the other infinite possibilities):

S = {9, 27, 45, 63, 81} or

S = {9, 63, 81, 99, 153} or

S = {99, 135, 153, 243, 1071}

The range in each case will be different. The question asks us for the option that ‘cannot’ be
the range. Let’s figure out the constraints on the range.

A set consisting of only odd multiples of 9 will have a range that is an even number (Odd
Number – Odd Number = Even number)
Also, the range will be a multiple of 9 since both, the smallest and the greatest numbers, will
be multiples of 9. So their difference will also be a multiple of 9.

Only one option will not satisfy these constraints. Do you remember the divisibility rule of 9?
The sum of the digits of the number should be divisible by 9 for the number to be divisible by
9. The sum of the digits of 436 is 4 + 3 + 6 = 13 which is not divisible by 9. Hence 436
cannot be divisible by 9 and therefore, cannot be the range of the set.

Answer (E).

On to another one now:

Question 2: If the arithmetic mean of n consecutive odd integers is 20, what is the
greatest of the integers?

(1) The range of the n integers is 18.

(2) The least of the n integers is 11.

Solution: We have discussed mean in case of arithmetic progressions in the previous posts.
If mean of consecutive odd integers is 20, what do you think the integers will look like?

19, 21 or
17, 19, 21, 23 or
15, 17, 19, 21, 23, 25 or
13, 15, 17, 19, 21, 23, 25, 27 or
11, 13, 15, 17, 19, 21, 23, 25, 27, 29
etc.

Does it make sense that the required numbers will represent one such sequence? The
numbers in the sequence will be equally distributed around 20. Every time you add a
number to the left, you need to add one to the right to keep the mean 20. The smallest
sequence will have 2 numbers 19 and 21, the largest will have infinite numbers. Did you
notice that each one of these sequences has a unique “range,” a unique “least number” and
a unique “greatest number?” So if you are given any one statistic of the sequence, you will
know the entire desired sequence.

Statement 1: Only one possible sequence: 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 will have the
range 18. The greatest number here is 29. This statement alone is sufficient.

Statement 2: Only one possible sequence: 11, 13, 15, 17, 19, 21, 23, 25, 27, 29 will have 11
as the least number. The greatest number here is 29. This statement alone is sufficient too.

Answer (D).

Dealing with Standard Deviation II

In this post, we pick from where we left in the post above. Let’s discuss the last 3 cases first.

Question: Which set, S or T, has higher SD?


Case 5: S = {1, 3, 5} or T = {1, 3, 3, 5}

The standard deviation (SD) of T will be less than the SD of S. Why? The mean of 1, 3 and 5
is 3. If you add another 3 to the list, the mean stays the same and the sum of the squared
deviations is also the same but the number of elements increases. Hence, the SD
decreases.

Case 6: S = {6, 8, 10} or T = {12, 16, 20}

Put the numbers on the number line. You will see that the SD of T is greater than the SD of
S. When you multiply each element of a set by the same number (T is obtained by
multiplying each element of S by 2), the SD increases.

Case 7: S = {6, 8, 10} or T = {3, 4, 5}

Put the numbers on the number line. You will see that the SD of T is less than the SD of S.
When you divide each element of a set by the same number (T is obtained by dividing each
element of S by 2 OR you can say that S is obtained by multiplying each element of T by 2),
the SD decreases.

Now that we have an understanding of how SD behaves, let’s look at a question.

Question 1: A certain list of 300 test scores has an arithmetic mean of 75 and a
standard deviation of d, where d is positive. Which of the following two test scores,
when added to the list, must result in a list of 302 test scores with a standard
deviation less than d?

(A) 75 and 80
(B) 80 and 85
(C) 70 and 75
(D) 75 and 75
(E) 70 and 80

Solution: As discussed above, the standard deviation of a set measures the deviation from
the mean. A low standard deviation indicates that the data points are very close to the mean
whereas a high standard deviation indicates that the data points are spread far apart from
the mean.

When we add numbers that are far from the mean, we are stretching the set and hence,
increasing the SD. When we add numbers which are close to the mean, we are shrinking the
set and hence, decreasing the SD.

Therefore, adding two numbers which are closest to the mean will shrink the set the most,
thus decreasing SD by the greatest amount.

Numbers closest to the mean are 75 and 75 (they are equal to the mean) and thus adding
them will decrease SD the most.

Answer: D.

Now that we have seen that difficult looking questions on SD can be quite simple, I want you
to think about something – when you add some new numbers to a set, how do you decide
whether SD increases or decreases? If you notice, we have seen two different cases (case 4
and case 5) – in one of them SD increases when you add two numbers to the set and in the
other, SD decreases. So how do you decide whether SD will increase or decrease? Say,
what happens in case S = {3, 4, 5, 6, 7} and T = {3, 4, 4, 5, 6, 6, 7}? Will SD increase or
decrease in this case? How do you decide the point at which the increase in the numerator
offsets the increase in the denominator?

Meanwhile, let’s look at one more question.

Question 2: If 100 is included in each of sets A, B and C (given A= {30, 50, 70, 90, 110},
B = {-20, -10, 0, 10, 20} and C= {30, 35, 40, 45, 50}), which of the following represents
the correct ordering (largest to smallest) of the sets in terms of the absolute increase
in their standard deviation?

(A) A, C, B
(B) A, B, C
(C) C, A, B
(D) B, A, C
(E) B, C, A

Solution: The question looks a little convoluted but actually you don’t have to calculate
anything. SD measures the deviation of the elements from the mean. If a new element is
added which is far away from the mean, it will add much more to the deviations than if it
were added close to the mean.
The means of A, B and C are 70, 0 and 40, respectively.
100 is farthest from 0 so it will change the SD of set B the most (in terms of absolute
increase). It is closest to 70 so it will change the SD of set A the least. Hence the correct
ordering is B, C, A.

Answer (E)

3 Important Concepts for Statistics Questions on the GMAT

– Arithmetic mean is the number that can represent/replace all the numbers of the
sequence. It lies somewhere in between the smallest and the largest values.

– Median is the middle number (in case the total number of numbers is odd) or the average
of two middle numbers (in case the total number of numbers is even).

– Standard deviation is a measure of the dispersion of the values around the mean.

A conceptual question is how these three measures change when all the numbers of the set
are varied is a similar fashion.

For example, how does the mean of a set change when all the numbers are increased by
say, 10? How does the median change? And what about the standard deviation? What
happens when you multiply each element of a set by the same number?

Let’s discuss all these cases in detail but before we start, we would like to point out that the
discussion will be conceptual. We will not get into formulas though you can arrive at the
answer by manipulating the respective formulas.

When you talk about mean or median or standard deviation of a list of numbers, imagine the
numbers lying on the number line. They would be spread on the number line in a certain
way. For example,

——0—a———b—c———————d———e————————f—g———————
Case I:

When you add the same positive number (say x) to all the elements, the entire bunch of
numbers moves ahead together on the number line. The new numbers a’, b’, c’, d’, e’, f’ and
g’ would look like this

——0——————a’———b’—c’———————d’———e’————————f’—g’————
——

The relative placement of the numbers does not change. They are still at the same distance
from each other. Note that the numbers have moved further to the right of 0 now to show
that they have moved ahead on the number line.

The mean lies somewhere in the middle of the bunch and will move forward by the added
number. Say, if the mean was d, the new mean will be d′=d+xd′=d+x.

So when you add the same number to each element of a list,

New mean = Old mean + Added number.

On similar lines, the median is the middle number (d in this case) and will move ahead by
the added number. The new median will be d′=d+x

So when you add the same number to each element of a list,

New median = Old median + Added number

Standard deviation is a measure of dispersion of the numbers around the mean and this
dispersion does not change when the whole bunch moves ahead as it is. Standard deviation
does not depend on where the numbers lie on the number line. It depends on how far the
numbers are from the mean. So standard deviation of 3, 5, 7 and 9 is the same as the
standard deviation of 13, 15, 17 and 19. The relative placement of the numbers in both the
cases will be the same. Hence, if you add the same number to each element of a list,
the standard deviation will stay the same.

Case II:

Let’s now move on to the discussion of multiplying each element by the same positive
number.

The original placing of the numbers on the number line looked like this:

——0—a———b—c———————d———e————————f—g———————

The new placing of the numbers on the number line will look something like this:

——0———a’——————b’———c’————————————d’—————————e—
- etc

The numbers spread out. To understand this, take an example. Say, the initial numbers
were 10, 20 and 30. If you multiply each number by 2, the new numbers are 20, 40 and 60.
The difference between them has increased from 10 to 20.

If you multiply each number by x, the mean also gets multiplied by x. So, if d was the mean
initially, d’ will be the new mean which is x∗d.

New mean = Old mean * Multiplied number

Similarly, the median will also get multiplied by x.

New median = Old median * Multiplied number

What happens to standard deviation in this case? It changes! Since the numbers are now
further apart from the mean, their dispersion increases and hence the standard deviation
also increases. The new standard deviation will be x times the old standard deviation. You
can also establish this using the standard deviation formula.

New standard deviation = Old standard deviation * Multiplied number

The same concept is applicable when you increase each number by the same percentage. It
is akin to multiplying each element by the same number. Say, if you increase each number
by 20%, you are, in effect, multiplying each number by 1.2. So our case II applies here.

Now, think about what happens when you subtract/divide each element by the same
number.

Solving GMAT Standard Deviation Problems By Using as Little Math as


Possible

The other night I taught our Statistics lesson, and when we got to the section of class that
deals with standard deviation, there was a familiar collective groan – not unlike the groan
one encounters when doing compound interest, or any mathematical concept that, when we
learned it in school, involved an intimidating-looking formula.

So, I think it’s time for me to coin an axiom: the more painful the traditional formula
associated with a given topic, the simpler the actual calculations will be on the GMAT.
(Please note, though the axiom is awaiting official mathematical verification by Veritas’ hard-
working team of data scientists, the anecdotal evidence in support of the axiom is
overwhelming.)

So, let’s talk standard deviation. If you’re like my students, your first thought is to start
assembling a list of increasingly frantic questions: Do we need to know that horrible formula I
learned in Stats class? (No.) Do we need to know the relationship between variance and
Standard deviation? (You just need to know that there is a relationship, and that if you can
solve for one, you can solve for the other.) Etc.

So, rather than droning on about what we don’t need to know, let’s boil down what we do
need to know about standard deviation. The good news – it isn’t much. Just make sure
you’ve internalized the following:

* The standard deviation is a measure of the dispersion the elements of the


set around mean. The farther away the terms are from the mean, the larger
the standard deviation.
* If we were to increase or decrease each element of the set by “x,” the
standard deviation would remain unchanged.
* If we were to multiply each element of the set by “x,” the standard
deviation would also be multiplied by “x.”
* If the mean of a set is “m” and the standard deviation is “d,” then to say
that something is within 3 standard deviations of a set is to say that it falls
within the interval of (m – 3d) to (m + 3d.) And to say that something is
within 2 standard deviations of the mean is to say that it falls within the
interval of (m – 2d) to (m + 2d.

That’s basically it. Not anything to get too worked up about. So, let’s see some of these
principles in action to substantiate the claim that we won’t have to do too much arithmetical
grinding on these types of questions:

If d is the standard deviation of x, y, z, what is the standard deviation of x+5, y+5, z+5
?
A) d
B) 3d
C) 15d
D) d+5
E) d+15

If our initial set is x, y, z, and our new set is x+5, y+5, and z+5, then we’re adding the same
value to each element of the set. We already know that adding the same value to each
element of the set does not change the standard deviation. Therefore, if the initial standard
deviation was d, the new standard deviation is also d. We’re done – the answer is A. (You
can see this with a simple example. If your initial set is {1, 2, 3} and your new set is {6, 7, 8}
the dispersion of the set clearly hasn’t changed.

Surely the questions get harder than this, you say. They do, but if you know the
aforementioned core concepts, they’re all quite manageable. Here’s another one:

Some water was removed from each of 6 tanks. If standard deviation of the volumes
of water at the beginning was 10 gallons, what was the standard deviation of the
volumes at the end?

1) For each tank, 30% of water at the beginning was removed


2) The average volume of water in the tanks at the end was 63 gallons

We know the initial standard deviation. We want to know if it’s possible to determine the new
standard deviation after water is removed. To the statements we go!

Statement 1: If 30% of the water is removed from each tank, we know that each term in the
set is multiplied by the same value: 0.7. Well, if each term in a set is multiplied by 0.7, then
the standard deviation of the set is also multiplied by 0.7. If the initial standard deviation was
10 gallons, then the new standard deviation would be 10*(0.7) = 7 gallons. And we don’t
even need to do the math – it’s enough to see that it’s possible to calculate this number.
Therefore, Statement 1 alone is sufficient.

Statement 2: Knowing the average of a set is not going to tell us very much about the
dispersion of the set. To see why, imagine a simple case in which we have two tanks, and
the average volume of water in the tanks is 63 gallons. It’s possible that each tank has
exactly 63 gallons and, if so, the standard deviation would be 0, as everything would equal
the mean. It’s also possible to have one tank that had 126 gallons and another tank that was
empty, creating a standard deviation that would, of course, be significantly greater than 0.
So, simply knowing the average cannot possibly give us our standard deviation. Statement 2
alone is not sufficient to answer the question.
And the answer is A.

Maybe at this point you’re itching for more of a challenge. Let’s look at a slightly tougher
one:

7.51; 8.22; 7.86; 8.36


8.09; 7.83; 8.30; 8.01
7.73; 8.25; 7.96; 8.53

A vending machine is designed to dispense 8 ounces of coffee into a cup. After a test
that recorded the number of ounces of coffee in each of 1000 cups dispensed by the
vending machine, the 12 listed amounts, in ounces, were selected from the data
above. If the 1000 recorded amounts have a mean of 8.1 ounces and a standard
deviation of 0.3 ounces, how many of the 12 listed amounts are within 1.5 standard
deviations of the mean?
A)Four
B) Six
C) Nine
D) Ten
E) Eleven

Okay, so the standard deviation is 0.3 ounces. We want the values that are within 1.5
standard deviations of the mean. 1.5 standard deviations would be (1.5)(0.3) = 0.45 ounces,
so we want all of the values that are within 0.45 ounces of the mean. If the mean is 8.1
ounces, this means that we want everything that falls between a lower bound of (8.1 – 0.45)
and an upper bound of (8.1 + 4.5). Put another way, we want the number of values that fall
between 8.1 – 0.45 = 7.65 and 8.1 + 0.45 = 8.55.

Looking at our 12 values, we can see that only one value, 7.51, falls outside of this range. If
we have 12 total values and only 1 falls outside the range, then the other 11 are clearly
within the range, so the answer is E.
As you can see, there’s very little math involved, even on the more difficult questions.

Takeaway: remember the axiom that the more complex-looking the formula is for a
concept, the simpler the calculations are likely to be on the GMAT. An intuitive
understanding of a topic will always go a lot further on this test than any amount of
arithmetical virtuosity.

Dealing with Standard Deviation

In this post, we will work our way through the concepts of Standard Deviation (SD). Let’s
take a look at how you calculate standard deviation first:
Ai – The numbers in the list

Aavg – Arithmetic mean of the list

n – Number of numbers in the list

Say you have 3 numbers : 11, 13 and 15. Their standard deviation is the “square root of the
average of their squared deviations from the arithmetic mean.” Let’s see what we mean by
this.

Mean of 11, 13 and 15 is 13.

Focus on these words: “deviations from mean”

The important point to note is that SD is a measure of dispersion or deviation from the mean
(the mean is approximately the middle of the list if there are no outliers). In other words, SD
is a measure of whether the numbers are very far away from the mean or close together.
Since GMAT isn’t calculation intensive, you probably won’t need to calculate the actual SD
in the test. The calculations are shown here only to illustrate the concept. But you must have
a feel for how the numbers are distributed around the mean and what that implies for the
SD.

Your statistics book explains how to visualize SD using the number line in detail, therefore, I
am not going to delve deep into it but will quickly recap so that we can move ahead. Recall
that if you plot the numbers on the number line, it gives you a sense of how far the numbers
are from the mean. The farther the numbers, higher is the SD.

Let’s check out a few different cases to internalize the SD concept. Do not calculate anything
in these questions. Just look at the number line for each case and figure out whether it
makes sense to you.

Question: Which set, S or T, has higher SD?

Case 1: S = {3, 3, 3} or T = {0, 10, 20}

Case 2: S = {3, 4, 5} or T = {5, 6, 7}

Case 3: S = {3, 4, 5, 6} or T = {2, 3, 4, 5, 6, 7}

Case 4: S = {1, 3, 5} or T = {1, 1, 3, 5, 5}

Case 5: S = {1, 3, 5} or T = {1, 3, 3, 5}

Case 6: S = {6, 8, 10} or T = {12, 16, 20}


Case 7: S = {6, 8, 10} or T = {3, 4, 5}

Let me represent the first four cases on the number line. Check them out and then think
which set should have the higher SD.

Let’s discuss each of these four cases now.

Case 1: S = {3, 3, 3} or T = {0, 10, 20}

T has higher SD. We will obtain the SD of T by calculating as shown in the example above.
But we don’t really need to calculate it because we see that for set S, SD = 0. Each number
is at the mean and hence has 0 deviation from the mean. Since SD cannot be negative,
whatever the SD of T, it will be higher than the SD of S which is 0.

Case 2: S = {3, 4, 5} or T = {5, 6, 7}

Both sets have the same SD. We can see from the number line that they are equally
dispersed around their respective means.

Case 3: S = {3, 4, 5, 6} or T = {2, 3, 4, 5, 6, 7}

Set T has higher SD. T has two extra numbers which are farther from the mean. Hence
these 2 numbers will add to the total deviation. (There is a caveat here which we will discuss
next week.)

Case 4: S = {1, 3, 5} or T = {1, 1, 3, 5, 5}

T has higher SD. It has two extra numbers far from the mean. (There is a caveat here too!)

What do you think about cases 5, 6, and 7? I will give you the answers to these three cases
in the next post!

How to Quickly Solve Standard Deviation Questions on the GMAT

The quantitative section of the GMAT is designed to test your understanding and application
of concepts you learned in high school. The exam focuses on core mathematical concepts
such as algebra, geometry and statistics. However some concepts are more engrained in
the high school curriculum than others. Everyone’s done addition, multiplication, subtraction
and division, but sometimes figuring out factorials or square roots may be a little more
unusual.

Perhaps no concept perplexes students on the GMAT more than the standard deviation. The
standard deviation (often represented by σ) is measure of dispersion around the mean. It
indicates how close the numbers in a set are to the set’s average. As a simple example, the
sets {5, 10, 15} and {8, 10, 12} both have the same mean (10); however they do not have
the same standard deviation.

Knowing how to calculate the standard deviation is not required on the GMAT, but knowing
how it’s calculated gives you a tremendous edge in answering questions. It’s a four step
process:

1) Find the average (mean) of the set.

2) Find the differences between each element of the set and that average.

3) Square all the differences and take the average of the differences. This
gives you the variance.

4) Take the square root of the variance.

In this example, the average of the first set is clearly 10. The differences between the three
elements are (-5, 0 and -5). Taking the square of these numbers, we get (25, 0 and 25). The
average of these numbers is 50/3 or 16.67. The square root of this number will not be an
integer, but it will be very close to 4. So we can assume roughly ~4 or ~4.1.

In contrast, the second set of numbers will have a much smaller standard deviation. The
average is still 10, but the differences are now (-2, 0 and 2). Taking the square of these
numbers, we get (4, 0 and 4). The average of these numbers is 8/3 or 2.67. The square root
of 2.67 is roughly ~1.6 or ~1.7, but it’s very hard to pin down without a calculator or a lot of
extra time.

This example should help highlight why the standard deviation is not explicitly calculated on
an exam without a calculator: the chances of it being an integer are relatively low. However
the concept it represents and the idea behind it are fair game on the test. One of the simple
takeaways from the math behind the process is that, the farther the number is from the
mean of the set, the more the standard deviation will increase. Specifically, the distance
increases with the square of the difference, so 5 looks much farther out than 2.

This kind of concept can be tested on the exam, but if you know what you’re looking for, you
can answer standard deviation questions very quickly. Let’s look at an example:
For the set {2, 2, 3, 3, 4, 4, 5, 5, x}, which of the following values of x will most
increase the standard deviation?

(A) 1
(B) 2
(C) 3
(D) 4
(E) 5
If you recall the steps to calculating the standard deviation, what we really need to do first is
to calculate the mean. (i.e. how mean are you?) You can add the eight elements together
and divide by eight, but the fact that these elements follow a fairly obvious pattern helps us
as well. The numbers each appear twice, and they are evenly spaced. This means that the
average will be the same as the median, and the median is 3.5. Even if you take the long
way, it shouldn’t take you more than 20 seconds to find that the mean of this set is 3.5

The next step is to take each element and find the difference from the mean, but this is what
we need to do if the goal is to actually calculate the standard deviation. All we’re being
tasked to do here is to determine which number will increase the standard deviation the
most. In this regard, all we need to do is figure out which answer choice is furthest from the
mean. That number will produce the biggest distance, which will then be squared and in turn
produce the biggest difference in standard deviation. So although you can spend a lot of
time calculating every last detail of this question, what it actually comes down to is “which of
these numbers is furthest from 3.5”.

Asking about distance from a specific number is much more straightforward, and probably
an elementary school level question. Yet, if you understand the concept, you can turn a
GMAT question into something a 5th grader could answer (Are you smarter than a 5th
grader?). The answer is thus obviously choice A, as 1 is as far from 3.5 as possible given
only these five choices.
The important thing about the standard deviation is that you will never have to formally
calculate it, but understanding the underlying concept will help you excel at the quantitative
section of the GMAT. Most standard deviation questions hinge primarily on the distance from
the mean, as everything else is just a rote division or addition. Much like taking five practice
exams and getting wildly different scores, having a high variance is bad for knowing what to
expect. Understanding the way standard deviations are tested on the GMAT will help you
consistently get the questions right and reduce the variance of your results (hopefully with a
very high mean).

A 750+ Level Question on SD

Above, we looked at a 750+ level question on mean, median and range concepts of
Statistics. Here we have a 750+ level question on standard deviation concept of Statistics.
We do hope you enjoy checking it out.

Question: Given that set S has four odd integers and their range is 4, how
many distinct values can the standard deviation of S take?

(A) 3
(B) 4
(C) 5
(D) 6
(E) 7

Solution: Recall what standard deviation is. It measures the dispersion of all the elements
from the mean. It doesn’t matter what the actual elements are and what the arithmetic mean
is – the standard deviation of set {1, 3, 5} will be the same as the standard deviation of set
{6, 8, 10} since in each set there are 3 elements such that one is at mean, one is 2 below the
mean and one is 2 above the mean. So when we calculate the standard deviation, it will give
us exactly the same value for both sets. Similarly, standard deviation of set {1, 3, 3, 5, 6} will
be the same as standard deviation of {10, 12, 12, 14, 15} and so on. But note that the
standard deviation of set {25, 27, 29, 29, 30} will be different because it represents a
different arrangement on the number line.
Let’s look at the given question now.

Set S has four odd integers such that their range is 4. So it could look something like this {1,
x, y, 5} when the elements are arranged in ascending order. Note that we have taken just
one example of what set S could look like. There are innumerable other ways of
representing it such as {3, x, y, 7} or {11, x, y, 15} etc.

Now in our example, x and y can take 3 different values: 1, 3 or 5

x and y could be same or different but x would always be smaller than or equal to y.

- If x and y were same, we could select the values of x and y in 3 different ways: both could
be 1; both could be 3; both could be 5

- If x and y were different, we could select the values of x and y in 3C2 ways: x could be 1
and y could be 3; x could be 1 and y could be 5; x could be 3 and y could be 5.

For clarification, let’s enumerate the different ways in which we can write set S:

{1, 1, 1, 5}, {1, 3, 3, 5}, {1, 5, 5, 5}, {1, 1, 3, 5}, {1, 1, 5, 5}, {1, 3, 5, 5}

These are the 6 ways in which we can choose the numbers in our example.

Will all of them have unique standard deviations? Do all of them represent different
distributions on the number line? Actually, no!

Standard deviations of {1, 1, 1, 5} and {1, 5, 5, 5} are the same. Why?

Standard deviation measures distance from mean. It has nothing to do with the actual value
of mean and actual value of numbers. Note that the distribution of numbers on the number
line is the same in both cases. The two sets are just mirror images of each other.

For the set {1, 1, 1, 5}, mean is 2. Three of the numbers are distance 1 away from mean and
one number is distance 3 away from mean.

For the set {1, 5, 5, 5}, mean is 4. Three of the numbers are distance 1 away from mean and
one number is distance 3 away from mean.

The deviations in both cases are the same -> 1, 1, 1 and 3. So when we square the
deviations, add them up, divide by 4 and then find the square root, the figure we will get will
be the same.

Similarly, {1, 1, 3, 5} and {1, 3, 5, 5} will have the same SD. Again, they are mirror images of
each other on the number line.

The rest of the two sets: {1, 3, 3, 5} and {1, 1, 5, 5} will have distinct standard deviations
since their distributions on the number line are unique.

In all, there are 4 different values that standard deviation can take in such a case.

Answer (B)

Using the Standard Deviation Formula on the GMAT


We have discussed standard deviation (SD) in detail above. We know what the formula is for
finding the standard deviation of a set of numbers, but we also know that GMAT will not ask
us to actually calculate the standard deviation because the calculations involved would be
way too cumbersome. It is still a good idea to know this formula, though, as it will help us
compare standard deviations across various sets – a concept we should know well.

Today, we will look at some GMAT questions that involve sets with similar standard
deviations such that it is hard to tell which will have a higher SD without properly
understanding the way it is calculated. Take a look at the following question:

Which of the following distribution of numbers has the greatest standard deviation?

(A) {-3, 1, 2}
(B) {-2, -1, 1, 2}
(C) {3, 5, 7}
(D) {-1, 2, 3, 4}
(E) {0, 2, 4}

At first glance, these sets all look very similar. If we try to plot them on a number line, we will
see that they also have similar distributions, so it is hard to say which will have a higher SD
than the others. Let’s quickly review their deviations from the arithmetic means:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2


For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice C, the mean = 5 and the deviations are 2, 0, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2
For answer choice E, the mean = 2 and the deviations are 2, 0, 2

We don’t need to worry about the arithmetic means (they just help us calculate the deviation
of each element from the mean); our focus should be on the deviations. The SD formula
squares the individual deviations and then adds them, then the sum is divided by the
number of elements and finally, we find the square root of the whole term. So if a deviation is
greater, its square will be even greater and that will increase the SD.

If the deviation increases and the number of elements increases, too, then we cannot be
sure what the final effect will be – an increased deviation increases the SD but an increase
in the number of elements increases the denominator and hence, actually decreases the
SD. The overall effect as to whether the SD increases or decreases will vary from case to
case.

First, we should note that answers C and E have identical deviations and numbers of
elements, hence, their SDs will be identical. This means the answer is certainly not C or E,
since Problem Solving questions have a single correct answer.

Let’s move on to the other three options:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2


For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2
For answer choice D, the mean = 2 and the deviations are 3, 0, 1, 2

Comparing answer choices A and D, we see that they both have the same deviations, but D
has more elements. This means its denominator will be greater, and therefore, the SD of
answer D is smaller than the SD of answer A. This leaves us with options A and B:

For answer choice A, the mean = 0 and the deviations are 3, 1, 2


For answer choice B, the mean = 0 and the deviations are 2, 1, 1, 2

Now notice that although two deviations of answers A and B are the same, answer choice A
has a higher deviation of 3 but fewer elements than answer choice B. This means the SD of
A will be higher than the SD of B, so the SD of A will be the highest. Hence, our answer
must be A.
Let’s try another one:

Which of the following data sets has the third largest standard deviation?

(A) {1, 2, 3, 4, 5}
(B) {2, 3, 3, 3, 4}
(C) {2, 2, 2, 4, 5}
(D) {0, 2, 3, 4, 6}
(E) {-1, 1, 3, 5, 7}

How would you answer this question without calculating the SDs? We need to arrange the
sets in increasing SD order. Upon careful examination, you will see that the number of
elements in each set is the same, and the mean of each set is 3.

Deviations of answer choice A: 2, 1, 0, 1, 2


Deviations of answer choice B: 1, 0, 0, 0, 1 (lowest SD)
Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3
Deviations of answer choice E: 4, 2, 0, 2, 4 (highest SD)

Obviously, option B has the lowest SD (the deviations are the smallest) and option E has the
highest SD (the deviations are the greatest). This means we can automatically rule these
answers out, as they cannot have the third largest SD.

Deviations of answer choice A: 2, 1, 0, 1, 2


Deviations of answer choice C: 1, 1, 1, 1, 2
Deviations of answer choice D: 3, 1, 0, 1, 3

Out of these options, answer choice D has a higher SD than answer choice A, since it has
higher deviations of two 3s (whereas A has deviations of two 2s). Also, C is more tightly
packed than A, with four deviations of 1. If you are not sure why, consider this:

The square of deviations for C will be 1 + 1+ 1 + 1 + 4 = 8


The square of deviations for A will be 4 + 1 + 0 + 1 + 4 = 10

So, A will have a higher SD than C but a lower SD than D. Arranging from lowest to highest
SD’s, we get: B, C, A, D, E. Answer choice A has the third highest SD, and therefore, A is
our answer.

You might also like