0% found this document useful (0 votes)

46 views21 pages

Math 133 - Unit 10 Summary Statistics

Uploaded by

Astro Hajeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views21 pages

Math 133 - Unit 10 Summary Statistics

Uploaded by

Astro Hajeer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Math 133 – Engineering Mathematics 1

Unit 10 – Summary Statistics

10.1 Frequency Distribution Table and its Histogram

Let us consider a set of grades from a sample of 150 Math 133 students:

MATH 133 Grades (Raw Data)

What do we see from the above result? All we can tell is that there are students who have grades of 83.8,
some scored 52.3, some 62.6, etc. The above presentation is just a clutter of numerical grades.

Just as our bedrooms can be filled with nothing but clutter, we need to organise this clutter so that we
can make some sense of the data we just collected. Therefore, we need to organise our data, just like we
need to organise the things in our bedrooms.

Organising Data into a Distribution

A distribution summarises the organisation of raw data into

• What values each group or class takes, and
• How often each class occurs (i.e. the frequency of each class).

We can summarise the results of the grades of students into a Frequency Distribution as follows. First of
all, we need to know the following:

How to draw a frequency distribution for Quantitative Data: Continuous Case

• This distribution is defined as a Grouped Frequency Distribution.
• Categories are defined as intervals of numbers called classes or groups.
• Classes (or groups) must not overlap.
• Classes (or groups) of equal width are preferred.
Step 1: Determine the minimum (Min) and the maximum (Max) values in the data set.

It is always a good idea to look through the grades and see what the minimum observation is and what
the maximum observation is in the data set.

In the above data set, we see that the minimum grade is

𝑀𝑖𝑛 = 34.6
and the maximum grade is
𝑀𝑎𝑥 = 97.5

Step 2: One good way to define the classes would be as follows:

• 30 to <40 – Grades that are from 30 to less than 40 are included in this class.
• 40 to <50 – Grades that are from 40 to less than 50 are included in this class.
• 50 to <60 – Grades that are from 50 to less than 60 are included in this class.
• 60 to <70 – Grades that are from 60 to less than 70 are included in this class.
• 70 to <80 – Grades that are from 70 to less than 80 are included in this class.
• 80 to <90 – Grades that are from 80 to less than 90 are included in this class.
• 90 to 100 – Grades that are from 90 to 100 are included in this class.
Each class has a width of 10 grades.

Step 3: We generate the Frequency Distribution Table.

Cumulative
Relative
Class Limits Tally Frequency Relative
Frequency
Frequency
30 – <40 / 1 0.0067 0.0067
40 – <50 ///// //// 9 0.0600 0.0667
///// /////
50 – <60 ///// ///// ///// 34 0.2267 0.2933
///// ////
///// /////
60 – <70 ///// ///// ///// 29 0.1933 0.4867
////
///// /////
///// ///// /////
70 – <80 42 0.2800 0.7667
///// ///// /////
//
///// /////
80 – <90 ///// ///// ///// 28 0.1867 0.9533
///
90 – 100 ///// // 7 0.0467 1
Total 150 1.0001 100%

Note: The total for the Relative Frequency should be exactly 1. Sometimes, we may be off slightly because
of rounding errors.
Step 4: We draw the Histogram.

Math 133 - Frequency vs Grades

45
40
35
30
Frequency

25
20
15
10
5
0
30 - <40 40 - <50 50 - <60 60 - <70 70 - <80 80 - <90 90 - 100
Grades

This is my favourite way of depicting the data. Advantages include seeing the class intervals directly, no
time wasted on figuring any “hidden” clues.

10.2 Shapes of Distributions:

Four basic distribution shapes:

1. Symmetric – Three types of symmetric shapes:
i. Bell-shaped (or Mound-shaped)
ii. U-shaped
iii. Uniform
2. Skewed Right
3. Skewed Left
4. Irregular
Examples:

• Symmetric bell-shaped
45

30
Frequency

0
1 2 3 4 5 6 7 8 9
Value of Variable

• Symmetric U-shaped

30
Frequency

0
1 2 3 4 5 6 7 8 9
Value of Variable

• Symmetric Uniform

30
Frequency

0
1 2 3 4 5 6 7 8 9 10 11 12

Value of the Variable

• Skewed Right – Tail to the right.

Frequency
30

0
1 2 3 4 5 6 7 8 9 10 11

Value of Variable

• Skewed Left – Tail to the left.

35
Frequency

0
1 2 3 4 5 6 7 8 9 10 11

Value of Variable
10.3 Outlier

• An outlier is an observation that lies outside the overall pattern of a distribution.

• A large gap in the distribution is typically a sign of an outlier.

Example:

Question: Are Alaska and Florida outliers because of human error during the collection of data?
Thought: Maybe not.
• Alaska is too cold so not many seniors may want to live there.
• Florida is warm so many seniors may want to live there.

What happens when you get outliers?

• If you have the resource to collect / verify the data again, go do it. It is frowned upon to reject outliers
without verification.
• If not, use robust statistical methods where outliers do not affect the results too much.

So if possible, go to Alaska and Florida and verify the percentage of seniors in these two states.
10.4 Dot Plots

A dot plot is a graph in which each observation is represented by a dot placed over a numerical value each
time that observation is observed

Example: Thirty nine participants took part in a fishing competition. Each participant is represented by a
dot placed over the number of fishes caught by the participant.

Number of fishes caught

We can see that the above distribution can be considered as right skewed since the tail is somewhat to
the right.

10.5 Describing Data with Numerical Measures

Symbols: Size of Population, N

Size of Sample, n

10.5.1 The Mean (or Average)

Symbols: Population mean, 𝜇

Sample mean, 𝑥̅

𝑆𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Formula: 𝑀𝑒𝑎𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Let 𝑥𝑖 denote the 𝑖 𝑡ℎ observation in the data set. Thus,

∑ 𝑥𝑖 𝑥1 +𝑥2 +𝑥3 +⋯+𝑥𝑁

• Population mean, 𝜇 = 𝑁
= 𝑁

∑ 𝑥𝑖 𝑥1 +𝑥2 +𝑥3 +⋯+𝑥𝑛

• Sample mean, 𝑥̅ = 𝑛
= 𝑛

Note: Sometimes, we just simplify the notation ∑ 𝑥𝑖 to simply ∑ 𝑥.

Example: We are given the heights in inches of a simple random sample of 25 women:
58.2, 59.5, 60.7, 60.9, 61.9, 61.9, 62.2, 62.2, 62.4, 62.9, 63.1, 63.9, 63.9, 64.0, 64.5, 64.1, 64.8, 65.2, 65.7,
66.2, 66.7, 67.1, 67.8, 68.9 and 69.6

∴ The sample mean height of the 25 women is

∑𝑥 58.2+59.5+60.7+⋯+69.6
𝑥̅ = 𝑛
= 25

1598.3
= 25
= 63.932 𝑖𝑛

10.5.2 The Median

Symbol: M

Note: The first step in determining the median is to rearrange the data in ascending order, i.e. from
smallest to largest. The median is the middle-most value when the data values are arranged in ascending
order.

𝑛+1
Location Formula: 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 =
2

Note: The above formula is NOT for the value of the median. It is for the position of the median.

Example 1: Let us consider the heights in inches of the 25 women in the previous example.
58.2, 59.5, 60.7, 60.9, 61.9, 61.9, 62.2, 62.2, 62.4, 62.9, 63.1, 63.9, 63.9, 64.0, 64.5, 64.1, 64.8, 65.2, 65.7,
66.2, 66.7, 67.1, 67.8, 68.9 and 69.6
What is the median of this data set?

Solution:
To find the median, the first step is to rearrange the data in ascending order, i.e. from smallest to largest.

Position 1 2 3 4 5 6 7 8 9 10 11 12 13
Height 58.2 59.5 60.7 60.9 61.9 61.9 62.2 62.2 62.4 62.9 63.1 63.9 63.9

Position 14 15 16 17 18 19 20 21 22 23 24 25
Height 64.0 64.1 64.5 64.8 65.2 65.7 66.2 66.7 67.1 67.8 68.9 69.6

The sample size is 𝑛 = 25 observations.

𝑛+1 25+1 26
∴ The median is located at the 2
= 2
= 2
= 13𝑡ℎ position.

And the value at the 13th position, i.e. the median is 𝑀 = 63.9 inches.
Example 2: A student reported her 10 grades,
77, 86, 58, 67, 75, 77, 71, 65, 77 and 92
What is the median of this data set?

Solution:
The first step is to rearrange the data in ascending order:

Position 1 2 3 4 5 6 7 8 9 10
Grade 58 65 67 71 75 77 77 77 86 92

5th position 6th position

The sample size is 𝑛 = 10 observations.

𝑛+1 10+1 11 1 𝑡ℎ
∴ The median is located at the = = =5 position.
2 2 2 2

1
Therefore, we take the value at the 5 2 𝑡ℎ position to be exactly halfway between the values of 75 and 77,
or the midpoint value of 75 and 77.

75+77 152
Thus, the median of the above data set is 𝑀= 2
= 2
= 76

10.5.3 The Mode

Symbol: m

The mode of a variable is the value that occurs most often in the data set.
It is a good idea to rearrange the data in order so that we can look for values that occur most often.

Example 1: Let us look at the 10 grades shown in the previous example:

77, 86, 58, 67, 75, 77, 71, 65, 77 and 92
What is the mode of the data set?

Solution:
The first step is to rearrange the data in order:

Position 1 2 3 4 5 6 7 8 9 10
Grade 58 65 67 71 75 77 77 77 86 92

Occurs most often

We can see that the value that occurs most often is 77.

Hence, the mode of the data set is 𝑚 = 77

It is possible to have more than one mode in some data sets.

Example 2: Let us consider the heights in inches of the 25 women in a previous example.
58.2, 59.5, 60.7, 60.9, 61.9, 61.9, 62.2, 62.2, 62.4, 62.9, 63.1, 63.9, 63.9, 64.0, 64.5, 64.1, 64.8, 65.2, 65.7,
66.2, 66.7, 67.1, 67.8, 68.9 and 69.6
What is the mode of this data set?

Solution: The first step is to rearrange the data set in order:

Position 1 2 3 4 5 6 7 8 9 10 11 12 13
Height 58.2 59.5 60.7 60.9 61.9 61.9 62.2 62.2 62.4 62.9 63.1 63.9 63.9

Occurs twice Occurs twice Occurs twice

Position 14 15 16 17 18 19 20 21 22 23 24 25
Height 64.0 64.1 64.5 64.8 65.2 65.7 66.2 66.7 67.1 67.8 68.9 69.6

We can see that there are three values that occur twice (this data set’s the highest number of
occurrences), i.e. 61.9, 62.2 and 63.9.

Therefore, there are three modes in this data set,

𝑚1 = 61.9, 𝑚2 = 62.2 and 𝑚3 = 63.9

We call these:
• Data with one mode = Unimodal data
• Data with two modes = Bimodal data
• Data with three modes = Trimodal data

If a data has more than three observations that occur most often, it renders the definition of a mode
impractical. Therefore, it will no longer be useful to denote its modes.

10.5.3.1 Impact of skewness on Unimodal data

1. Symmetric bell-shaped (mound-shaped) data

2. Right-skewed data

3. Left-skewed data

Notice from the above three diagrams that:

• The mean is most affected by skewness and outliers, i.e. the mean is pulled in the direction of the
tail or the direction of the skewness.
• The median somewhat affected by skewness and outliers.
• The mode is unaffected by skewness and outliers.
10.6 Measures of Spread or Dispersion

Given a data set, we usually want to know its mean and a measure of the spread.

Mean Mean

Data is not so spread out, it is quite concentrated Data is quite spread out, it is not so concentrated
near the mean, i.e. variability is small. near the mean, i.e. variability is large.

There are a few ways to define the spread of which perhaps the most popular definition is the Standard
Deviation. Before we can define the Standard Deviation, we need to calculate the Variance.

10.6.1 The Variance

Symbol: For population, 𝜎 2

For sample, 𝑠 2

1
Definition: For population variance, 𝜎 2 = 𝑁 ∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2

1
For sample variance, 𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2

1 1
Formulas: For population variance, 𝜎 2 = 𝑁 (∑ 𝑥𝑖 2 − 𝑁 (∑ 𝑥𝑖 )2 )

1 1
For sample variance, 𝑠 2 = 𝑛−1 (∑ 𝑥𝑖 2 − 𝑛 (∑ 𝑥𝑖 )2 )

What is the difference between ∑𝑛𝑖=1 𝑥𝑖 2 and (∑𝑛𝑖=1 𝑥𝑖 )2 ?

• ∑𝑛𝑖=1 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + ⋯ + 𝑥𝑛 2 We square each value before adding the results.

• (∑𝑛𝑖=1 𝑥𝑖 )2 = (𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 )2 We add the values, then we square the result.

Example: Let us look at the sample of the 10 grades shown reported by the student as in a previous
example:
58, 65, 67, 71, 75, 77, 77, 77, 86 and 92
What is the variance of this data set?

Solution: As before, it is always a good idea to do our calculation in a table.

Since we also require the sum ∑𝑛𝑖=1 𝑥𝑖 2 , we need the square of each 𝑥𝑖 value. So we generate another
row, 𝑥𝑖 2 .

𝒊 1 2 3 4 5 6 7 8 9 10 Sum, ∑

𝒙𝒊 58 65 67 71 75 77 77 77 86 92 745
𝒙𝒊 𝟐 3364 4225 4489 5041 5625 5929 5929 5929 7396 8464 56391

Thus, the variance of the sample of the student’s ten grades is

1 1
𝑠 2 = 𝑛−1 (∑ 𝑥𝑖 2 − 𝑛 (∑ 𝑥𝑖 )2 )

1 1
= 10−1 (56391 − 10 (745)2 )

1 1
= (56391 − (555025))
9 10

1 888.5
= 9 (56391 − 55502.5) = 9
≈ 98.722222

10.6.2 The Standard Deviation

Symbol: For population, 𝜎

For sample, 𝑠

Definition: The standard deviation is the square root of the variance.

Formulas: For population std. dev., 𝜎 = √𝜎 2

For sample std. dev., 𝑠 = √𝑠 2

Example: Let us look at the sample of the 10 grades shown reported by the student:
58, 65, 67, 71, 75, 77, 77, 77, 86 and 92
What is the standard deviation of this data set?

Solution: From the previous example, we calculated the variance of the data set to be
888.5
𝑠2 = ≈ 98.722222
9

888.5
∴ The std. dev. of the data set is 𝑠 = √𝑠 2 = √ ≈ 9.935906
9
10.6.3 Large Populations or Large Samples

For large populations or large samples where values are usually repeated, it is often necessary to tabulate
the values and their corresponding frequencies instead of writing a long list of values. Suppose 𝑥1 appears
𝑓1 number of times, 𝑥2 appears 𝑓2 number of times, 𝑥3 appears 𝑓3 number of times, … and so on as in the
following table,

Value, 𝑥𝑖 𝑥1 𝑥2 𝑥3 𝑥4 ⋯ ⋯ Sum, Σ

Frequency, 𝑓𝑖 𝑓1 𝑓2 𝑓3 𝑓4 ⋯ ⋯ ∑ 𝑓𝑖

If the data is from a population, we have

Population size, 𝑁 = ∑ 𝑓𝑖

∑𝑥 ∑(𝑥𝑖 𝑓𝑖 ) ∑(𝑥𝑖 𝑓𝑖 )
Mean, 𝜇= 𝑁
= ∑ 𝑓𝑖
= 𝑁

2 2
(∑ 𝑥)2 (∑(𝑥𝑖 𝑓𝑖 ))
∑ 𝑥2− ∑(𝑥𝑖 2 𝑓𝑖 )− ∑(𝑥𝑖 2 𝑓𝑖 )−
(∑(𝑥𝑖 𝑓𝑖 ))
2 𝑁 ∑ 𝑓𝑖 𝑁
Variance, 𝜎 = = ∑ 𝑓𝑖
=
𝑁 𝑁

If the data is from a sample, we have

Sample size, 𝑛 = ∑ 𝑓𝑖

∑𝑥 ∑(𝑥𝑖 𝑓𝑖 ) ∑(𝑥𝑖 𝑓𝑖 )
Mean, 𝑥̅ = = ∑ 𝑓𝑖
=
𝑛 𝑛

2 2
(∑ 𝑥)2 (∑(𝑥𝑖 𝑓𝑖 ))
∑ 𝑥2− ∑(𝑥𝑖 2 𝑓𝑖 )− ∑(𝑥𝑖 2 𝑓𝑖 )−
(∑(𝑥𝑖 𝑓𝑖 ))
2 𝑛 ∑ 𝑓𝑖 𝑛
Variance, 𝑠 = 𝑛−1
= ∑ 𝑓𝑖 −1
= 𝑛−1

Therefore, we need to expand the table as follows:

Value, 𝑥𝑖 𝑥1 𝑥2 𝑥3 𝑥4 ⋯ ⋯ Sum, Σ

Frequency, 𝑓𝑖 𝑓1 𝑓2 𝑓3 𝑓4 ⋯ ⋯ ∑ 𝑓𝑖

𝑥𝑖 𝑓𝑖 𝑥1 𝑓1 𝑥2 𝑓2 𝑥3 𝑓3 𝑥4 𝑓4 ⋯ ⋯ ∑(𝑥𝑖 𝑓𝑖 )

𝑥𝑖 2 𝑓𝑖 𝑥1 2 𝑓1 𝑥2 2 𝑓2 𝑥3 2 𝑓3 𝑥4 2 𝑓4 ⋯ ⋯ ∑(𝑥𝑖 2 𝑓𝑖 )
As usual, the standard deviation is the square root of the variance,

Population standard deviation, 𝜎 = √𝜎 2

and
Sample standard deviation, 𝑠 = √𝑠 2

Example: The owner of a DVD rental shop meticulously kept a record of how many DVDs were
rented by customers who entered his shop from the day he opened his business until the day he closed.
The following table shows the number of DVDs, 𝑥𝑖 , rented by the corresponding number of customers, 𝑓𝑖 ,
out of every 100 customers who entered his shop.

No. of DVDs rented, 𝑥𝑖 0 1 2 3 4 5

No. of customers, 𝑓𝑖 6 58 22 10 3 1

Find the average number of DVDs rented by each customer and the standard deviation of the number of
DVDs rented by his customers.

Solution: We expand the above table as follows:

Value, 𝑥𝑖 0 1 2 3 4 5 Sum, Σ

Frequency, 𝑓𝑖 6 58 22 10 3 1 100

𝑥𝑖 𝑓𝑖 0 58 44 30 12 5 149

𝑥𝑖 2 𝑓𝑖 0 58 88 90 48 25 309
The size of this population being studied is
𝑁 = ∑ 𝑓𝑖 = 100
The total number of DVDs rented is
∑ 𝑥 = ∑ 𝑥𝑖 𝑓𝑖 = 149

∴ The average number of DVDs rented per customer is

∑ 𝑥𝑖 𝑓𝑖 149
𝜇= = = 1.49
𝑁 100

The variance of the number of DVDs rented by his customers is

(∑(𝑥𝑖 𝑓𝑖 ))2 1492 22201

∑(𝑥𝑖 2 𝑓𝑖 ) − 309 − 100 309 − 100 309 − 222.01 86.99
𝜎2 = 𝑁 = = = = = 0.8699
𝑁 100 100 100 100

∴ The std dev of the number of DVDs rented by his customers is

𝜎 = √𝜎 2 = √0.8699 ≈ 0.932684
10.7 Percentiles
- Measures of relative standing.

The 𝒑𝒕𝒉 percentile of a set of data is a value 𝒙𝒑 with 𝒑% of the data values less than it and (𝟏𝟎𝟎 − 𝒑)%
of the data values greater than it.

Example: If 80% of students have marks lower than yours and 20% have marks higher than yours,
then your mark is at the 80th percentile.

10.7.1 Quartiles

Quartiles divide an ordered, from smallest to largest, data set into four groups each containing as close to
25% of the data as possible.

Graphical example:

25% 25% 25% 25%

𝑄1 𝑄2 = 𝑀 𝑄3

We can now define the following quartiles:

• 𝑄1 = 1st (Lower) Quartile = 25𝑡ℎ percentile = 𝑥25

• 𝑄2 = 2nd Quartile = 50𝑡ℎ percentile = 𝑥50 . This is also the Median, M.

• 𝑄3 = 3rd (Upper) Quartile = 75𝑡ℎ percentile = 𝑥75

Location Formulas: Recall that n is the number of observations in the data set.

𝑛+1
• 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄1 = 4

2(𝑛+1) 𝑛+1
• 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄2 (𝑜𝑟 𝑀) = 4
= 2

3(𝑛+1)
• 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑄3 = 4

Example: Let us consider the following data set of the number of years that 24 HIV patients lived
before succumbing to AIDS: 0.6, 1.6, 2.1, 2.3, 2.9, 3.4, 3.7, 4.1, 4.5, 5.3, 1.2, 1.9, 2.3, 2.5, 3.3, 3.6, 3.8, 4.2,
4.7, 7.4, 1.5, 2.8, 3.9 and 4.9.
What are the quartiles of the above data set?

Solution: The data set must be rearranged in order from smallest to largest before we proceed to
determine the positions of the quartiles:

Position 1 2 3 4 5 6 7 8 9 10 11 12
Years 0.6 1.2 1.5 1.6 1.9 2.1 2.3 2.3 2.5 2.8 2.9 3.3

6𝑡ℎ position 7𝑡ℎ position

1𝑡ℎ
64 position
Position 13 14 15 16 17 18 19 20 21 22 23 24
Years 3.4 3.6 3.7 3.8 3.9 4.1 4.2 4.5 4.7 4.9 5.3 7.4

18𝑡ℎ position 19𝑡ℎ position

3𝑡ℎ
18 4 position

There are altogether 𝑛 = 24 observations.

𝑛+1 24+1 25 1𝑡ℎ

• The location of 𝑄1 is at the 4
= 4
= 4
= 64 position

1𝑡ℎ
Since 𝑄1 is the value at the 6 4 position, we can calculate that

1 1
𝑄1 = 2.1 + (2.3 − 2.1) = 2.1 + (0.2) = 2.1 + 0.05 = 2.15
4 4
𝑛+1 24+1 25 1𝑡ℎ
• The location of 𝑄2 or M is at the 2
= 2
= 2
= 12 2 position

1𝑡ℎ
Since 𝑄2 or M is the value at the 12 2 position, we can calculate that

1 1
𝑄2 = 𝑀 = 3.3 + (3.4 − 3.3) = 3.3 + (0.1) = 3.3 + 0.05 = 3.35
2 2

3(𝑛+1) 3(24+1) (3)(25) 3𝑡ℎ

• The location of 𝑄3 is at the 4
= 4
= 4
= 18 4 position

3𝑡ℎ
Since 𝑄3 is the value at the 18 4 position, we can calculate that

3 3
𝑄3 = 4.1 + (4.2 − 4.1) = 4.1 + (0.1) = 4.1 + 0.075 = 4.175
4 4

10.7.2 Five-Number Summary

The 5-number summary of a data set consists of

𝑀𝑖𝑛 𝑄1 𝑀 𝑄3 𝑀𝑎𝑥

Where 𝑀𝑖𝑛 = the smallest observation in the data set,

𝑄1 = the 1st quartile of the data set,
𝑀 = the median of the data set,
𝑄3 = the 3rd quartile of the data set, and
𝑀𝑎𝑥 = the largest observation in the data set.

10.7.2.1 Interquartile Range

Symbol: 𝐼𝑄𝑅

Formula: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
10.7.2.2 Lower Fence and Upper Fence

Symbols: Lower Fence, 𝐿𝐹

Upper Fence, 𝑈𝐹

Formulas: 𝐿𝐹 = 𝑄1 − 1.5 × 𝐼𝑄𝑅

𝑈𝐹 = 𝑄3 + 1.5 × 𝐼𝑄𝑅

Example: Let us consider again the data in the previous example:

Position 1 2 3 4 5 6 7 8 9 10 11 12
Years 0.6 1.2 1.5 1.6 1.9 2.1 2.3 2.3 2.5 2.8 2.9 3.3

Position 13 14 15 16 17 18 19 20 21 22 23 24
Years 3.4 3.6 3.7 3.8 3.9 4.1 4.2 4.5 4.7 4.9 5.3 7.4

We have already calculated that 𝑄1 = 2.15, 𝑀 = 3.35 and 𝑄3 = 4.175,

and we note that the smallest observation is 0.6 years and the largest observation is 7.4 years.

Therefore, we can write the 5-number summary for the above data set as

𝑀𝑖𝑛 = 0.6 𝑄1 = 2.15 𝑀 = 3.35 𝑄3 = 4.175 𝑀𝑎𝑥 = 7.4

The Interquartile Range is

𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 4.175 − 2.15 = 2.025

The Lower Fence is

𝐿𝐹 = 𝑄1 − 1.5 × 𝐼𝑄𝑅 = 2.15 − 1.5 × 2.025 = −0.8875

The Upper Fence is

𝑈𝐹 = 𝑄3 + 1.5 × 𝐼𝑄𝑅 = 4.175 + 1.5 × 2.025 = 7.2125

10.7.3 Boxplot (or Box-and-Whiskers) Plot

This is a graphical presentation of the 5-number summary and it defines outliers if any exist in the data
set. The following are steps to construct a boxplot:
1. Obtain the 5-Number Summary, the Interquartile Range, and the Lower Fence and the Upper
Fence.
2. Draw and label a horizontal line (axis) to represent the scale of measurement, and locate Q1, M
and Q3 on it.
3. Draw a box between Q1 and Q3 above the axis and a line within the box indicating M.
4. Draw a dotted line to indicate the Lower Fence (LF) and another dotted line to indicate the Upper
Fence (UF).
5. Draw whiskers from the left side of the box to the first data above the LF and from the right side
of the box to the first data below the UF.
6. Observations to the left of the LF and to the right of the UF are outliers and are marked with
asterisks.

Outliers Outliers
∗ ∗ ∗ ∗

x
Min LF First data 𝑄1 M 𝑄3 First data UF Max
above LF below UF
Axis Label

Example: Let us draw a boxplot based on the data in the previous example:

Position 1 2 3 4 5 6 7 8 9 10 11 12
Years 0.6 1.2 1.5 1.6 1.9 2.1 2.3 2.3 2.5 2.8 2.9 3.3

Position 13 14 15 16 17 18 19 20 21 22 23 24
Years 3.4 3.6 3.7 3.8 3.9 4.1 4.2 4.5 4.7 4.9 5.3 7.4

We have already obtained the 5-number summary, i.e.

𝑀𝑖𝑛 = 0.6 𝑄1 = 2.15 𝑀 = 3.35 𝑄3 = 4.175 𝑀𝑎𝑥 = 7.4

As well as the Lower Fence and the Upper Fence, i.e.

𝐿𝐹 = −0.8875 and 𝑈𝐹 = 7.2125

The boxplot based on the given data set is as follows:

∗ Outlier

t
−2 −1 0 1 2 3 4 5 6 7 8
𝐿𝐹 = −0.8875 𝑀𝑖𝑛 = 0.6 𝑄1 = 2.15 𝑀 = 3.35 𝑄3 = 4.175 𝑈𝐹 = 7.2125 𝑀𝑎𝑥 = 7.4
First data below
𝑈𝐹 𝑖𝑠 5.3
No. of years a HIV patient survives

The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)
Chapter 3 - Describing Comparing Data
No ratings yet
Chapter 3 - Describing Comparing Data
21 pages
Chap1 Lesson 2
No ratings yet
Chap1 Lesson 2
10 pages
1 Statistics 23
No ratings yet
1 Statistics 23
98 pages
285 Notes
100% (1)
285 Notes
45 pages
Statistics
No ratings yet
Statistics
46 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
CH 1 Central Tendency Class
No ratings yet
CH 1 Central Tendency Class
34 pages
MMW Module 4 - Statistics
No ratings yet
MMW Module 4 - Statistics
18 pages
STPDF2 - Descriptive Statistics
100% (1)
STPDF2 - Descriptive Statistics
74 pages
Measure of Central Tendency
100% (1)
Measure of Central Tendency
70 pages
CAS - Descriptive Statistics - Final PPT-1
No ratings yet
CAS - Descriptive Statistics - Final PPT-1
112 pages
Educational Statistics
100% (1)
Educational Statistics
106 pages
MMW Mathematics As A Tool
No ratings yet
MMW Mathematics As A Tool
37 pages
Final Term Notes Ands
No ratings yet
Final Term Notes Ands
43 pages
Psychological Statistics Midterm - 2023 2024
No ratings yet
Psychological Statistics Midterm - 2023 2024
7 pages
Statistics
No ratings yet
Statistics
13 pages
Measures of Central TendencyGrouped Module 1
No ratings yet
Measures of Central TendencyGrouped Module 1
10 pages
Statistical Analysis With Software Application - Week2
No ratings yet
Statistical Analysis With Software Application - Week2
76 pages
Statistics
No ratings yet
Statistics
12 pages
Lecture Statistics
No ratings yet
Lecture Statistics
23 pages
Frequency Distribution PDF
No ratings yet
Frequency Distribution PDF
36 pages
Summry Biostatstics
No ratings yet
Summry Biostatstics
32 pages
Statistics, mg4
No ratings yet
Statistics, mg4
58 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Research 3 Quarter 3 - MELC 1 Week 1-2 Inferential Statistics
No ratings yet
Research 3 Quarter 3 - MELC 1 Week 1-2 Inferential Statistics
39 pages
Lecture 5 Measures of Central Tendency
No ratings yet
Lecture 5 Measures of Central Tendency
29 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
105 pages
Basic of Statistical Data
No ratings yet
Basic of Statistical Data
15 pages
Screenshot 2025-02-24 at 5.26.02 PM
No ratings yet
Screenshot 2025-02-24 at 5.26.02 PM
20 pages
Lesson 3 Measures of Central Tendency and Location
No ratings yet
Lesson 3 Measures of Central Tendency and Location
16 pages
Chapter 15 (3) NNN
No ratings yet
Chapter 15 (3) NNN
16 pages
Educ 202 Report Basic Statistics
No ratings yet
Educ 202 Report Basic Statistics
34 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
40 pages
Unit-V Basic Statistics and Probability: Presentation - Three Forms - Histogram, Bar Chart, Frequency Polygon
No ratings yet
Unit-V Basic Statistics and Probability: Presentation - Three Forms - Histogram, Bar Chart, Frequency Polygon
6 pages
Statistics
No ratings yet
Statistics
118 pages
Statistics
100% (1)
Statistics
72 pages
Yr10 Chapter 22U Statistics 2023
No ratings yet
Yr10 Chapter 22U Statistics 2023
12 pages
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
No ratings yet
Elementary Statistics: Davis Lazarus Assistant Professor ISIM, The IIS University
73 pages
7.1 - Describibing Data & Sample Inforntation - Lecture 1
No ratings yet
7.1 - Describibing Data & Sample Inforntation - Lecture 1
37 pages
AGB Unit
No ratings yet
AGB Unit
63 pages
Assessment Learning 2. M4
No ratings yet
Assessment Learning 2. M4
10 pages
Q2 Business Mathematics 12 - Week 7.2-1
No ratings yet
Q2 Business Mathematics 12 - Week 7.2-1
8 pages
Module 4 - Data Management
No ratings yet
Module 4 - Data Management
38 pages
Sma11
No ratings yet
Sma11
27 pages
Chapter One Illustration
No ratings yet
Chapter One Illustration
9 pages
Faraz Statistics
No ratings yet
Faraz Statistics
15 pages
Chapter 4 Data Management
No ratings yet
Chapter 4 Data Management
29 pages
3.0 STA 192 Measure of Location and Partition 2
No ratings yet
3.0 STA 192 Measure of Location and Partition 2
11 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
Lecture Three - Measures of Location and Partition
No ratings yet
Lecture Three - Measures of Location and Partition
11 pages
Chapter 4 Describing Educational Data Libmanan Group
No ratings yet
Chapter 4 Describing Educational Data Libmanan Group
31 pages
Statistics
No ratings yet
Statistics
27 pages
Mean Median Mode Population Sample
No ratings yet
Mean Median Mode Population Sample
11 pages
Revised Lectures 2,3 and 4
No ratings yet
Revised Lectures 2,3 and 4
13 pages
Grouped and Ungrouped Data
No ratings yet
Grouped and Ungrouped Data
18 pages
Descriptive Statistics: Definition 10.2.1
No ratings yet
Descriptive Statistics: Definition 10.2.1
16 pages
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet
Module 4 - TQM
No ratings yet
Module 4 - TQM
5 pages
Kaplan and Duchon (1988)
No ratings yet
Kaplan and Duchon (1988)
17 pages
Estimation of Parameters
No ratings yet
Estimation of Parameters
49 pages
The Influence of Tax Revenue On Government Capital Expenditure and Economic Growth in Nigeria
No ratings yet
The Influence of Tax Revenue On Government Capital Expenditure and Economic Growth in Nigeria
11 pages
BUSINESS STATISTICS Notes UNIT 2
No ratings yet
BUSINESS STATISTICS Notes UNIT 2
6 pages
Beamer 1
0% (1)
Beamer 1
12 pages
ADM G11 Q4 Week 1 Week 4 Set A Updated
No ratings yet
ADM G11 Q4 Week 1 Week 4 Set A Updated
31 pages
Key Reversal Myth: Alexei Bocharov
No ratings yet
Key Reversal Myth: Alexei Bocharov
4 pages
Measurement and Scaling
No ratings yet
Measurement and Scaling
11 pages
Project Work Assignment
No ratings yet
Project Work Assignment
8 pages
Real Skills That Deliver: Data Science Real Outcomes!
No ratings yet
Real Skills That Deliver: Data Science Real Outcomes!
20 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
PR 1 Module 1
No ratings yet
PR 1 Module 1
26 pages
Data Presentation and Analysis: No. Student No. Score No Student No. Score
No ratings yet
Data Presentation and Analysis: No. Student No. Score No Student No. Score
11 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
28 pages
Fit Indices Commonly Reported For CFA and SEM
No ratings yet
Fit Indices Commonly Reported For CFA and SEM
2 pages
Hansen (1999) (Testing For Linearity) 06
No ratings yet
Hansen (1999) (Testing For Linearity) 06
26 pages
Process Validation
No ratings yet
Process Validation
883 pages
Bjsports 2018 March 52 6 387 Inline Supplementary Material 3
No ratings yet
Bjsports 2018 March 52 6 387 Inline Supplementary Material 3
97 pages
Reliability: Case Processing Summary
No ratings yet
Reliability: Case Processing Summary
2 pages
High-intensity efforts in elite soccer matches and -- Ade, Jack; Fitzpatrick, John; Bradley, Paul S. -- Journal of Sports Sciences, #24, 34, pages -- 10.1080_02640414.2016.1217343 -- 29fdbae4fd906580075080fc
No ratings yet
High-intensity efforts in elite soccer matches and -- Ade, Jack; Fitzpatrick, John; Bradley, Paul S. -- Journal of Sports Sciences, #24, 34, pages -- 10.1080_02640414.2016.1217343 -- 29fdbae4fd906580075080fc
11 pages
Predict and Co
No ratings yet
Predict and Co
6 pages
Allergy and Intolerance Regarding IgG4 Immunoglobulin
No ratings yet
Allergy and Intolerance Regarding IgG4 Immunoglobulin
23 pages
The Pact Sharon Bolton pdf download
No ratings yet
The Pact Sharon Bolton pdf download
102 pages
AI in Sport Gambling
No ratings yet
AI in Sport Gambling
6 pages
(Ebook PDF) Statistics For Political Analysis: Understanding The Numbers Revised Edition Full Chapter Instant Download
No ratings yet
(Ebook PDF) Statistics For Political Analysis: Understanding The Numbers Revised Edition Full Chapter Instant Download
44 pages
Quality Control in Fruit and Vegetable Processing
No ratings yet
Quality Control in Fruit and Vegetable Processing
72 pages
Chartered Market Technician (CMT) Program - Level III: Exam Time Length: 4 Hours, 15 Minutes Exam Format: Short Answer
No ratings yet
Chartered Market Technician (CMT) Program - Level III: Exam Time Length: 4 Hours, 15 Minutes Exam Format: Short Answer
6 pages
Design of Experiments Mcqs
No ratings yet
Design of Experiments Mcqs
24 pages
Pearson Ebooks
No ratings yet
Pearson Ebooks
3 pages