0% found this document useful (0 votes)
29 views80 pages

Unit 1 - Examining Distributions

Uploaded by

wendee1911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views80 pages

Unit 1 - Examining Distributions

Uploaded by

wendee1911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Basic Statistical

Analysis
Unit 1: Examining Distributions

Instructor: Farhan Islam


Statistics and Types of Statistics
 Statistics is the science of collecting, analyzing,
presenting, and interpreting data, as well as of making
decisions based on such analyses.
 Data consists of information coming from observations,
counts, measurements, or responses.
 Elements are the entities on which data are collected.
 A variable is a characteristic of interest for the elements.
 The set of measurements obtained for a particular element
is called an observation.
 A population is the collection of all outcomes, responses,
measurement, or counts that are of interest.
 A sample is a subset of units in a population that we
examine in order to gather information about the
population.
Statistics and Types of Statistics
Observation Variables

Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10
EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40
Psychemedics N 17.60 0.13

Data Set
Population VS Sample
How Many Variables Have You
Measured?
 Univariate data: one variable is measured
on a single experimental unit

 Bivariate data: two variables are measured


on a single experimental unit

 Multivariate data: more than two variables


are measured on a single experimental unit
Descriptive and inferential
statistics
 Descriptive statistics summarize information already present
in data
 Visualizations like boxplots, histograms, etc.
 Summary measures like averages, standard deviation, median, etc.
 Inferential statistics use a sample of data to make
predictions about larger populations or about
unobserved/future trends
 Any measurements made in the presence of noise or variation
 Generalizations from a sample to a population
 Confidence intervals, hypothesis tests, etc.
 Comparisons made between datasets
 Comparisons, correlations, regress, etc.
 The average grade of STAT 1000 is C+, the average grade of SCM
1000 is B+
 Should I take STAT 1000 instead of SCM 1000 for better grades?
 What else statistics might you need to make this decision?
Scales of Measurement
Scales
Scales of
of measurement
measurement include:
include: Nominal

Ordinal

The
The scale
scale determines
determines thethe amount
amount of
of information
information
contained
contained in
in the
the data.
data.

The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.
Measurement Scale of Qualitative

Nominal Ordinal
level level:

classifies data into mutually classifies data into categories


exclusive , exhausting can be ranked .
categories in which no order or
ranking can be imposed on the For example:
data. Grade of course (A,B,C) ,
For example: Size( S,M,L)
Eye color ,Gender , Rating scale
Political party , blood types …etc (Poor ,Good ,Excellent )
Ranking of tennis players …
etc
Types of Data

Copyright © 2019 by Nelson Education 1-9


Ltd.
Types of Variables
Categorical Variables (Qualitative Variables)
 Categorical variables are variables that have distinct
categories , according to some characteristic or attribute.

Quantitative Variables (Discrete and Continuous)


 Quantitative variables are variables which have values that are
a count or are obtained by measurement, and it makes sense to
perform arithmetic calculations, such as taking the average, on
those values.
 Discrete, if measuring how many. E.g., number of 6-packs
consumed at tail-gate party
 Continuous, if measuring how much. E.g., pounds of hamburger
consumed at tail-gate party
 Quantitative data are always numeric.
Examples of Quantitative
Variables (Choice: either
Continuous Or Discrete)
 Measurement in cm of all the men in this classroom.–
Continuous
 The distance from your home to the nearest grocery store.–
Continuous
 Square footage of your house.– Continuous
 How many packs of cigarettes do you smoke a day? –
Discrete
 The number of phone calls you receive for each day of the
week.– Discrete
 Number of children in a family.– Discrete
 Number of rainy days in a month.– Discrete
 The numbers of books in the backpacks.– Discrete
 The amount of time a student spent on studying for an
exam– Continuous
Examples of Categorical
(Qualitative) Variables (Choice:
either Nominal or Ordinal)
 Gender (male or female) – Nominal
 Marital Status (single, married, widowed, divorced) –
Nominal
 Service rating of a restaurant (scale of one to five) – Ordinal
 How many packs of cigarettes do you smoke a day? (0-1),
(2-5), (5-7) – Ordinal
 License plates – Nominal
 Hair Colour (black, dark brown, light brown, blonde, gray
and red)–Nominal
 Size of French fries ordered at McDonald (small, medium,
large.)– Ordinal
 Birth month for people born in 1985 – Ordinal
 The arrival status of an airplane flight (early, on time, late,
canceled) at an airport–Nominal
Displaying Distributions
 The distribution of a variables tells us what values
it takes and how often it takes on these values.
 Categorical Variables
We can use:
 bar charts.
 pie charts.

 Quantitative Variables
We can use:
 histograms.
 Time Series chart.
Graphing Qualitative Variables
Use a data distribution to describe:
What values of the variable have been measured
How often each value has occurred:
 Frequency
 Relative
frequency = Frequency/n
(where n = sample size)
 Percent = 100 × Relative frequency

Copyright © 2019 by Nelson Education Ltd.


Graphs for Categorical Data
Example: in a survey concerning public education, 400
school administrators were asked to rate the quality of
education in Canada

Copyright © 2019 by Nelson Education Ltd.


Graph Types: Bar Chart
 Bar Graph - Variable values on one axis and
frequency (count) on the other axis.

Copyright © 2019 by Nelson Education Ltd.


Pie Chart
 Pie Chart gives us a visual representation of the relative
frequency of the observed values for a categorical variable.
Angle = Relative Frequency×360°

Copyright © 2019 by Nelson Education Ltd.


Graphing Time Series
Example
Time series: a single quantitative variable measured over time; can
be graphed using a line or bar chart
 Put time on the horizontal scale (X-axis) and the variable we are
measuring on the vertical scale (Y-axis). Connecting the data points by
lines helps emphasize any change over time.
 Canadian population growth projections for age group 65–69

Copyright © 2019 by Nelson Education Ltd.


Histogram

A histogram is a graph in which


classes are marked on the
horizontal axis and the
frequencies, relative frequencies,
or percentages are marked on the
vertical axis.

The frequencies, relative


frequencies, or percentages are
represented by the heights of the
bars.

In a histogram, the bars are drawn


adjacent to each other.
Frequency Distribution
Example: Frequency Distribution of STAT 1000 Exam Marks
Frequency Distribution and Relative
Frequency Distribution

Let X = Exam Marks

What is the probability (chance) that a random student picked scored at least 80?

𝑷 ( 𝑿 ≥ 𝟖𝟎 )=𝟎 . 𝟏𝟖+ 𝟎 .𝟎𝟔=𝟎. 𝟐𝟒


What is the probability (chance) that a random student’s exam mark is between 50
and 70 (exclusive)?
𝑷 ( 𝟓𝟎 ≤ 𝑿 <𝟕𝟎 ) =𝟎 .𝟏𝟔 +𝟎 . 𝟑𝟎=𝟎 . 𝟒𝟔
Grades on a statistics exam
Data: 75 66 77 66 64 73 91 65 59 86 61 86 61 58 70 77 80 58
94 78 62 79 83 54 52 45 82 48 67 55
Class Limits Frequency

40 up to 50 2
50 up to 60 6
60 up to 70 8
70 up to 80 7
80 up to 90 5
90 up to 100 2
Total 30
Relative Frequency Distribution
of Grades
Class Limits Relative Frequency
40 up to 50 2/30 = .067
50 up to 60 6/30 = .200
60 up to 70 8/30 = .267
70 up to 80 7/30 = .233
80 up to 90 5/30 = .167
90 up to 100 2/30 = .067
Relative Frequency Histogram of
Relative frequency
Grades

.30
.25
.20
.15
.10
.05
0
40 50 60 70 80 90 100
Grade
Figure - Histograms (SHAPES)
Symmetric = Bell Shaped

Skewed to the Right/ Positively skewed Skewed to the Left/Negatively Skewed


Figure - Histograms (SHAPES)
Measures of Center for Ungrouped Data

 Measure of Center: a measure along the horizontal


axis of the data distribution that locates the center of
the distribution
Characteristics of the
Mean
For ungrouped data, the population mean is
the sum of all the population values divided by
the total number of population values:
𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒍𝒖𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 ∑ 𝒙 𝒊 𝒙 𝟏 + 𝒙 𝟐+ 𝒙 𝟑 +… + 𝒙 𝑵
𝝁= = =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 𝑵 𝑵

POPULATION MEAN value is affected by extremely large or


extremely small values (NOT ROBUST to OUTLIERS)
EXAMPLE – Population Mean

30
Sample Mean
 For ungrouped data, the sample mean is the sum of all the
sample values divided by the number of sample values:
 Let X be a random variable

𝑿=
∑ 𝒙𝒊
=
𝒙 𝟏 + 𝒙 𝟐 + 𝒙 𝟑 +…+ 𝒙 𝒏
𝒏 𝒏

SAMPLE MEAN value is affected by extremely large or


extremely small values (NOT ROBUST to OUTLIERS)
Sample Mean - EXAMPLE

𝑋=
∑ 𝑋
=
90+ 77+94 + …+113 +83
= 97 . 5
𝑛 12 32
Example 4
A randomly selected sample of eight newborns were selected
and their lengths (in inches) were as follows:
20.4 18.5 16.3 17.9 19.2 21.2 17.3 ???
It is known that the sample mean of all these newborns equal
18.825 inches.
What is the 8th baby’s length?
Solution:

== 18.825
= 18.825(8) = 150.6
= (20.4+18.5+16.3+17.9+19.2+21.2+17.3) = 130.8
Subtract the sums
150.6−130.8 = 19.8 inches
is the 8th baby's length.
Median
 Median: the middle measurement when the
measurements are ranked from smallest to largest
 The position of the median is

0.5(n + 1)

once the measurements have been ordered

MEDIAN value is NOT affected by extremely large or


extremely small values (ROBUST to OUTLIERS)
Example 5
The following table lists the 2014 compensations of female CEOs of 11
American companies (USA TODAY, May 1, 2015). (The compensation of
Carol Meyrowitz of TJX is for the fiscal year ending in January 2015.).
Find the median for these data.
Table 2 Compensations of 11 Female CEOs
Example 5: Solution
To calculate the median, we perform the following two steps.
Step 1: We rank the given data in increasing order as follows:
16.2 16.9 19.3 19.3 19.6 21.0 22.2 22.5 28.7 33.7 42.1

Step 2: There are 11 data values. The sixth value divides


these 11 values in two equal parts. Hence, the sixth value
gives the median as shown below.

Thus, the median of 2014 compensations for these 11 female


CEOs is $21.0 million.
Example 6
 The following data give the cell phone minutes used last month by 12 randomly
selected persons.
 230 2053 160 397 510 380 263 3864 184 201 326 721
 Find the median for these data.
 Solution:
To calculate the median, we perform the following two steps.
Step 1: We rank the given data in increasing order as follows:
160 184 201 230 263 326 380 397 510 721 2053 3864
Step 2: The value that divides 12 data values in two equal
parts falls between the sixth and the seventh values. Thus, the median will be
given by the average of the sixth and the seventh values as follows.

326  380
Median average of two middle values  353 minutes
2
Mode

The value or values that occur


most frequently in a data set. If all
values occur with the same
frequency, the data set is said to
have no mode.
 MODE is not affected by
outliers
 It can be calculated for both
kinds of data–quantitative and
qualitative

Example
The status of five students who are members of the student senate at a college
are senior, sophomore, senior, junior, and senior, respectively. Find the mode.

Solution:
Because senior occurs more frequently than the other categories, it is the mode
for this data set. We cannot calculate the mean and median for this data set.
Example
The number of liters of milk purchased by 25 households:
0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
3 3 3 3 4 4 4 5
 Mean?

 Median?

 Mode? (Highest peak)


mode 2
Copyright © 2019 by Nelson Education Ltd.
Weighted Mean
 When different values of a data set occur with different frequencies,
that is, each value of a data set is assigned different weight, then
we calculate the weighted mean to find the center of the given
data set.

𝑾𝒆𝒊𝒈𝒉𝒕𝒆𝒅 𝑴𝒆𝒂𝒏=𝒙 𝒘 =
∑ 𝒘 𝒊 𝒙𝒊 = 𝒘 𝟏 𝒙𝟏 +𝒘 𝟐 𝒙 𝟐+ …+𝒘 𝒏 𝒙 𝒏
∑𝒘𝒊 𝒘 𝟏+ 𝒘 𝟐+ …+ 𝒘 𝒏
Weighted Mean (Example)
Example

Suppose your midterm test score is 83 and your final exam


score is 95. Using weights of 40% for the midterm and 60% for
the final exam, compute the weighted average of your scores.

If the minimum average for an A is 90, will you earn an A?

Solution

83 ( 0 . 40 ) +95 ( 0 . 6 0 )
𝑊𝑒𝑖𝑔 h 𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒= =90 .2
0 . 4 +0 . 6

Your average is high enough to earn an A.


Example 7
Example 8
Extreme Values (OUTLIERS)
 The mean is more easily affected by extremely
large or small values than the median

 If a distribution is skewed to the right, the mean


shifts to the right; if a distribution is skewed to
the left, the mean shifts to the left

 When a distribution is symmetric, the mean and


the median are equal

 The median is often used as a measure of center


when the distribution is skewed
The Relative Positions of the Mean, Median
and the Mode

Symmetric: Mean = Median

If approximate, it is called approximately symmetric distribution


45
The Relative Positions of the Mean, Median
and the Mode

Skewed right: Mean > Median

46
The Relative Positions of the Mean, Median
and the Mode

Skewed left: Mean < Median

47
Measures of Variability
 Measure of variability: a measure along
the horizontal axis of the data distribution
that describes the spread of the
distribution from the center

Copyright © 2019 by Nelson Education Ltd.


The Range
 Range (R): the difference between the largest and
smallest measurements in a set
Finding the Range for Ungrouped Data

 Example: a botanist records the number of petals on five


flowers: 5, 12, 6, 8, 14
 The range is R = 14 – 5 = 9

RANGE value is affected by extremely large or extremely


small values (NOT ROBUST to OUTLIERS)
The Variance
 Variance: a measure of variability that uses all the
measurements; it measures the average deviation of
the measurements about their mean
 Example: a botanist records the number of petals on five
flowers: 5, 12, 6, 8, 14

4 6 8 10 12 14
Copyright © 2019 by Nelson Education Ltd.
The Variance
 The variance of a population of N
measurements is the average of the
squared deviations of the measurements
about their mean μ

POPULATION VARIANCE value is affected by extremely large


or extremely small values (NOT ROBUST to OUTLIERS)
Copyright © 2019 by Nelson Education Ltd.
The Variance
 The variance of a sample of n measurements is
the sum of the squared deviations of the
measurements about their mean, divided by (n – 1)

Why divide by n – 1?
The sample standard deviation s is often used to estimate the
population standard deviation σ
Dividing by n – 1 gives us a better estimate of σ

SAMPLE VARIANCE value is affected by extremely large or


extremely small values (NOT ROBUST to OUTLIERS)
Copyright © 2019 by Nelson Education Ltd.
The Standard Deviation
 In calculating the variance, we squared all of the deviations,
and in doing so changed the scale of the measurements
 To return this measure of variability to the original units of
measure, we calculate the standard deviation, the positive
square root of the variance

POPULATION and SAMPLE STANDARD DEVIATION value is


affected by extremely large or extremely small values (NOT
ROBUST to OUTLIERS)
Variance and Standard Deviation
Example: The number of traffic citations issued during the
last five months in Beaufort County, South Carolina, is
38, 26, 13, 41, and 22. What is the population variance?

54
EXAMPLE – Sample
Variance
The hourly wages for a sample of part-time
employees at Home Depot are: $12, $20, $16, $18,
and $19. What is the sample variance?
(Sample Mean is calculated and is $17)

55
Example 9 – Variance and Standard deviation
Consider a small sample dataset: 4, 16, 9, 7, 0, 1, 10, 8
Find the standard deviation of the above set.
Solution i ‫ݔ‬ ሺ‫ ݔ‬െ‫ݔ‬ҧ
ሻ ‫ ݔ‬െ‫ݔ‬ҧଶ
 Step 1: Find the sample mean: 1 4 -2.875 8.265625
2 16 9.125 83.265625
3 9 2.125 4.515625
4 7 0.125 0.015625
5 0 -6.875 47.265625
6 1 -5.875 34.515625
 Step 2: Find the standard deviation 7 10 3.125 9.765625

√ √
8 8 1.125 1.265625

𝑠=
∑ =
2
( 𝑥𝑖 −𝑥 ) 188.875
= √ 26.982143=5.194434
Total (Sum) 55 0 188.875

𝑛−1 8−1 ∑ 𝑥 𝑖 ∑ ( 𝑥 −𝑥 ) ∑ ( 𝑥 −𝑥 )
𝑖 𝑖
2
Measures of Position
 Quartiles and Interquartile Range
 Percentiles and Percentile Rank
Definition
Quartiles are three summary measures that divide a ranked data
set into four equal parts.
The second quartile is the same as the median of a data set.
The first quartile is the value of the middle term among the
observations that are less than the median, and the third quartile
is the value of the middle term among the observations that are
greater than the median.
Interquartile Range (IQR)
 Interquartile Range (IQR) is a measure of variability which is
the difference between the third and the first quartiles
 IQR = Interquartile range = Q – Q
3 1

IQR value is NOT affected by extremely large or extremely small


values (ROBUST to OUTLIERS)

Example 12:
A sample of 12 commuter students was selected from a college.
The following data give the typical one-way commuting times (in
minutes) from home to college for these 12 students.

29 14 39 17 7 47 63 37 42 18 24 55

(a) Find the values of the three quartiles.


(b) Where does the commuting time of 47 fall in relation to the
three quartiles?
(c) Find the interquartile range.
Example 12: Solution
(a) We perform the following steps to find the three quartiles.

Step 1. First, we rank the given data in increasing order as follows:

7 14 17 18 24 29 37 39 42 47 55 63

Median = = 33
Step 2. We find the median(second quartile):

Step 3. We find the median of the data values that are smaller than , and this gives
the value of the first quartile.

The values that are smaller than are:

7 14 17 18 24 29

Thus, the first quartile is:


Example 12: Solution
Step 4. We find the median of the data values that are larger than , and this gives the value of
the third quartile.

The values that are larger than are:

37 39 42 47 55 63

Thus, the third quartile is:

(b) By looking at the position of 47 minutes, we can state that this value lies in
the top 25% of the commuting times.

(c) The interquartile range is given by the difference between the values of the
third and first quartiles. Thus

IQR = = 27 minutes

Interpretation: The range of the middle half of commuting times in the sample is 27 minutes.
Identifying OUTLIERS
In addition to serving as a measure of spread, the interquartile range (IQR) is
used as part of a rule of thumb for identifying outliers.
The 1.5 x IQR Rule for Outliers
Any value below is considered a low outlier, and any value above is considered a
high outlier.

Example: Use the data below to calculate the mean and median of the commuting times (in
minutes) of 20 randomly selected New York workers.
10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

Solutions: In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and
IQR=27.5 minutes.
For these data, 1.5 x IQR = 1.5(27.5) = 41.25
Q1 - 1.5 x IQR = 15 – 41.25 = -26.25
Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75
Any travel time shorter than -26.25 minutes or longer than 83.75 minutes is considered an
outlier. 85 is an outlier.
FIVE NUMBER SUMMARY

The minimum and maximum values alone tell us little about the
distribution as a whole. Likewise, the median and quartiles tell us little
about the tails of a distribution.

To get a quick summary of both center and spread, combine all five
numbers.

The five-number summary of a distribution consists of the smallest


observation, the first quartile, the median, the third quartile, and the
largest observation, written in order from smallest to largest.
Minimum Q1 Median (Q2 ) Q3 Maximum
Example 13: Five Number Summary
Consider this sample dataset of nine observations:

6, 8, 1, 5, 7, 4, 4, 9, 2

Find the five-number summary.


Solution:
Step 1: Order the data: 1, 2, 4, 4, 5, 6, 7, 8, 9
Step 2: Minimum is 1 and maximum is 9
Step 3: The median is in the 5th position and is therefore 5.
Step 4: Concentrating of the four numbers to the left of the
median, we have a first quartile of 3 and the four numbers to
the right yield a third quartile of 7.5.
Five number summary is:
1 3 5 7.5 9
Percentiles and Percentile Rank
 A percentile is the score at which a specified percentage of scores in a distribution fall
below
 To say a score 53 is in the 75th percentile is to say that 75% of all scores are less
than 53
 The percentile rank of a score indicates the percentage of scores in the distribution
that fall at or below that score.
 Thus, for example, to say that the percentile rank of 53 is 75, is to say that 75% of
the scores on the exam are less than 53.

Example:
Sadie, a student in a large class with 250 students, receives a score of 80% on her
exam. Sadie learns that her score is at the 70th percentile in the class. How many
students score more than Sadie

Solution: If Sadie is in the 70th percentile, then 30 % of the students in the class
did as good or better than Sadie.
Therefore, (0.30*250) 75 students scored higher than Sadie.
BOX and WHISKER PLOT
 The center of the boxplot shows us the middle half of
the data between the quartiles.
 The height of the box is equal to the IQR.
 If the median is roughly centered between the
quartiles, then the middle half of the data is roughly
symmetric. Thus, if the median is not centered, the
distribution is skewed.
 The whiskers also show the skewness if they are not the
same length.
 Outliers are out of the way to keep you from judging
skewness but give them special attention.
Interpreting Box Plots
 Median line in centre of box and whiskers of
equal length: symmetric distribution

 Median line left of centre and long right whisker:


skewed right

 Median line right of centre and long left whisker:


skewed left

Copyright © 2019 by Nelson Education Ltd.


BOX and WHISKER PLOT
Boxplot - Example
Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and ends at Q3 (22
minutes). Inside the box we place a vertical line to represent the
median (18 minutes).
Step 3: Extend horizontal lines from the box out to the minimum
value (13 minutes) and the maximum value (30 minutes).

The Shape of the distribution is Right skewed (positively skewed)


Box and Whisker Modified / Outlier
Box Plot
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Find the Quartiles and IQR
Step 3: Determine Lower Fence (LF) and (UF)
Step 4: Draw a box that starts at Q1 and ends at Q3. Inside the
box we place a vertical line to represent the median.
Step 5: Extend horizontal lines from the box out to the closest
observation inside LF and to the closest observation inside UF
Step 6: Outliers are marked with a special symbol such as an
asterisk/dot/square box (*).
Box and Whisker Outlier Box Plot
(Example)
The following data are the incomes (in thousands of dollars)
for a sample of 12 households.

75 69 84 112 74 104 81 90 94 144 79 98

Construct an outlier box-and-whisker plot for these data.

Solution:

Step 1 and 2. First, rank the data in increasing order and calculate the
values of the median, the first quartile, the third quartile, and the
interquartile range. The ranked data are

69 74 75 79 81 84 90 94 98 104 112 144


Box and Whisker Outlier Box Plot
(Example)
 69 74 75 79 81 84 90 94 98 104 112 144

Median (Q2) = (84 + 90) / 2 = 87


 Q1 = (75 + 79) / 2 = 77
 Q3 = (98 + 104) / 2 = 101
 IQR = Q – Q = 101 – 77 = 24
3 1

Step 3. Find the points that are 1.5 x IQR below Q1 and 1.5 x
IQR above Q3.

 1.5 x IQR = 1.5 x 24 = 36

 Lower fence (LF) = Q1 – 36 = 77 – 36 = 41


 Upper fence (UF)= Q3 + 36 = 101 + 36 = 137
Box and Whisker Outlier Box Plot
(Example)
Step 4. Draw a horizontal line and mark the income levels
on it such that all the values in the given data set are
covered. The result of this step is shown in the figure below.

𝑸𝟏 𝑸𝟐 𝑸𝟑
Box and Whisker Outlier Box Plot
(Example)
 Step 5. By drawing two lines, join the points of the smallest
(69) and the largest values (112) within the two inner
fences to the box.
 These values are 69 and 112 in this example.
 This completes the box-and-whisker plot, as shown in Figure
below.
𝑸𝟏 𝑸𝟐 𝑸𝟑 112 145 (outlier)
69

 Interpretation:
 Right Skewed with one outlier. The outlier is 145.
Example
Amount of sodium in 8 brands of cheese:
260 290 300 320 330 340 340 520

Q1 = 295 Q2 = 325 Q3 = 340

Q1 Q3
Copyright © 2019 by Nelson Education 2-75
Ltd.
Example (cont’d)
 IQR = 340 – 295 = 45
 Lower fence (LF): 295 – 1.5(45) = 295 – 67.5 = 227.5
 Upper fence (UF): 340 + 1.5(45) = 340 + 67.5 = 407.5
 Outlier: x = 520

Draw “whiskers” connecting the largest (that is 340) and smallest observation
values(that is 290) that are NOT outliers to the box
Example 15: Side by Side Boxplot (Male vs
Female Weight Distribution)
Example 16
The sport of boxing divides its athletes into different weight classes in
order to make the competition fairer. The side-by-side basic (quantile)
boxplots shown below display the weights (in pounds) of a random
sample of 16 Cruiserweight boxers and 17 Heavyweight boxers.

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14
Example 16
Which of the following statements is/are true?

(I) The distribution of weights for the Heavyweights is skewed to the left.
(II) There are 12 Cruiserweights in the sample who weigh at least 220 pounds.
(III) The mean weight for the Heavyweights is likely greater than the median
weight.

(A) I only
(B) III only
(C) I and II only
(D) II and III only
(E) I, II and III C

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14
Example 16
What is the median weight of all of the boxers in the sample (Cruiserweight and
Heavyweight) combined?

(A) 260
(B) 262
(C) 265
(D) 268
(E) 270
E

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14

You might also like