0% found this document useful (0 votes)

32 views80 pages

Unit 1 - Examining Distributions

Uploaded by

wendee1911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views80 pages

Unit 1 - Examining Distributions

Uploaded by

wendee1911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 80

Basic Statistical

Analysis
Unit 1: Examining Distributions

Instructor: Farhan Islam

Statistics and Types of Statistics
 Statistics is the science of collecting, analyzing,
presenting, and interpreting data, as well as of making
decisions based on such analyses.
 Data consists of information coming from observations,
counts, measurements, or responses.
 Elements are the entities on which data are collected.
 A variable is a characteristic of interest for the elements.
 The set of measurements obtained for a particular element
is called an observation.
 A population is the collection of all outcomes, responses,
measurement, or counts that are of interest.
 A sample is a subset of units in a population that we
examine in order to gather information about the
population.
Statistics and Types of Statistics
Observation Variables

Element
Names Stock Annual Earn/
Company Exchange Sales($M) Share($)

Dataram NQ 73.10
EnergySouth N 74.00 1.67
Keystone N 365.70 0.86
LandCare NQ 111.40
Psychemedics N 17.60 0.13

Data Set
Population VS Sample
How Many Variables Have You
Measured?
 Univariate data: one variable is measured
on a single experimental unit

 Bivariate data: two variables are measured

on a single experimental unit

 Multivariate data: more than two variables

are measured on a single experimental unit
Descriptive and inferential
statistics
 Descriptive statistics summarize information already present
in data
 Visualizations like boxplots, histograms, etc.
 Summary measures like averages, standard deviation, median, etc.
 Inferential statistics use a sample of data to make
predictions about larger populations or about
unobserved/future trends
 Any measurements made in the presence of noise or variation
 Generalizations from a sample to a population
 Confidence intervals, hypothesis tests, etc.
 Comparisons made between datasets
 Comparisons, correlations, regress, etc.
 The average grade of STAT 1000 is C+, the average grade of SCM
1000 is B+
 Should I take STAT 1000 instead of SCM 1000 for better grades?
 What else statistics might you need to make this decision?
Scales of Measurement
Scales
Scales of
of measurement
measurement include:
include: Nominal

Ordinal

The
The scale
scale determines
determines thethe amount
amount of
of information
information
contained
contained in
in the
the data.
data.

The
The scale
scale indicates
indicates the
the data
data summarization
summarization and
and
statistical
statistical analyses
analyses that
that are
are most
most appropriate.
appropriate.
Measurement Scale of Qualitative

Nominal Ordinal
level level:

classifies data into mutually classifies data into categories

exclusive , exhausting can be ranked .
categories in which no order or
ranking can be imposed on the For example:
data. Grade of course (A,B,C) ,
For example: Size( S,M,L)
Eye color ,Gender , Rating scale
Political party , blood types …etc (Poor ,Good ,Excellent )
Ranking of tennis players …
etc
Types of Data

Copyright © 2019 by Nelson Education 1-9

Ltd.
Types of Variables
Categorical Variables (Qualitative Variables)
 Categorical variables are variables that have distinct
categories , according to some characteristic or attribute.

Quantitative Variables (Discrete and Continuous)

 Quantitative variables are variables which have values that are
a count or are obtained by measurement, and it makes sense to
perform arithmetic calculations, such as taking the average, on
those values.
 Discrete, if measuring how many. E.g., number of 6-packs
consumed at tail-gate party
 Continuous, if measuring how much. E.g., pounds of hamburger
consumed at tail-gate party
 Quantitative data are always numeric.
Examples of Quantitative
Variables (Choice: either
Continuous Or Discrete)
 Measurement in cm of all the men in this classroom.–
Continuous
 The distance from your home to the nearest grocery store.–
Continuous
 Square footage of your house.– Continuous
 How many packs of cigarettes do you smoke a day? –
Discrete
 The number of phone calls you receive for each day of the
week.– Discrete
 Number of children in a family.– Discrete
 Number of rainy days in a month.– Discrete
 The numbers of books in the backpacks.– Discrete
 The amount of time a student spent on studying for an
exam– Continuous
Examples of Categorical
(Qualitative) Variables (Choice:
either Nominal or Ordinal)
 Gender (male or female) – Nominal
 Marital Status (single, married, widowed, divorced) –
Nominal
 Service rating of a restaurant (scale of one to ﬁve) – Ordinal
 How many packs of cigarettes do you smoke a day? (0-1),
(2-5), (5-7) – Ordinal
 License plates – Nominal
 Hair Colour (black, dark brown, light brown, blonde, gray
and red)–Nominal
 Size of French fries ordered at McDonald (small, medium,
large.)– Ordinal
 Birth month for people born in 1985 – Ordinal
 The arrival status of an airplane flight (early, on time, late,
canceled) at an airport–Nominal
Displaying Distributions
 The distribution of a variables tells us what values
it takes and how often it takes on these values.
 Categorical Variables
We can use:
 bar charts.
 pie charts.

 Quantitative Variables
We can use:
 histograms.
 Time Series chart.
Graphing Qualitative Variables
Use a data distribution to describe:
What values of the variable have been measured
How often each value has occurred:
 Frequency
 Relative
frequency = Frequency/n
(where n = sample size)
 Percent = 100 × Relative frequency

Copyright © 2019 by Nelson Education Ltd.

Graphs for Categorical Data
Example: in a survey concerning public education, 400
school administrators were asked to rate the quality of
education in Canada

Copyright © 2019 by Nelson Education Ltd.

Graph Types: Bar Chart
 Bar Graph - Variable values on one axis and
frequency (count) on the other axis.

Copyright © 2019 by Nelson Education Ltd.

Pie Chart
 Pie Chart gives us a visual representation of the relative
frequency of the observed values for a categorical variable.
Angle = Relative Frequency×360°

Copyright © 2019 by Nelson Education Ltd.

Graphing Time Series
Example
Time series: a single quantitative variable measured over time; can
be graphed using a line or bar chart
 Put time on the horizontal scale (X-axis) and the variable we are
measuring on the vertical scale (Y-axis). Connecting the data points by
lines helps emphasize any change over time.
 Canadian population growth projections for age group 65–69

Copyright © 2019 by Nelson Education Ltd.

Histogram

A histogram is a graph in which

classes are marked on the
horizontal axis and the
frequencies, relative frequencies,
or percentages are marked on the
vertical axis.

The frequencies, relative

frequencies, or percentages are
represented by the heights of the
bars.

In a histogram, the bars are drawn

adjacent to each other.
Frequency Distribution
Example: Frequency Distribution of STAT 1000 Exam Marks
Frequency Distribution and Relative
Frequency Distribution

Let X = Exam Marks

What is the probability (chance) that a random student picked scored at least 80?

𝑷 ( 𝑿 ≥ 𝟖𝟎 )=𝟎 . 𝟏𝟖+ 𝟎 .𝟎𝟔=𝟎. 𝟐𝟒

What is the probability (chance) that a random student’s exam mark is between 50
and 70 (exclusive)?
𝑷 ( 𝟓𝟎 ≤ 𝑿 <𝟕𝟎 ) =𝟎 .𝟏𝟔 +𝟎 . 𝟑𝟎=𝟎 . 𝟒𝟔
Grades on a statistics exam
Data: 75 66 77 66 64 73 91 65 59 86 61 86 61 58 70 77 80 58
94 78 62 79 83 54 52 45 82 48 67 55
Class Limits Frequency

40 up to 50 2
50 up to 60 6
60 up to 70 8
70 up to 80 7
80 up to 90 5
90 up to 100 2
Total 30
Relative Frequency Distribution
of Grades
Class Limits Relative Frequency
40 up to 50 2/30 = .067
50 up to 60 6/30 = .200
60 up to 70 8/30 = .267
70 up to 80 7/30 = .233
80 up to 90 5/30 = .167
90 up to 100 2/30 = .067
Relative Frequency Histogram of
Relative frequency
Grades

.30
.25
.20
.15
.10
.05
0
40 50 60 70 80 90 100
Grade
Figure - Histograms (SHAPES)
Symmetric = Bell Shaped

Skewed to the Right/ Positively skewed Skewed to the Left/Negatively Skewed

Figure - Histograms (SHAPES)
Measures of Center for Ungrouped Data

 Measure of Center: a measure along the horizontal

axis of the data distribution that locates the center of
the distribution
Characteristics of the
Mean
For ungrouped data, the population mean is
the sum of all the population values divided by
the total number of population values:
𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒍𝒖𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 ∑ 𝒙 𝒊 𝒙 𝟏 + 𝒙 𝟐+ 𝒙 𝟑 +… + 𝒙 𝑵
𝝁= = =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 𝑵 𝑵

POPULATION MEAN value is affected by extremely large or

extremely small values (NOT ROBUST to OUTLIERS)
EXAMPLE – Population Mean

30
Sample Mean
 For ungrouped data, the sample mean is the sum of all the
sample values divided by the number of sample values:
 Let X be a random variable

𝑿=
∑ 𝒙𝒊
=
𝒙 𝟏 + 𝒙 𝟐 + 𝒙 𝟑 +…+ 𝒙 𝒏
𝒏 𝒏

SAMPLE MEAN value is affected by extremely large or

extremely small values (NOT ROBUST to OUTLIERS)
Sample Mean - EXAMPLE

𝑋=
∑ 𝑋
=
90+ 77+94 + …+113 +83
= 97 . 5
𝑛 12 32
Example 4
A randomly selected sample of eight newborns were selected
and their lengths (in inches) were as follows:
20.4 18.5 16.3 17.9 19.2 21.2 17.3 ???
It is known that the sample mean of all these newborns equal
18.825 inches.
What is the 8th baby’s length?
Solution:

== 18.825
= 18.825(8) = 150.6
= (20.4+18.5+16.3+17.9+19.2+21.2+17.3) = 130.8
Subtract the sums
150.6−130.8 = 19.8 inches
is the 8th baby's length.
Median
 Median: the middle measurement when the
measurements are ranked from smallest to largest
 The position of the median is

0.5(n + 1)

once the measurements have been ordered

MEDIAN value is NOT affected by extremely large or

extremely small values (ROBUST to OUTLIERS)
Example 5
The following table lists the 2014 compensations of female CEOs of 11
American companies (USA TODAY, May 1, 2015). (The compensation of
Carol Meyrowitz of TJX is for the fiscal year ending in January 2015.).
Find the median for these data.
Table 2 Compensations of 11 Female CEOs
Example 5: Solution
To calculate the median, we perform the following two steps.
Step 1: We rank the given data in increasing order as follows:
16.2 16.9 19.3 19.3 19.6 21.0 22.2 22.5 28.7 33.7 42.1

Step 2: There are 11 data values. The sixth value divides

these 11 values in two equal parts. Hence, the sixth value
gives the median as shown below.

Thus, the median of 2014 compensations for these 11 female

CEOs is $21.0 million.
Example 6
 The following data give the cell phone minutes used last month by 12 randomly
selected persons.
 230 2053 160 397 510 380 263 3864 184 201 326 721
 Find the median for these data.
 Solution:
To calculate the median, we perform the following two steps.
Step 1: We rank the given data in increasing order as follows:
160 184 201 230 263 326 380 397 510 721 2053 3864
Step 2: The value that divides 12 data values in two equal
parts falls between the sixth and the seventh values. Thus, the median will be
given by the average of the sixth and the seventh values as follows.

326  380
Median average of two middle values  353 minutes
2
Mode

The value or values that occur

most frequently in a data set. If all
values occur with the same
frequency, the data set is said to
have no mode.
 MODE is not affected by
outliers
 It can be calculated for both
kinds of data–quantitative and
qualitative

Example
The status of five students who are members of the student senate at a college
are senior, sophomore, senior, junior, and senior, respectively. Find the mode.

Solution:
Because senior occurs more frequently than the other categories, it is the mode
for this data set. We cannot calculate the mean and median for this data set.
Example
The number of liters of milk purchased by 25 households:
0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3
3 3 3 3 4 4 4 5
 Mean?

 Median?

 Mode? (Highest peak)

mode 2
Copyright © 2019 by Nelson Education Ltd.
Weighted Mean
 When different values of a data set occur with different frequencies,
that is, each value of a data set is assigned different weight, then
we calculate the weighted mean to find the center of the given
data set.

𝑾𝒆𝒊𝒈𝒉𝒕𝒆𝒅 𝑴𝒆𝒂𝒏=𝒙 𝒘 =
∑ 𝒘 𝒊 𝒙𝒊 = 𝒘 𝟏 𝒙𝟏 +𝒘 𝟐 𝒙 𝟐+ …+𝒘 𝒏 𝒙 𝒏
∑𝒘𝒊 𝒘 𝟏+ 𝒘 𝟐+ …+ 𝒘 𝒏
Weighted Mean (Example)
Example

Suppose your midterm test score is 83 and your final exam

score is 95. Using weights of 40% for the midterm and 60% for
the final exam, compute the weighted average of your scores.

If the minimum average for an A is 90, will you earn an A?

Solution

83 ( 0 . 40 ) +95 ( 0 . 6 0 )
𝑊𝑒𝑖𝑔 h 𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒= =90 .2
0 . 4 +0 . 6

Your average is high enough to earn an A.

Example 7
Example 8
Extreme Values (OUTLIERS)
 The mean is more easily affected by extremely
large or small values than the median

 If a distribution is skewed to the right, the mean

shifts to the right; if a distribution is skewed to
the left, the mean shifts to the left

 When a distribution is symmetric, the mean and

the median are equal

 The median is often used as a measure of center

when the distribution is skewed
The Relative Positions of the Mean, Median
and the Mode

Symmetric: Mean = Median

If approximate, it is called approximately symmetric distribution

45
The Relative Positions of the Mean, Median
and the Mode

Skewed right: Mean > Median

46
The Relative Positions of the Mean, Median
and the Mode

Skewed left: Mean < Median

47
Measures of Variability
 Measure of variability: a measure along
the horizontal axis of the data distribution
that describes the spread of the
distribution from the center

Copyright © 2019 by Nelson Education Ltd.

The Range
 Range (R): the difference between the largest and
smallest measurements in a set
Finding the Range for Ungrouped Data

 Example: a botanist records the number of petals on five

flowers: 5, 12, 6, 8, 14
 The range is R = 14 – 5 = 9

RANGE value is affected by extremely large or extremely

small values (NOT ROBUST to OUTLIERS)
The Variance
 Variance: a measure of variability that uses all the
measurements; it measures the average deviation of
the measurements about their mean
 Example: a botanist records the number of petals on five
flowers: 5, 12, 6, 8, 14

4 6 8 10 12 14
Copyright © 2019 by Nelson Education Ltd.
The Variance
 The variance of a population of N
measurements is the average of the
squared deviations of the measurements
about their mean μ

POPULATION VARIANCE value is affected by extremely large

or extremely small values (NOT ROBUST to OUTLIERS)
Copyright © 2019 by Nelson Education Ltd.
The Variance
 The variance of a sample of n measurements is
the sum of the squared deviations of the
measurements about their mean, divided by (n – 1)

Why divide by n – 1?
The sample standard deviation s is often used to estimate the
population standard deviation σ
Dividing by n – 1 gives us a better estimate of σ

SAMPLE VARIANCE value is affected by extremely large or

extremely small values (NOT ROBUST to OUTLIERS)
Copyright © 2019 by Nelson Education Ltd.
The Standard Deviation
 In calculating the variance, we squared all of the deviations,
and in doing so changed the scale of the measurements
 To return this measure of variability to the original units of
measure, we calculate the standard deviation, the positive
square root of the variance

POPULATION and SAMPLE STANDARD DEVIATION value is

affected by extremely large or extremely small values (NOT
ROBUST to OUTLIERS)
Variance and Standard Deviation
Example: The number of traffic citations issued during the
last five months in Beaufort County, South Carolina, is
38, 26, 13, 41, and 22. What is the population variance?

54
EXAMPLE – Sample
Variance
The hourly wages for a sample of part-time
employees at Home Depot are: $12, $20, $16, $18,
and $19. What is the sample variance?
(Sample Mean is calculated and is $17)

55
Example 9 – Variance and Standard deviation
Consider a small sample dataset: 4, 16, 9, 7, 0, 1, 10, 8
Find the standard deviation of the above set.
Solution i ‫ݔ‬ ሺ‫ ݔ‬െ‫ݔ‬ҧ
ሻ ‫ ݔ‬െ‫ݔ‬ҧଶ
 Step 1: Find the sample mean: 1 4 -2.875 8.265625
2 16 9.125 83.265625
3 9 2.125 4.515625
4 7 0.125 0.015625
5 0 -6.875 47.265625
6 1 -5.875 34.515625
 Step 2: Find the standard deviation 7 10 3.125 9.765625

√ √
8 8 1.125 1.265625

𝑠=
∑ =
2
( 𝑥𝑖 −𝑥 ) 188.875
= √ 26.982143=5.194434
Total (Sum) 55 0 188.875

𝑛−1 8−1 ∑ 𝑥 𝑖 ∑ ( 𝑥 −𝑥 ) ∑ ( 𝑥 −𝑥 )
𝑖 𝑖
2
Measures of Position
 Quartiles and Interquartile Range
 Percentiles and Percentile Rank
Definition
Quartiles are three summary measures that divide a ranked data
set into four equal parts.
The second quartile is the same as the median of a data set.
The first quartile is the value of the middle term among the
observations that are less than the median, and the third quartile
is the value of the middle term among the observations that are
greater than the median.
Interquartile Range (IQR)
 Interquartile Range (IQR) is a measure of variability which is
the difference between the third and the first quartiles
 IQR = Interquartile range = Q – Q
3 1

IQR value is NOT affected by extremely large or extremely small

values (ROBUST to OUTLIERS)

Example 12:
A sample of 12 commuter students was selected from a college.
The following data give the typical one-way commuting times (in
minutes) from home to college for these 12 students.

29 14 39 17 7 47 63 37 42 18 24 55

(a) Find the values of the three quartiles.

(b) Where does the commuting time of 47 fall in relation to the
three quartiles?
(c) Find the interquartile range.
Example 12: Solution
(a) We perform the following steps to find the three quartiles.

Step 1. First, we rank the given data in increasing order as follows:

7 14 17 18 24 29 37 39 42 47 55 63

Median = = 33
Step 2. We find the median(second quartile):

Step 3. We find the median of the data values that are smaller than , and this gives
the value of the first quartile.

The values that are smaller than are:

7 14 17 18 24 29

Thus, the first quartile is:

Example 12: Solution
Step 4. We find the median of the data values that are larger than , and this gives the value of
the third quartile.

The values that are larger than are:

37 39 42 47 55 63

Thus, the third quartile is:

(b) By looking at the position of 47 minutes, we can state that this value lies in
the top 25% of the commuting times.

IQR = = 27 minutes

Interpretation: The range of the middle half of commuting times in the sample is 27 minutes.
Identifying OUTLIERS
In addition to serving as a measure of spread, the interquartile range (IQR) is
used as part of a rule of thumb for identifying outliers.
The 1.5 x IQR Rule for Outliers
Any value below is considered a low outlier, and any value above is considered a
high outlier.

Example: Use the data below to calculate the mean and median of the commuting times (in
minutes) of 20 randomly selected New York workers.
10 30 5 25 40 20 10 15 30 20 15 20 85 15 65 15 60 60 40 45

Solutions: In the New York travel time data, we found Q1=15 minutes, Q3=42.5 minutes, and
IQR=27.5 minutes.
For these data, 1.5 x IQR = 1.5(27.5) = 41.25
Q1 - 1.5 x IQR = 15 – 41.25 = -26.25
Q3+ 1.5 x IQR = 42.5 + 41.25 = 83.75
Any travel time shorter than -26.25 minutes or longer than 83.75 minutes is considered an
outlier. 85 is an outlier.
FIVE NUMBER SUMMARY

The minimum and maximum values alone tell us little about the
distribution as a whole. Likewise, the median and quartiles tell us little
about the tails of a distribution.

To get a quick summary of both center and spread, combine all five
numbers.

The five-number summary of a distribution consists of the smallest

observation, the first quartile, the median, the third quartile, and the
largest observation, written in order from smallest to largest.
Minimum Q1 Median (Q2 ) Q3 Maximum
Example 13: Five Number Summary
Consider this sample dataset of nine observations:

6, 8, 1, 5, 7, 4, 4, 9, 2

Find the five-number summary.

Solution:
Step 1: Order the data: 1, 2, 4, 4, 5, 6, 7, 8, 9
Step 2: Minimum is 1 and maximum is 9
Step 3: The median is in the 5th position and is therefore 5.
Step 4: Concentrating of the four numbers to the left of the
median, we have a first quartile of 3 and the four numbers to
the right yield a third quartile of 7.5.
Five number summary is:
1 3 5 7.5 9
Percentiles and Percentile Rank
 A percentile is the score at which a specified percentage of scores in a distribution fall
below
 To say a score 53 is in the 75th percentile is to say that 75% of all scores are less
than 53
 The percentile rank of a score indicates the percentage of scores in the distribution
that fall at or below that score.
 Thus, for example, to say that the percentile rank of 53 is 75, is to say that 75% of
the scores on the exam are less than 53.

Example:
Sadie, a student in a large class with 250 students, receives a score of 80% on her
exam. Sadie learns that her score is at the 70th percentile in the class. How many
students score more than Sadie

Solution: If Sadie is in the 70th percentile, then 30 % of the students in the class
did as good or better than Sadie.
Therefore, (0.30*250) 75 students scored higher than Sadie.
BOX and WHISKER PLOT
 The center of the boxplot shows us the middle half of
the data between the quartiles.
 The height of the box is equal to the IQR.
 If the median is roughly centered between the
quartiles, then the middle half of the data is roughly
symmetric. Thus, if the median is not centered, the
distribution is skewed.
 The whiskers also show the skewness if they are not the
same length.
 Outliers are out of the way to keep you from judging
skewness but give them special attention.
Interpreting Box Plots
 Median line in centre of box and whiskers of
equal length: symmetric distribution

 Median line left of centre and long right whisker:

skewed right

 Median line right of centre and long left whisker:

skewed left

BOX and WHISKER PLOT
Boxplot - Example
Boxplot Example
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Draw a box that starts at Q1 (15 minutes) and ends at Q3 (22
minutes). Inside the box we place a vertical line to represent the
median (18 minutes).
Step 3: Extend horizontal lines from the box out to the minimum
value (13 minutes) and the maximum value (30 minutes).

The Shape of the distribution is Right skewed (positively skewed)

Box and Whisker Modified / Outlier
Box Plot
Step1: Create an appropriate scale along the horizontal axis.
Step 2: Find the Quartiles and IQR
Step 3: Determine Lower Fence (LF) and (UF)
Step 4: Draw a box that starts at Q1 and ends at Q3. Inside the
box we place a vertical line to represent the median.
Step 5: Extend horizontal lines from the box out to the closest
observation inside LF and to the closest observation inside UF
Step 6: Outliers are marked with a special symbol such as an
asterisk/dot/square box (*).
Box and Whisker Outlier Box Plot
(Example)
The following data are the incomes (in thousands of dollars)
for a sample of 12 households.

75 69 84 112 74 104 81 90 94 144 79 98

Construct an outlier box-and-whisker plot for these data.

Solution:

Step 1 and 2. First, rank the data in increasing order and calculate the
values of the median, the first quartile, the third quartile, and the
interquartile range. The ranked data are

69 74 75 79 81 84 90 94 98 104 112 144

Box and Whisker Outlier Box Plot
(Example)
 69 74 75 79 81 84 90 94 98 104 112 144

Median (Q2) = (84 + 90) / 2 = 87

 Q1 = (75 + 79) / 2 = 77
 Q3 = (98 + 104) / 2 = 101
 IQR = Q – Q = 101 – 77 = 24
3 1

Step 3. Find the points that are 1.5 x IQR below Q1 and 1.5 x
IQR above Q3.

 1.5 x IQR = 1.5 x 24 = 36

 Lower fence (LF) = Q1 – 36 = 77 – 36 = 41

 Upper fence (UF)= Q3 + 36 = 101 + 36 = 137
Box and Whisker Outlier Box Plot
(Example)
Step 4. Draw a horizontal line and mark the income levels
on it such that all the values in the given data set are
covered. The result of this step is shown in the figure below.

𝑸𝟏 𝑸𝟐 𝑸𝟑
Box and Whisker Outlier Box Plot
(Example)
 Step 5. By drawing two lines, join the points of the smallest
(69) and the largest values (112) within the two inner
fences to the box.
 These values are 69 and 112 in this example.
 This completes the box-and-whisker plot, as shown in Figure
below.
𝑸𝟏 𝑸𝟐 𝑸𝟑 112 145 (outlier)
69

 Interpretation:
 Right Skewed with one outlier. The outlier is 145.
Example
Amount of sodium in 8 brands of cheese:
260 290 300 320 330 340 340 520

Q1 = 295 Q2 = 325 Q3 = 340

Q1 Q3
Copyright © 2019 by Nelson Education 2-75
Ltd.
Example (cont’d)
 IQR = 340 – 295 = 45
 Lower fence (LF): 295 – 1.5(45) = 295 – 67.5 = 227.5
 Upper fence (UF): 340 + 1.5(45) = 340 + 67.5 = 407.5
 Outlier: x = 520

Draw “whiskers” connecting the largest (that is 340) and smallest observation
values(that is 290) that are NOT outliers to the box
Example 15: Side by Side Boxplot (Male vs
Female Weight Distribution)
Example 16
The sport of boxing divides its athletes into different weight classes in
order to make the competition fairer. The side-by-side basic (quantile)
boxplots shown below display the weights (in pounds) of a random
sample of 16 Cruiserweight boxers and 17 Heavyweight boxers.

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14
Example 16
Which of the following statements is/are true?

(I) The distribution of weights for the Heavyweights is skewed to the left.
(II) There are 12 Cruiserweights in the sample who weigh at least 220 pounds.
(III) The mean weight for the Heavyweights is likely greater than the median
weight.

(A) I only
(B) III only
(C) I and II only
(D) II and III only
(E) I, II and III C

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14
Example 16
What is the median weight of all of the boxers in the sample (Cruiserweight and
Heavyweight) combined?

(A) 260
(B) 262
(C) 265
(D) 268
(E) 270
E

Cruiserweight Heavyweight
Minimum 204 270
Q1 220 294
Median 226 304
Q3 230 312
Maximum 250 320
n 16 14

ST Topic 1
No ratings yet
ST Topic 1
164 pages
A Quick Approach To Statistics by G.R.pashA
77% (13)
A Quick Approach To Statistics by G.R.pashA
210 pages
EA311 Lecture Note One
No ratings yet
EA311 Lecture Note One
33 pages
BST 121
No ratings yet
BST 121
111 pages
Module 0. Review On Statistics
No ratings yet
Module 0. Review On Statistics
76 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
MMW Finals Notes Mod 5&6
No ratings yet
MMW Finals Notes Mod 5&6
52 pages
Inroduction To Statistics
No ratings yet
Inroduction To Statistics
71 pages
Biostatistics 1
No ratings yet
Biostatistics 1
120 pages
Lecture 1
No ratings yet
Lecture 1
63 pages
MMW Finals Notes Mod 5 Part 1&2
No ratings yet
MMW Finals Notes Mod 5 Part 1&2
32 pages
PROBABILITY Lecture 1 - 2 - 3
No ratings yet
PROBABILITY Lecture 1 - 2 - 3
63 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Psychological Assessment Test Drill: Prepared and Screened by
No ratings yet
Psychological Assessment Test Drill: Prepared and Screened by
7 pages
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
100% (1)
Lecture 1 - Online - INTRODUCTION TO BIOSTATISTICS (Compatibility Mode)
28 pages
MAT 361 Lecture 15 16
No ratings yet
MAT 361 Lecture 15 16
40 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Introduction To Statistics and SPSS
100% (1)
Introduction To Statistics and SPSS
110 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
Ns Statistics 2022
No ratings yet
Ns Statistics 2022
70 pages
M 301 - Ch1 - Introduction To Statistics
No ratings yet
M 301 - Ch1 - Introduction To Statistics
96 pages
Educational-Statistics Basic-Terms Sampling Data-Gathering
No ratings yet
Educational-Statistics Basic-Terms Sampling Data-Gathering
21 pages
1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
CHAPTER+ONE+Descriptive+Statistics+ +univariate
No ratings yet
CHAPTER+ONE+Descriptive+Statistics+ +univariate
12 pages
Unit 2
No ratings yet
Unit 2
72 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
Chapter-1 Data Analysis
No ratings yet
Chapter-1 Data Analysis
14 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
STAT110 Biostatistics
No ratings yet
STAT110 Biostatistics
21 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Intro 123243 Ewqs 1
No ratings yet
Intro 123243 Ewqs 1
37 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
103 pages
GIS Data Model
No ratings yet
GIS Data Model
97 pages
3rd QTR Stats Reviewer
No ratings yet
3rd QTR Stats Reviewer
24 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
STA132 Lecture Notes - 1
No ratings yet
STA132 Lecture Notes - 1
6 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
Chapters 1 and 2
No ratings yet
Chapters 1 and 2
12 pages
Lec Notes Business Stat
No ratings yet
Lec Notes Business Stat
7 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Math As A Tool Data Management Introduction and Central Tendency
No ratings yet
Math As A Tool Data Management Introduction and Central Tendency
12 pages
Chapter 1 - Introduction To Business Analytics PDF
100% (2)
Chapter 1 - Introduction To Business Analytics PDF
50 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Fisher and Bloomfield Understanding The Research Process
100% (1)
Fisher and Bloomfield Understanding The Research Process
7 pages
Statistics
100% (1)
Statistics
11 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Lesson 5.1
No ratings yet
Lesson 5.1
43 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
The Analytic Hierarchy Process - An Exposition: Ernest H. Forman, Saul I. Gass
No ratings yet
The Analytic Hierarchy Process - An Exposition: Ernest H. Forman, Saul I. Gass
24 pages
Lesson 7 The Basics of Experimentation
100% (1)
Lesson 7 The Basics of Experimentation
48 pages
Chapter 1 Introduction To Statistics
No ratings yet
Chapter 1 Introduction To Statistics
28 pages
Pre Ph.D. (Education)
No ratings yet
Pre Ph.D. (Education)
10 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
As 2542.2.3-2014
100% (1)
As 2542.2.3-2014
25 pages
Research Methods Handout
0% (1)
Research Methods Handout
32 pages
Udacity Statistics Notes
No ratings yet
Udacity Statistics Notes
37 pages
Modules in Stat101
No ratings yet
Modules in Stat101
133 pages
PPT2 Types and Classification of Variables
No ratings yet
PPT2 Types and Classification of Variables
13 pages
GIS Theory
100% (1)
GIS Theory
421 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
Unit 3 - Sampling and Experimental Design New - Read-Only
No ratings yet
Unit 3 - Sampling and Experimental Design New - Read-Only
44 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
MBA 2012-13 Syllabus
No ratings yet
MBA 2012-13 Syllabus
164 pages
Math2130 - Chapter 11
No ratings yet
Math2130 - Chapter 11
72 pages
1-AS500 Sheet 1 Final
No ratings yet
1-AS500 Sheet 1 Final
9 pages
STAT2000 - Unit 1
No ratings yet
STAT2000 - Unit 1
217 pages
Unit 4: An Overview On The Analysis and Interpretation of Assessment Results
No ratings yet
Unit 4: An Overview On The Analysis and Interpretation of Assessment Results
18 pages
Levels of Data
100% (1)
Levels of Data
26 pages
Math in The Modern World - Module - Final Term - Revised 1
No ratings yet
Math in The Modern World - Module - Final Term - Revised 1
72 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Essentials of Marketing Research 4th Edition Hair Test Bank Download
100% (1)
Essentials of Marketing Research 4th Edition Hair Test Bank Download
57 pages
Stat 1000
No ratings yet
Stat 1000
117 pages
Processing and Interpretation of Data
No ratings yet
Processing and Interpretation of Data
12 pages
Statistics: Prepared By: Larry Jay B. Valero, LPT
No ratings yet
Statistics: Prepared By: Larry Jay B. Valero, LPT
139 pages
U1 Note
No ratings yet
U1 Note
69 pages
(FREE PDF Sample) Clinical Research Methods in Speech Language Pathology and Audiology 3rd Edition David L Irwin Norman J Lass Mary Pannbacker Mary Ellen Tekieli Koay Jennifer S Whited Ebooks
100% (2)
(FREE PDF Sample) Clinical Research Methods in Speech Language Pathology and Audiology 3rd Edition David L Irwin Norman J Lass Mary Pannbacker Mary Ellen Tekieli Koay Jennifer S Whited Ebooks
24 pages
C. True Experimental: Nres 1 Semi-Final Exam
0% (1)
C. True Experimental: Nres 1 Semi-Final Exam
17 pages
ECON2030 Winter2025 Ch2
No ratings yet
ECON2030 Winter2025 Ch2
27 pages
The Correct Answers Are Highlighted in Green
No ratings yet
The Correct Answers Are Highlighted in Green
11 pages
MPPU1034 01 Introduction
No ratings yet
MPPU1034 01 Introduction
27 pages
BRM 9e PPT CH 13 - Measurement and Scale
No ratings yet
BRM 9e PPT CH 13 - Measurement and Scale
23 pages
Math2130 Test 2 - Fall24
No ratings yet
Math2130 Test 2 - Fall24
3 pages
Math2130 Test 1 - Fall24
No ratings yet
Math2130 Test 1 - Fall24
2 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

Unit 1 - Examining Distributions

Uploaded by

Unit 1 - Examining Distributions

Uploaded by

Basic Statistical

Instructor: Farhan Islam

 Bivariate data: two variables are measured

 Multivariate data: more than two variables

classifies data into mutually classifies data into categories

Copyright © 2019 by Nelson Education 1-9

Quantitative Variables (Discrete and Continuous)

Copyright © 2019 by Nelson Education Ltd.

Copyright © 2019 by Nelson Education Ltd.

Copyright © 2019 by Nelson Education Ltd.

Copyright © 2019 by Nelson Education Ltd.

Copyright © 2019 by Nelson Education Ltd.

A histogram is a graph in which

The frequencies, relative

In a histogram, the bars are drawn

Let X = Exam Marks

𝑷 ( 𝑿 ≥ 𝟖𝟎 )=𝟎 . 𝟏𝟖+ 𝟎 .𝟎𝟔=𝟎. 𝟐𝟒

Skewed to the Right/ Positively skewed Skewed to the Left/Negatively Skewed

 Measure of Center: a measure along the horizontal

POPULATION MEAN value is affected by extremely large or

SAMPLE MEAN value is affected by extremely large or

once the measurements have been ordered

MEDIAN value is NOT affected by extremely large or

Step 2: There are 11 data values. The sixth value divides

Thus, the median of 2014 compensations for these 11 female

The value or values that occur

 Mode? (Highest peak)

Suppose your midterm test score is 83 and your final exam

If the minimum average for an A is 90, will you earn an A?

Your average is high enough to earn an A.

 If a distribution is skewed to the right, the mean

 When a distribution is symmetric, the mean and

 The median is often used as a measure of center

Symmetric: Mean = Median

If approximate, it is called approximately symmetric distribution

Skewed right: Mean > Median

Skewed left: Mean < Median

Copyright © 2019 by Nelson Education Ltd.

 Example: a botanist records the number of petals on five

RANGE value is affected by extremely large or extremely

POPULATION VARIANCE value is affected by extremely large

SAMPLE VARIANCE value is affected by extremely large or

POPULATION and SAMPLE STANDARD DEVIATION value is

IQR value is NOT affected by extremely large or extremely small

(a) Find the values of the three quartiles.

Step 1. First, we rank the given data in increasing order as follows:

The values that are smaller than are:

Thus, the first quartile is:

The values that are larger than are:

Thus, the third quartile is:

The five-number summary of a distribution consists of the smallest

Find the five-number summary.

 Median line left of centre and long right whisker:

 Median line right of centre and long left whisker:

Copyright © 2019 by Nelson Education Ltd.

The Shape of the distribution is Right skewed (positively skewed)

75 69 84 112 74 104 81 90 94 144 79 98

Construct an outlier box-and-whisker plot for these data.

69 74 75 79 81 84 90 94 98 104 112 144

Median (Q2) = (84 + 90) / 2 = 87

 1.5 x IQR = 1.5 x 24 = 36

 Lower fence (LF) = Q1 – 36 = 77 – 36 = 41

Q1 = 295 Q2 = 325 Q3 = 340

You might also like