Math236 Lecture 2
Math236 Lecture 2
Engineering Statistics
2022 – 2023 Fall
6
What is Statistics?
Statistics is the science of data…
7
Section 1.3
Measures of
Location: The
Sample Mean
and Median
Measures of Variation
Locations
Sample Mean Range
Mode Variance
Standard Deviation
Mean
• The arithmetic mean (mean) is the most common
measure of central tendency
• For a population of N values:
Population
values
Population size
• For a sample of size n:
Observed
values
Sample size
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-11
Remark: Outliers
• Outliers are points that are much larger or smaller than the rest of
the sample points.
• Outliers may be data entry errors or they may be points that really are
different from the rest.
• Outliers should not be deleted without considerable thought—
sometimes calculations and analyses will be done with and without
outliers and then compared.
Median
• In an ordered list, the median is the “middle” number
(50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Copyright © 2017 Pearson Education Ltd. All
rights reserved. 1 - 16
Mean-Median-Mode-Trimmed Mean for
Example 1.2
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Review Example:
Summary Statistics
House Prices:
• Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000
• Median: middle value of ranked data
Sum 3,000,000
= $300,000
= $100,000
Which measure of location
is the “best”?
• Mean is generally used, unless extreme values
(outliers) exist . . .
• Then median is often used, since the median is not
sensitive to extreme values.
• Example: Median home prices may be reported for a
region – less sensitive to outliers
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
• Symmetric or skewed
• Population variance:
• Sample variance:
n=8 Mean = x = 16
11 12 13 14 15 16 17 18 19 20
s = 3.338
21
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 s = 0.926
21
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = 4.570
Advantages of Variance and
Standard Deviation
• Categorical
•Nominal
•Ordinal
• Numerical
• Discrete
• Continuous
Scatterplot
• Data for which items consists of a pair of values is
called bivariate.
• The graphical summary for bivariate data is a
scatterplot.
• Display of a scatterplot:
4 259
5 0111133556678
6 067789
7 01233455556666699
8 000012223344456668
9 013
Creating a Histogram
• Choose boundary points for the class intervals.
Usually these intervals are the same width.
• Compute the frequencies: this is the number of
observations that occur in each interval.
• Compute the relative frequencies for each class: this
is the number of observations in each interval divided
by the total number of observations.
• If the class intervals are the same width, then draw a
rectangle for each class, whose height is equal to the
frequencies or relative frequencies.
• If the class intervals are of unequal widths, the heights of
the rectangles must be set equal to the densities, where
density is the relative frequency divided by the class width.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Example of Histogram
Table 1.4 Car Battery Life
0 10 20 30 40 50 60 70 80
90
0 10 20 30 40 50 60 70 80
90
0 10 20 30 40 50 60 70 80
90
mean < median < mode