Chapter 1
Chapter 1
1 Descriptive Statistics
2
1.1 Populations, Samples, and
Processes
Populations, Samples, and Processes
4
Populations, Samples, and Processes
A subset of the population—a sample—is selected, due to
constraints on scarce resources (time, money, etc)
Examples:
1. A sample of bearings from a production run to investigate
whether bearings are conforming to manufacturing
specifications
2. A sample of last year’s engineering graduates to obtain feedback
about the quality of the engineering curricula.
5
Populations, Samples, and Processes
6
Populations, Samples, and Processes
Characteristic:
Categorical: Gender (value = female), type of malfunction
(value = insufficient solder)
Numerical: Age (value = 23), Diameter (Value = .502 cm)
7
Populations, Samples, and Processes
A variable: Any characteristic whose value may change
from one object to another in the population.
(Denoted by lowercase letters)
8
Populations, Samples, and Processes
A univariate data set: Observations on a single variable.
Example:
1. The type of transmission, automatic (A) or manual (M),
on each of ten automobiles recently purchased at a certain
dealership (categorical data set)
M A A A M A A M A A
9
Populations, Samples, and Processes
We have bivariate data when observations are made on
each of two variables.
Examples
1. The (height, weight) pair for each basketball player on a
team, with the first observation as (72, 168), the second
as
(75, 212), and so on.
2. An engineer determines the value of both x =
component lifetime and y = reason for component
failure. The data set is bivariate, one variable is
numerical and the other categorical.
10
Populations, Samples, and Processes
11
Branches of Statistics
12
Branches of Statistics
Descriptive statistics: Histograms, boxplots, and scatter
plots, etc.
Uses: Investigators apply these to summarize and describe
important features of the data.
Other descriptive numerical summary measures: means,
standard deviations, and correlation coefficients etc.
13
14
Stem-and-Leaf Displays
15
Stem-and-Leaf Displays
Numerical data set x1, x2, x3,…, xn for which each xi
consists of at least two digits.
16
Example 1 cont’d
6.1 12.6 34.7 1.6 18.8 2.2 3.0 2.2 5.6 3.8
2.2 3.1 1.3 1.1 14.1 4.0 21.0 6.1 1.3 20.4
7.5 3.9 10.1 8.1 19.5 5.2 12.0 15.8 10.4 5.2
6.4 10.8 83.1 3.6 6.2 6.3 16.3 12.7 1.3 0.8
8.8 5.1 3.7 26.3 6.0 48.0 8.2 11.7 7.2 3.9
15.3 16.6 8.8 12.0 4.7 14.7 6.4 17.0 2.5 16.2
17
Example 1 cont’d
18
Branches of Statistics
Deductions:
Substantial majority of the charities spend
less than 20% on fundraising
Only a few percentages is beyond the
bounds of sensible practice.
Stem-and-leaf display for the percentage of binge drinkers at each of the 140 colleges
20
Stem-and-Leaf Displays
A stem-and-leaf display conveys information about the
following aspects of the data:
21
Dotplots
22
Example 3
Data on state-by-state appropriations for higher education as a
percentage of state and local tax revenue for the fiscal year
10.8 6.9 8.0 8.8 7.3 3.6 4.1 6.0 4.4 8.3
8.1 8.0 5.9 5.9 7.6 8.9 8.5 8.1 4.2 5.7
4.0 6.7 5.8 9.9 5.6 5.8 9.3 6.2 2.5 4.5
12.8 3.5 10.0 9.1 5.0 8.1 5.3 3.9 4.0 8.0
7.4 7.5 8.4 8.3 2.6 5.1 6.0 7.0 6.5 10.3
23
Histograms
24
Histograms
The relative frequency of a value is the fraction or
proportion of times the value occurs:
Example:
x = number of courses a college student is taking this term.
Total number of students observed: 200
26
Example 4 cont’d
Determining proportions:
= .6361
28
Histograms
Construction of a histogram for continuous data:
Subdivide measurement axis into a suitable number of class
intervals or classes, so each observation is contained in
exactly one class.
Example: 50 observations
x = fuel efficiency of an automobile (mpg),
smallest is 27.8, largest is 31.4.
Class boundaries: 27.5, 28.0, 28.5, . . . , and 31.5.
29
1.3 Measures of Location
30
The Mean
31
The Mean
For a given set of numbers x1, x2,. . ., xn, the most useful
measure of the center is the mean, or arithmetic average of
the set.
Definition: The sample mean x of observations x1, x2,. . ., xn,
is given by
32
Example 5 cont’d
33
The Mean
Demerit of mean:
Its value can be affected by the presence of an outliers.
34
The Median
35
The Median
Definition
The sample median is obtained by first ordering the n
observations from smallest to largest Then,
36
Example 6 cont’d
62.3 62.8 63.6 65.2 65.7 66.4 67.4 68.4 68.8 70.8 75.7 79.0
37
Example 6 cont’d
Note: Both sample mean and median describe where the data
is centered, however the sample median is very insensitive
to outliers.
38
The Median
Comparing Sample mean and Median:
39
Other Measures of Location: Quartiles,
Percentiles, and Trimmed Means
40
Measures of Location: Quartiles
41
Measures of Location: Percentiles
42
Measures of Location: Trimmed Means
Sample mean is sensitive to outliers, but median is ignores
extreme values.
A trimmed mean is a compromise between and
How?
Example: A 10% trimmed mean, would be computed by
eliminating the smallest 10% and the largest 10% of the
sample and then averaging what remains.
43
Example 7
2.0 2.4 2.5 2.6 2.6 2.7 2.7 2.8 3.0 3.1 3.2 3.3 3.3
3.4 3.4 3.6 3.6 3.6 3.6 3.7 4.4 4.6 4.7 4.8 5.3 10.1
44
1.4 Measures of Variability
45
Measures of Variability for
Sample Data
46
Measures of Variability for Sample Data
Range: Difference between the largest and smallest sample
values.
48
Measures of Variability for Sample Data
The sample variance, denoted by s2, is given by
49
Example 8
Consider the following sample of n = 11 efficiencies for the 2009 Ford
Focus equipped with an automatic transmission (for this model, EPA
reports an overall rating of 27 mpg–24 mpg for city driving and 33 mpg
for highway driving): Find s
50
Motivation for s2
Chapter 6 explains the rationale for the divisor n – 1 in s2
51
A Computing Formula for s2
52
Example 9
The given data relates to postsurgical range of motion.
Calculate the standard deviation.
154 142 137 133 122 126 135 135 108 120 127 134 122
53
Boxplots
54
Boxplots
55
Boxplots
Construction:
Summary:
smallest xi = 40 lower fourth = 72.5
upper fourth = 96.5 largest xi = 125
57
Example 11
59
Boxplots That Show Outliers
60
Boxplots That Show Outliers
Definition
Any observation farther than 1.5fs from the closest fourth is
an outlier. An outlier is extreme if it is more than 3fs from
the nearest fourth, and it is mild otherwise.
61
Example 12
Among the data considered is the following sample of TN
(total nitrogen) loads (kg N/day) from a particular
Chesapeake Bay location, in increasing order.
62
Example 12
Relevant summary quantities
63