0% found this document useful (0 votes)
3 views63 pages

Chapter 1

Uploaded by

李昊哲
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views63 pages

Chapter 1

Uploaded by

李昊哲
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Overview and

1 Descriptive Statistics

Copyright © Cengage Learning. All rights reserved.


Importance of Statistics to Engineers
Engineers and scientists are constantly exposed to collections of
facts, or data,

Statistics provides methods for:


▪ Organizing data
▪ Summarizing data
▪ Drawing conclusions based on information in the data

2
1.1 Populations, Samples, and
Processes
Populations, Samples, and Processes

Population: collection of objects of interest.


Example:
1. All gelatin capsules of a particular type produced during
a specified period.
2. All individuals who received a B.S. in engineering during
the most recent academic year

4
Populations, Samples, and Processes
A subset of the population—a sample—is selected, due to
constraints on scarce resources (time, money, etc)
Examples:
1. A sample of bearings from a production run to investigate
whether bearings are conforming to manufacturing
specifications
2. A sample of last year’s engineering graduates to obtain feedback
about the quality of the engineering curricula.

5
Populations, Samples, and Processes

Interest is in certain characteristics of the objects in a


population:
Examples:
1. The number of flaws on the surface of each casing.
2. The thickness of each capsule wall.
3. The gender of an engineering graduate
4. The age at which the individual graduated, and so on.

6
Populations, Samples, and Processes
Characteristic:
Categorical: Gender (value = female), type of malfunction
(value = insufficient solder)
Numerical: Age (value = 23), Diameter (Value = .502 cm)

7
Populations, Samples, and Processes
A variable: Any characteristic whose value may change
from one object to another in the population.
(Denoted by lowercase letters)

x = brand of calculator owned by a student

y = number of visits to a particular Web site during a


specified period

z = braking distance of an automobile under specified


conditions

8
Populations, Samples, and Processes
A univariate data set: Observations on a single variable.
Example:
1. The type of transmission, automatic (A) or manual (M),
on each of ten automobiles recently purchased at a certain
dealership (categorical data set)
M A A A M A A M A A

2. Sample of lifetimes (hours) of brand D batteries put to a


certain use (numerical data set)
5.6 5.1 6.2 6.0 5.8 6.5 5.8 5.5

9
Populations, Samples, and Processes
We have bivariate data when observations are made on
each of two variables.

Examples
1. The (height, weight) pair for each basketball player on a
team, with the first observation as (72, 168), the second
as
(75, 212), and so on.
2. An engineer determines the value of both x =
component lifetime and y = reason for component
failure. The data set is bivariate, one variable is
numerical and the other categorical.
10
Populations, Samples, and Processes

Multivariate data: when observations are made on more


than one variable (bivariate is a special case of multivariate).
Examples:
1. A research to determine the systolic blood pressure, diastolic blood
pressure, and serum cholesterol level for each patient participating
in a study. Each observation is a triple of numbers, such as (120, 80,
146).
2. The annual automobile issue of Consumer Reports gives values of
such variables as type of vehicle (small, sporty, compact, mid-size,
large), city fuel efficiency (mpg), highway fuel efficiency (mpg),
drivetrain type (rear wheel, front wheel, four wheel), and so on.

11
Branches of Statistics

12
Branches of Statistics
Descriptive statistics: Histograms, boxplots, and scatter
plots, etc.
Uses: Investigators apply these to summarize and describe
important features of the data.
Other descriptive numerical summary measures: means,
standard deviations, and correlation coefficients etc.

Inferential statistics: Making deductions on a population based


on evidence from sample.
Uses: Test of a new drug on a small number of people, test the
likelihood of success of a new product, etc.

13
14
Stem-and-Leaf Displays

15
Stem-and-Leaf Displays
Numerical data set x1, x2, x3,…, xn for which each xi
consists of at least two digits.

16
Example 1 cont’d

Data on fundraising expenses as a percentage of total


expenditures for a random sample of 60 charities:

6.1 12.6 34.7 1.6 18.8 2.2 3.0 2.2 5.6 3.8
2.2 3.1 1.3 1.1 14.1 4.0 21.0 6.1 1.3 20.4
7.5 3.9 10.1 8.1 19.5 5.2 12.0 15.8 10.4 5.2
6.4 10.8 83.1 3.6 6.2 6.3 16.3 12.7 1.3 0.8
8.8 5.1 3.7 26.3 6.0 48.0 8.2 11.7 7.2 3.9
15.3 16.6 8.8 12.0 4.7 14.7 6.4 17.0 2.5 16.2

17
Example 1 cont’d

18
Branches of Statistics
Deductions:
Substantial majority of the charities spend
less than 20% on fundraising
Only a few percentages is beyond the
bounds of sensible practice.

Generally, investigators use sample


information to draw some type of conclusion
(make an inference of some sort) about the
population.

Inferential statistics: Making deductions


on a population based on evidence from
sample.
19
Example 2 cont’d

A study of heavy drinking on campuses across the United


States. A binge episode was defined as five or more drinks
in a row for males and four or more for females.

Stem-and-leaf display for the percentage of binge drinkers at each of the 140 colleges

20
Stem-and-Leaf Displays
A stem-and-leaf display conveys information about the
following aspects of the data:

• identification of a typical or representative value

• extent of spread about the typical value

• presence of any gaps in the data

• extent of symmetry in the distribution of values

• number and location of peaks

• presence of any outlying values

21
Dotplots

22
Example 3
Data on state-by-state appropriations for higher education as a
percentage of state and local tax revenue for the fiscal year

10.8 6.9 8.0 8.8 7.3 3.6 4.1 6.0 4.4 8.3
8.1 8.0 5.9 5.9 7.6 8.9 8.5 8.1 4.2 5.7
4.0 6.7 5.8 9.9 5.6 5.8 9.3 6.2 2.5 4.5
12.8 3.5 10.0 9.1 5.0 8.1 5.3 3.9 4.0 8.0
7.4 7.5 8.4 8.3 2.6 5.1 6.0 7.0 6.5 10.3

23
Histograms

24
Histograms
The relative frequency of a value is the fraction or
proportion of times the value occurs:

Example:
x = number of courses a college student is taking this term.
Total number of students observed: 200

Suppose 70 of x values are 3


Relative frequency of the x value 3:
25
Histograms

26
Example 4 cont’d

A frequency distribution for the number of hits per team per


game for all nine-inning games

Frequency Distribution for Hits in Nine-Inning Games


27
Example 4 cont’d

Determining proportions:

= .0010 +.0037 + .0108 = .0155

= .6361

28
Histograms
Construction of a histogram for continuous data:
Subdivide measurement axis into a suitable number of class
intervals or classes, so each observation is contained in
exactly one class.

Example: 50 observations
x = fuel efficiency of an automobile (mpg),
smallest is 27.8, largest is 31.4.
Class boundaries: 27.5, 28.0, 28.5, . . . , and 31.5.

29
1.3 Measures of Location

30
The Mean

31
The Mean
For a given set of numbers x1, x2,. . ., xn, the most useful
measure of the center is the mean, or arithmetic average of
the set.
Definition: The sample mean x of observations x1, x2,. . ., xn,
is given by

32
Example 5 cont’d

Observations on x = crack length (m) as a result of constant load stress


corrosion tests on smooth bar tensile specimens for a fixed length of time.

x1 = 16.1 x2 = 9.6 x3 = 24.9 x4 = 20.4 x5 = 12.7


x6 = 21.2 x7 = 30.2 x8 = 25.8 x9 = 18.5 x10 = 10.3
x11 = 25.3 x12 = 14.0 x13 = 27.1 x14 = 45.0 x15 = 23.3
x16 = 24.2 x17 = 14.6 x18 = 8.9 x19 = 32.4 x20 = 11.8
x21 = 28.5

33
The Mean
Demerit of mean:
Its value can be affected by the presence of an outliers.

34
The Median

35
The Median
Definition
The sample median is obtained by first ordering the n
observations from smallest to largest Then,

36
Example 6 cont’d

A sample of 12 recordings of Beethoven’s Symphony yielding


the following durations (min) listed in increasing order:

62.3 62.8 63.6 65.2 65.7 66.4 67.4 68.4 68.8 70.8 75.7 79.0

If Largest observation (79.0) is not included in the sample,


Then, n = 11, median = [n + 1]/2 = 6th ordered value = 66.4

37
Example 6 cont’d

Comparing Sample mean and Median:

x = xi /n = 816.1/12 = 68.01

Note: The mean is pulled out relative to the median because


the sample “stretches out” on one side.

Note: Both sample mean and median describe where the data
is centered, however the sample median is very insensitive
to outliers.

38
The Median
Comparing Sample mean and Median:

The population mean  and median will be dependent


on the population distribution.

(a) Negative skew (b) Symmetric (c) Positive skew

Three different shapes for a population distribution

39
Other Measures of Location: Quartiles,
Percentiles, and Trimmed Means

40
Measures of Location: Quartiles

Quartiles divide the data set into four equal parts

First quartile (lower fourth):


The second quartile (corresponds to the median)
Third quartile (upper fourth)

41
Measures of Location: Percentiles

Percentiles divide the data set into 100 equal parts

A data set (sample or population) can be divided using


percentiles; the 99th percentile separates the highest 1%
from the bottom 99%, and so on.

42
Measures of Location: Trimmed Means
Sample mean is sensitive to outliers, but median is ignores
extreme values.
A trimmed mean is a compromise between and

How?
Example: A 10% trimmed mean, would be computed by
eliminating the smallest 10% and the largest 10% of the
sample and then averaging what remains.

43
Example 7

Consider the following observations on copper content (%)


for a sample of Bidri artifacts in London’s Victoria and Albert
Museum listed in increasing order. Find 10% trimmed mean.

2.0 2.4 2.5 2.6 2.6 2.7 2.7 2.8 3.0 3.1 3.2 3.3 3.3
3.4 3.4 3.6 3.6 3.6 3.6 3.7 4.4 4.6 4.7 4.8 5.3 10.1

44
1.4 Measures of Variability

45
Measures of Variability for
Sample Data

46
Measures of Variability for Sample Data
Range: Difference between the largest and smallest sample
values.

Samples with identical measures of center but different


amounts of variability

Range for sample 1 is larger than it is for sample 3, reflecting


more variability in the first sample.
47
Measures of Variability for Sample Data
Primary measures of variability involve the deviations
from the mean,

One way to combine the deviations into a single quantity is to


average them.

However, this is not useful:

What then? How


can negative and positive deviations from the
mean be prevented from counteracting one another?

48
Measures of Variability for Sample Data
The sample variance, denoted by s2, is given by

The sample standard deviation, denoted by s, is the


(positive) square root of the variance:
Note: s2 and s are both nonnegative.
Note: The unit for s is the same as for each of the xis.

49
Example 8
Consider the following sample of n = 11 efficiencies for the 2009 Ford
Focus equipped with an automatic transmission (for this model, EPA
reports an overall rating of 27 mpg–24 mpg for city driving and 33 mpg
for highway driving): Find s

50
Motivation for s2
Chapter 6 explains the rationale for the divisor n – 1 in s2

51
A Computing Formula for s2

52
Example 9
The given data relates to postsurgical range of motion.
Calculate the standard deviation.

154 142 137 133 122 126 135 135 108 120 127 134 122

53
Boxplots

54
Boxplots

Boxplot: A pictorial summary used to describe several of a


data set’s most prominent features.

The features include:


(1) center,
(2) spread,
(3) extent and nature of any departure from symmetry
(4) identification of “outliers,” observations that lie
unusually far from the main body of the data.

55
Boxplots
Construction:

1. Order the n observations from smallest to largest and


separate the smallest half from the largest half;
2. The median is included in both halves if n is odd.
3. The lower fourth is the median of the smallest half and
the upper fourth is the median of the largest half.
4. A measure of spread, fourth spread fs, given by
fs = upper fourth – lower fourth
56
Example 10
Ultrasound was used to gather the accompanying corrosion
data on the thickness of the floor plate of an aboveground
tank used to store crude oil. Each observation is the largest
pit depth in the plate, expressed in milli-in.

Summary:
smallest xi = 40 lower fourth = 72.5
upper fourth = 96.5 largest xi = 125

57
Example 11

A boxplot of the corrosion data


Deductions:
1. The right edge of the box is much closer to the median
than is the left edge, indicating a very substantial skew in the
middle half of the data.
2. The box width (fs) is reasonably large relative to the range
of the data (distance between the tips of the whiskers).
58
Example 11

Minitab description of the pit-depth data

59
Boxplots That Show Outliers

60
Boxplots That Show Outliers

Definition
Any observation farther than 1.5fs from the closest fourth is
an outlier. An outlier is extreme if it is more than 3fs from
the nearest fourth, and it is mild otherwise.

61
Example 12
Among the data considered is the following sample of TN
(total nitrogen) loads (kg N/day) from a particular
Chesapeake Bay location, in increasing order.

62
Example 12
Relevant summary quantities

Lower 4th -1.5fs gives a negative number, no outliers on


the lower end of the data.

Upper 4th + 1.5fs = 351.015; upper 4th + 3fs = 534.24


Deduction: four largest observations—563.92, 690.11,
826.54, and 1529.35—are extreme outliers, and 352.09,
371.47, 444.68, and 460.86 are mild outliers.

63

You might also like