0% found this document useful (0 votes)

38 views97 pages

Lectures 11 12 13 - Engineering Statistics 2017 - Handouts

- ECOR1010 is an engineering statistics course that teaches statistical analysis methods, how to interpret results, and the importance of experimental design and data analysis ([DOCUMENT]) - Statistics are used to summarize data, maximize information

Uploaded by

Stephen Alao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views97 pages

Lectures 11 12 13 - Engineering Statistics 2017 - Handouts

Uploaded by

Stephen Alao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

ECOR1010

Engineering Statistics
Motivation
• To learn methods of statistical analysis:
– What is available.
– How to interpret results.
– What are the pitfalls and caveats.
– The necessity of good experimental design,
technique and data-analysis to achieve high
quality results.
Overview of Statistics
• Statistical Analysis is the science of data
collection and data interpretation.
– Uses formal probabilistic methods for drawing
inferences and making decisions from these
data.
• Summarize data
• Maximize information derived from data
• Test alternate hypotheses or models
• Compute probabilities of future occurrences
• Make rational decisions based on data and information.
– Quantifies our Ignorance (uncertainty)
What are Statistics?
• Statistics are about ways to describe our
previous experience
• Statistics are useful as guides and motivators
• Statistics are generally poorly understood and
often abused
• “Oh, people can come up
with statistics to prove
anything, 14% of people
know that”, Homer Simpson.
Definition of a Statistic
 A statistic is a specified, determinable function of a
set of observations (data set)
- Given n measures of some "random" quantity: xi
you can calculate various statistics such as these examples:
n is the sample size
n

x
i 1
i is the sample sum  x1  x2  x3  ... xn

1 n

n i 1
xi is the sample mean (arithmetic average)
n

 i is the sample sum of squares

x 2

i 1

R  max( x)  min( x), the sample range

Definition of a Statistic
 Statistics are numbers that we use to describe a
large number of measurements
 Depending on what is being measured, some
statistics are better than others, and some statistics
are of no real value.
 Just because you can calculate something does not make
it useful !!
 An example of a silly statistic:
 If you have one hand in cold water and the other
hand in hot water, does the mean (average)
temperature tell you anything about how
comfortable you feel?
(Tcold  Thot )
Tavg 
2
Sometimes an arithmetic average
(Mean) is relevant:
• because it gives an indication of where the
‘center’ of the measurement distribution is.
• Median is another statistic used to indicate
the ‘center’
• Mode is a statistic used to indicate the most
probable measurement
Mean
• Mean (or arithmetic average) is the sum of all
the data divided by the number of data points
– Notation depends on whether we are computing
the mean of a population (N) or of a sample (n)

Recall:
Median
• Median is the value of the data point in the
centre of the data set when arranged in
ascending (or descending) order
• E.g., 2.98, 3.18, 3.25, 3.50 and 3.74
– Median is 3.25
• If there is an even number of points, the median
is the mean of the two centre data points
– E.g., 2, 3, 6, 7, 12, and 15
– Median is (6 + 7)/2 = 6.5
Mode
• Mode is one (or more) sets of numbers that
occurs with the greatest frequency
• e.g., 2, 2, 5, 7, 9, 9, 9, 10, 10, 11, 12, and 18
– Mode is 9 (is unimodal)
• e.g., 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9
– Modes are 4 and 7 (called bimodal)
• A data set can have no mode (e.g., uniform)
How to make a Histogram
• Start with a data set of
observations
• Find the big Range
(= max value - min value);
• Make a ‘number’ of bins
with smaller equal-size
ranges that span the big
Range;
• Count how many of your
data fit in the ranges of
your bins
Heights of 100 Residents in Anytown, Canada (ft)
Introduction to Engineering Analysis, 3rd Edition, K. D. Hagen, 2008
Classification of Height Data
Histogram for Heights in Anytown
Histograms
• Valuable statistical tool for showing the
frequency distribution of data
– Information about location, spread, and shape
that is portrayed can provide clues about the
underlying process that generated the data
Frequency Distributions (Histograms)

Note that Figure 18.4 in the textbook has left skew defined incorrectly.
The distribution looks ‘smoother’ as the number of
students in the sample gets larger.

• Mean:
140
17.5/27
120
Midterm 2017
mean 17.5
(65 %)
• Median:
100 median 18
Std.Ddev 3.8
Std.Err
80
18/27
Count

N=1067
60
(67%)
• N=1067
40

0
5 10 15 20 25

Mark out of 27

The red line is the “bell curve”, which is also the Normal distribution,
which is also called a Gaussian distribution.
Distributions
• Many types of distributions
• The distribution that best describes your data
will depend on the ‘physics’
• Many engineering measurements have
normal (or near normal) distributions
– Normal distributions are also called Gaussian
distributions, and sometimes Bell curves
• Other distributions
– Bimodal, multimodal, flat, skewed (either right or
left), etc.
Aside on Histograms
• The appearance of histograms is quite dependent on
the number of bins and how the bin boundaries are
computed.

• The appearance of histograms is quite dependent on

80
the number of bins and how the bin boundaries are
Mean 20.2
Mediun 21.0
SD 4.3
computed. Count 60
N 907

100 40 100

80 80
Mean 20.2 20 Mean 20.2
Mediun 21.0 Mediun 21.0
SD 4.3 SD 4.3
N 907 N 907
60 60
Count

Count
0
5 10 15 20 25 30
40 40
Mark out of 30

20 20

0 0
5 10 15 20 25 30 0 5 10 15 20 25 30

Mark out of 30 Mark out of 30

Do you place the bin at the front end or the back end?
i.e., do you want to show 30 and above to the next bin, or
above 29 to 30? Or maybe in the middle?
Abuse of Statistics
• There is a famous expression that you might hear
when people talk about statistics
• It is attributed to Mark Twain and/or Disraeli in the
early 1900’s.
• “There are three types of falsehoods, each worse
than the one before – lies, damned lies, and
statistics.”
• This is meant to be funny, but it does reflect a feeling
people have that statistics can be used to prove
anything, hence, people mistrust statistics.
• “Oh, people can come up
with statistics to prove
anything, 14% of people
know that”, Homer Simpson.
Stretching the truth with statistics
18

• A politician in power 16

says: “My town is 14 Salaries in Anytown

12
wealthy: the average Mean: $71 000
10 Median: $56 000

Count
salary is $71 000.” 8 Mode: $30 000

• The leader of the 6

opposition replies: “But 4

half of our citizens make 0

less that $56 000.” 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Salary (x1000$) per year

• The third party leader Everyone is correct: the first quotes the
says: “Most of the mean, the second the median and the
third quotes the mode; but the
people make $30 000”
messages are ‘different’.
Bill Gates moves into
18

14 Salaries in Anytown

town ‘for an hour’ 12

Mean: $71 000
10 Median: $56 000

Count
• A politician in power 8 Mode: $30 000
says: “My town is
6

wealthy: the average 2

salary is $86 000.”

0
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Salary (x1000$) per year

• The leader of the 50

opposition replies: “But Salaries in Anytown

half of our citizens make after Bill moves in

less that $56 000.” Count

30
Mean: $86 000
Median: $56 000
• The third party leader 20
Mode: $50 000
says: “Most of the 10

people make $50 000” 0

0 200 400 600 800 1000

Salary (x1000$) per year

Bill Gates would be called an outlier
• An outlying observation, or outlier, is one that
appears to deviate markedly from other
members of the sample in which it occurs.
• Often people make arguments to ‘ignore’
outliers
• You should only eliminate outliers after very,
very careful consideration:
– Ignoring Bill Gates might hurt his feelings!
Interlude on Outliers and
Censoring
• Extreme values (outliers) pose a problem
– Are they valid extreme values or gross errors of
measurement?
– Extreme values should never be deleted (censored)
without careful investigation.
• Sometimes the extreme values are the most important data in
the experiment; don’t delete your Nobel Prize mindlessly!
• Including erroneous data can seriously bias the results.
• Censoring of good data always introduces bias and
always makes the data look better than it really is.
– Automatic censoring, as is done by some instrument
software is not generally an acceptable procedure.
Descriptive Statistics
When you wish to use a few numbers to
describe your data set:
• Mean and Median give an idea of ‘central
tendency’
– where is the middle?
• Mode tells you the most probable ‘bin’ range
for your data
• Standard Deviation, and Range give
indications of width Next slide

– how wide is the distribution?

Measures of Variation (Width)
• A measure of variation is a number that
indicates the extent to which data are spread
out around the mean
– Standard deviation
– Variance
– Range
• How far is a given data point from the mean?
i.e., what is its deviation from the mean?

Three ways of indicating with a number how far xi

is from the (population) mean?
Standard Deviation (Population)
The Standard Deviation (abbreviated Std.Dev below) gives an
indication of the width of the distribution
Population Descriptive Statistics Mean: $71 000

14 Mean + 1Std.Dev
Salaries in Anytown = $127 000
12

Count
8

This is a ‘population’ because we 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280

Salary (x1000$) per year

have included the entire town
population. If we only considered a
small number of people in the town Mean - 1Std.Dev = $15 000
we would calculate statistics for a
sample, see later slide.
The Standard Deviation (abbreviated Std.Dev below) gives an
indication of the width of the distribution
Population Descriptive Statistics Mean: $71 000
18

14 Salaries
Mean in Anytown = $94 000
+ 1Std.Dev
12

10 If the width of the

Count
8 distribution is
smaller then the
6
Std.Dev is smaller
4

0
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
This is a ‘population’ because we Salary (x1000$) per year
have included the entire town
population. If we only considered a
small number of people in the town Mean - 1Std.Dev = $48 000
we would calculate statistics for a
sample, see next slide.
Standard Deviation (Sample)

Statisticians have discovered

that using the sample mean
underestimates the true
standard deviation, so n−1 is
used here rather than n.
Variance
• Variance is simply the square of the standard
deviation (population and sample variance is
shown below, respectively)

N
1
 
2

N
 (x  )
i 1
i
2

n
1
s 
2

n  1 i 1
( xi  x ) 2
Descriptive Data Analysis
1. Graphics Displays:
– Dot plots,
– Box and Whisker Plots
– Scatter plots,
– Frequency plots (histograms)
2. Summary statistics:
– mean,
– median,
– mode,
– standard deviation,
– range (and interquartile range)
Descriptive Graphics of Data
Box & Whisker Plot
Box & Whisker Plot
1350
1350

1300
1300

1250
1250

1200

1200
1150

1150
1100

1100
1050

1000 1050
Median
Mean
25%-75%
±SD
950 Min-Max
1000 ±1.96*SD
BELT1 BELT2 BELT3 BELT4
BELT1 BELT2 BELT3 BELT4

Median, IQR, Range Mean, SD, 95%CL

Correlations (ChMPIngotOxygen.sta 7v*47c)

BELT1

BELT2

BELT3

BELT4

Correlation Plot of Oxy. Measurements

Descriptive Graphics of Data
Box & Whisker Plot Box & Whisker Plot
1300 1300

1280 1280

1260
1260
1240
1240
1220
1220
1200
1200
1180

1160 1180

1140 1160

1120
1140
1100
1120
1080
Median = 1177.5 1100 Mean = 1175.2128
1060 ±SD
25%-75%
= (1147.5, 1217.5) 1080 = (1122.5375, 1227.888)
1040
Min-Max ±1.96*SD
1020 = (1042.5, 1285) 1060 = (1071.9693, 1278.4563)
MeanOxy MeanOxy

Median, IQR, Range Mean, SD, 95%CL

Histogram: MeanOxy Scatterplot (ChMPIngotOxygen.sta 7v*47c)

K-S d=.10137, p> .20; Lilliefors p> .20 MeanOxy = 1163.4459+0.4903*x
Shapiro-Wilk W=.98245, p=.69545 1300
25
1280

1260

1240
20
1220

1200

15 1180

1160

1140
MeanOxy

10
No. of obs.

1120

1100

1080
5
1060

1040

0 1020
1000 1050 1100 1150 1200 1250 1300 -10 0 10 20 30 40 50
X <= Category Boundary Seq

Frequency Distr. Of Mean Oxy. Scatter Plot of Mean Oxy. vs Seq. Num.
Summary Statistics (samples)
Measures of Location Measures of Spread
• Mean: 1 n
• Variance: 1 n

x   xi
s 
2
( xi  x ) 2
n 1 i
n i 1 • Standard Deviation s  Var  s 2
• Median: middle value when you • Range: R =max(x) – min(x)
order the values: • Interquartile Range:
x1  x2  x3 ...  xn
• Mode: position or positions of
maximum probability
– Not always easy to define in
experimental data • Full Width Half Max (FWHM)
• Remember: a statistic is any – Half width half max. (HWHM)
function of the data.
The first “Golden Rule” of Data
Analysis
• Study the data
– Use the descriptive methods above to get a ‘feel’ for the
data.
– Complicated data sets deserve several hours, days, or
even weeks of study.
– ‘Outliers’ must be carefully vetted.
– Data are not simply numbers but rather measurements or
counts of real entities.
– Tentative conclusions should be made in the contexts of
their meaning in relation to these entities, the real
background of the data, and how the data were collected.
• Valid conclusions are unlikely to be obtained from
poor data.
Summary
• For the descriptive analysis to be valid
– The data must be independent (exchangeable)
– Arbitrary inclusion of repeat measurements is
not allowed.
– The data cannot be correlated such that
exchangeability is violated.
• If these conditions aren’t met, the standard
interpretation of the results could lead to
very, very, wrong conclusions.
What Statistics can’t do
• Can’t get blood from a stone
– Can’t rescue you from bad data.
– Can’t get good results from poorly designed and
badly executed experiments.
– Can’t generate information where non exists.
• Statistics might help you understand and
quantify just how bad your results are from
your bad data.
• In other words, it is not a magic Mr. Fix-it in a
black box.
How well do we know the mean value
(i.e., arithmetic average) we determine
from our data?
The mean mark on the midterm was 18/30
The mean salary in Anytown, before Bill, was
$71 000.
? How many sig figs should we quote for these
mean values ?
? Want is the error in these mean values ?
As an example:
The goal is to determine the average (mean)
weight of students on campus
• You could weigh the entire population of students at Carleton
and then calculate their average weight, but this takes time,
and would cost a lot of money.
• Or, you can choose a sample of students, say 5 students,
weigh them and calculate an average.
• How far would this average be from the real average of the
total population of students?
• If you chose a second group of 5 students, how different
would the second group’s average weight be?
– You would expect that the average weight of each group of 5 students
would be ‘slightly’ different.

Each group of 5 students must be chosen randomly. Why?

The n5 Student Weighing Machine
Massing

10kg

350 kg
Standard Error in the mean value
 Start with any ‘arbitrary’
distribution that represents
your population
1. Take, for example, n=5
samples from the starting
distribution
2. Take the average of these n=5
samples
3. Put a ‘blue’ box in the lower
histogram showing the
average value calculated in 2.
4. Goto 1 until the histogram
doesn’t change much.
Standard Error in the mean value
Standard Error in the mean value

Now we are taking 10 000 samples and averaging each.

What if we choose n=20?
The super-duper n20 Student weighing machine
What if we choose n=20?
The super-duper n20 Student weighing machine
The n5 Weighing Machine versus the n20 Weighing Machine

Averages of 5 samples taken Averages of 20 samples taken

from the student population. from the student population.

Which machine is better?

Which machine is more precise?
Which machine is more accurate?
What number (statistic) would you use to Ans: Some measure
quantify the difference in these machines? of the width.
Precision and Accuracy
• High accuracy
(low bias)
• High precision
(low standard
deviation)
Precision and Accuracy
• Low accuracy
(large bias)
• High precision
(low standard
deviation)
Precision and Accuracy
• High accuracy
(low bias)
• Low precision
(high standard
deviation)
The n5 Weighing Machine versus the n20 Weighing Machine

• The standard error is related to the width of the sampled distribution. The
standard error is the standard deviation of the distribution of the means, i.e., the
‘sd’ in the lower plots.
• The lower plots will always be normal curves (Gaussian curves) regardless of the
parent population: this is the Central Limit theorem.
• The standard error indicates how well you know the mean.

Standard Error for the mean of x  sx 
n
You can do this at home:
• Go to:
http://onlinestatbook.com/rvls/index.html
• Go to the ‘Simulations/Demonstrations’ link
and then to ‘Central Limit Theorem’ and then
to ‘Sampling Distribution Simulation’.

Sampling Distribution Simulation

Standard Error
• The standard deviation of the means is called
the standard error
– Describes the variation of the means about the
estimated mean
– We usually do not know the true standard
deviation, σ, of the parent population, so
– The standard error is estimated as follows

 s
sx  
n n
Standard Error
• Thus, the standard error associated with the
estimated population mean is
  x  zc s x
where
zc  confidence level
e.g. z95  1.96 for 95% confidence

At 95% confidence, the true mean, μ, lies

within this ± interval.
So, what is the point of all of this?
• Almost all of our measurements are made with instruments
that give us ‘average values’.
• If you take many measurements and plot a histogram of these
values, then you will get a normal (Gaussian) distribution
• If we only do one N5, or one N20, measurement then we will
have an estimate of the true mean, based on a single sample.
• The single N5, or N20, measurement distribution will also have
a width, which we calculate as the standard deviation: ‘s’.
• From the standard deviation, s, we calculate the standard error
in the estimated mean:
sx  s / n
• We can never be 100% confident that the mean of the sample
is the ‘true mean’, but …
• We can give the sample mean value, which is our best estimate,
along with a  zc sx range, which is a confidence interval
 depending on the choice of zc.
Common Confidence Values
'true value' = measured mean value  zc measured
where
zc  confidence level
z90  1.645 for 90% confidence
z95  1.96 for 95% confidence (or 19 times out of 20)
z99  2.58 for 99% confidence

So, up to now, you simply put error bars on your

measurements. Now the size of the error bar depends
on how confident you are about the measurement.
Evolution of Reporting
Numbers
What you see on your calculator:
13.923
13.9 to 3 sig figs, because of the least
significant number in your calculation
13.9 ± 0.05, because of an assumed
error based on sig figs
13.9 ± 0.3, (2σ, 95% confidence)
Gaussian (normal) Distributions are
everywhere, so let’s look at them in
some detail
Gaussian Distribution
• Also called Normal
distribution
– Named after Carl
Fredrich Gauss (1777-
1855)
• Common distribution in
many engineering
applications
• Symmetric about a
central value
106
Continuous Form of the
Gaussian Distribution
Standard Normal Distribution
• To make the calculation easier, we create a
special table the eliminates the need to
perform a new integration every time
• Use a standard form, otherwise you need a
table for every new mean and standard
deviation
– Apply a change of variable:
Population vs. Sample

 ( x   )2 
 2 
1  2
 1  z 2 /2
f ( x)  e 
 f ( z)  e
 2 2

 ( x  x )2 
 2 
1  1  z 2 /2
f ( x)  e  2 s 
 f ( z)  e
s 2 2
Standard Normal Distribution
• In the new variable z, the mean is at z = 0
• We also set the standard deviation to one (1)
• We get the standard normal distribution:

1  z 2 /2
f ( z)  e
2

Called z-statistics
Standard Normal (Gaussian) Distribution
The total area under the
standard normal distribution is
unity, which means equal to 1.

z2 1  z 2 /2
AREA   e dz
z1
2

0 z1 z2
z-statistics:
use z scale
How do we know that Zc=1.96
corresponds to 95% of the area?
• We integrate the equation for the ‘standard’
Gaussian curve from -1.96 to +1.96 and we get
0.95
• Or, integrate from 0 to 1.96, and multiply by 2,
because the Gaussian curve is symmetric
about the mean.
z-Statistics
(half-areas)
0.4750 is the area under the
standard curve from zero,
which is the mean, to z=1.96

1
1.96 1   z 2


0
2
e 2
dz  0.475

Source: Introduction to Engineering Practice, 3rdEd., Hagen, 2008. Similar

tables are in Chapter 10 of your textbook. 0 1.96 (z-scale)
Normal (Gaussian) Bell Curve

Compare these numbers with the zc values:

z95  1.96 for 95% confidence (or 19 times out of 20)
z99  2.58 for 99% confidence
Normal (Gaussian) Distributions
14
σ
12
Section C
68% of the students
10
had marks in the
8
range:
Count

mean  
6

2
95% of the students
0 had marks in the
5 10 15 20 25

Mark out of 30 range:

2σ
mean  1.96
The Usual Values People Know

Most computer programs, like Excel for example, have

functions that give you the areas, so tables are ‘old fashion’.
Random Number Generation
• We can use a computer to generate “pseudo-random”
numbers
– Not actually random, but have approximately random
statistical properties
• Random numbers according to a distribution can be
generated:
• Normal (Gaussian)
• Lognormal
• Poisson
• Binomial
• Uniform
• ChiSq , etc.

118
Estimating with Small Samples

Introduction to t-Statistics

119
Small Samples
• Small samples are those with n< 30 elements
• Our z-statistics are no longer accurate
• Must use t-statistics
– The sample variance is weighted
– Result is a different distribution from the normal
distribution, but with a similar shape
– Appropriate for small samples
Student t-Distribution
• The distribution for
small samples is
generally called the
student t-distribution
– Based on a weighting
described by W. S.
Gosset (1876-1937)
– He was interested in
beer (yes, beer)
– He worked for a beer
company (Guiness)
Student t-Distribution
• Because of variability in the
ingredients of beer, samples
that come from the same
population are generally
small
• Important for Quality
Control
• Gosset’s company did not
allow him to publish, so he
did so under the pseudonym
“Student”
Student t-Distribution
• Gosset showed that small samples
DOF= 1
taken from an essentially normal
DOF= 2
population have a wider confidence DOF= 3
interval than we would predict with DOF=
∞
z-statistics.
• For small samples, we must use
t-statistics.
• The t-distribution is shorter and
fatter than the normal distribution,
but when DOF=∞ (i.e., for large n) DOF    n  1
the t-distribution becomes a normal = Degrees of Freedom
(Gaussian) distribution
Student t-Distribution

x  x  t ,c sx
where
t ,c  t-statistic for  degrees of freedom
  n  1  DOF = Degrees of Freedom
t-Statistics

See the EXCEL functions:

TDIST & TINV

125
There are ‘calculators’ on the web. See, for example:

http://www.tutor-homework.com/statistics_tables/statistics_tables.html

126
Example
• Consider a sample of eighteen batteries from the
entire population of batteries (each battery
should be about 9 V)
• In order to establish the true mean of the
voltages, you measure the voltage across each
battery with a voltmeter
• Using the collected data, calculate:
– Mean and standard deviation
– 95% and 99% confidence interval estimates
– Population mean with 90% confidence
Voltage
Meas. # (V)
7

1 6.51
x  9.1498 6
9-Volt Battery Histogram

2 8.45 5

3 11.76 4

Count
4 8.36 3

5 9.35 2

6 9.23 1

7 7.85
0

8 8.59 Mean 6 7 8 9 10 11 12 13

Battery Voltage (V)

9 9.05
1 n 1
10
11
8.08
10.59
x   xi  ( x1  x2  x3  xn )
12 9.87
n i 1 n
13 8.04
1
14 8.38  (6.51  8.45  11.76 7.27)
15 10.01 18
16 12.84
17 10.48
 9.1498
18 7.27  Do not round yet!
Voltage
Meas. # (V)
7

1 6.51
x  9.1498 6
9-Volt Battery Histogram

2 8.45
s  1.5776
5

3 11.76 4

Count
4 8.36 3

5 9.35 2

6 9.23 1

7 7.85
0

8 8.59 6 7 8 9 10 11 12 13

Battery Voltage (V)

9 9.05 n
1
10
11
8.08
10.59
s  2

n 1 i
( xi  x ) 2
Sample Standard Variance

12 9.87
13 8.04 1
 [(6.51  9.1498)2  (8.45  9.1498)2  (7.27  9.1498)2 ]
14 8.38 18  1
15 10.01  2.4888
16 12.84
17 10.48
18 7.27  Do not round yet!
Voltage
Meas. # (V)
7

1 6.51
x  9.1498 6
9-Volt Battery Histogram

2 8.45
s  1.5776
5

3 11.76 4

Count
4 8.36
sx  0.3718
3

5 9.35 2

6 9.23 1

7 7.85
0

8 8.59 6 7 8 9 10 11 12 13

Battery Voltage (V)

9 9.05
10 8.08 Sample Standard Error
11 10.59
12 9.87
s 1.5776
13
14
8.04
8.38
sx    0.3718
15 10.01 n 18
16 12.84
17 10.48
18 7.27  Do not round yet!
t-Statistics

x  x  t17,95 sx
where
t17,95  2.110 = Student's 95% confidence for 17 degrees of freedom
Voltage
Meas. # (V)
7

1 6.51
x  9.1498 6
9-Volt Battery Histogram

2 8.45
s  1.5776
5

3 11.76 4

Count
4 8.36
sx  0.3718
3

5 9.35 2

6 9.23 1

7
8
7.85
8.59
x  x  t17,c sx 0
6 7 8 9 10 11 12 13

Battery Voltage (V)

9 9.05
10 8.08 x  9.1498  (2.110  0.3718)
11
12
10.59
9.87
 9.1498  0.7845
13
14
8.04
8.38
 9.15  0.8  95% Confidence
15 10.01 x  9.1498  (2.898  0.3718)
16 12.84
17 10.48
 9.1498  1.076
18 7.27
 9.1  1  99% Confidence
We were asked for 90% confidence
estimate of the Population Mean
• What will be the difference in the calculations
between what was just done, and what we are
now being asked to do?
• The previous estimates were also for the
population mean.
• The only difference is we use x  x  t17,90 sx

1.740
Comparison of z-statistics
with t-statistics
• On the next slide, note that as the sample size
nears 30, one can see how the results
obtained using t-statistics get closer to those
from z-statistics
• Using z-statistics overestimates confidence
when the sample size is small:
• the error bars are too small if you use z-statistics for
samples of less than 30.
t-Statistics

x  x  t17,95 sx
where
t17,95  2.110 = Student's confidence interval for 17 degrees of freedom
Compare this with z95  1.96, which is for 95% confidence for large samples
Estimating Proportional
Mean Values

136
Tossing a Coin
• P(H) = P(T) = 0.5 (we know this is true for a
‘fair coin’)
• Try 100 tosses and find:
– 58/100 are heads in one trial
– 45/100 are heads in another trial
• We call these ratios proportions

137
Population Mean for Proportions
• Estimate the proportion p by sampling
p (1  p )
p  zc
n

np  5
Must be satisfied to
n(1  p )  5 use this method

• Use z-statistics for n > 30 and t-statistics for smaller n

Proportion Example
• Suppose we take n = 150 coin flips
• 80 of the 150 are ‘heads’
• Then p = 80/150 is the sample proportion of heads
• p = 80/150 = 53%
What error do we put on p?

In other words: given that we are sampling, how

well do we think we know the ‘true’ proportion of
heads?
Example
• For our coin-flipping example, find the 90%
confidence interval for the proportion p
np  150  80 /150  80  5 Check
n(1  p )  150  (1  80 /150)  70  5
z90  1.645 (see next slide)

p (1  p)
p  zc  80 /150  1.645 (80 /150)  (70 /150) /150
n
 0.53  0.07 From this measurement, we cannot say the coin is unfair.
1.645 corresponds to a half
area of 0.4500. This means
that the total area under the
curve from –z to +z is twice
this number, which is 0.90.
So, z90=1.645
Polling (e.g., in Politics)
• A poll is conducted with a sample size of
n = 200 to test the support for candidate A
• The sample suggests that A will receive 32% of
the votes
• Estimate the population proportion, with 95%
confidence
p (1  p )
p  zc  0.32  1.960 0.32  0.68 / 200
n
Confident to within six percentage
 0.32  0.06 points, 19 times out of 20

np  200  0.3  60  5
Check
n(1  p )  200  0.7  140  5
Manufacturing
• In a manufacturing operation, 200 parts were
examined and 8/200 were found to be defective
• Estimate the population proportion, with 95%
confidence
p (1  p )
p  zc  0.040  1.960 0.04  0.96 / 200
n
 0.04  0.027  0.04  0.03

np  200  0.04  8  5
Check
n(1  p )  200  0.96  192  5
• For every 100 parts made, between about 1 and
7 parts will be defective 95% of the time

Math236 Lecture 2
No ratings yet
Math236 Lecture 2
64 pages
Unit 01 - Describing Data and Its Distributions - 1 Per Page
No ratings yet
Unit 01 - Describing Data and Its Distributions - 1 Per Page
79 pages
Univariate Statistics w24 Update
No ratings yet
Univariate Statistics w24 Update
144 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Define Statistics
No ratings yet
Define Statistics
89 pages
Techniques in Geog 1 Complete
No ratings yet
Techniques in Geog 1 Complete
153 pages
Biostatistics Unit 3 Measures of Statistics - Central Tendency
No ratings yet
Biostatistics Unit 3 Measures of Statistics - Central Tendency
57 pages
Freq. Distribution Characteristics
No ratings yet
Freq. Distribution Characteristics
13 pages
Descriptive Statsistics
No ratings yet
Descriptive Statsistics
34 pages
Statistic Analysis
No ratings yet
Statistic Analysis
20 pages
Lecture 6
No ratings yet
Lecture 6
84 pages
Statistics Theory
No ratings yet
Statistics Theory
3 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Unit Iii
No ratings yet
Unit Iii
152 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
ASTM D 3330 (Standard Test Method For Peel Adhesion of Pressure
100% (2)
ASTM D 3330 (Standard Test Method For Peel Adhesion of Pressure
6 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
02 - ASDM Workbook Part 1
No ratings yet
02 - ASDM Workbook Part 1
71 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Statistics
No ratings yet
Statistics
81 pages
Basic Statistics
No ratings yet
Basic Statistics
24 pages
3.describing Data
No ratings yet
3.describing Data
35 pages
The Idiomatic Programmer - Statistics Primer
No ratings yet
The Idiomatic Programmer - Statistics Primer
44 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Introduction To Statistics and Application in Engineering Analysis
No ratings yet
Introduction To Statistics and Application in Engineering Analysis
31 pages
Notes Chapter 1
No ratings yet
Notes Chapter 1
52 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
83 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Introduction To Statistics Lecture 7
No ratings yet
Introduction To Statistics Lecture 7
32 pages
Statistics 1
No ratings yet
Statistics 1
10 pages
Statistics
100% (6)
Statistics
211 pages
Statistics
No ratings yet
Statistics
12 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Data Description Analysis
No ratings yet
Data Description Analysis
40 pages
$RELC031
No ratings yet
$RELC031
43 pages
Session 1 ISM May 2024
No ratings yet
Session 1 ISM May 2024
59 pages
CE 459 Statistics: Assistant Prof. Muhammet Vefa AKPINAR
No ratings yet
CE 459 Statistics: Assistant Prof. Muhammet Vefa AKPINAR
211 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Mean Median Mode
0% (1)
Mean Median Mode
10 pages
Ch13 Sampling
67% (3)
Ch13 Sampling
17 pages
Assignment
No ratings yet
Assignment
23 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Assignment
No ratings yet
Assignment
30 pages
Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in
No ratings yet
Business Statistics: Shalabh Singh Room No: 231 Shalabhsingh@iim Raipur - Ac.in
58 pages
Data Management
100% (1)
Data Management
51 pages
Basics For Understanding
No ratings yet
Basics For Understanding
8 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Six Sigma
100% (2)
Six Sigma
62 pages
Statistics Firstfive
No ratings yet
Statistics Firstfive
43 pages
Asset Management and Risk Management DNV
No ratings yet
Asset Management and Risk Management DNV
14 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
Managing For Quality and Performance Excellence 10th Edition Evans Solutions Manual Download
100% (21)
Managing For Quality and Performance Excellence 10th Edition Evans Solutions Manual Download
43 pages
ECOR 1010: Engineering Graphics - 1
No ratings yet
ECOR 1010: Engineering Graphics - 1
57 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Statistical Modeling and Computation Full Access Download
No ratings yet
Statistical Modeling and Computation Full Access Download
16 pages
Professional Engineering: ECOR 1010
No ratings yet
Professional Engineering: ECOR 1010
62 pages
Lectures 16 17 18 Linear Systems and MATLAB 2016 Handouts
No ratings yet
Lectures 16 17 18 Linear Systems and MATLAB 2016 Handouts
111 pages
Assignment
75% (4)
Assignment
13 pages
Lecture 8 - Introduction To Design - 2017
No ratings yet
Lecture 8 - Introduction To Design - 2017
52 pages
Absolutelycomplete
No ratings yet
Absolutelycomplete
27 pages
tmp1413 TMP
No ratings yet
tmp1413 TMP
25 pages
Capability Ratios Vary
No ratings yet
Capability Ratios Vary
10 pages
Lecture 2 - Intro To Engineering 2016 PDF
No ratings yet
Lecture 2 - Intro To Engineering 2016 PDF
43 pages
PH.D Thesis M. Usha
100% (1)
PH.D Thesis M. Usha
185 pages
Optimization of The Process of Drying of Corn Seeds With The Use of Microwaves
No ratings yet
Optimization of The Process of Drying of Corn Seeds With The Use of Microwaves
10 pages
Indigenous HIV:AIDS Annotated Bibliography 2-2024 2
No ratings yet
Indigenous HIV:AIDS Annotated Bibliography 2-2024 2
213 pages
Lecture 6 - Engineering Graphics 2017 - 2
No ratings yet
Lecture 6 - Engineering Graphics 2017 - 2
45 pages
Development of Checklist For Evaluating Sustainability Characteristics of Manufacturing Processes
No ratings yet
Development of Checklist For Evaluating Sustainability Characteristics of Manufacturing Processes
21 pages
Understanding Bland-Altman Analyses
No ratings yet
Understanding Bland-Altman Analyses
11 pages
Safety in European Gas Transmission Pipelines Egig: European Gas Pipeline Incident Data Group
No ratings yet
Safety in European Gas Transmission Pipelines Egig: European Gas Pipeline Incident Data Group
21 pages
Vocal Function Exercises For Presbylaryngis: A Multidimensional Assessment of Treatment Outcomes
No ratings yet
Vocal Function Exercises For Presbylaryngis: A Multidimensional Assessment of Treatment Outcomes
10 pages
Lecture 7 - Engineering Graphics 2017 - 3
No ratings yet
Lecture 7 - Engineering Graphics 2017 - 3
52 pages
Lecture 19 Maple 2017
No ratings yet
Lecture 19 Maple 2017
101 pages
Discussion and Criticism of Cartesian Epistemology
No ratings yet
Discussion and Criticism of Cartesian Epistemology
65 pages
Bios Tat
No ratings yet
Bios Tat
20 pages
Lecture Week 5 - Confidence Intervals Hypothesis Testing and Pvalues
No ratings yet
Lecture Week 5 - Confidence Intervals Hypothesis Testing and Pvalues
49 pages
Statistics 2593 Review
No ratings yet
Statistics 2593 Review
6 pages
CME 106 - Statistics Cheatsheet
No ratings yet
CME 106 - Statistics Cheatsheet
13 pages
HO4 Estimation
No ratings yet
HO4 Estimation
9 pages
Calculator Instructions - Casio PDF
No ratings yet
Calculator Instructions - Casio PDF
14 pages
Robust Bayesian Allocation
No ratings yet
Robust Bayesian Allocation
18 pages
Mantel Haenszel Test
No ratings yet
Mantel Haenszel Test
11 pages
An Application of RP-SP Data For Joint Estimation of Mode Choice Models
No ratings yet
An Application of RP-SP Data For Joint Estimation of Mode Choice Models
22 pages
Hazard Ratio.
No ratings yet
Hazard Ratio.
13 pages
7 Assessing The Motor Component of The Gcs Scoring System
No ratings yet
7 Assessing The Motor Component of The Gcs Scoring System
11 pages
Spss 1. Uji Normalitas Data: One-Sample Kolmogorov-Smirnov Test
No ratings yet
Spss 1. Uji Normalitas Data: One-Sample Kolmogorov-Smirnov Test
4 pages
Ashrafian, Hutan Sunzi Surgical Philosophy Concepts of Modern Surgery Paralleled To Sun Tzus Art of War
No ratings yet
Ashrafian, Hutan Sunzi Surgical Philosophy Concepts of Modern Surgery Paralleled To Sun Tzus Art of War
4 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)

Lectures 11 12 13 - Engineering Statistics 2017 - Handouts

Uploaded by

Lectures 11 12 13 - Engineering Statistics 2017 - Handouts

Uploaded by

ECOR1010

 i is the sample sum of squares

R  max( x)  min( x), the sample range

See also: onlinestatbook.com/rvls/index.html

• The appearance of histograms is quite dependent on

Mark out of 30 Mark out of 30

says: “My town is 14 Salaries in Anytown

• The leader of the 6

opposition replies: “But 4

half of our citizens make 0

Salary (x1000$) per year

town ‘for an hour’ 12

wealthy: the average 2

salary is $86 000.”

Salary (x1000$) per year

• The leader of the 50

opposition replies: “But Salaries in Anytown

half of our citizens make after Bill moves in

less that $56 000.” Count

people make $50 000” 0

Salary (x1000$) per year

– how wide is the distribution?

Three ways of indicating with a number how far xi

Salary (x1000$) per year

10 If the width of the

Statisticians have discovered

Median, IQR, Range Mean, SD, 95%CL

Correlations (ChMPIngotOxygen.sta 7v*47c)

Correlation Plot of Oxy. Measurements

Median, IQR, Range Mean, SD, 95%CL

Histogram: MeanOxy Scatterplot (ChMPIngotOxygen.sta 7v*47c)

Each group of 5 students must be chosen randomly. Why?

Now we are taking 10 000 samples and averaging each.

Averages of 5 samples taken Averages of 20 samples taken

Which machine is better?

Sampling Distribution Simulation

At 95% confidence, the true mean, μ, lies

So, up to now, you simply put error bars on your

Source: Introduction to Engineering Practice, 3rdEd., Hagen, 2008. Similar

Compare these numbers with the zc values:

Mark out of 30 range:

Most computer programs, like Excel for example, have

See the EXCEL functions:

Battery Voltage (V)

Battery Voltage (V)

Battery Voltage (V)

Battery Voltage (V)

• Use z-statistics for n > 30 and t-statistics for smaller n

In other words: given that we are sampling, how

You might also like