Statistics From PLTW
Statistics From PLTW
Statistics From PLTW
Statistics
The collection, evaluation, and interpretation of data
Statistical analysis of measurements can help verify the quality of a set of measurements.
Summary Statistics
Central Tendency
Center of a distribution
Mean, median, mode
Standard Deviation
Variation
Measure of data variation The standard deviation is a measure of the spread of data values
A larger standard deviation indicates a wider spread in data values
Standard Deviation
Variation
xi N
= standard deviation xi = individual data value ( x1, x2, x3, ) = mean N = size of population
Standard Deviation
Variation 2
Procedure xi = N 1. Calculate the mean, 2. Subtract the mean from each value and then square each difference 3. Sum all squared differences 4. Divide the summation by the size of the population (number of data values), N 5. Calculate the square root of the result
Standard Deviation
Calculate the standard deviation for the data array
xi N
(59 - 47.63)2 = (60 - 47.63)2 = (62 - 47.63)2 = (63 - 47.63)2 = (63 - 47.63)2 =
Standard Deviation
Variation
3. Sum all squared differences 2 2082.6777 + 1817.8595 + 0.1322 + 1.8595 + 54.2231 + xi = 107.4050 + 129.1322 + 152.8595 + 206.3140
+ 236.0413 + 236.0413
= 5,024.5455
4. Divide the summation by the number of data values 2 xi 5024.5455 = = 456.7769 N 11 5. Calculate the square root of the result xi N 2 = 456.7769 = 21.4
Histogram
Distribution
Frequency
A histogram is a common data distribution chart that is used to show the frequency with which specific values, or values within ranges, occur in a set of data. A scientist might use a histogram to show the variation of a measurement that exists when an experiment is repeated. 5
4 3 2 1 0
0.745 0.746 0.747 0.748 0.749 0.750 0.751 0.752 0.753 0.754 0.755 0.756 0.757 0.758 0.759 0.760
Length (in.)
Histogram
Distribution
Large sets of data are often divided into a limited number of groups. These groups are called class intervals.
-16 to -6
-5 to 5
6 to 16
Class Intervals
Histogram
Distribution
The number of data elements in each class interval is shown by the frequency, which is indicated along the Y-axis of the graph.
Frequency
7 5 3 1
-16 to -6
-5 to 5
6 to 16
Histogram
Example
Distribution
2
1
1 to 5
6 to 10
11 to 15
0.5
5.5
10.5
15.5
Histogram
Distribution
The height of each bar in the chart indicates the number of data elements, or frequency of occurrence, within each range.
1, 4, 5, 7, 8, 8, 10,12,15
Frequency
4 3
2
1
1 to 5
6 to 10
11 to 15
Histogram
5 4 Frequency 3 2 1 0
Distribution
0.7495 < x 0.7505
Length (in.)
Research and Statistics Often we do not have information on the entire population of interest Population versus sample
Population = all members of a group Sample = part of a population
Inferential statistics involves estimating, forecasting or predicting the odds of an outcome based on an incomplete set of data
use sample statistics
= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population
s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample
Variation
Procedure: s= 1. Calculate the sample mean, x. 2. Subtract the mean from each value and then square each difference. 3. Sum all squared differences. 4. Divide the summation by the number of data values minus one, n - 1. 5. Calculate the square root of the result.
xi x n 1
Sample Mean
Central Tendency
xi x= n
x = sample mean
xi = individual data value
Estimate the standard deviation for a population for which the following data is a sample.
s=
xi x n1
(59 - 47.63)2 = (60 - 47.63)2 = (62 - 47.63)2 = (63 - 47.63)2 = (63 - 47.63)2 =
Variation
= 5,024.5455
4. Divide the summation by the number of sample data values minus one. 2 xi x 5024.5455 = = 502.4545 n1 10 5. Calculate the square root of the result. xi x n1 2 = 502.4545 = 22.4
= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population
s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample
As n N, s
So for very large numbers of measurements, s
= population standard deviation xi = individual data value ( x1, x2, x3, ) = population mean N = size of population
n = size of sample
s = sample standard deviation xi = individual data value ( x1, x2, x3, ) x = sample mean n = size of sample
Probability Distribution
Distribution
A distribution of all possible values of a variable with an indication of the likelihood that each will occur A probability distribution can be represented by a probability density function
Normal Distribution most commonly used probability distribution
http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
Normal Distribution
Is the data distribution normal?
Distribution
Normal Distribution
Distribution
Frequency
-6 -5
-4
-3
-2
-1
Data Elements
Normal Distribution
Does the greatest frequency of the data values occur at about the mean value?
Mean Value
Distribution
Frequency
-6 -5
-4
-3
-2
-1
Data Elements
Normal Distribution
Does the curve decrease on both sides away from the mean?
Mean Value
Distribution
Frequency
-6 -5
-4
-3
-2
-1
Data Elements
Normal Distribution
Is the curve symmetric about the mean?
Mean Value
Distribution
Frequency
-6 -5
-4
-3
-2
-1
Data Elements
Empirical Rule
If the data are normally distributed:
68% of the observations fall within 1 standard deviation of the mean. 95% of the observations fall within 2 standard deviations of the mean. 99.7% of the observations fall within 3 standard deviations of the mean.
Mean = x = 0.08
Standard Deviation = s = 1.77 (sample)
Normal Distribution
0.08 + - 1.77 = -1.69 0.08 + 1.77 = 1.88
68 %
Data Elements
Normal Distribution
0.08 + -3.54 = - 3.46 0.08 + 3.54 = 3.62
95 %
2s - 3.54
2s + 3.54
Your Turn
Revisit the data you collected during the Fling Machine Instant Challenge.
Assume that you repeated launch cotton balls with your device. Using the mean and sample standard deviation of your data:
Predict the range of travel distances within which 68% of cotton balls would fall
Predict the range of travel distances within which 95% of cotton balls would fall
Example
Assume that a statistical analysis resulted in the following:
Mean = x = 2.35 ft.
Sample standard deviation = s = 0.76 ft
Predict the range of travel distances within which 68% of cotton balls would fall x s : 2.35 - 0.76 = 1.59 ft 2.35 + 0.76 = 3.26 ft Prediction: Approximately 68% of the launches will result in a travel distance between 1.59 ft and 3.26 ft.
Example
Assume that a statistical analysis resulted in the following:
Mean = x = 2.35 ft.
Sample standard deviation = s = 0.76 ft
Predict the range of travel distances within which 95% of cotton balls would fall x 2s : 2.35 2(0.76) = 0.83 ft 2.35 + 2(0.76) = 3.87ft Prediction: Approximately 95% of the launches will result in a travel distance between 0.83 ft and 3.86 ft.
Uncertainty in Measurements
Scientists and engineers often use significant digits to indicate the uncertainty of a measurement
A measurement is recorded such that all certain digits are reported and one uncertain (estimated) digit is reported
Uncertainty in Measurements
Another (more definitive) method to indicate uncertainty is to use plus/minus notation.
THIS IS THE FORMAT YOU WILL USE IN COLLEGE
IF YOU WANT TO ADOPT IT SOONER BE MY GUEST
Example: 3.84 .05 cm 3.79 true value 3.89 This means that we are certain the true measurement lies between 1.19 cm and 1.29 cm
Uncertainty in Measurement
In some cases the uncertainty from a digital or analog instrument is greater than indicated by the scale or reading display
Resolution of the instrument is better than the accuracy
Example: Speedometers
How can we determine, with confidence, how close a measurement is to the true value?
Uncertainty in Measurement
Uncertainty of single measurement
How close is this measurement to the true value? Uncertainty dependent on instrument and scale
Precision is dependent on the capabilities of the measuring device and its use
Reproducibility Poor precision is associated with random error
Your Turn
Two students each measure the length of a credit card four times. Student A measures with a plastic ruler, and student B measures with a precision measuring instrument called a micrometer.
Student A 85.1mm 85.0 mm 85.2 mm 84.9 mm Student B 85.701 mm 85.698 mm 85.699 mm 85.701 mm
Your Turn
Plot Student As data on a number line Plot Student Bs data on a number line
Student A 85.1mm 85.0 mm 85.2 mm 85.1 mm Student B 85.301 mm 85.298 mm 85.299 mm 85.301 mm
Your Turn
Student As data ranges from 85.0 mm to 85.2 mm Student Bs data ranges from 85.298 mm to 85.301 mm The accepted length of the credit card is 85.105 mm
Accepted Value
85.105
Your Turn
Which students data is more accurate?
Student A
Quantifying Accuracy
The accuracy of a measurement is related to the error between the measurement value and the accepted value Error = measured values accepted value
Student A 85.1mm 85.0 mm 85.2 mm 85.1 mm Student B 85.301 mm 85.298 mm 85.299 mm 85.301 mm
Quantifying Accuracy
Calculate the error of Student As measurements
Error A = mean of measured values accepted value Error A = 85.10 mm 85.105 mm = 0.005 mm xA = 85.10
Accepted 85.105 Value
Error - 0.005
Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error 0.1948
xB= 85.2998
Error - 0.005
Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error |0.1948| 0.1948 = 0.1948
xB= 85.2998
Quantifying Accuracy
Calculate the error of Student Bs measurements
Error B = mean of measured values accepted value Error B = 85.2998 mm 85.105 mm = 0.1948 mm xA = 85.10
Error |0.1948| 0.1948 = 0.1948
xB= 85.2998
Quantifying Precision
Precision is related to the variation in measurement data due to random errors that produce differing values when a measurement is repeated
Quantifying Precision
The precision of a measurement device can be related to the standard deviation of repeated measurement data
Quantifying Precision
Use the empirical rule to express precision
True value is within one standard deviation of the mean with 68% confidence True value is within two standard deviations of the mean with 95% confidence
Quantifying Precision
Express the precision indicated by Student As data at the 68% confidence level
True value is 85.10 0.08 mm with 68% confidence
85.10 0.08 mm true value 85.10 + 0.08 mm
Quantifying Precision
Express the precision indicated by Student As data at the 95% confidence level
True value is 85.10 2(0.08) mm with 95% confidence
85.10 0.16 mm true value 85.10 + 0.16 mm
B
High Accuracy High Precision
C
Low Accuracy Low Precision
D
High Accuracy Low Precision