STI - 03 - Data Presentation & Parameter
STI - 03 - Data Presentation & Parameter
Other summary
measures:
Skewness
Kurtosis
Measures of Central Tendency
or Location
Mean Average
Example – Median (Data from Example 1)
Sales Sorted Sales
9 6
6 9
12 10 Median
10 12 50th Percentile
13 13
15 14
16 14
14 15 (20+1)50/100=10.5 16 + (.5)(0) = 16
14 16
Median
16 16
17 16
16 17 The median is the middle
24 17
21 18 value of data sorted in
22 18
18 19 order of magnitude. It is
19 20
18 21 the 50th percentile.
20 22
17 24
Example – Mode (Data from Example 1)
.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
Mode = 16
x
12
10
317
13
15
x = i =1 = = 15 .85
16 n 20
14
14
16
17
16
24
21
22
18
19
18
20
17
317
Measures of Variability or Dispersion
Range
Difference between maximum and
minimum values
Interquartile Range
Difference between third and first
quartile (Q3 - Q1)
Variance
Average*of the squared deviations from
the mean
Standard Deviation
Square root of the variance
Definitions of population variance and sample variance differ slightly.
Example - Range and Interquartile Range
(Data from Example 1)
Sorted
Sales Sales Rank Range
9 6 1 Minimum Maximum - Minimum =
6 9 2
12 10 3 24 - 6 = 18
10 12 4
13 13 5
15 14 6 First Quartile Q1 = 13 + (.25)(1) = 13.25
16 14 7
14 15 8
14 16 9 Interquartile Range
16 16 10
17 16 11 Q3 - Q1 =
16 17 12 18.75 - 13.25 = 5.5
24 17 13
21 18 14
22 18 15
18 19 16 Third Quartile Q3 = 18+ (.75)(1) = 18.75
19 20 17
18 21 18
20 22 19
17 24 20 Maximum
Variance and Standard Deviation
Population Variance Sample Variance
(x - x)
n
N 2
(x - m) 2
s =
2 i =1
s 2 = i =1
N
(n - 1)
( x) ( )
2 2
N n
x
i =1
N
x -
n
x - 2 i =1 2
= i =1 N = i =1
n
N (n - 1)
s= s 2
s= s 2
Calculation of Sample Variance
x x-x (x - x) 2
n
(x - x)
2
x 2
378.55
6 -9.85 97.0225 36 s =
2 i =1
=
9 -6.85 46.9225 81 (n - 1) (20 - 1)
10 -5.85 34.2225 100
378.55
12
13
-3.85
-2.85
14.8225
8.1225
144
169
= = 19.923684
19
14 -1.85 3.4225 196
n x
2
14 -1.85 3.4225 196
15 -0.85 0.7225 225 n i =1
16 0.15 0.0225 256 x - 2
Mutually exclusive
Not overlapping - every observation is assigned to only one group
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)
First or last group may be open-ended
Frequency Distribution
Table with two columns listing:
Each and every group or class or interval of values
Associated frequency of each group
Number of observations assigned to each group
Sum of frequencies is number of observations
N for population
n for sample
each class
Sum of relative frequencies = 1
Example 2: Frequency Distribution
x f(x) f(x)/n
Spending Class ($) Frequency Relative Frequency
(number of customers)
184 1.000
Chebyshev’s Theorem
Applies to any distribution, regardless of shape
Places lower limits on the percentages of observations
within a given number of standard deviations from the
mean
Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions
Specifies approximate percentages of observations
within a given number of standard deviations from the
mean
Chebyshev’s Theorem
At least 1 - of the elements of any distribution
1
lie within kstandard
k deviations of the mean
2
1 1 3
1- 2 = 1 - = = 75% 2
2 4 4
Standard
At 1 1 8 Lie deviations
least 1 - 2 = 1 - = = 89% within 3 of the mean
3 9 9
1 1 15
1- 2 = 1- = = 94% 4
4 16 16
Empirical Rule
For roughly mound-shaped and symmetric
distributions, approximately:
12
Average Revenues
10
Average Expenses
0.3 1.0
0.2
0.5
0.1
0.0 0.0
0 10 20 30 40 50 0 10 20 30 40 50
Sales Sales
Time Plot
M o n thly S te e l P ro d uc tio n
(P ro b le m 1 -4 6 )
8.5
Millions of Tons
7.5
6.5
5.5
Month J F M A M J J A S ON D J F M A M J J A S ON D J F M A M J J A S O
Exploratory Data Analysis - EDA
Techniques to determine relationships and trends,
identify outliers and influential observations, and quickly
describe or summarize data sets.
Stem-and-Leaf Displays
Quick-and-dirty listing of all observations
Conveys some of the same information as a
histogram
Box Plots
Median
Lower and upper quartiles
Maximum and minimum
Box Plot
Smallest Elements of a Box Plot Largest data
data point point not
not below exceeding Suspected
Outlier inner fence inner fence outlier
o X X *
Inner Q1 Median Q3
Outer Inner Outer
Fence Fence Fence Fence
Q1-1.5(IQR) Interquartile Range Q3+1.5(IQR)
Q1-3(IQR)
Q3+3(IQR)
Example: Box Plot
Box Plot to Compare Two Data Sets
THANK YOU