0% found this document useful (0 votes)
46 views47 pages

STI - 03 - Data Presentation & Parameter

The document discusses percentiles, quartiles, and other statistical measures used to summarize data. It provides an example dataset of sales data for 20 salespeople. It then calculates the 50th, 80th, and 90th percentiles for this dataset. The 50th percentile, or median, is 16. The document also defines quartiles as the 25th, 50th, and 75th percentiles, which divide the data into quarters. In the sales data example, the first, second, and third quartiles are calculated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views47 pages

STI - 03 - Data Presentation & Parameter

The document discusses percentiles, quartiles, and other statistical measures used to summarize data. It provides an example dataset of sales data for 20 salespeople. It then calculates the 50th, 80th, and 90th percentiles for this dataset. The 50th percentile, or median, is 16. The document also defines quartiles as the 25th, 50th, and 75th percentiles, which divide the data into quarters. In the sales data example, the first, second, and third quartiles are calculated.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Statistik Industri

Data Presentation & Parameter


Percentiles and Quartiles
 Given any set of numerical observations, order them according
to magnitude.
 The Pth percentile in the ordered set is that value below which
lie P% (P percent) of the observations in the set.
 The position of the Pth percentile is given by (n + 1)P/100,
where n is the number of observations in the set.
Example 1
Sales and Sorted Sales
A large department store collects data on sales made by each of
its salespeople. The number of sales made on a given day by
each of 20 salespeople is shown on the next slide. Also, the data
has been sorted in magnitude.
Example 1 Sales Sorted Sales
Sales and Sorted Sales 9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
Example 1
Percentiles
 Find the 50th, 80th, and the 90th percentiles of this
data set.
Example 1
Percentiles

 To find the 50th percentile, determine the data


point in position (n + 1)P/100 =
(20 + 1)(50/100) = 10.5.
 Thus, the percentile is located at the 10.5th position.
 The 10th observation is 16, and the 11th
observation is also 16.
 The 50th percentile will lie halfway between the
10th and 11th values and is thus 16.
Example 1
Percentiles

 To find the 90th percentile, determine the data point


in position (n + 1)P/100 =
(20 + 1)(90/100) = 18.9.
 Thus, the percentile is located at the 18.9th position.
 The 18th observation is 21, and the 19th observation
is also 22.
 The 90th percentile is a point lying 0.9 of the way
from 21 to 22 and is thus 21.9.
Quartiles – Special Percentiles

 Quartiles are the percentage points that break down


the ordered data set into quarters.
 The first quartile is the 25th percentile. It is the point
below which lie 1/4 of the data.
 The second quartile is the 50th percentile. It is the
point below which lie 1/2 of the data. This is also
called the median.
 The third quartile is the 75th percentile. It is the
point below which lie 3/4 of the data.
Quartiles and Interquartile Range
 The first quartile, Q1, (25th percentile) is
often called the lower quartile.
 The second quartile, Q2, (50
th

percentile) is often called median or the


middle quartile.
 The third quartile, Q3, (75th percentile)

is often called the upper quartile.


 The interquartile range is the difference

between the first and the third quartiles.


Example 1: Finding Quartiles
(n+1)P/100 Quartiles
Sorted
Sales Sales Position
9 6
6 9
12 10
10 12
13 13 First Quartile (20+1)25/100=5.25 13 + (.25)(1) = 13.25
15 14
16 14
14 15
14 16
16 16 Median (20+1)50/100=10.5 16 + (.5)(0) = 16
17 16
16 17
24 17
21 18
22 18 Third Quartile (20+1)75/100=15.75 18+ (.75)(1) = 18.75
18 19
19 20
18 21
20 22
17 24
Summary Measures: Population
Parameters Sample Statistics
 Measures of Central  Measures of Variability
Tendency Range
Median Interquartile range
Mode Variance
Mean Standard Deviation

 Other summary
measures:
 Skewness
 Kurtosis
Measures of Central Tendency
or Location

Median  Middle value when


sorted in order of
magnitude
 50th percentile

Mode  Most frequently-


occurring value

Mean  Average
Example – Median (Data from Example 1)
Sales Sorted Sales

9 6
6 9
12 10 Median
10 12 50th Percentile
13 13
15 14
16 14
14 15 (20+1)50/100=10.5 16 + (.5)(0) = 16
14 16
Median
16 16
17 16
16 17 The median is the middle
24 17
21 18 value of data sorted in
22 18
18 19 order of magnitude. It is
19 20
18 21 the 50th percentile.
20 22
17 24
Example – Mode (Data from Example 1)

.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24

Mode = 16

The mode is the most frequently occurring value. It


is the value with the highest frequency.
Arithmetic Mean or Average
The mean of a set of observations is their average -
the sum of the observed values divided by the
number of observations.

Population Mean Sample Mean


N n
x x
m = i =1 x = i =1
N n
Example – Mean (Data Example 1)
Sales
9
6
n

x
12
10
317
13
15
x = i =1 = = 15 .85
16 n 20
14
14
16
17
16
24
21
22
18
19
18
20
17
317
Measures of Variability or Dispersion
 Range
Difference between maximum and
minimum values
 Interquartile Range
Difference between third and first
quartile (Q3 - Q1)
 Variance
Average*of the squared deviations from
the mean
 Standard Deviation
Square root of the variance
 Definitions of population variance and sample variance differ slightly.
Example - Range and Interquartile Range
(Data from Example 1)
Sorted
Sales Sales Rank Range
9 6 1 Minimum Maximum - Minimum =
6 9 2
12 10 3 24 - 6 = 18
10 12 4
13 13 5
15 14 6 First Quartile Q1 = 13 + (.25)(1) = 13.25
16 14 7
14 15 8
14 16 9 Interquartile Range
16 16 10
17 16 11 Q3 - Q1 =
16 17 12 18.75 - 13.25 = 5.5
24 17 13
21 18 14
22 18 15
18 19 16 Third Quartile Q3 = 18+ (.75)(1) = 18.75
19 20 17
18 21 18
20 22 19
17 24 20 Maximum
Variance and Standard Deviation
Population Variance Sample Variance

(x - x)
n
N 2

(x - m) 2

s =
2 i =1

s 2 = i =1
N
(n - 1)
( x) ( )
2 2
N n
 x
i =1
N
x -
n

x - 2 i =1 2

= i =1 N = i =1
n
N (n - 1)
s= s 2

s= s 2
Calculation of Sample Variance
x x-x (x - x) 2
n

 (x - x)
2
x 2
378.55
6 -9.85 97.0225 36 s =
2 i =1
=
9 -6.85 46.9225 81 (n - 1) (20 - 1)
10 -5.85 34.2225 100
378.55
12
13
-3.85
-2.85
14.8225
8.1225
144
169
= = 19.923684
19
14 -1.85 3.4225 196
 n x
2
14 -1.85 3.4225 196
15 -0.85 0.7225 225 n  i =1 
16 0.15 0.0225 256  x - 2

16 0.15 0.0225 256 =


i =1 n
16 0.15 0.0225 256 (n - 1)
17 1.15 1.3225 289 2
317 100489
17
18
1.15
2.15
1.3225
4.6225
289
324
5403 - 5403 -
= 20 = 20
(20 - 1)
18 2.15 4.6225 324
19 3.15 9.9225 361 19
20 4.15 17.2225 400
5403 - 5024.45 378.55
21 5.15 26.5225 441 = = = 19.923684
22 6.15 37.8225 484 19 19
24 8.15 66.4225 576
s = s = 19.923684 = 4.46
2

317 0 378.5500 5403


Group Data and the Histogram
Dividing data into groups or classes or intervals
Groups should be:

Mutually exclusive
Not overlapping - every observation is assigned to only one group
Exhaustive
Every observation is assigned to a group
Equal-width (if possible)
First or last group may be open-ended
Frequency Distribution
 Table with two columns listing:
Each and every group or class or interval of values
Associated frequency of each group
Number of observations assigned to each group
Sum of frequencies is number of observations
N for population
n for sample

Class midpoint is the middle value of a group or class or interval


Relative frequency is the percentage of total observations in

each class
Sum of relative frequencies = 1
Example 2: Frequency Distribution
x f(x) f(x)/n
Spending Class ($) Frequency Relative Frequency
(number of customers)

0 to less than 100 30 0.163


100 to less than 200 38 0.207
200 to less than 300 50 0.272
300 to less than 400 31 0.168
400 to less than 500 22 0.120
500 to less than 600 13 0.070

184 1.000

• Example of relative frequency: 30/184 = 0.163


• Sum of relative frequencies = 1
Cumulative Frequency Distribution
x F(x) F(x)/n
Spending Class ($) Cumulative Cumulative
Frequency Relative Frequency

0 to less than 100 30 0.163


100 to less than 200 68 0.370
200 to less than 300 118 0.641
300 to less than 400 149 0.810
400 to less than 500 171 0.929
500 to less than 600 184 1.000

The cumulative frequency of each group is the sum of the


frequencies of that and all preceding groups.
Histogram
 A histogram is a chart made of bars of different
heights.
Widths and locations of bars correspond to widths and
locations of data groupings
Heights of bars correspond to frequencies or relative
frequencies of data groupings
Histogram Example
Frequency Histogram
Histogram Example
Relative Frequency Histogram
Skewness and Kurtosis
 Skewness
• Measure of asymmetry of a frequency distribution
• Skewed to left
• Symmetric or unskewed
• Skewed to right
 Kurtosis
• Measure of flatness or peakedness of a frequency
distribution
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
Skewness
Skewed to left
Skewness
Symmetric
Skewness
Skewed to right
Kurtosis
Platykurtic - flat distribution
Kurtosis
Mesokurtic - not too flat and not too peaked
Kurtosis
Leptokurtic - peaked distribution
Relations between the Mean and
Standard Deviation

 Chebyshev’s Theorem
Applies to any distribution, regardless of shape
Places lower limits on the percentages of observations
within a given number of standard deviations from the
mean
 Empirical Rule
Applies only to roughly mound-shaped and symmetric
distributions
Specifies approximate percentages of observations
within a given number of standard deviations from the
mean
Chebyshev’s Theorem
 At least 1 - of the elements of any distribution
1 

lie within kstandard
k  deviations of the mean
2

1 1 3
1- 2 = 1 - = = 75% 2
2 4 4
Standard
At 1 1 8 Lie deviations
least 1 - 2 = 1 - = = 89% within 3 of the mean
3 9 9
1 1 15
1- 2 = 1- = = 94% 4
4 16 16
Empirical Rule
 For roughly mound-shaped and symmetric
distributions, approximately:

68% 1 standard deviation


of the mean

95% Lie 2 standard deviations


within of the mean

All 3 standard deviations


of the mean
Methods of Displaying Data
 Pie Charts
Categories represented as percentages of
total
 Bar Graphs
Heights of rectangles represent group
frequencies
 Frequency Polygons
Height of line represents frequency
Ogives
Height of line represents cumulative
frequency
 Time Plots
Represents values over time
Pie Chart
Bar Chart
Fig. 1-11 Airline Operating Expenses and Revenues

12
Average Revenues

10
Average Expenses

American Continental Delta Northwest Southwest United USAir


A i r li n e
Frequency Polygon and Ogive

Relative Frequency Polygon Ogive

0.3 1.0

0.2

0.5

0.1

0.0 0.0

0 10 20 30 40 50 0 10 20 30 40 50
Sales Sales
Time Plot
M o n thly S te e l P ro d uc tio n
(P ro b le m 1 -4 6 )
8.5
Millions of Tons

7.5

6.5

5.5

Month J F M A M J J A S ON D J F M A M J J A S ON D J F M A M J J A S O
Exploratory Data Analysis - EDA
Techniques to determine relationships and trends,
identify outliers and influential observations, and quickly
describe or summarize data sets.

 Stem-and-Leaf Displays
Quick-and-dirty listing of all observations
Conveys some of the same information as a
histogram
 Box Plots
Median
Lower and upper quartiles
Maximum and minimum
Box Plot
Smallest Elements of a Box Plot Largest data
data point point not
not below exceeding Suspected
Outlier inner fence inner fence outlier

o X X *

Inner Q1 Median Q3
Outer Inner Outer
Fence Fence Fence Fence
Q1-1.5(IQR) Interquartile Range Q3+1.5(IQR)
Q1-3(IQR)
Q3+3(IQR)
Example: Box Plot
Box Plot to Compare Two Data Sets
THANK YOU

You might also like