Descriptive Statistics 1
Descriptive Statistics 1
Business Statistics
Measures of Central Tendency
x i
x1 x2 xn
x x1 x2 xN
i
x i 1
i 1
n n N N
Arithmetic Mean (Contd.)
Applicable for interval and ratio data.
Not applicable for nominal or ordinal data.
Affected by each value in the data set,
including extreme values (also known as
outliers).
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 1 2 3 4 10
Mean 3 Mean 4
5 5
Weighted Arithmetic Mean
Considers the importance of each value.
Weighted mean, n
(w i xi )
xw i 1
n
w
i 1
i
Median = 3 Median = 3
Mode
The most frequently occurring value in a data set.
Applicable to all levels of data measurement (nominal,
ordinal, interval, and ratio).
Not affected by extreme values. Modes
There may be no mode, single mode (uni-modal), two
modes (bi-modal) or several modes (multi-modal).
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
Quartiles
Quartiles split the ranked data into four segments with
an equal number of values per segment.
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger.
Q2 is the same as the median (50% are smaller, 50% are
larger).
Only 25% of the observations are greater than the third
quartile, Q3.
Quartiles
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = (n+1)/4
Second quartile position:
Q2 = (n+1)/2 (the median position)
Third quartile position:
Q3 = 3(n+1)/4
where n is the number of observed values
Percentiles
Note that:
25th percentile = Q1 (first quartile)
50th percentile = median = Q2 (Second quartile)
75th percentile = Q3 (third quartile)
Measures of Variability
Measures of variability describe the spread or
the dispersion of a set of data.
Same center,
different variation
Range
Simplest measure of variation.
Difference between the largest and the
smallest values in a set of data:
Range = xlargest - xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Range
Disadvantages:
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Sensitive to outliers
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Example:
Median
xminimum Q1 (Q2) Q3 xmaximum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 30 = 27
Variance
The variance of a set of observations is the
average squared deviation of the data points
from their mean. n
Sample variance: i
( x x ) 2
s
2 i 1
n 1
N
Population variance: i
( x ) 2
2 i 1
N
Standard Deviation
The standard deviation of a set of
observations is the (positive) square root of
the variance of the set.
Sample standard deviation :
n
(x i x )2
s s2 i 1
n 1
(x i )2
2 i 1
N
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Can be used to compare two or more sets of
data measured in different units
s
CV 100%
x
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $7
S $7
CVA 100%
100% 14%
X $50 Stock B is more
variable than
Stock B:
stock A, but stock
Average price last year = $100 B is less variable
relative to its
Standard deviation = $10 price
S $10
CVB 100%
100% 10%
X $100
Z Scores
XX
Z
S
Z Scores
(continued)
Example:
If the mean is 14.0 and the standard deviation is 3.0, what is
the Z score for the value 18.5?
X X 18.5 14.0
Z 1.5
S 3.0
68%
1
The Empirical Rule
2 contains about 95% of the values in
the population or the sample
3 contains about 99.7% of the values
in the population or the sample
95% 99.7%
2 3
Problem
A cold drink bottling plant fills bottles of 500 ml
capacity with mean of 500 ml and standard deviation
of 5 ml. At least what percentage of bottles would
contain cold drink between 490 and 510 ml?
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Box and Whisker Plots
Graphic display of a distribution
Reveals skewness and outliers
Skewness
Mesokurtic
Platykurtic
Box and Whisker Plot
Five summary measures are used:
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
The Box
Median (Vertical line across the box)
First Quartile
Third Quartile
The Whisker
Lower inner fence = smallest observation within Q1 1.5 IQR
Upper inner fence = Largest observation within Q3 + 1.5 IQR
Outer Fences
Lower outer fence = Q1 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
IQR
Right inner Right outer
Left outer Left inner
fence fence
fence fence
Outlier
Suspected
Outlier
smallest observation
within Q1 1.5 IQR
Q1 Q2 Q3 largest observation
within Q3 + 1.5 IQR
Example of Raw Data
Class Frequency
15.2 15.4 2
15.5 15.7 5
15.8 16.0 11
16.1 16.3 6
16.4 16.6 3
16.7 16.9 3
Relative Frequency Distribution
Relative
Class Frequency Percentage
frequency
15.2 15.4 2 2/30 = 0.07 7
15.5 15.7 5 5/30 = 0.17 17
15.8 16.0 11 11/30 = 0. 36 36
16.1 16.3 6 6/30 = 0.20 20
16.4 16.6 3 3/30 = 0.10 10
16.7 16.9 3 3/30 = 0.10 10
Total 30 1.00 100
The Histogram
A graph of the data in a frequency distribution is
called a histogram
The class boundaries (or class midpoints) are
shown on the horizontal axis
the vertical axis is either frequency, relative
frequency, or percentage
Bars of the appropriate heights are used to
represent the number of observations within
each class
Histogram Example
12 (No gaps
between
10
bars)
Frequency
8
6
4
2
0
15.2 15.5 15.8 16.1 16.4 16.7 17.0
Production level in Yards
Frequency Polygon
12
10
Frequency
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production level in Yards
Frequency Polygon Example
12
10
Frequency
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production level in Yards
Frequency Polygon Example
12
10
Frequency
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production level in Yards
Cumulative Frequency Distribution
Enables us to see how many observations lie
above or below certain value.
Less-than type and more-than type.
A graph of a cumulative frequency distribution
is called an ogive.
The shape of an ogive for less-than type
cumulative frequency distribution would be
slope up and to the right.
Cumulative Frequency Distribution
0.7
0.6
0.5
0.4
0.1
0
15.2 15.5 15.8 16.1 16.4 16.7 17.0
Production level in yards
Pie Chart
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in $1000's