Descriptive Stats Part2
Descriptive Stats Part2
3-2
Statistics103110
Chapter Three: Numerical Measures of the Data
3-3
Statistics103110
Chapter Three: Numerical Measures of the Data
3-4
Statistics103110
Chapter Three: Numerical Measures of the Data
Example
A ▪ Compute the range of 6, 1, 2, 6, 11, 7, 3, 3
▪ The largest value is 11
T
N
20
3-5 15
10
Statistics 103110
5
0
Chapter Three: Numerical Measures of the Data
3-6
Statistics103110
Chapter Three: Numerical Measures of the Data
( X − ) 2 X = individual value
2
= where
= population mean
N N = population size
3-7
Example #1
Calculate the population variance from the following 5
observations: 50, 55, 45, 60, 40.
Solution:
Use the following data for the calculation of population variance.
8
9
Standard Deviation
10
Chapter Three: Numerical Measures of the Data
3-12
Statistics103110
Chapter Three: Numerical Measures of the Data
Sample Population
Textbook s Book
x
Some graphics Some graphics
calculators
Sx calculators
Some non-graphics
calculators
xn-1 xn
Some non-graphics
calculators
3-13
Statistics103110
Chapter Three: Numerical Measures of the Data
3-15
Statistics103110
Chapter Three: Numerical Measures of the Data
2 10 4 20 40
3 7 9 21 63
4 5 16 20 80
5 4 25 20 100
Total 26 81 283
nf ( x ) − ( fx ) 2
2
26( 283) − 812
s =
2
=
n( n − 1) 26( 26 − 1)
797
= = 1.2262
650
s = 1.2262 = 1.1073
3-16
Statistics103110
Sample variance of grouped data
17
Chapter Three: Numerical Measures of the Data
5- 11 1
11-17 2
17-23 3
23-29 5
29-35 4
35-41 3
41-47 2
3-18 total
Statistics103110
Chapter Three: Numerical Measures of the Data
nf (x ) − ( fx)
2
20(17336) − 556
2 2
s =
2
=
n(n − 1) 20(20 − 1)
37584
= = 98.905
380
s = 98.905 = 9.95
3-20
Statistics103110
Chapter Three: Numerical Measures of the Data
3-21
Statistics103110
Chapter Three: Numerical Measures of the Data
3-22
Statistics103110
Example :
N
20
15
10
5
23
0
Chapter Three: Numerical Measures of the Data
Advantages
The coefficient of variation is useful because the
standard deviation of data must always be
understood in the context of the mean of the data.
The coefficient of variation is a unitless
(dimensionless )number. So when comparing
between data sets with different units or widely
different means, one should use the coefficient of
variation for comparison instead of the standard
deviation.
Disadvantages
When the mean value is near zero, the coefficient of
variation is sensitive to small changes in the mean,
limiting its usefulness.
.
3-24
Statistics103110
Chapter Three: Numerical Measures of the Data
Example:- Data about the annual salary (000’s) and age of CEO’s in a
number of firms has been collected.The means and standard deviations are
as follows: Mean SD
Salary 404.2 220.5
Age 51.47 8.92
Comparing CV’s we can now see clearly that the dispersion or variability
relative to the mean is greater for CEO annual salary than for age.
3-25
Statistics103110
Chapter Three: Numerical Measures of the Data
Measure of position:
A Measures of position are used to locate the relative position
of a data value in the data set
U 1- Standard Scores
To compare values of different units a z-score for each value
is needed to be obtained then compared
S A z-score or standard score for each value is obtained by
For sample z =
x− x
T
s
or
For population x−
z =
N The z-score represents the number SD that a data value falls
above or below the mean. 20
3-26 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
S E.g., a z-score of -1.3 tells us that the raw score fell 1.3
standard deviations below the mean.
3-27 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
N
20
3-28 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
N the mean.
below - since the z-score is positive.
20
3-29 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
S zCal =
x − x 65 − 50
s
=
10
= 1.5
30 − 25
= = 1.0
T z stat
5
Since the z-score for calculus is larger , her relative
position in the calculus class is higher than her
N relative position in the statistics class.
20
3-30
15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
2. Quartiles
A Quartiles divide the data set into 4 groups.
Quartiles are denoted by Q1, Q2, and Q3.
S size.
Step 1: Arrange the data in order.
Step 2: Compute c = ({n+1}k)/4.
N
value of the required percentile.
20
3-31 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
A Example:
For the following data set: 2, 3, 5, 6, 8, 10, 12
U Find Q1 and Q3
n = 7, so for Q1 we have c = ((7+1) 1)/4 = 2.
Hence the value of Q1 is the 2nd value.
S Thus Q1 for the data set is 3.
for Q3 we have c = ((7+1) 3)/4 = 6.
N
20
3-32 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
3-33 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
A Example:
For the following data set: 2, 3, 5, 6, 8, 10, 12
U Find Q1 and Q3
The median for the above data is 6
The median for the lower group of data which is less than
S median is 3
So the value of Q1 is the 2nd value which means that Q1
=3.
T The median for the upper group of data which is grater
than median is 10
3-34 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
T
N
20
3-35 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
Outliers
A An outlier is an extremely high or an extremely low data
value when compared with the rest of the data values.
3-36 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
Example
A Given the data set 5, 6, 12, 13, 15, 18, 22, 50,
N
20
3-37 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
S lack of symmetry.
Coefficient of Skewness
Unitless number that measures the degree and direction of
T symmetry of a distribution
There are several ways of measuring Skewness:
Pearson’s coefficient of Skewness
N sk 2 =
3(mean − median )
3-38
20
15
10
s
Statistics103110
5
0
Skewness in statistics represents an
A imbalance and asymmetry from the mean
of a data distribution. If you look at a
U normal data distribution using a bell curve,
the curve will be perfectly symmetrical.
Now, this doesn't happen all that often! In
S order to fully understand when a data
distribution is imperfect and skewed, let's
T look at a normal data distribution and
symmetrical bell curve.
N
20
15
10
5
39
0
In a normal data distribution, the mean is
directly in the middle (and top point) of the
A bell curve. Imagine that Mrs.Thomas wanted
to teach her high school statistics class on
U the first day about data distributions,
standard deviations, and bell curves. She asks
her 16 student class to secretly divulge their
S summer job incomes. Each student provides
Mrs.Thomas with a piece of paper with their
T income. She rounds each income level to the
nearest 500 and makes a chart.
N
20
15
10
5
40
0
41
Now that we see the data on a
chart, we can see that four of
the students made about $2,000
N
20
15
10
5
42
0
43
44
Chapter Three: Numerical Measures of the Data
3-45 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the
Data
U
S
T
N − − − + + +
20
3-46 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
20
3-47 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
3-48
Statistics103110
Chapter Three: Numerical Measures of the Data
U 76.33 80.37
5 # summary:
82.87
{ 32.56 , 59.03 , 63.29 , 70.60 , 82.87 }
The final product: A Simple Box-plot. Only quartile information is displayed.
S
T A mathematical rule designates “outliers.” These are plotted
using special symbols.
N
20
3-51 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
A
U
S
T
N
20
3-52 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
Now find the interquartile range (IQR). The interquartile range is the difference
A between the upper quartile and the lower quartile. In this case the IQR =
87 - 52 = 35.The IQR is a very useful measurement. It is useful because it is
less influenced by extreme values, it limits the range to the middle 50%
U of the values.
35 is the interquartile range
begin to draw Box-plot graph.
S
T
N
20
3-53 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
Example 2
A Consider two datasets:
A1={0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17,
3-54 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
A
U
S
T
N
20
3-55 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
A
U
S
T
N
20
3-56 15
10
Statistics103110
5
0
Chapter Three: Numerical Measures of the Data
U 2. If the median falls to the left of the center of the box, the
distribution is positively skewed.
3. If the median falls to the right of the center of the box, the
distribution is negatively skewed
S Similarly :
1. If the lines are about the same length, the distribution is
approximately symmetric.
T 2. If the right line is larger than the left line, the distribution is
positively skewed.
3. If the left line is larger than the right line, the distribution is
N negatively skewed.
20
3-57 15
10
Statistics103110
5