2.descriptive Statistics
2.descriptive Statistics
Class content
In this class:
•The central tendency
•The variation
•The shape
Descriptive Statistics
THE CENTRAL TENDENCY is the extent to which all the data values
group around a typical or central value.
THE SHAPE is the pattern of the distribution of values from the lowest
value to the highest value.
Measures of Central Tendency:
The Mean
THE ARITHMETIC MEAN (often just called the “mean”) is the most
common measure of central tendency
For a sample of size n:
X i
X1 X 2 Xn
X i 1
n n
Sample size Observed values
Measures of Central Tendency:
The Mean (cont.)
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Mean = 13 Mean = 14
11 12 13 14 15 65 11 12 13 14 20 70
13 14
5 5 5 5
Measures of Central Tendency:
The Median
In an ordered array, THE MEDIAN is the “middle” number
(50% above, 50% below)
11 12 13 14 15 16 17 18 19 20 11 12 13 14 15 16 17 18 19 20
Median = 13 Median = 13
n 1
Median position position in the ordered data
2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
Measures of Central Tendency:
Review Examples
X i
X i1 Middle value in Most frequently
n
the ordered array observed value
Measures of Variation
Variation
Same center,
different variation
Measures of Variation:
The Range
Simplest measure of variation
Difference between the largest and the smallest values:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Measures of Variation:
Why The Range Can Be Misleading
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Measures of Variation:
The Sample Variance
• Sample variance:
2
(X i X) 2
S i 1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the
variable X
Measures of Variation:
The Sample Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
n
• Sample standard deviation: (X i X) 2
S i1
n -1
Measures of Variation:
Comparing Standard Deviations
Smaller standard deviation
X i
X1 X 2 XN
i 1
N N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Numerical Descriptive Measures For A
Population: The Variance σ2
Population variance:
2
(X i μ) 2
σ i 1
N
Where μ = population mean
N = population size
Xi = ith value of the variable X
Numerical Descriptive Measures For A
Population: The Standard Deviation σ
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the population variance
Has the same units as the original data
σ i 1
N
Sample statistics versus population
parameters
Quartiles split the ranked data into 4 segments with an equal number of
values per segment
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the observations are
smaller and 75% are larger
Q2 is the same as the median (50% of the observations are smaller and
50% are larger)
Only 25% of the observations are greater than the third quartile
Quartiles: Locating Quartiles
• If the result is not a whole number or a fractional half then round the
result to the nearest integer to find the ranked position.
Quartile Measures
Calculating The Quartiles: Example
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = (12+13)/2 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = (18+21)/2 = 19.5
• The IQR is Q3 – Q1 and measures the spread in the middle 50% of the data
• The IQR is also called the mid-spread because it covers the middle 50% of
the data
• Measures like Q1, Q3, and IQR that are not influenced by outliers are called
resistant measures
Calculating The Interquartile Range
Example:
Median X
X
Q1 (Q2) Q3 maximum
minimum
25% 25% 25% 25%
12 30 45 57
70
Interquartile range
= 57 – 30 = 27
The Five-Number Summary
• The five numbers that help describe the center, spread and shape
of data are:
•X smallest
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Measures Of The Relationship Between
Two Numerical Variables
• THE COVARIANCE
• THE COEFFICIENT OF CORRELATION
The Covariance
(X i X )(Yi Y )
cov( X , Y ) i 1
n 1
n n n
(X X)(Y Y)
i i (X X)
i
2
(Y Y )
i
2
X X
r = -1 r = -.6
Y
Y Y
X X X
r = +1 r = +.3 r=0
The Coefficient of Correlation Using
Microsoft Excel Function
Test #1 Score Test #2 Score Correlation Coefficient
78 82 0.7332 =CORREL(A2:A11,B2:B11)
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
The Coefficient of Correlation Using
Microsoft Excel Data Analysis Tool
1. Select Data
2. Choose Data Analysis
3. Choose Correlation &
Click OK
The Coefficient of Correlation
Using Microsoft Excel
Excel function: =correl
Or use the Data
Analysis Tool,
“covariance” and
“correlation”.
Thank you
for your
attention