Lecture 5
Lecture 5
Review
• To calculate a percentile:
1. Arrange the observations in increasing order (smallest value to
largest value)
!
2. Compute an index 𝑖 = 𝑛
"##
• Where p is the percentile of interest and n is the number of observations
3. Two cases:
• If i is not an integer, round up. This denotes the position of the pth percentile in
our ordered list.
• If i is an integer, the pth percentile is the average of the values in positions i and
i+1
• Note: There is not always a value with exactly p percent below /at it.
Example: Calculate the 85th percentile
• Arrange the data in increasing order:
• 3710 3755 3850 3880 3880 3890 3920 3940 3950 4050 4130 4325
!"
•𝑖= 12 = 10.2
#$$
• Not integer, so round up to 11.
• The data value in the 11th position is 4130, so the 85th percentile is
4130
NOTE: if you get i = 10, take 10th and 11th data and then calculate the average
Measure of spread: the quartiles
• The first quartile, Q1, is
the value in the sample
that has 25% of the data
at or below it (ó it is the
median of the lower half
of the sorted data,
excluding M).
• The third quartile, Q3, is
the value in the sample
that has 75% of the data
at or below it (ó it is the
median of the upper half
of the sorted data,
excluding M).
Example 1: Percentiles in a histogram
What is the
10th percentile?
0 (10% of the obs)
Five-number summary and boxplot
upper whisker
• The five-number
summary consists of:
• min, Q1, M ,Q3, max
• These five statistics can
be used to create a
boxplot
lower whisker
Skewness in boxplots
right skewed /
positively skew
symmetric
Example 2: Side-by-side boxplots
all of them are right skewed (positively skew distribution)
75%
50%
25%
upper whisker: Q3 + 1.5 x IQR
Identifying outliers
lower whisker: Q1 - 1.5 x IQR
IQR: Q3 - Q1
Identifying outliers
lower whisker: Q1 - 1.5 x IQR
IQR: Q3 - Q1
Measure of spread: standard deviation
• The standard deviation, “s”, is
used to describe variation
around the mean. Like the
mean, it is not resistant to
skewness or outliers
1. First calculate the variance s2
Mean = 16.33
Sum of squared deviations from mean = 199.99
Degrees freedom (df) = (n − 1) = 8
s2 = variance = 199.99/8 = 25.00 (dollars per hour) squared
s = standard deviation = √25.00 = 5.00 dollars per hour
Ans:
Example 3 Since Xi is 3% for every country and Xbar is also 3%, the expression (Xi - Xbar)^2
for each country will be equal 0, because (3%-3%)^2 = 0 for every country.
Therefore, the sum of these squared differences will be 0.
• The coefficient of variation is useful for comparing the variability of variables that
have different standard deviations and different means
Example 5
Which histogram represents data with the largest standard deviation?
The smallest?
Why? because sample 3 has more variation from the mean point,
if the variation is big, then the SD also big
Summary
• Measures of central tendency
• Mean
• Median
• Measures of spread
• Percentiles
• Quartiles
• Standard deviation
• Summarizing both center and spread
• Five-number summary
• Boxplot
• Error bars around mean
For next class
• Read: Alwan 1.4 and Krauth Chapter 6
• For practice: Alwan 1.55, 1.56, 1.87, 1.93