0% found this document useful (0 votes)
14 views

Lecture 5

The document discusses measures of central tendency and spread for economic data, including the mean, median, percentiles, quartiles, range, standard deviation, and boxplots. It provides examples and explanations of how to calculate and interpret these statistics. Key topics are measures of central tendency, measures of spread, summarizing data using five-number summaries and boxplots, and identifying outliers.

Uploaded by

Luna eukharis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 5

The document discusses measures of central tendency and spread for economic data, including the mean, median, percentiles, quartiles, range, standard deviation, and boxplots. It provides examples and explanations of how to calculate and interpret these statistics. Key topics are measures of central tendency, measures of spread, summarizing data using five-number summaries and boxplots, and identifying outliers.

Uploaded by

Luna eukharis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

ECON 225: Data and

Statistics for Economics


Lecture 5
Review
An abstract from an NBER paper, Han et al. (2024):
• This paper estimates the value of urban trees. The empirical strategy
exploits an ecological catastrophe — the Emerald Ash Borer (EAB)
infestation in Toronto to isolate exogenous variation in neighborhood
tree canopy changes. Adding one tree to a postcode increases
property prices by 0.40%; the hardest-hit areas lost 7% tree cover,
resulting in a 6% property price decline. The tree premium includes
the value of tree services and aesthetics. Our results demonstrate a
significant impact of trees on mitigating urban heat and generating
energy savings. However, the total amenity value of trees exceeds the
combined value of these services.
type: histogram
shape: bimodal distribution

Review

Han at al. (2024)


Review timeplot & seasonal variation

Han at al. (2024)


Plan for today
• Measures of spread
• Range
• Percentiles
• Quartiles
• Illustrating the spread in a boxplot
• Another measure of spread: standard deviation
• Excel: Quantitative summary of data
Measuring spread: Range
• The simplest measure of spread is the range
• Range = Largest value – Smallest value
• Seldom used by itself
• It is based on only two of the observations and is highly influenced by
extreme values
Measuring spread: percentiles percentage of obeservation

• To calculate a percentile:
1. Arrange the observations in increasing order (smallest value to
largest value)
!
2. Compute an index 𝑖 = 𝑛
"##
• Where p is the percentile of interest and n is the number of observations
3. Two cases:
• If i is not an integer, round up. This denotes the position of the pth percentile in
our ordered list.
• If i is an integer, the pth percentile is the average of the values in positions i and
i+1
• Note: There is not always a value with exactly p percent below /at it.
Example: Calculate the 85th percentile
• Arrange the data in increasing order:
• 3710 3755 3850 3880 3880 3890 3920 3940 3950 4050 4130 4325
!"
•𝑖= 12 = 10.2
#$$
• Not integer, so round up to 11.
• The data value in the 11th position is 4130, so the 85th percentile is
4130

NOTE: if you get i = 10, take 10th and 11th data and then calculate the average
Measure of spread: the quartiles
• The first quartile, Q1, is
the value in the sample
that has 25% of the data
at or below it (ó it is the
median of the lower half
of the sorted data,
excluding M).
• The third quartile, Q3, is
the value in the sample
that has 75% of the data
at or below it (ó it is the
median of the upper half
of the sorted data,
excluding M).
Example 1: Percentiles in a histogram

What is the
10th percentile?
0 (10% of the obs)
Five-number summary and boxplot
upper whisker

• The five-number
summary consists of:
• min, Q1, M ,Q3, max
• These five statistics can
be used to create a
boxplot

lower whisker
Skewness in boxplots

right skewed /
positively skew

symmetric
Example 2: Side-by-side boxplots
all of them are right skewed (positively skew distribution)

75%
50%

25%
upper whisker: Q3 + 1.5 x IQR

Identifying outliers
lower whisker: Q1 - 1.5 x IQR

IQR: Q3 - Q1

• It is important to identify outliers since they can be


troublesome data points that will have a big impact on
your analysis
• One way to identify an outlier is to use the distance
from the data point to the nearest quartile (Q1 or Q3).
The n compare this distance to the interquartile range
(distance between Q1 and Q3).
• We call an observation a suspected outlier if it falls
more than 1.5 times the size of the interquartile range
(IQR) above the first quartile or below the third
quartile. This is called the “1.5 * IQR rule for outliers.”
upper whisker: Q3 + 1.5 x IQR

Identifying outliers
lower whisker: Q1 - 1.5 x IQR

IQR: Q3 - Q1
Measure of spread: standard deviation
• The standard deviation, “s”, is
used to describe variation
around the mean. Like the
mean, it is not resistant to
skewness or outliers
1. First calculate the variance s2

2. Then take the square root to get


the standard deviation s
Calculating the standard deviation

Mean = 16.33
Sum of squared deviations from mean = 199.99
Degrees freedom (df) = (n − 1) = 8
s2 = variance = 199.99/8 = 25.00 (dollars per hour) squared
s = standard deviation = √25.00 = 5.00 dollars per hour
Ans:

Example 3 Since Xi is 3% for every country and Xbar is also 3%, the expression (Xi - Xbar)^2
for each country will be equal 0, because (3%-3%)^2 = 0 for every country.
Therefore, the sum of these squared differences will be 0.

consequently, sample variance and standard deviation also 0

Suppose we have a dataset with the inflation rates for 34 countries in


the OCED. Suppose also that the inflation rate is 3% in every country.
What would be the numerator in the variance expression? What is the
sample variance and standard deviation?
Example 4
Which of the following samples has the smallest standard
deviation?
a) 101, 102, 103, 104, 105 Ans:
b) 11, 12, 13, 14, 15 the standard deviation for all of them are same,
because they have same numerator and same
c) 1, 2, 3, 4, 5 denominator
Properties of the standard deviation
• s measures spread about the mean and should be used only when
the mean is the measure of center.
• s = 0 only when all observations have the same value and there is no
spread. Otherwise, s > 0
• s is location invariant.
• s is not resistant to outliers.
• s has the same units of measurement as the original observations.
Options for displaying central tendency and
spread
• To display central tendency and
spread, we can
1. Plot the five-number summary in
a boxplot
2. Plot the mean and use the
standard deviation for error bars
• Since the mean and sd are not
resistant to outliers and skewness, use
option 2 to describe distributions that
are fairly symmetrical and don’t have
outliers. Otherwise use the boxplot.
NOTE: only median that is NOT sensitive with outliers
Coefficient of Variation
• In some situations we may be interested in a descriptive statistic that indicates
how large the standard deviation is relative to the mean
• This measure is called the coefficient of variation and is usually expressed as a
percentage

• The coefficient of variation is useful for comparing the variability of variables that
have different standard deviations and different means
Example 5
Which histogram represents data with the largest standard deviation?
The smallest?

smallest medium largest

Why? because sample 3 has more variation from the mean point,
if the variation is big, then the SD also big
Summary
• Measures of central tendency
• Mean
• Median
• Measures of spread
• Percentiles
• Quartiles
• Standard deviation
• Summarizing both center and spread
• Five-number summary
• Boxplot
• Error bars around mean
For next class
• Read: Alwan 1.4 and Krauth Chapter 6
• For practice: Alwan 1.55, 1.56, 1.87, 1.93

You might also like