Lecture 4
Lecture 4
70
60
50
frequency
40
30
20
10
0
25 50 75 100 125 150 175 200 225 250 275 300
$
Stemplots stemplot
character:
- have times (x)
- have variables on (y)
Describing time plots
A trend is a rise or fall that persists over
time, despite small irregularities.
NOTE: the line represents the overall increasing
mode: 12-14
A large gap in roughly symmetric / bell-shaped
the distribution
is typically a sign of
an outlier.
Scales matter
• How you stretch the axes and choose your scales can give a different
impression
• Always look at the scales
a) Pareto chart
b) Histogram
c) Bar chart
d) Pie chart
e) Time plot
c) symmetric
Describing Distributions with Numbers
1. Measures of central tendency: mean, median
2. Measures of spread: quartiles, standard deviations
3. Five-number summary and boxplots
Measure of central tendency: the mean
• To calculate the average, or
mean, add all values, then divide
by the number of individuals
• It is the “center of mass” of a
distribution
• Sum is 391
• Divided by 24 observations =
16.292
The mean of a variable in a sample
Q2 = (20+30) / 2
= 25
Steps:
1. Arrange the observations in ascending order
2. Because the number of observations is even, the median is the average of the two middle
values
Median vs Mean
• The mean and median are close together if the distribution is
symmetrical
mean and median will getting apart if the observation is getting skewer
Mean and median of a distribution with
outliers
The mean is pulled to the right a lot by the outliers (the ones dying after 13 years),
from 3.4 to 4.2. Meanwhile, the median would only increase from 3.4 to 3.6 with
inclusion of the outlier.
Summary of mean vs. median
• Mean is not resistant to outliers or skewness
• Any extreme value will have a big effect on the mean
• Median is very resistant to outliers and skewness
• Both mean and median are useful measures of central tendency
• Report both and let the reader decide
Example 3
• A realtor selling homes calculates two measures of central tendency
for the price of a home in her area. She gets $127,312 and $105,100
• One is the mean, and one is the median. Can you guess which is
which?