Descriptive Statistics SV
Descriptive Statistics SV
Measures of Dispersion
Measures of Central Tendency
(the various averages)
Some ‘central’ aspect of the data
Measures of Dispersion
Measures of Central Tendency
(the various averages)
Some ‘central’ aspect of the data
Measures of Dispersion
How ‘spread-out’ or ‘dispersed’ the data is
Measures of Central Tendency
Mean (the Arithmetic mean)
Median
Mode
Mean
The mean of the following set of observations,
5, 0.9, 0.2, 2, 1
5+0.9+0.2+2+1
is = 1.82
5
Median
The Median of a set of ordered observations is
a middle number that divides the data into two
parts.
Median
The Median of a set of ordered observations is
a middle number that divides the data into two
parts.
=MEDIAN(number1, number2, …)
Mean versus Median
When is a Median a better summary description
of data as compared to the Mean?
Mean versus Median
When is a Median a better summary description
of data as compared to the Mean?
Let's take a seven-employee small firm with the
following salaries:
28,000 $
33,000 $
33,000 $
34,000 $
37,000 $
40,000 $
400,000 $
Mean versus Median
When is a Median a better summary description
of data as compared to the Mean?
Let's take a seven-employee small firm with the
following salaries:
28,000 $
33,000 $
33,000 $ What is the ‘typical’ salary in this group?
34,000 $
37,000 $
40,000 $
400,000 $
Mean versus Median
When is a Median a better summary description
of data as compared to the Mean?
Let's take a seven-employee small firm with the
following salaries:
28,000 $
33,000 $
What is the ‘typical’ salary in this
33,000 $ group?
34,000 $ Mean = 86,000$
37,000 $
40,000 $
400,000 $
Mean versus Median
When is a Median a better summary description
of data as compared to the Mean?
Let's take a seven employee small firm with the
following salaries:
28,000 $
33,000 $
What is the ‘typical’ salary in this
33,000 $ group?
34,000 $ Mean = 86,000$
37,000 $
Median = 34,000$
40,000 $
400,000 $
Mean versus Median
The Mean is influenced to a greater extent by
extreme observations (Outliers)
Mean versus Median
The Mean is influenced to a greater extent by
extreme observations
=MODE.SNGL(number1, number2, …)
Mode of your responses about organic products
4
3
5
3
3
2
3. Mode is 3
What is the most popular
pie flavor in the US?
Mode
Not a very relevant descriptive statistic when the data is essentially
continuous.
Date Rate
1-Jan-16 0.920945
Daily exchange rate, Dollar to Euro 2-Jan-16 0.920555
4-Jan-16 0.920725
5-Jan-16 0.926355
6-Jan-16 0.931012
7-Jan-16 0.929196
8-Jan-16 0.921235
9-Jan-16 0.917684
11-Jan-16 0.91533
12-Jan-16 0.918274
13-Jan-16 0.923063
… …
… …
Case Study
Case Study
Occupancy
Counter
Time Flow Flow
8:05 114 64
8:10 141 71
8:15 145 68
8:20 101 70
8:25 113 53
8:30 117 65
8:35 141 72
8:40 918 73
8:45 1000 54
8:50 134 60
8:55 137 72
9:00 129 57
9:05 132 73
9:10 114 63
9:15 143 55
9:20 124 53
9:25 114 73
9:30 136 62
9:35 111 62
Case Study
Based on the data provided and using basic descriptive statistics, can you
provide advice on why people perceive the London Tube is always
crowded with more than 1,000 passengers at a time. London
government insists that the average occupancy is between 130 and 150
people.
How would you analyze and present the data, so you portray a more
realistic view of what is actually happening ? More importantly. What
would you suggest to avoid this perception?
• Crush Capacity: 1,000+ people but the average occupancy: 130 ( not
occupancy at quiet moments but the average)
• You can make an educated guess passenger numbers or distance traveled
based on data proxies: return journey, use in a connecting service or use
the WIFI network
• Rush hour trains are full but only a few of them. Depending on the range
you can get a mean closer to 250
• What about the trains running counter to the flow of commuters?
• The two trains at pick time have as many passengers as the other 20+ in
that range
Real Estate
• Mean: Real estate agents calculate the mean price of houses in a particular
area so they can inform their clients of what they can expect to spend on a
house.
• Median: Real estate agents also calculate the median price of houses to
gain a better idea of the “typical” home price, since the median is less
influenced by outliers (like multi-million-dollar homes) compared to the
mean.
• Mode: Real estate agents also calculate the mode of the number of
bedrooms per house so they can inform their clients on how many
bedrooms they can expect to have in houses in a particular area.
Advertising
• Mean: Marketers often calculate the mean revenue earned per advertisement so
they can understand how much money their company is making on each ad.
• Median: Marketers also calculate the median revenue earned per advertisement
so they can understand how well the median ad performs.
• Mode: Marketers also calculate the mode of the type of ad used (e.g. newspaper,
TV, radio, digital) so they can know which type of ads their company uses most
often.
Histogram
What is the “n” of this
14
distribution?
12
Do you see any trends?
10
Is 525 an outlier?
Frequency
6
What is your best bet on mean
4
(average) and median?
2
0
0
150
225
300
375
450
525
600
675
750
825
900
975
75
More
1050
CEO Salaries (in thousands)
Histogram
14
Skewed to the right
12
10
Frequency
0
0
150
225
300
375
450
525
600
675
750
825
900
975
75
More
1050
CEO Salaries (in thousands)
Histogram
14
Skewed to the right
12
Mean = 404.17
10 Median = 350
Frequency
0
0
150
225
300
375
450
525
600
675
750
825
900
975
75
More
1050
CEO Salaries (in thousands)
Histogram
14
Skewed to the right
12
Mean = 404.17
10 Median = 350
Frequency
6
Mean > Median
0
0
150
225
300
375
450
525
600
675
750
825
900
975
75
More
1050
CEO Salaries (in thousands)
Histogram
20
18 Skewed to the left
16
14
12
Frequency
10
8
6
4
2
0
100
35
40
45
50
55
60
65
70
75
80
85
90
95
20
18 Skewed to the left
16 Mean = 74
14 Median = 79
12
Frequency
10
8
6
4
2
0
100
35
40
45
50
55
60
65
70
75
80
85
90
95
20
18 Skewed to the left
16 Mean = 74
14 Median = 79
12
Frequency
100
35
40
45
50
55
60
65
70
75
80
85
90
95
3 Salaries in Firm 2
1
Salaries in Firm 1
Mean = $33,500
Median = $33,800
Salaries at a small firm
Mean
Mean
Mean
=STDEV.P(number1, number2,…)
Excel Exercise
Rule of Thumb
Approximately 68% of the data lie within one
standard deviation, and approximately 95% lie
within 2 standard deviations from the mean
1) Apply the changes proposed and once again, find the mean, median,
mode, standard deviation of the salaries
2) What changed? Do you consider you made the right decision? Why?
3) What other pieces of data would you require to make a better decision?