SOB 1040 Lecture 2 - Data Organisation and Descriptive Statistics
SOB 1040 Lecture 2 - Data Organisation and Descriptive Statistics
LECTURE 2
DESCRIPTIVE STATISTICS
Fee
Type of health facility Yes No Total
Government hospital/clinic 34 53 87
Private hospital/clinic 20 77 97
Total 54 130 184
Form of payment
Other/don't know 3
Electronic/online 28
Check 54
Cash 15
• Among the charts you use to visualize numerical data are the stem-
and-leaf display, the histogram, the percentage polygon, and the
cumulative percentage polygon (ogive)
132 – 145 13
145 – 157 5
12
10
8
6
4
2
0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories
18
the prices are above 119. 16
Frequency
12
6
are a few prices that stray 4
12
10
8
6
4
2
0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories
and 145 16
Frequency
12
to 132 8
0
80 – 93 93 – 106 106 – 119 119 – 132 132 – 145 145 – 157
Price per plate categories
𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
𝑖=1
𝐷𝑎𝑦: 1 2 3 4 5 6 7 8 9 10
𝑇𝑖𝑚𝑒 (𝑀𝑖𝑛𝑢𝑡𝑒𝑠): 39 29 43 52 39 44 40 31 44 35
10
1 1
𝑋ത = 𝑋𝑖 = 𝑋1 + 𝑋2 + ⋯ + 𝑋10
10 10
𝑖=1
1
𝑋ത = 39 + 29 + 43 + 52 + 39 + 44 + 40 + 31 + 44 + 35
10
1
𝑋ത = 396 = 39.6
10
𝑋ത𝑤 = 𝑤1 𝑋1 + 𝑤2 𝑋2 + ⋯ + 𝑤𝑛 𝑋𝑛
𝑋ത𝑤 = 𝑤𝑖 𝑋𝑖
𝑖=1
𝑤1 𝑋1 + 𝑤2 𝑋2 + ⋯ + 𝑤𝑛 𝑋𝑛
𝑋ത𝑤 =
𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛
σ𝑛𝑖=1 𝑤𝑖 𝑋𝑖
𝑋ത𝑤 = 𝑛
σ𝑖=1 𝑤𝑖
Very poor 30
0.4
Poor 40
2
Middle 20
8
Relatively rich 8
40
Rich 2
200
Business Statistics Graduate School of Business
Solution cont…
• The ordinary average overstates the typical income of
individuals in the country. This is because computation of the
average is influenced by extreme values (outliers) that are on
the tail of the distribution. Since the distribution in this case
is skewed to the right, we expect the average to gravitate
towards the extreme values.
• The weighted average, on the other hand, takes into account
the proportion (weights) of the different socio-economic
groups reflecting a more realistic picture.
Business Statistics Graduate School of Business
Solution
• Ordinary • Weighted average
5
𝑛 𝑋ത = 𝑤𝑖 𝑋𝑖
1
𝑋ത = 𝑋𝑖 𝑖=1
𝑛 = 0.3 × 0.4 + 0.4 × 2
𝑖=1
+ 0.2 × 8 + 0.08 × 40
5
+ (0.02 × 200)
1 1 = 9.72
ത
𝑋 = 𝑋𝑖 = (250.4)
5 5
𝑖=1
𝑋ത = 50.08
𝑛 5 5
𝐻𝑀 = = = = 5.29
𝑛 1 1 1 1 1 1 0.94444444
σ𝑖=1 + + + +
𝑥𝑖 4 6 9 12 3
• It is the central value; 50% of the measurements lie above the median
and 50% lie below the median
𝑁
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 𝑐
𝑓𝑚
𝑑1
𝑀𝑜𝑑𝑒 = 𝐿 + 𝑐
𝑑1 + 𝑑2
• Use the table on slide (page) 43 to find the median and mode for that
dataset.
• If a distribution is clumped on one side of its range and has a long tail
on the other, then it is considered to be skewed in the direction of the
tail
20
15
10
0
1 3 5 7 9 11 13
• Symmetry implies mean and median are both at center. Mean = median
Business Statistics Graduate School of Business
• Histogram of ages of children
25 DISTRIBUTION SKEWED RIGHT
20
15
10
• Histogram of ages of children
5
0
25 DISTRIBUTION SKEWED LEFT
1 3 5 7 9 11 13
20
15
10
0
1 3 5 7 9 11 13
Business Statistics Graduate School of Business
Skewness & Median vs. Mean
• If a distribution is skewed left, then there must be a clump to the
right. The median is “pulled” right. The mean is more influenced by
the extreme values in the long tail. They pull it in that direction. Thus
the mean lies to the left of the median.
Histogram of ages of children
25 DISTRIBUTION SKEWED LEFT
20
Median
15
Mean
10
0
1 3 5 7 9 11 13
10
0
1 3 5 7 9 11 13
Average Yes - -
• Range represents the size of the entire data set computed as follows:
𝑅𝑎𝑛𝑔𝑒 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
Where max is the largest data value of the data series and min is the smallest
𝑛
1
𝑠2 = 𝑋𝑖 − 𝑋ത 2
𝑛−1
𝑖=1
𝑋𝑖 − 𝑋ത 2 = 1081.09225
𝑖=1
1
𝑠2 = 1081.09225 = 120.121361
10 − 1
Difficult to interpret because its unit of measurement is the square of the original data
Stud Income
point is from the mean ent (K)
• Same units as the underlying 1 1.2 -11.945 142.683025
data instead of the square of the 2 5 -8.145 66.341025
3 7.5 -5.645 31.866025
underlying data 4 7.5 -5.645 31.866025
• Standard deviations for a sample 5 7.5 -5.645 31.866025
is measured as follows: 6 12.5 -0.645 0.416025
7 15 1.855 3.441025
𝑛
1 8 15.25 2.105 4.431025
𝑠= 𝑋𝑖 − 𝑋ത 2 9 20 6.855 46.991025
𝑛−1 10 40 26.855 721.191025
𝑖=1
2/3 of data
𝑘
𝑖 = 𝑘% × 𝑛 = ×𝑛
100
❖If the index obtained in Step 2 is a whole number, Count the values in your
data set from the smallest to the largest value until you reach the number
indicated by the index. The kth percentile is the average of that corresponding
value in your data set and the value that directly follows it.