Idl 3
Idl 3
Idl 3
Numerical Summary
Raw data
The data represent the highest temperature recorded by a
remote sensor in 50 countries.
112, 100, 127, 120, 134, 118, 105, 110, 109, 112,
110, 118, 117, 116, 118, 112, 114, 114, 105, 109,
107, 112, 114, 115, 118, 117, 118, 122, 106, 110,
116, 108, 110, 121, 113, 120, 119, 111, 104, 111,
120, 113, 120, 117, 105, 110, 118, 112, 114, 114.
where
µ is the population mean
X is a particular value.
X
56 ,000 ... 73,000
48 ,500
N 4
The Arithmetic Mean
Sample Mean
X
X
n
X 14 15 17 16 15 .0
X
n 5
77
15 .4
5
The Arithmetic Mean 3- 10
Xf
X
n
where X and f represent the midpoint and
frequency of each class respectively
The Arithmetic Mean for grouped data
6 – 10 5.5 – 10.5 1 1
11 – 15 10.5 – 15.5 2 3
16 – 20 15.5 – 20.5 3 6
21 – 25 20.5 – 25.5 5 11
26 – 30 25.5 – 30.5 4 15
31 – 35 30.5 – 35.5 3 18
36 - 40 35.5 – 40.5 2 20
n=20
The Median: Example
n 20
CF 6
Median L 2 ( w ) 20 .5 2 (5) 24 .5
fm 5
The Median
range,
Measures of dispersion include the following:
mean deviation, variance, standard deviation
and coefficient of variation.
The Range 3- 26
X-X
MD =
n
The Mean Deviation: Example 3- 28
Variance: the
arithmetic mean
of the squared
deviations from
the mean.
= (X - )2
N
X is an observed values in the population
u is the arithmetic mean of the population
N is the number of observations in the population
= (X - ) 2
N
= 42.227
= 6.498
The Variance and Standard Deviation
(X - X) 2
s2 = n-1
s s 2
The Variance and Standard Deviation: Example
s2
n 1 5 1
21 .2
5 .30
5 1
s s 2
5.30 2.30
The coefficient of variation
The coefficient of variation (cv) is defined as the ratio of
the standard deviation to the arithmetic mean. This is
usually expressed in percentage.
S
CV 100%
X
• Stock A:
– Average price last year = $50
– Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50 Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
– Standard deviation = $5 stock B is less
variable relative
S $5
to its price
CVB 100% 100% 5%
X $100
The Measures of Position
The Measures of Position
Measures of Position: These measures describe the location
(position) of a particular value in a given distribution of data.
The position is described by quartiles, deciles and percentiles
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the observations
are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are larger)
Only 25% of the observations are greater than the third quartile
The Quartiles
(𝑄 −𝑄 )
Semi Interquartile range: Semi-IQR= 3 1
2
where n is the number of observed values
The Deciles and Percentiles
Deciles: The deciles are the values that divide the
set of data into ten equal parts.
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
• The formula for the kth percentile for a grouped data is given by
Where
• K= Percentile (0.1,0.2,…,)
• 𝑙𝑘 = lower class boundary of the class in which the kth percentile lies
• 𝐶𝑘 = the class width of the kth percentile class boundary
• 𝐹𝑘 =the cumulative frequency just before the kth percentile class boundary
• 𝑓𝑘 =the frequency of the kth percentile class boundary
Measures of position for group data
Example
𝐶0.1 9
𝑃0.1 = 𝑙0.1 + (0.1X40-𝐹0.1 ) = 126.5 + 0.1 × 40 − 3 = 128.3
𝑓0.1 5
𝐶0.45 9
𝑃0.45 = 𝑙0.45 + (0.45x40-𝐹0.45 ) = 144.5 + 0.45 × 40 − 17 = 144.5
𝑓0.45 12
𝐶0.9 9
𝑃0.9 = 𝑙0.9 + (0.9X40-𝐹0.9 ) = 162.5 + 0.9 × 40 − 34 = 167
𝑓0.9 4
The Measure of Shape
The Measure of Shape
Measures of shape determine whether the distribution of data
exhibits a symmetric pattern or stretch out in a particular direction.
Two of such measures of shape are the
Skewness
Kurtosis/Peakness
Skewness
Symmetrical distribution
Positively Skewed distribution
Negatively Skewed distribution
3 3
Skewness
3- 48
Mean
Median
Mode
Negatively Skewed: Mean and Median are to the left of the Mode.
Mean<Median<Mode
Mean Mode
Median
• Positively skewed: Mean and median are to the right of the mode.
Mean>Median>Mode
Mode Mean
Median
Example
The lengths of stay on the cancer floor of a hospital were organised into a
frequency distribution. The mean length of stay was 28 days, the median,
25 days and mode, 23 days. The standard deviation was computed to be
4.2 days. Compute the skewness.
Kurtosis/Peakness 3- 53
• If k<3 The distribution flattens at the centre than the normal distribution
Example
The lengths of stay on the cancer floor of a hospital were organised into a
frequency distribution. The mean length of stay was 28 days, the median,
25 days and mode, 23 days. The standard deviation was computed to be
4.2 days. Compute the skewness.
Assignment 3- 56
68%
95%
99.7%
3 1 1 3
Assignment
The data represent the recorded high temperature from a
sensor for 50 districts in Ghana.
112, 100, 127, 120, 134, 118, 105, 110, 109, 112,
110, 118, 117, 116, 118, 112, 114, 114, 105, 109,
107, 112, 114, 115, 118, 117, 118, 122, 106, 110,
116, 108, 110, 121, 113, 120, 119, 111, 104, 111,
120, 113, 120, 117, 105, 110, 118, 112, 114, 114.
Box-and-Whisker Plot:
A Graphical display of data using 5-number
summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
0 2 2 2 3 3 4 5 5 10 27
00 22 33 55 27
27
• The data are right skewed, as the plot depicts
Distribution Shape and Box-and-whisker plot
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3