Measures of Variability: Range
Measures of Variability: Range
Range
Variance
Standard Deviation
Coefficient of Variation
Range
The range is a simple measure that tells you the spread of values in a data set. It has a simple definition:
• Found by subtracting the smallest value from the largest value in a data set
So if you have a set of data such as 4, 2, 5, 8, 12, 15, the range is the highest number (15) minus
the lowest number (2). In this case:
Range = 15-2 = 13
= $456,250 – $108,000
= $348,250
• Drawback: Range is based on only two of the observations and thus is highly influenced by
extreme values
Variance
• The deviations about the mean are squared while computing the variance
∑(𝑥𝑖 − 𝑥̅ )2
• Sample variance, 𝑠 2 =
𝑛−1
∑(𝑥𝑖 − µ)2
• Population variance , 𝜎 2 = 𝑁
Table 2.12: Computation of Deviations and Squared Deviations about the Mean for the Class Size Data
• Standard Deviation
• For sample , s = √𝑠 2
• Coefficient of Variation
Standard deviation
• ( x 100 ) %
Mean
• Expressed as a percentage
Illustration:
46 54 42 46 32
• Mean, 𝑥̅ = 44
• Standard deviation, s = 8
8
• Coefficient of variation = (44 x 100)% = 18.2%
Analyzing Distributions
Percentiles Empirical Rule
Percentiles
• Approximately p percent of the observations have values less than the pth percentile
• Approximately (100 – p) percent of the observations have values greater than the pth
percentile
• Compute k = (n + 1) × p
• If d = 0, find the kth largest value in the data set; this is the pth percentile
• If d > 0, the percentile is between the values in positions i and i + 1 in the sorted
data; to find this percentile, we must interpolate between these two values:
• To find the pth percentile, add t to the value in position i of the sorted
data
• Illustration
• To determine the 85th percentile for the home sales data in Table 2.9.
2. Dividing 11.05 into the integer and decimal components gives us i = 11 and d = 0.05
d > 0, interpolate between the values in the 11th and 12th positions in the sorted data
Illustration (contd.)
• To determine the 85th percentile for the home sales data in Table 2.9
Quartiles
z-score
• Helps to determine how far a particular value is from the mean relative to the data set’s
standard deviation
• Standardized value
• If 𝑥1 , 𝑥2 , . . . , 𝑥𝑛 is a sample of n observations
𝑥𝑖 − 𝑥̅
𝑧𝑖 = 𝑠
• 𝑧𝑖 = z-score for 𝑥𝑖
• 𝑥̅ = sample mean
Identifying Outliers
• Any data value with a z-score less than –3 or greater than +3 is an outlier
Box Plots
*q`
Figure 2.23: Box Plots Comparing Home Sale Prices in Different Communities
Figure 2.22: Box Plot
for the Home Sales
Data