Basic Statistical Descriptions of Data
Basic Statistical Descriptions of Data
Basic Statistical Descriptions of Data
2. Median :
Sum of the values of then observations Number of observations in the
sample
Sum of the values of the N observations Number of observations in the
population
• The median of a data set is the value in the middle when the data items are
arranged in ascending order. Whenever a data set has extreme values, the
median is the preferred measure of central location.
• The median is the measure of location most often reported for annual
income and property value data. A few extremely large incomes of property
values can inflate the mean.
• For an off number of observations:
7 observations= 26, 18, 27, 12, 14, 29, 19.
Numbers in ascending order = 12, 14, 18, 19, 26, 27, 29
• The median is the middle value.
Median=19
• For an even number of observations :
8 observations = 26 18 29 12 14 27 30 19
Numbers in ascending order =12, 14, 18, 19, 26, 27, 29, 30
The median is the average of the middle two values.
3. Mode:
• The mode of a data set is the value that occurs with greatest frequency. The
greatest frequency can occur at two or more different values. If the data have
exactly two modes, the data have exactly two modes, the data are bimodal. If
the data have more than two modes, the data are multimodal.
• Weighted mean: Sometimes, each value in a set may be associated with a
weight, the weights reflect the significance, importance or occurrence
frequency attached to their respective values.
• Trimmed mean: A major problem with the mean is its sensitivity to
extreme (e.g., outlier) values. Even a small number of extreme values can
corrupt the mean. The trimmed mean is the mean obtained after cutting off
values at the high and low extremes.
• For example, we can sort the values and remove the top and bottom 2 %
before computing the mean. We should avoid trimming too large a portion
(such as 20 %) at both ends as this can result in the loss of valuable
information.
• Holistic measure is a measure that must be computed on the entire data set
as a whole. It cannot be computed by partitioning the given data into subsets
and merging the values obtained for the measure in each subset.
Standard Deviation :
• The standard deviation of a data set is the positive square root of the
variance. It is measured in the same in the same units as the data, making it
more easily interpreted than the variance.
• The standard deviation is computed as follows:
Difference between Standard Deviation and Variance