Tutorial 15
Tutorial 15
Refer to the topic above, measures and central tendency and dispersion is a statistic
method to make data easily digestible. Make 5 in your groups:
1. Justify the choice for using mean, median and mode as measures of central
tendency.
Mean
-the average of the numbers: a calculated “central” value of a set of numbers.
Group data:
∑𝑥
𝑛
Ungrouped data:
∑ 𝑓𝑥
𝑁
Advantages:
-is simple to understand and easy to calculate.
-It is rigidly defined.
-It is suitable for further algebraic treatment.
-It is least affected fluctuation of sampling.
-It takes into account all the values in the series.
Disavantages:
-It is highly affected by the presence of a few abnormally high or abnormally low scores.
-In absence of a single item, its value becomes inaccurate.
-It can not be determined by inspection.
Median
-is the value which occupies the middle position when all the observations are arranged in an
ascending/descending order.
-Fifty percent of observations in a distribution have scores at or below the median. Hence
median is the 50th percentile
-If the number of observations are odd, then (n + 1)/2th observation (in the ordered set) is the
median.
-When the total number of observations are even, it is given by the mean of n/2th and (n/2 +
1)th observation
Advantages:
Disadvantages
-It does not take into account the precise value of each observation and hence does not use
all information available in the data.
-Unlike mean, median is not amenable to further mathematical calculation and hence is not
used in many statistical tests.
-If we pool the observations of two groups, median of the pooled group cannot be expressed
in terms of the individual medians of the pooled groups.
Mode
-the value that occurs most frequently in the data.
-Some data sets do not have a mode because each value occurs only once.
-On the other hand, some data sets can have more than one mode.
-This happens when the data set has two or more values of equal frequency which is greater
than that of any other value.
-In a bimodal distribution, the taller peak is called the major mode and the shorter one is the
minor mode.
Advantages:
-It is the only measure of central tendency that can be used for data measured in a nominal
scale.
Disadvantages:
-It is not used in statistical analysis as it is not algebraically defined and the fluctuation in the
frequency of observation is more when the sample size is small.
2. Justify the choice of using range, midrange, interquartiltes range and standard
deviation as measures of dispersion which is suitable.
Range
-the simplest method of measurement of dispersion and defines the difference between the
largest and the smallest item in a given distribution. The formula of range is as shown:
Range = Maximum value - minimum value
Advantages:
-Very easy to calculate.
Disadvantages:
-Very sensitive to outliers
-Does not use all observations in the data resulting in failing to give any idea of the distribution.
Midrange
-The least value and the greatest value of the data set.
-The arithmetic mean amongst the data subset including only the minimum and maximum
values of the larger set
Formula:
Advantages:
Disadvantages:
Advantages:
-If the interquartile range is small it means that the middle observations are close to each other
and vice versa,
-Suitable for open-ended distributions
-Very suitable is highly skewed distribution as it is not affected by extreme values
Disadvantages:
-Not amenable mathematical manipulation
-Varies widely from sample to sample even if it is in the same population.
Standard deviation,σ
-The positive square root of the arithmetic mean of the squares of the deviations of the given
values from their arithmetic mean.
-The sum of the deviations as a measure of dispersion for a set of data is not suitable as it is
always 0.
-There are two formulas for standard deviation which are standard deviation for population
and sample.
If the set of data x₁, x₂, x₃, ...is the population with mean of , then the standard deviation of the
population is σ=(x-)²n
If the set of data x₁, x₂, x₃, ...is the sample with mean x,of then standard deviation of the
sample is s=(x-x)²n-1
Advantages:
-Can detect skewness of the graph
-Can be rigidly defined
-Very rare to be affected by sampling fluctuations
-Suitable for algebraic operations
Disadvantages:
-It is a complex method to compute compared to others
-Highly influenced by extreme values.