Div-A - Unit 2 - DS
Div-A - Unit 2 - DS
Rupali R. Patil
Measure of Central Tendency
• The central tendency is stated as the statistical
measure that represents the single value of
the entire distribution or a dataset. It aims to
provide an accurate description of the entire
data in the distribution.
• Measures of Central Tendency
• The central tendency of the dataset can be
found out using the three important measures
namely mean, median and mode.
• Mean
• The mean represents the average value of the dataset. It can be calculated
as the sum of all the values in the dataset divided by the number of values.
In general, it is considered as the arithmetic mean. Some other measures
of mean used to find the central tendency are as follows:
• Geometric Mean
• Harmonic Mean
• Weighted Mean
• It is observed that if all the values in the dataset are the same, then all
geometric, arithmetic and harmonic mean values are the same. If there is
variability in the data, then the mean value differs. Calculating the mean
value is completely easy. The formula to calculate the mean value is given
by:
• 𝑀𝑒𝑎𝑛 = (𝑥1+𝑥2+..+𝑥𝑛)/n
• The histogram given below shows that the mean value of symmetric
continuous data and the skewed continuous data.
• Median Formulas:
• Let us take “n” be the number of observations.
• If “n” is odd,
• Median = [(n +1)/2]th term
• If “n” is even,
• Median = [(n/2)th term + ((n/2) + 1)th term]/2
• 1. How to find the median?
• Solution:
• The steps to find the median are as follows:
• Step 1: Arrange the given data in ascending order.
• Step 2: Count the number of observations (n) to
check whether it is odd or even.
• Step 3: If the number of observations (n) is odd,
use the formula [(n +1)/2]th term to find the
median.
• Step 4: If the number of observations (n) is even,
use the formula [(n/2)th term + ((n/2) +
1)th term]/2 to find the median value.
• 2. The runs scored by 11 players in the cricket
match are as follows:
• 7, 16, 121, 51, 101, 81, 1, 16, 9, 11, 16
• Find the median of the data.
• Solution:
• Given data: 7, 16, 121, 51, 101, 81, 1, 16, 9, 11, 16.
• Now, arrange the data in ascending order, we get
• 1, 7, 9, 11, 16, 16, 16, 51, 81, 101, 121.
• Here, the number of observations is 11, which is
odd.
• Thus, median = 6th term
• Hence, the median of the given data is 16.
• 3. Find the median for the data 8, 5, 7, 10, 15, 21.
• Solution:
• Arranging the given data in ascending order, we
get,
• 5, 7, 8, 10, 15, 21.
• Here, the number of observations is 6, which is
even.
• Hence, Median = [(n/2)th term + ((n/2) +
1)th term]/2
• Median = [(6/2)th term + ((6/2) + 1)th term]/2
• Median = (3rd term + 4 term )/2
• Here, 3rd term = 8 and 4th term = 10
• Therefore, median = (8+10)/2 = 18/2 = 9
• Hence, the median of the given data is 9.
• 4. What is the relation between mean,
median and mode?
• Solution:
• The relation between mean, median and
mode is (Mean – Median) = 1/3 (Mean –
Mode).
• This can also be written as follows:
• 3 (Mean – Median) = (Mean – Mode)
• 3 Mean – 3 Median = Mean – Mode
• 3 Median = 3 Mean – Mean + Mode
• 3 Median = 2 Mean + Mode
• 5. For a moderately skewed distribution, mean =
12 and mode = 6. Using these values, find the
value of the median.
• Solution:
• Given that, mean = 12 and mode = 6
• We know that, 3 Median = 2 Mean + Mode
• Now, substitute the values in the formula, we get
• 3 Median = 2(12) + 6
• 3 Median = 24 + 6
• 3 Median = 30
• Median = 30/3 = 10.
• Hence, the value of median is 10.
• What is the median of two numbers?
• Solution:
• For a set of two numbers, the value of the
median will be the same as the value of the
mean.
• For example, 2 and 10 are the two numbers.
• Here, median = (2+10)/2 = 12/2 = 6
• Also, mean = (2+10)/2 = 12/2 = 6.
• Hence, the median of two numbers is equal to
the mean.
• Here 12 is the middle or
median number that has 6
values above it and 6 values
below it.
• Now, consider another
example with an even number
of observations that are
arranged in descending order
– 40, 38, 35, 33, 32, 30, 29,
27, 26, 24, 23, 22, 19, and 17
• When you look at the given dataset, the two
middle values obtained are 27 and 29.
• Now, find out the mean value for these two
numbers.
• i.e.,(27+29)/2 =28
• Therefore, the median for the given data
distribution is 28.
Homework
• Find the median of the first 5 whole
numbers.
• What is the median of 4, 2, 7, 3, 10, 9, 13?
• The marks scored by a student in different
subjects are 45, 91, 62, 71, 55. Find the
median of the given data using the median
formula.
• The weight of 8 students in kgs are 54, 49, 51,
58, 61, 52, 54, 60. Find the median weight.
• Mode
• The mode represents the
frequently occurring value in the
dataset. Sometimes the dataset
may contain multiple modes and
in some cases, it does not contain
any mode at all.
• Consider the given dataset 5, 4, 2,
3, 2, 1, 5, 4, 5
• Since the mode represents the
most common value. Hence, the
most frequently repeated value in
the given dataset is 5.
Example 5.21
The following are the marks scored by 20 students in
the class.
Find the mode 90, 70, 50, 30, 40, 86, 65, 73, 68, 90, 90,
10, 73, 25, 35, 88, 67, 80, 74, 46
Solution:
Solution:
Here, 7 is the maximum frequency, hence the
value of x corresponding to 7 is 8.
Therefore 8 is the mode.
Measures of Dispersion
• Measures of Dispersion are used to represent
the scattering of data. These are the numbers
that show the various aspects of the data
spread across various parameters.
• Range: It is defined as the difference between the
largest and the smallest value in the distribution
• Mean Deviation: It is the arithmetic mean of the
difference between the values and their mean.
• Standard Deviation: It is the square root of the
arithmetic average of the square of the deviations
measured from the mean.
• Variance: It is defined as the average of the square
deviation from the mean of the given data set.
• Quartile Deviation: It is defined as half of the
difference between the third quartile and the first
quartile in a given data set.
• Interquartile Range: The difference between
upper(Q3) and lower(Q1) quartile is called
Interterquartile Range. Its formula is given as Q3 – Q1.
• 1. Find the mean deviation from the mean for the following data
set.
• 57, 64, 43, 67, 49, 59, 44, 47, 61, 59
• Solution:
• Given,
• 57, 64, 43, 67, 49, 59, 44, 47, 61, 59
• Mean = (57 + 64 + 43 + 67 + 49 + 59 + 44 + 47 + 61 + 59)/10 =
550/10 = 55
• |xi – x̄| = |57 – 55|, |64 – 55|,
• |43 – 55|,
• |67 – 55|, |49 – 55|, |59 – 55|, |44 – 55|, |47 – 55|, |61 – 55|,
|59 – 55|
• = 2, 9, 12, 12, 6, 4, 11, 8, 6, 4
• Mean deviation = ∑|xi – x̄|/ n
• = (2 + 9 + 12 + 12 + 6 + 4 + 11 + 8 + 6 + 4)/10
• = 74/10
• = 7.4
• Therefore, the mean deviation of the given data is 7.4.
• Calculate the mean deviation about the mean of the
set of the first 10 natural numbers.
• Solution:
• First 10 natural numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
• Mean = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10
• = 55/10
• = 5.5
• |xi – x̄| = |1 – 5.5|, |2 – 5.5|, |3 – 5.5|, |4 – 5.5|, |5 –
5.5|, |6 – 5.5|, |7 – 5.5|, |8 – 5.5|, |9 – 5.5|, |10 –
5.5|
• = 4.5, 3.5, 2.5, 1.5, 0.5, 0.5, 1.5, 2.5, 3.5, 4.5
• Mean deviation = ∑|xi – x̄|/ n
• = (4.5 + 3.5 + 2.5 + 1.5 + 0.5 + 0.5 + 1.5 + 2.5 + 3.5 +
4.5)/10
• = 25/10
• = 2.5
3. Find the mean deviation about the median for the following data.
6, 15, 4, 10, 12, 11, 5, 3, 16
Solution:
Given,
6, 15, 4, 10, 12, 11, 5, 3, 16
Ascending order of the given data is: 3, 4, 5, 6, 10, 11, 12, 15, 16
Number of data values = 9
Median = (n + 1)/2 th observation
= (9 + 1)/2
= 5th observation
Median (M) = 10
The absolute values of the respective deviations from the median,
i.e., |xi − M| are:
|3 – 10|, |4 – 10|, |5 – 10|, |6 – 10|, |10 – 10|, |11 – 10|, |12 –
10|, |15 – 10|, |16 – 10|
= 7, 6, 5, 4, 0, 1, 2, 5, 6
Mean deviation = ∑|xi – M|/ n
= (7 + 6 + 5 + 4 + 0 + 1 + 2 + 5 + 6)/9
= 36/9
=4
Calculate the mean deviation from mean for the following distribution.
Solution:
Mean (x̄) = ∑fixi/∑fi
= (4 + 27 + 64 + 70 + 66 + 42)/ (4 + 9 + 16 + 14 + 11 + 6)
= 273/60
= 4.55
Mean deviation = ∑fi |xi – x̄|/ N
= (14.2 + 13.95 + 8.8 + 6.3 + 15.95 + 17.15)/60
= 76.35/60
= 1.2725
x 1 3 4 5 6 7
f 4 9 16 14 11 6
xi 1 3 4 5 6 7
fi 4 9 16 14 11 6
fx 4 27 64 70 66 42
Marks 20 29 28 33 42 38 43 25
Number of students 6 28 24 15 2 4 1 20
Marks (xi) 20 25 28 29 33 38 42 43
Age 5 10 15 20 25 30
Shoe Size 7 8 9 10
Frequency 5 10 7 13
Freque
19 23 31 26
ncy
Variance
• Standard Deviation
• How far our given set of data varies along with the
mean of the data is measured in standard
deviation. Thus, we define standard deviation as
the “spread of the statistical data from the mean
or average position”. We denote the standard
deviation of the data using the symbol σ.
• We can also define the standard deviation as the
square root of the variance.
• Properties of Standard Deviation
• Various properties of the Variance of the group of data
are,
• Standard Deviation is the square root of the variance
of the given data set. It is also called root mean square
deviation.
• Standard Deviation is a non-negative quantity i.e. it
always has positive values or zero values.
• If all the values in a data set are similar then Standard
Deviation has a value close to zero. Whereas if the
values in a data set are very different from each other
then standard deviation has a high positive value.
• Formula for Population Standard Deviation
• The mathematical formula to find the
standard deviation of the given data is,
Practice Numerical:
• 1. Calculate the standard deviation of the
following values:5, 10, 25, 30, 50.
2. The number of televisions sold in each day of
a week are 13, 8, 4, 9, 7, 12, 10.
Find its standard deviation.
Relation between Standard Deviation
and Variance
• Variance and Standard deviation are the most
common measure of the given set of data. They
are used to find the deviation of the values from
their mean value or the spread of all the values of
the data set.
• Variance is defined as the average degree through
which all the values of a given data set deviate
from the mean value.
• Standard Deviation is the degree to which the
values in a data set are spread out with respect to
the mean value.
Standard Deviation Variance
• What is Range?
• The range of the data is given as the
difference between the maximum and the
minimum values of the observations in the
data. For example, let’s say we have data on
the number of customers walking in the store
in a week.
• 10, 14, 8, 10, 15, 7
• Minimum value in data = 7
• Maximum Value in the data = 15
• Range = Maximum Value in the data – Minimum
value in the data
• = 15 – 7
• =8
• Now we can say that the range of the data is 8.
This gives us an idea about the spread of the data
but doesn’t tell how the data is distributed.
-4 5 -10 6 9
Question 1: Find out the range of the following data:
Solution:
For calculating the range of a data, we need to find out the
maximum and the minimum of the data:
Max = 9
Min = -10
Range = Max – Min
= 9 -(-10)
= 19
-4 5 -10 6 9
Question 2: Find out the mean and the
median of the same data:
Solution:
Mean of the
data: (−4+5−10+6+9)55(−4+5−10+6+9)
= 6556
= 1.2
Median is called the middle element of
the data.
Median = -10.
Coefficients of Variation
• As Standard Deviation is an absolute measure of dispersion,
one cannot use it for comparing the variability of two or
more series when they are expressed in different units.
Therefore, in order to compare the variability of two or
more series with different units it is essential to determine
the relative measure of Standard Deviation. Two of the
relative measures of Standard Deviation are Coefficient of
Standard Deviation and Coefficient of Variation.
• Coefficient of Variation is a relative measure introduced by
Karl Pearson (also known as Karl Pearson’s Coefficient of
Variation) through which two or more groups of similar data
are compared with respect to stability or homogeneity or
consistency. It is the most appropriate measure and
indicates the relationship between the standard deviation
and the arithmetic mean of the given distributions/series.
• Example 1:
• The number of points scored by two teams in
a hockey match is given below. With the help
of Coefficient of Variation, determine which
team is more consistent.
• Example 2:
• Coefficient of Variation and Standard
Deviation of two series X and Y are 55.43%
and 48.86%, and 25.5 and 24.43, respectively.
Find the means of series X and Y.
Example: Two plants C and D of a factory show the following results about the number
of workers and the wages paid to them.
Using coefficient of variation formulas, find in which plant, C or D is there greater
variability in individual wages.
Solution:
No. of workers 5000 6000
Standard deviation 9 10
To Find: Which plant has greater variability.
For this, we need to find the coefficient of variation. The plant that has a
higher coefficient of variation will have greater variability.
Coefficient of variation for plant C.
Using coefficient of variation formula,
CV = (σ/μ) × 100, μ≠0
CV = (9/2500) × 100
CV = 0.36%
Now, CV for plant D
CV = (σ/μ) × 100
CV = (10/2500) × 100
CV = 0.4%
Plant C has CV = 0.36 and plant D has CV = 0.4
Answer: Hence plant D has greater variability in individual wages.
Coefficient of Variation and Standard
Deviation
Coefficient of Variation Standard deviation