0% found this document useful (0 votes)
22 views63 pages

Div-A - Unit 2 - DS

The document discusses descriptive measures in statistics, focusing on measures of central tendency, including mean, median, and mode, along with their calculations and examples. It also covers measures of dispersion such as range, mean deviation, standard deviation, variance, quartile deviation, and interquartile range, providing formulas and examples for each. Additionally, it includes homework exercises for practice.

Uploaded by

darshanakadu786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views63 pages

Div-A - Unit 2 - DS

The document discusses descriptive measures in statistics, focusing on measures of central tendency, including mean, median, and mode, along with their calculations and examples. It also covers measures of dispersion such as range, mean deviation, standard deviation, variance, quartile deviation, and interquartile range, providing formulas and examples for each. Additionally, it includes homework exercises for practice.

Uploaded by

darshanakadu786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Descriptive Measures

Rupali R. Patil
Measure of Central Tendency
• The central tendency is stated as the statistical
measure that represents the single value of
the entire distribution or a dataset. It aims to
provide an accurate description of the entire
data in the distribution.
• Measures of Central Tendency
• The central tendency of the dataset can be
found out using the three important measures
namely mean, median and mode.
• Mean
• The mean represents the average value of the dataset. It can be calculated
as the sum of all the values in the dataset divided by the number of values.
In general, it is considered as the arithmetic mean. Some other measures
of mean used to find the central tendency are as follows:
• Geometric Mean
• Harmonic Mean
• Weighted Mean
• It is observed that if all the values in the dataset are the same, then all
geometric, arithmetic and harmonic mean values are the same. If there is
variability in the data, then the mean value differs. Calculating the mean
value is completely easy. The formula to calculate the mean value is given
by:
• 𝑀𝑒𝑎𝑛 = (𝑥1+𝑥2+..+𝑥𝑛)/n
• The histogram given below shows that the mean value of symmetric
continuous data and the skewed continuous data.
• Median Formulas:
• Let us take “n” be the number of observations.
• If “n” is odd,
• Median = [(n +1)/2]th term
• If “n” is even,
• Median = [(n/2)th term + ((n/2) + 1)th term]/2
• 1. How to find the median?
• Solution:
• The steps to find the median are as follows:
• Step 1: Arrange the given data in ascending order.
• Step 2: Count the number of observations (n) to
check whether it is odd or even.
• Step 3: If the number of observations (n) is odd,
use the formula [(n +1)/2]th term to find the
median.
• Step 4: If the number of observations (n) is even,
use the formula [(n/2)th term + ((n/2) +
1)th term]/2 to find the median value.
• 2. The runs scored by 11 players in the cricket
match are as follows:
• 7, 16, 121, 51, 101, 81, 1, 16, 9, 11, 16
• Find the median of the data.
• Solution:
• Given data: 7, 16, 121, 51, 101, 81, 1, 16, 9, 11, 16.
• Now, arrange the data in ascending order, we get
• 1, 7, 9, 11, 16, 16, 16, 51, 81, 101, 121.
• Here, the number of observations is 11, which is
odd.
• Thus, median = 6th term
• Hence, the median of the given data is 16.
• 3. Find the median for the data 8, 5, 7, 10, 15, 21.
• Solution:
• Arranging the given data in ascending order, we
get,
• 5, 7, 8, 10, 15, 21.
• Here, the number of observations is 6, which is
even.
• Hence, Median = [(n/2)th term + ((n/2) +
1)th term]/2
• Median = [(6/2)th term + ((6/2) + 1)th term]/2
• Median = (3rd term + 4 term )/2
• Here, 3rd term = 8 and 4th term = 10
• Therefore, median = (8+10)/2 = 18/2 = 9
• Hence, the median of the given data is 9.
• 4. What is the relation between mean,
median and mode?
• Solution:
• The relation between mean, median and
mode is (Mean – Median) = 1/3 (Mean –
Mode).
• This can also be written as follows:
• 3 (Mean – Median) = (Mean – Mode)
• 3 Mean – 3 Median = Mean – Mode
• 3 Median = 3 Mean – Mean + Mode
• 3 Median = 2 Mean + Mode
• 5. For a moderately skewed distribution, mean =
12 and mode = 6. Using these values, find the
value of the median.
• Solution:
• Given that, mean = 12 and mode = 6
• We know that, 3 Median = 2 Mean + Mode
• Now, substitute the values in the formula, we get
• 3 Median = 2(12) + 6
• 3 Median = 24 + 6
• 3 Median = 30
• Median = 30/3 = 10.
• Hence, the value of median is 10.
• What is the median of two numbers?
• Solution:
• For a set of two numbers, the value of the
median will be the same as the value of the
mean.
• For example, 2 and 10 are the two numbers.
• Here, median = (2+10)/2 = 12/2 = 6
• Also, mean = (2+10)/2 = 12/2 = 6.
• Hence, the median of two numbers is equal to
the mean.
• Here 12 is the middle or
median number that has 6
values above it and 6 values
below it.
• Now, consider another
example with an even number
of observations that are
arranged in descending order
– 40, 38, 35, 33, 32, 30, 29,
27, 26, 24, 23, 22, 19, and 17
• When you look at the given dataset, the two
middle values obtained are 27 and 29.
• Now, find out the mean value for these two
numbers.
• i.e.,(27+29)/2 =28
• Therefore, the median for the given data
distribution is 28.
Homework
• Find the median of the first 5 whole
numbers.
• What is the median of 4, 2, 7, 3, 10, 9, 13?
• The marks scored by a student in different
subjects are 45, 91, 62, 71, 55. Find the
median of the given data using the median
formula.
• The weight of 8 students in kgs are 54, 49, 51,
58, 61, 52, 54, 60. Find the median weight.
• Mode
• The mode represents the
frequently occurring value in the
dataset. Sometimes the dataset
may contain multiple modes and
in some cases, it does not contain
any mode at all.
• Consider the given dataset 5, 4, 2,
3, 2, 1, 5, 4, 5
• Since the mode represents the
most common value. Hence, the
most frequently repeated value in
the given dataset is 5.
Example 5.21
The following are the marks scored by 20 students in
the class.
Find the mode 90, 70, 50, 30, 40, 86, 65, 73, 68, 90, 90,
10, 73, 25, 35, 88, 67, 80, 74, 46
Solution:

Since the marks 90 occurs the maximum number of


times,
three times compared with the other numbers, mode is
90.
Example 5.22
A doctor who checked 9 patients’ sugar level is
given below. Find the mode value of the sugar
levels.
80, 112, 110, 115, 124, 130, 100, 90, 150, 180
Solution:
Since each values occurs only once, there is no
mode.
Example 5.23
Compute mode value for the following
observations.
2, 7, 10, 12, 10, 19, 2, 11, 3, 12
Solution:
Here, the observations 10 and 12 occurs
twice in the data set, the modes are 10 and
12.
For discrete frequency distribution, mode is
the value of the
variable corresponding to the maximum
frequency.
Example 5.24
Calculate the mode from the following data

Solution:
Here, 7 is the maximum frequency, hence the
value of x corresponding to 7 is 8.
Therefore 8 is the mode.
Measures of Dispersion
• Measures of Dispersion are used to represent
the scattering of data. These are the numbers
that show the various aspects of the data
spread across various parameters.
• Range: It is defined as the difference between the
largest and the smallest value in the distribution
• Mean Deviation: It is the arithmetic mean of the
difference between the values and their mean.
• Standard Deviation: It is the square root of the
arithmetic average of the square of the deviations
measured from the mean.
• Variance: It is defined as the average of the square
deviation from the mean of the given data set.
• Quartile Deviation: It is defined as half of the
difference between the third quartile and the first
quartile in a given data set.
• Interquartile Range: The difference between
upper(Q3) and lower(Q1) quartile is called
Interterquartile Range. Its formula is given as Q3 – Q1.
• 1. Find the mean deviation from the mean for the following data
set.
• 57, 64, 43, 67, 49, 59, 44, 47, 61, 59
• Solution:
• Given,
• 57, 64, 43, 67, 49, 59, 44, 47, 61, 59
• Mean = (57 + 64 + 43 + 67 + 49 + 59 + 44 + 47 + 61 + 59)/10 =
550/10 = 55
• |xi – x̄| = |57 – 55|, |64 – 55|,
• |43 – 55|,
• |67 – 55|, |49 – 55|, |59 – 55|, |44 – 55|, |47 – 55|, |61 – 55|,
|59 – 55|
• = 2, 9, 12, 12, 6, 4, 11, 8, 6, 4
• Mean deviation = ∑|xi – x̄|/ n
• = (2 + 9 + 12 + 12 + 6 + 4 + 11 + 8 + 6 + 4)/10
• = 74/10
• = 7.4
• Therefore, the mean deviation of the given data is 7.4.
• Calculate the mean deviation about the mean of the
set of the first 10 natural numbers.
• Solution:
• First 10 natural numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
• Mean = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10)/10
• = 55/10
• = 5.5
• |xi – x̄| = |1 – 5.5|, |2 – 5.5|, |3 – 5.5|, |4 – 5.5|, |5 –
5.5|, |6 – 5.5|, |7 – 5.5|, |8 – 5.5|, |9 – 5.5|, |10 –
5.5|
• = 4.5, 3.5, 2.5, 1.5, 0.5, 0.5, 1.5, 2.5, 3.5, 4.5
• Mean deviation = ∑|xi – x̄|/ n
• = (4.5 + 3.5 + 2.5 + 1.5 + 0.5 + 0.5 + 1.5 + 2.5 + 3.5 +
4.5)/10
• = 25/10
• = 2.5
3. Find the mean deviation about the median for the following data.
6, 15, 4, 10, 12, 11, 5, 3, 16
Solution:
Given,
6, 15, 4, 10, 12, 11, 5, 3, 16
Ascending order of the given data is: 3, 4, 5, 6, 10, 11, 12, 15, 16
Number of data values = 9
Median = (n + 1)/2 th observation
= (9 + 1)/2
= 5th observation
Median (M) = 10
The absolute values of the respective deviations from the median,
i.e., |xi − M| are:
|3 – 10|, |4 – 10|, |5 – 10|, |6 – 10|, |10 – 10|, |11 – 10|, |12 –
10|, |15 – 10|, |16 – 10|
= 7, 6, 5, 4, 0, 1, 2, 5, 6
Mean deviation = ∑|xi – M|/ n
= (7 + 6 + 5 + 4 + 0 + 1 + 2 + 5 + 6)/9
= 36/9
=4
Calculate the mean deviation from mean for the following distribution.
Solution:
Mean (x̄) = ∑fixi/∑fi
= (4 + 27 + 64 + 70 + 66 + 42)/ (4 + 9 + 16 + 14 + 11 + 6)
= 273/60
= 4.55
Mean deviation = ∑fi |xi – x̄|/ N
= (14.2 + 13.95 + 8.8 + 6.3 + 15.95 + 17.15)/60
= 76.35/60
= 1.2725

x 1 3 4 5 6 7

f 4 9 16 14 11 6

xi 1 3 4 5 6 7

fi 4 9 16 14 11 6

fx 4 27 64 70 66 42

|xi – x̄| 3.55 1.55 0.55 0.45 1.45 2.45

fi |xi – x̄| 14.2 13.95 8.8 6.3 15.95 17.15


Find the mean deviation about the median for the following data.
Solution:
Let us write the given data in an ascending order and calculate the cumulative frequency
for the same, as follows:
Here, N = 100
Median = (28 + 29)/2 = 28.5
The absolute values of the respective deviations from the median, i.e., |xi − M| are:
|20 – 28.5|, |25 – 28.5|, |28 – 28.5|, |29 – 28.5|, |33 – 28.5|, |38 – 28.5|, |42 – 28.5|, |43 – 28.5|
= 8.5, 3.5, 0.5, 0.5, 4.5, 9.5, 13.5, 14.5
Mean deviation = ∑fi|xi – M|/ N
= [6(8.5) + 20(3.5) + 24(0.5) + 28(0.5) + 15(4.5) + 4(9.5) + 2(13.5) + 1(14.5)]/ 100
= 294/100
= 2.94
Therefore, the mean deviation about median of the given data is 2.94.

Marks 20 29 28 33 42 38 43 25

Number of students 6 28 24 15 2 4 1 20

Marks (xi) 20 25 28 29 33 38 42 43

Number of students (fi) 6 20 24 28 15 4 2 1

Cumulative frequency 6 26 50 78 93 97 99 100


Difference between Mean Deviation and Standard Deviation

Mean Deviation Standard Deviation


We use central points (mean,
To calculate the standard
median, mode) to calculate
deviation we only use the mean.
the mean deviation.
To calculate the mean
We use the square of the
deviation, we take the
deviations to calculate the
absolute value of the
standard deviation.
deviations.
It is one of the most commonly
It is less frequently used. used measures of variability and
frequently used.
When there are a greater
When there are fewer outliers in
number of outliers in the data,
the data, the standard deviation
mean absolute deviation is
is employed.
employed.
Mean Deviation Types
•Individual Data Series – When the given data set is on an individual
•basis.
•Discrete Data Series – When the data set is given along with their
•frequencies.
• Continuous Data Series – When the data set is based on ranges
•along with their frequencies.

Age 5 10 15 20 25 30

Shoe Size 7 8 9 10

Frequency 5 10 7 13

Age 5-10 10-15 15-20 20-25

Freque
19 23 31 26
ncy
Variance

• Variance is the expected value of the squared


variation of a random variable from its mean
value, in probability and statistics. Informally,
variance estimates how far a set of numbers
(random) are spread out from their mean value.
• The value of variance is equal to the square
of standard deviation, which is another central
tool.
• Variance is symbolically represented by σ2, s2,
or Var(X).
• The formula for variance is given by:
Variance
• Definition
• Variance is a measure of how data points differ from
the mean. According to Layman, a variance is a
measure of how far a set of data (numbers) are spread
out from their mean (average) value.
• Variance means to find the expected difference of
deviation from actual value. Therefore, variance
depends on the standard deviation of the given data
set.
• The more the value of variance, the data is more
scattered from its mean and if the value of variance is
low or minimum, then it is less scattered from mean.
Therefore, it is called a measure of spread of data from
mean.
• How to Calculate Variance
• Variance can be calculated easily by following the steps
given below:
• Find the mean of the given data set. Calculate the average
of a given set of values
• Now subtract the mean from each value and square them
• Find the average of these squared values, that will result in
variance
• Say if x1, x2, x3, x4, …,xn are the given values.
• Therefore, the mean of all these values is:
• x̄ = (x1+x2+x3+…+xn)/n
• Now subtract the mean value from each value of the given
data set and square them.
• (x1-x̄)2, (x2-x̄)2, (x3-x̄)2,…….,(xn-x̄)2
• Find the average of the above values to get the variance.
• Var (X) = [(x1-x̄)2+ (x2-x̄)2+ (x3-x̄)2+…….+(xn-x̄)2]/n
• Hence, the variance is calculated.
Variance
• Example 1: Find the Variance and Standard Deviation
of the Following Numbers: 1, 3, 5, 5, 6, 7, 9, 10.
• Solution:
• The mean = (1+ 3+ 5+ 5+ 6+ 7+ 9+ 10)/8 = 46/ 8 = 5.75
• Step 1: Subtract the mean value from individual value
• (1 – 5.75), (3 – 5.75), (5 – 5.75), (5 – 5.75), (6 – 5.75),
(7 – 5.75), (9 – 5.75), (10 – 5.75)
• = -4.75, -2.75, -0.75, -0.75, 0.25, 1.25, 3.25, 4.25
• Step 2: Squaring the above values we get, 22.563,
7.563, 0.563, 0.563, 0.063, 1.563, 10.563, 18.063
• Step 3: 22.563 + 7.563 + 0.563 + 0.563 + 0.063 + 1.563
+ 10.563 + 18.063
= 61.504
• Step 4: n = 8, therefore variance (σ2) = 61.504/ 8 =
7.69
• Now, Standard deviation (σ) = 2.77
• Example 2: Calculate the range and coefficient of
range for the following data values.
• 45, 55, 63, 76, 67, 84, 75, 48, 62, 65
• Solution:
• Let Xi values be: 45, 55, 63, 76, 67, 84, 75, 48, 62, 65
• Here,
• Maxium value (Xmax) = 84
• Minimum or Least value (Xmin) = 45
• Range = Maximum value = Minimum value
• = 84 – 45
• = 39
• Coefficient of range = (Xmax – Xmin)/(Xmax + Xmin)
• = (84 – 45)/(84 + 45)
• = 39/129
• = 0.302 (approx)
• Practice Problems
1. You have found the following ages (in years)
of 4 zebras. The zebras are randomly selected
from the 45zebras at your local zoo:7,1,9,14
Based on your sample, what is the average age
of the zebras? What is the estimated variance
of the ages?
2. Finding the Mean and Variance
Find the μ and σ2 of the number of students in
each classroom at Toni’s school:
Classroom Number of Students
A 6
B 5
C 9
D 13
E 12
F 16
G 14
Standard Deviation

• Standard Deviation
• How far our given set of data varies along with the
mean of the data is measured in standard
deviation. Thus, we define standard deviation as
the “spread of the statistical data from the mean
or average position”. We denote the standard
deviation of the data using the symbol σ.
• We can also define the standard deviation as the
square root of the variance.
• Properties of Standard Deviation
• Various properties of the Variance of the group of data
are,
• Standard Deviation is the square root of the variance
of the given data set. It is also called root mean square
deviation.
• Standard Deviation is a non-negative quantity i.e. it
always has positive values or zero values.
• If all the values in a data set are similar then Standard
Deviation has a value close to zero. Whereas if the
values in a data set are very different from each other
then standard deviation has a high positive value.
• Formula for Population Standard Deviation
• The mathematical formula to find the
standard deviation of the given data is,
Practice Numerical:
• 1. Calculate the standard deviation of the
following values:5, 10, 25, 30, 50.
2. The number of televisions sold in each day of
a week are 13, 8, 4, 9, 7, 12, 10.
Find its standard deviation.
Relation between Standard Deviation
and Variance
• Variance and Standard deviation are the most
common measure of the given set of data. They
are used to find the deviation of the values from
their mean value or the spread of all the values of
the data set.
• Variance is defined as the average degree through
which all the values of a given data set deviate
from the mean value.
• Standard Deviation is the degree to which the
values in a data set are spread out with respect to
the mean value.
Standard Deviation Variance

Standard Deviation is defined as Variance is defined as the average


the square root of of the squared
the variance. differences from the mean.

Standard deviation provides a Variance provides a measure of the


measure of the average
typical distance between data squared distance between data
points and the mean. points and the mean.

It is represented by a square of the


It is represented by the Greek
Greek symbol sigma
symbol σ.
i.e. σ2.

Its unit is the square of the unit of


It has the same unit as the data set.
the data set.

It represents the degree to which


It represents the volatility in the the average return
market or given data set. varies according to the long-term
Range

• What is Range?
• The range of the data is given as the
difference between the maximum and the
minimum values of the observations in the
data. For example, let’s say we have data on
the number of customers walking in the store
in a week.
• 10, 14, 8, 10, 15, 7
• Minimum value in data = 7
• Maximum Value in the data = 15
• Range = Maximum Value in the data – Minimum
value in the data
• = 15 – 7
• =8
• Now we can say that the range of the data is 8.
This gives us an idea about the spread of the data
but doesn’t tell how the data is distributed.
-4 5 -10 6 9
Question 1: Find out the range of the following data:
Solution:
For calculating the range of a data, we need to find out the
maximum and the minimum of the data:
Max = 9
Min = -10
Range = Max – Min
= 9 -(-10)
= 19
-4 5 -10 6 9
Question 2: Find out the mean and the
median of the same data:
Solution:
Mean of the
data: (−4+5−10+6+9)55(−4+5−10+6+9)
= 6556
= 1.2
Median is called the middle element of
the data.
Median = -10.
Coefficients of Variation
• As Standard Deviation is an absolute measure of dispersion,
one cannot use it for comparing the variability of two or
more series when they are expressed in different units.
Therefore, in order to compare the variability of two or
more series with different units it is essential to determine
the relative measure of Standard Deviation. Two of the
relative measures of Standard Deviation are Coefficient of
Standard Deviation and Coefficient of Variation.
• Coefficient of Variation is a relative measure introduced by
Karl Pearson (also known as Karl Pearson’s Coefficient of
Variation) through which two or more groups of similar data
are compared with respect to stability or homogeneity or
consistency. It is the most appropriate measure and
indicates the relationship between the standard deviation
and the arithmetic mean of the given distributions/series.
• Example 1:
• The number of points scored by two teams in
a hockey match is given below. With the help
of Coefficient of Variation, determine which
team is more consistent.
• Example 2:
• Coefficient of Variation and Standard
Deviation of two series X and Y are 55.43%
and 48.86%, and 25.5 and 24.43, respectively.
Find the means of series X and Y.
Example: Two plants C and D of a factory show the following results about the number
of workers and the wages paid to them.
Using coefficient of variation formulas, find in which plant, C or D is there greater
variability in individual wages.
Solution:
No. of workers 5000 6000

Average monthly wages $2500 $2500

Standard deviation 9 10
To Find: Which plant has greater variability.
For this, we need to find the coefficient of variation. The plant that has a
higher coefficient of variation will have greater variability.
Coefficient of variation for plant C.
Using coefficient of variation formula,
CV = (σ/μ) × 100, μ≠0
CV = (9/2500) × 100
CV = 0.36%
Now, CV for plant D
CV = (σ/μ) × 100
CV = (10/2500) × 100
CV = 0.4%
Plant C has CV = 0.36 and plant D has CV = 0.4
Answer: Hence plant D has greater variability in individual wages.
Coefficient of Variation and Standard
Deviation
Coefficient of Variation Standard deviation

It is a relative measure of It is an absolute measure


dispersion of dispersion

It measures the ratio of It measures how far a


the standard deviation to data point lies from the
the mean mean

Coefficient of variation is Standard deviation is


usually used to compare used to measure the
the variation of different dispersion of data in a
data sets single data set
What is a Moment in Statistics?
• We generally use moments in
statistics, machine learning, mathematics, and
other fields to describe the characteristics of
a distribution.
• Let’s say the variable of our interest is X then,
moments are X’s expected values. For
example, E(X), E(X²), E(X³), E(X⁴),…, etc.
• Moments in statistics:
• 1) First Moment: Measure of the central
location. (MEAN)
• 2) Second Moment: Measure of
dispersion/spread.(VARIANCE)
• 3) Third Moment: Measure of asymmetry.
• 4) Fourth Moment: Measure of
outliers/tailedness.
• The third moment is called skewness, and the
fourth moment is known as kurtosis.
• The third moment measures the asymmetry of
distribution while the fourth moment
measures how heavy the tail values are.
Physicists generally use the higher-order
moments in applications of physics. Let’s have
a look at the visualization of the third and
fourth moments.
• Third Moment(Skewness):
• 1) No Skew:
2) Positive Skew:
3) Negative Skew:
Fourth Moment(Kurtosis):
Thank You

You might also like