Notes Stats Unit1
Notes Stats Unit1
Listed below are a few important points that help to summarize our learning
on this concept of mode.
Mode value can sometimes be the same as mean and/or median, but not always.
The mode is very useful to find out categorical data.
There can be no mode for data that does not have any repeating numbers.
Mode can also be found out for data sets that do not have any numbers.
It is easy to find the mode when the given set of numbers are arranged in ascending order.
Mode for ungrouped data can be found by observation, whereas mode for grouped data can
be found using the formula.
For example, The mode of Set A = {2,2,2,3,4,4,5,5,5} is 2 and 5, because both 2 and 5
is repeated three times in the given set.
When there are three modes in a data set, then the set is called trimodal
When there are four or more modes in a data set, then the set is called multimodal
Example: The following table represents the number of wickets taken by a bowler
in 10 matches. Find the mode of the given set of data.
It can be seen that 2 wickets were taken by the bowler frequently in different matches.
Hence, the mode of the given data is 2.
Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48.
15 is the mode since it is appearing more number of times in the set compared to other
numbers.
Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48.
15 is the mode since it is appearing more number of times in the set compared to other
numbers.
Example 2: Find the mode of 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 data set.
Solution: Given: 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 is the data set.
As we know, a data set or set of values can have more than one mode if more than one
value occurs with equal frequency and number of time compared to the other values in
the set.
Hence, here both the number 4 and 15 are modes of the set.
Solution: If no value or number in a data set appears more than once, then the set has
no mode.
Hence, for set 3, 6, 9, 16, 27, 37, 48, there is no mode available.
Where,
Solution:
The maximum class frequency is 12 and the class interval corresponding to this
frequency is 20 – 30. Thus, the modal class is 20 – 30.
Copious Felicitations, to be a father on this gala day, genuinely this beatitude is utterly top of the world.
Wishing you all the best in this new chapter of your life likewise your world just became a little brighter
with this new arrival. Enjoy every precious moment with your new bundle of joy!
Measure of Dispersion: Range, quartile deviation, standard deviation, variance, coefficient of variation
Dispersion (variability', 'scatter' and 'spread') is the state of getting dispersed or spread.
Statistical dispersion means the extent to which numerical data is likely to vary about an
average value. In other words, dispersion helps to understand the distribution of the data.
In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.
There are two main types of dispersion methods in statistics which are:
An absolute measure of dispersion contains the same unit as the original data set. The absolute
dispersion method expresses the variations in terms of the average of deviations of
observations like standard or means deviations. It includes range, standard deviation, quartile
deviation, etc.
The types of absolute measures of dispersion are:
1. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set, square each of them and add
each square and finally divide them by the total no of values in the data set to get the
variance. Variance (σ2) = ∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard deviation
i.e. S.D. = √σ.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers
into quarters. The quartile deviation is half of the distance between the third and the
first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of
central tendency is known as the mean deviation (also called mean absolute deviation).
1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation
Range: It is defined as the difference between the largest and the smallest value in the
distribution.
Mean Deviation: It is the arithmetic mean of the difference between the values and their
mean.
Standard Deviation: It is the square root of the arithmetic average of the square of the
deviations measured from the mean.
Variance: It is defined as the average of the square deviation from the mean of the given data
set.
Quartile Deviation: It is defined as half of the difference between the third quartile and the
first quartile in a given data set.
Interquartile Range: The difference between upper(Q3 ) and lower(Q1) quartile is called
Interterquartile Range. Its formula is given as Q3 – Q1.
Example: Find out the range for the following observations, 20, 24, 31, 17, 45, 39, 51, 61.
Solution:
Largest Value = 61
Smallest Value = 17
Thus, the range of the data set is
Range = 61 – 17 = 44
Range for Grouped Data
The range of the grouped data set is found by studying the following example,
Example: Find out the range for the following frequency distribution table for the marks
scored by class 10 students.
Marks Intervals Number of Students
0-10 5
10-20 8
20-30 15
Marks Intervals Number of Students
30-40 9
Solution:
For example :
Let us consider this set of data : -5, 10, 25
Mean = (-5 + 10 + 25)/3 = 10
Now a deviation from the mean for different values is,
(-5 -10) = -15
(10 – 10) = 0
(25 – 10) = 15
Now adding the deviations, shows that there is zero deviation from the mean
which is incorrect. Thus, to counter this problem only the absolute values of the
difference are taken while calculating the mean deviation.
M. D =
⇒ M.D =
⇒ M.D = (4+2+0+2+4)/(5)
⇒ M.D = 12/5 = 2.4
Quartile deviation:-
Statistical dispersion means the extent to which numerical data is likely to vary about an
average value. Quartile deviation is a statistic that measures the deviation in the middle of the
data. . Quartiles are the values that divide a list of numerical data into three-quarters, such as
Q1, Q2 and Q3.
Quartile deviation depends on the difference between the first quartile and the third quartile in
the frequency distribution. The difference is also known as the interquartile range. The
difference divided by two is known as quartile deviation or semi-interquartile range.
Suppose Q1 is the lower quartile, Q2 is the median , and Q3 is the upper quartile for the given
data set, then its quartile deviation can be calculated using the following formula.
Q2 = [(n+1)/2]th item
Q3 = [3(n+1)/4]th item
Where n represents the total number of observations in the given data set.
Also, Q2 is the median of the given data set, Q1 is the median of the lower half of the
data set and Q3 is the median of the upper half of the data set.
Before, estimating the quartiles, we have to arrange the given data values in ascending
order. If the value of n is even, we can follow the similar procedure of finding the
median.
The quartile deviation can be calculated in two different methods, based on the type of given
data. The quartile deviation is calculated differently for ungrouped data and for the grouped
data. The quartile deviation is
Here,
Example 1:
Solution:
Given data:
= (10 + 14)/2
= 24/2
= 12
Q2 = 12
= (7 + 8)/2
= 15/2
= 7.5
14, 15, 17, 18, 24, 27, 28, 48 (even number of observations)
= (18 + 24)/2
= 42/2
= 21
= (21 – 7.5)/2
= 13.5/2
= 6.75
Therefore, the quartile deviation for the given data set is 6.75.
Example 2:
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 5 3 4 3 3 4 7 9 7 8
Solution:
Let us calculate the cumulative frequency for the given distribution of data.
We know that,
Finding Q1:
r=1
= 30 + (1.25/3) × 10
= 30 + (12.5/3)
= 30 + 4.167
= 34.167
Finding Q3:
r=3
= 80 + (1.75/7) × 10
= 80 + (17.5/7)
= 80 + 2.5
= 82.5
QD = (82.5 – 34.167)/2
= 48.333/2
= 24.1665
Hence, the quartile deviation of the given distribution is 24.167 (approximately).
Formally, the Quartile Deviation is equal to the half of the Inter-Quartile Range and thus we can
write it as –
Qd=Q3–Q12
Therefore, we also call it the Semi Inter-Quartile Range.
The Quartile Deviation doesn’t take into account the extreme points of the distribution.
Thus, the dispersion or the spread of only the central 50% data is considered.
If the scale of the data is changed, the Qd also changes in the same ratio.
It is the best measure of dispersion for open-ended systems (which have open-ended
extreme ranges).
Also, it is less affected by sampling fluctuations in the dataset as compared to the range
(another measure of dispersion).
Since it is solely dependent on the central values in the distribution, if in any experiment,
these values are abnormal or inaccurate, the result would be affected drastically.
Learn more about Range and Mean Deviation here in detail.
Quartile Deviation =
Q3–Q12
Q1 = lower quartile
Q3 = upper quartile
Q1 =
[(n+1)4]thitem
Q2 =
[(n+1)2]thitem
Q3 =
[3(n+1)4]thitem
Here, n is the total number of observations.
It is important to note here that students need to arrange the given data values in ascending order
before estimating the quartiles.
Quartile Deviation for Grouped Data
For a grouped data, the quartiles can be calculated using the following formula:
Qr=l1+r(N4)−cf(l2−l1)
Here,
Qr = rth quartile
Based on the quartiles, a relative measure of dispersion, known as the Coefficient of Quartile
Deviation, can be defined for any distribution. It is formally defined as –
Since it involves a ratio of two quantities of the same dimensions, it is unitless. Thus, it can act
as a suitable parameter for comparing two or more different datasets which may or may not
involve quantities with the same dimensions.
So, now let’s go through the solved examples below to get a better idea of how to apply these
concepts to various distributions.
Question 1: The number of vehicles sold by a major Toyota Showroom in a day was recorded for
10 working days. The data is given as –
Day Frequency
1 20
2 15
3 18
4 5
5 10
6 17
7 21
8 19
9 25
10 28
Find the Quartile Deviation and its coefficient for the given discrete distribution case.
Solution: We first need to sort the frequency data given to us before proceeding with the quartiles
calculation –
Sorted Data – 5, 10, 15, 17, 18, 19, 20, 21, 25, 28
n(number of data points) = 10
Now, to find the quartiles, we use the logic that the first quartile lies halfway between the lowest
value and the median; and the third quartile lies halfway between the median and the largest
value.
No. of
Marks
Students
0-10 10
10-20 20
20-30 30
30-40 50
40-50 40
50-60 30
Solution: For the case of a grouped-data distribution, we can find the quartiles through the
following steps –
⇒ Construct a cumulative frequency table for the given data alongside the given distribution
⇒ From the total number of data values, estimate the groups/classes of the Lower and Upper
Quartiles
⇒ Use the following formulae to then calculate the quartiles:
For the given data, we can form the required table with the cumulative frequency as –
Cumulative
Marks Frequency
Frequency
0-10 10 10
10-20 20 30
20-30 30 60
30-40 50 110
40-50 40 150
50-60 30 180
Since the total number of students is 180, the first quartile must lie at the position of 180/4 = 45th
student. Similarly, the third quartile must lie at the position of 180×3/4 = 135th student. By the
distribution of our data into groups, we can note that the first quartile will lie in the 20-30 marks
range.
Calculation –
Q1 = LB+w14n–fcf
Here, LB = 20; w = 10
f_c = 30; f = 30; n = 180
Thus, Q1 = 20+1014×180–3030
=20+1530×10
= 25
Similarly, the third quartile will lie in the 40-50 marks range. Calculation –
Q3 = LB+w34n–fcf
Here, LB = 40; w = 10
f_c = 110; f = 40; n = 180
Thus, Q3 = 40+1034×180–11040
=40+2540×10
= 46.25
Now, using the values for Q1 and Q3, now we can calculate the Quartile Deviation and its
coefficient as follows –
https://www.toppr.com/guides/business-mathematics-and-statistics/measures-of-central-
tendency-and-dispersion/quartile-deviation/
Variance is a measure of how data points differ from the mean. According to Layman, a
variance is a measure of how far a set of data (numbers) are spread out from their mean
(average) value.
Variance means to find the expected difference of deviation from actual value. Therefore,
variance depends on the standard deviation of the given data set.
The more the value of variance, the data is more scattered from its mean and if the value of
variance is low or minimum, then it is less scattered from mean. Therefore, it is called a
measure of spread of data from mean.
There can be two types of variances in statistics, namely, sample variance and population
variance.
Population Variance - All the members of a group are known as the population. When we want
to find how each data point in a given population varies or is spread out then we use
the population variance. It is used to give the squared distance of each data point from the
population mean.
Sample Variance - If the size of the population is too large then it is difficult to take each data
point into consideration. In such a case, a select number of data points are picked up from the
population to form the sample that can describe the entire group. Thus, the sample
variance can be defined as the average of the squared distances from the mean. The variance is
always calculated with respect to the sample mean.
A general definition of variance is that it is the expected value of the squared differences from
the mean.
Variance Example
Suppose we have the data set {3, 5, 8, 1} and we want to find the population variance. The
mean is given as (3 + 5 + 8 + 1) / 4 = 4.25. Then by using the definition of variance we get [(3 -
4.25)2 + (5 - 4.25)2 + (8 - 4.25)2 + (1 - 4.25)2] / 4 = 6.68. Thus, variance = 6.68.
Standard Deviation
Standard deviation is the positive square root of the variance. It is one of the basic methods of
statistical analysis. Standard Deviation is commonly abbreviated as SD and denoted by the
symbol 'σ’ and it tells about how much data values are deviated from the mean value. If we get
a low standard deviation then it means that the values tend to be close to the mean whereas a
high standard deviation tells us that the values are far from the mean value.
It tells how the values are spread across the data sample and it is the measure of the variation
of the data points from the mean. The standard deviation of a data set, sample, statistical
population, random variable, or probability distribution is the square root of its variance.
Standard Deviation of Ungrouped Data
The calculations for standard deviation differ for different data. Distribution measures the
deviation of data from its mean or average position. There are three methods to find the standard
deviation.
Probability means possibility. It is a branch of mathematics that deals with the occurrence of a
random event. The value is expressed from zero to one. Probability has been introduced in
Maths to predict how likely events are to happen. The meaning of probability is basically the
extent to which something is likely to happen. This is the basic probability theory, which is also
used in the probability distribution, where you will learn the possibility of outcomes for a
random experiment. To find the probability of a single event to occur, first, we should know the
The probability is the measure of the likelihood of an event to happen. It measures the
certainty of the event. The formula for probability is given by;
P(E) = n(E)/n(S)
Here,
Sample Space
A sample space is the set of all possible results or outcomes of a random experiment.
Suppose, if we have thrown a dice, randomly, then the sample space for this
experiment will be all possible outcomes of throwing a dice, such as;
Random Variables
The variables which denote the possible outcomes of a random experiment are called
random variables. They are of two types:
Discrete random variables take only those distinct values which are countable. Whereas
continuous random variables could take an infinite number of possible values.
Independent Event
When the probability of occurrence of one event has no impact on the probability of
another event, then both the events are termed as independent of each other. For
example, if you flip a coin and at the same time you throw a dice, the probability of
getting a ‘head’ is independent of the probability of getting a 6 in dice.
Mean
Mean of a random variable is the average of the random values of the possible
outcomes of a random experiment. In simple terms, it is the expectation of the possible
outcomes of the random experiment, repeated again and again or n number of times. It
is also called the expectation of a random variable.
Expected Value
Expected value is the mean of a random variable. It is the assumed value which is
considered for a random experiment. It is also called expectation, mathematical
expectation or first moment. For example, if we roll a dice having six faces, then the
expected value will be the average value of all the possible outcomes, i.e. 3.5.
Variance
Basically, the variance tells us how the values of the random variable are spread around
the mean value. It specifies the distribution of the sample space across the mean.
P(A)=King 4/52
P(B)=Queen 4/52
P(A)=3=> 1/6
P(A)=5,10,15,20,25,30=> 6/30
Example #1
Suppose an investor considers investing in two stocks, A and B. The
probability of stock A increasing in value over the next year is 0.4, and
the probability of stock B increasing in value over the next year is 0.6.
The investor wants to know the probability that at least one of the two
stocks will increase in value. By using the addition rule of probability,
we can calculate the probability as follows:
= 0.76
So, there is a 76% probability that at least one of the two stocks will
increase in value over the next year.
Let’s say a bank is considering giving loans to two borrowers, X and Y.
The probability of borrower X defaulting on the loan is 0.3, and the
probability of borrower Y defaulting on the loan is 0.4. The bank wants
to know the probability that at least one borrower will default. By
using the addition rule of probability, we can calculate the probability
as follows:
= 0.58
If A and B are dependent events, then the probability of both events occurring
simultaneously is given by:
If A and B are two independent events in an experiment, then the probability of both
events occurring simultaneously is given by:
Proof
We know that the conditional probability of event A given that B has occurred is denoted
by P(A|B) and is given by:
�(�|�)=�(�∩�)�(�)
Where, P(B)≠0
�(�|�) = �(�∩�)�(�)
Where, P(A) ≠ 0.
P(B∩A) = P(A)×P(B|A)
P(A) ≠ 0,P(B) ≠ 0.
For independent events A and B, P(B|A) = P(B). The equation (2) can be modified into,
Let us learn here the multiplication theorems for independent events A and B.
If A and B are two independent events for a random experiment, then the probability of
simultaneous occurrence of two independent events will be equal to the product of their
probabilities. Hence,
P(A∩B) = P(A).P(B)
Now, from multiplication rule we know;
P(A∩B) = P(A)×P(B|A)
P(B|A) = P(B)
P(A∩B) = P(A).P(B)
Hence, proved.
Solution: Let A and B denote the events that the first and the second balls are drawn
are red balls. We have to find P(A∩B) or P(AB).
Now, only 19 red balls and 10 blue balls are left in the bag. The probability of drawing a
red ball in the second draw too is an example of conditional probability where the
drawing of the second ball depends on the drawing of the first ball.
P(B|A) = 19/29
Conditional Probability
The conditional probability, as its name suggests, is the probability of happening an event that
is based upon a condition. Conditional probability is known as the possibility of an event or
outcome happening, based on the existence of a previous event or outcome. It is calculated by
multiplying the probability of the preceding event by the renewed probability of the
succeeding, or conditional, event.
For example, assume that the probability of a boy playing tennis in the evening is 95% (0.95)
whereas the probability that he plays given that it is a rainy day is less which is 10% (0.1). Then
the former case is just normal probability whereas the latter case is the conditional probability.
In this example, we represent the two probabilities as P(Play tennis) = 0.95 and P(Play tennis |
Rainy day) = 0.1.
If A and B are two events associated with the same sample space of a random experiment, the
conditional probability of event A given that B has occurred is given by P(A/B) = P( A ∩ B)/ P
(B), provided P(B) ≠ 0.
Let us understand conditional probability with an example. Let us find the conditional
probability of getting at least two tails given that it is a head on the first toss when 3 coins are
tossed. The sample space, S (the list of all outcomes) when 3 coins are tossed is given as
follows:
Then, A = {HTT, THT, TTH, TTT} and B = {HHH, HHT, HTH, HTT}.
We have to find the probability of getting at least two tails given that it is a head on the first
toss. It means, out of all elements of B, we have to choose only the ones with two tails. We can
see that among the elements of B, there is only one element (which is HTT) with two tails. Thus,
the required probability is P(A | B) = 1/4 (only 1 outcome of B is favorable to A out of 4
outcomes of B).
Bayes theorem is also known as the Bayes Rule or Bayes Law. It is used to determine
the conditional probability of event A when event B has already happened. The
general statement of Bayes’ theorem is “The conditional probability of an event A,
given the occurrence of another event B, is equal to the product of the event of B,
given A and the probability of A divided by the probability of event B.” i.e.
P(A|B) = P(B|A)P(A) / P(B)
where,
P(A) and P(B) are the probabilities of events A and B
P(A|B) is the probability of event A when event B happens
P(B|A) is the probability of event B when A happens
Example 1: A person has undertaken a job. The probabilities of completion of the job on
time with and without rain are 0.44 and 0.95 respectively. If the probability that it will rain
is 0.45, then determine the probability that the job will be completed on time.
Solution:
Let E1 be the event that the mining job will be completed on time and E 2 be the event that it
rains. We have,
P(A) = 0.45,
P(no rain) = P(B) = 1 − P(A) = 1 − 0.45 = 0.55
By multiplication law of probability,
P(E1) = 0.44
P(E2) = 0.95
Since, events A and B form partitions of the sample space S, by total probability theorem, we
have
P(E) = P(A) P(E1) + P(B) P(E2)
= 0.45 × 0.44 + 0.55 × 0.95
= 0.198 + 0.5225 = 0.7205
So, the probability that the job will be completed on time is 0.684.
Example 2: There are three urns containing 3 white and 2 black balls; 2 white and 3 black
balls; 1 black and 4 white balls respectively. There is an equal probability of each urn being
chosen. One ball is equal probability chosen at random. What is the probability that a
white ball is drawn?
Solution:
Let E1, E2, and E3 be the events of choosing the first, second, and third urn respectively. Then,
P(E1) = P(E2) = P(E3) =1/3
Let E be the event that a white ball is drawn. Then,
P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
By theorem of total probability, we have
P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
= (3/5 × 1/3) + (2/5 × 1/3) + (4/5 × 1/3)
= 9/15 = 3/5