0% found this document useful (0 votes)
14 views42 pages

Notes Stats Unit1

The mode is the value that occurs most frequently in a data set. It is one of three measures of central tendency, along with the mean and median. The mode of the data set {3, 7, 8, 8, 9} is 8, as it occurs most frequently. There can be no mode if no values are repeated. The mode can also be calculated for grouped frequency data using a formula involving the modal class frequency and the frequencies of neighboring classes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views42 pages

Notes Stats Unit1

The mode is the value that occurs most frequently in a data set. It is one of three measures of central tendency, along with the mean and median. The mode of the data set {3, 7, 8, 8, 9} is 8, as it occurs most frequently. There can be no mode if no values are repeated. The mode can also be calculated for grouped frequency data using a formula involving the modal class frequency and the frequencies of neighboring classes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

In statistics, the mode is the value that is repeatedly occurring in a given set.

We can also say


that the value or number in a data set, which has a high frequency or appears more frequently,
is called mode or modal value.
It is one of the three measures of central tendency, apart from mean and median. For example,
the mode of the set {3, 7, 8, 8, 9}, is 8.

Listed below are a few important points that help to summarize our learning
on this concept of mode.

 Mode value can sometimes be the same as mean and/or median, but not always.
 The mode is very useful to find out categorical data.
 There can be no mode for data that does not have any repeating numbers.
 Mode can also be found out for data sets that do not have any numbers.
 It is easy to find the mode when the given set of numbers are arranged in ascending order.
 Mode for ungrouped data can be found by observation, whereas mode for grouped data can
be found using the formula.

Bimodal, Trimodal & Multimodal


 When there are two modes in a data set, then the set is called bimodal

For example, The mode of Set A = {2,2,2,3,4,4,5,5,5} is 2 and 5, because both 2 and 5
is repeated three times in the given set.

 When there are three modes in a data set, then the set is called trimodal

For example, the mode of set A = {2,2,2,3,4,4,5,5,5,7,8,8,8} is 2, 5 and 8

 When there are four or more modes in a data set, then the set is called multimodal

Mode Formula in Statistics (Ungrouped Data)


Let us look into an example to get a better insight.

Example: The following table represents the number of wickets taken by a bowler
in 10 matches. Find the mode of the given set of data.
It can be seen that 2 wickets were taken by the bowler frequently in different matches.
Hence, the mode of the given data is 2.

Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48.

Solution: In the following list of numbers,

3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48

15 is the mode since it is appearing more number of times in the set compared to other
numbers.

Example 1: Find the mode of the given data set: 3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48.

Solution: In the following list of numbers,

3, 3, 6, 9, 15, 15, 15, 27, 27, 37, 48

15 is the mode since it is appearing more number of times in the set compared to other
numbers.

Example 2: Find the mode of 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 data set.

Solution: Given: 4, 4, 4, 9, 15, 15, 15, 27, 37, 48 is the data set.

As we know, a data set or set of values can have more than one mode if more than one
value occurs with equal frequency and number of time compared to the other values in
the set.

Hence, here both the number 4 and 15 are modes of the set.

Example 3: Find the mode of 3, 6, 9, 16, 27, 37, 48.

Solution: If no value or number in a data set appears more than once, then the set has
no mode.

Hence, for set 3, 6, 9, 16, 27, 37, 48, there is no mode available.

Mode Formula For Grouped Data


In the case of grouped frequency distribution, calculation of mode just by looking into
the frequency is not possible. To determine the mode of data in such cases we
calculate the modal class. Mode lies inside the modal class. The mode of data is given
by the formula:

Where,

l = lower limit of the modal class

h = size of the class interval

f1 = frequency of the modal class

f0 = frequency of the class preceding the modal class

f2 = frequency of the class succeeding the modal class

Example 4: In a class of 30 students marks obtained by students in mathematics


out of 50 is tabulated as below. Calculate the mode of data given.

Solution:

The maximum class frequency is 12 and the class interval corresponding to this
frequency is 20 – 30. Thus, the modal class is 20 – 30.

Lower limit of the modal class (l) = 20

Size of the class interval (h) = 10

Frequency of the modal class (f1) = 12


Frequency of the class preceding the modal class (f0) = 5

Frequency of the class succeeding the modal class (f2)= 8

Substituting these values in the formula we get;


The joy of parenthood begins with the arrival of your little one. Congratulations on becoming a father!
it's the best feeling in the world. Wishing you all the best in this new chapter of your life. The world just
became a little brighter with your new arrival. Congratulations on fatherhood!

Copious Felicitations, to be a father on this gala day, genuinely this beatitude is utterly top of the world.
Wishing you all the best in this new chapter of your life likewise your world just became a little brighter
with this new arrival. Enjoy every precious moment with your new bundle of joy!

Measure of Dispersion: Range, quartile deviation, standard deviation, variance, coefficient of variation

Dispersion (variability', 'scatter' and 'spread') is the state of getting dispersed or spread.
Statistical dispersion means the extent to which numerical data is likely to vary about an
average value. In other words, dispersion helps to understand the distribution of the data.

In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.

Types of Measures of Dispersion

There are two main types of dispersion methods in statistics which are:

 Absolute Measure of Dispersion


 Relative Measure of Dispersion

Absolute Measure of Dispersion

An absolute measure of dispersion contains the same unit as the original data set. The absolute
dispersion method expresses the variations in terms of the average of deviations of
observations like standard or means deviations. It includes range, standard deviation, quartile
deviation, etc.
The types of absolute measures of dispersion are:

1. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set, square each of them and add
each square and finally divide them by the total no of values in the data set to get the
variance. Variance (σ2) = ∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard deviation
i.e. S.D. = √σ.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers
into quarters. The quartile deviation is half of the distance between the third and the
first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of
central tendency is known as the mean deviation (also called mean absolute deviation).

Relative Measure of Dispersion


The relative measures of dispersion are used to compare the distribution of two or more data
sets. This measure compares values without units. Common relative dispersion methods
include:

1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation

Range: It is defined as the difference between the largest and the smallest value in the
distribution.

Mean Deviation: It is the arithmetic mean of the difference between the values and their
mean.
Standard Deviation: It is the square root of the arithmetic average of the square of the
deviations measured from the mean.
Variance: It is defined as the average of the square deviation from the mean of the given data
set.
Quartile Deviation: It is defined as half of the difference between the third quartile and the
first quartile in a given data set.
Interquartile Range: The difference between upper(Q3 ) and lower(Q1) quartile is called
Interterquartile Range. Its formula is given as Q3 – Q1.

Here are some of the relative measures of dispersion:


Coefficient of Range: It is defined as the ratio of the difference between the highest and
lowest value in a data set to the sum of the highest and lowest value.
Coefficient of Variation: It is defined as the ratio of the standard deviation to the mean of the
data set. We use percentages to express the coefficient of variation.
Coefficient of Mean Deviation: It is defined as the ratio of the mean deviation to the value of
the central point of the data set.
Coefficient of Quartile Deviation: It is defined as the ratio of the difference between the third
quartile and the first quartile to the sum of the third and first quartiles.

Range of Data Set


The range is the difference between the largest and the smallest values in the
distribution. Thus, it can be written as
R=L–S
where,
L is the largest value in the Distribution
S is the smallest value in the Distribution
 A higher value of range implies higher variation in the data set.
 One drawback of this measure is that it only takes into account the maximum and
the minimum value. They might not always be the proper indicator of how the
values of the distribution are scattered.
Example: Find the range of the data set 10, 20, 15, 0, 100.
Solution:
 Smallest Value in the data = 0
 Largest Value in the data = 100
Thus, the range of the data set is,
R = 100 – 0
R = 100

Example: Find out the range for the following observations, 20, 24, 31, 17, 45, 39, 51, 61.
Solution:
 Largest Value = 61
 Smallest Value = 17
Thus, the range of the data set is
Range = 61 – 17 = 44
Range for Grouped Data
The range of the grouped data set is found by studying the following example,
Example: Find out the range for the following frequency distribution table for the marks
scored by class 10 students.
Marks Intervals Number of Students

0-10 5

10-20 8

20-30 15
Marks Intervals Number of Students

30-40 9

Solution:

 For Largest Value: Taking the higher limit of Highest Class = 40


 For Smallest Value: Taking the lower limit of Lowest Class = 0
Range = 40 – 0
Thus, the range of the given data set is,
Range = 40

For example :
Let us consider this set of data : -5, 10, 25
Mean = (-5 + 10 + 25)/3 = 10
Now a deviation from the mean for different values is,
 (-5 -10) = -15
 (10 – 10) = 0
 (25 – 10) = 15
Now adding the deviations, shows that there is zero deviation from the mean
which is incorrect. Thus, to counter this problem only the absolute values of the
difference are taken while calculating the mean deviation.

Mean Deviation Formula :

Mean Deviation for Ungrouped Data


For calculating the mean deviation for ungrouped data, the following steps must
be followed:
Step 1: Calculate the arithmetic mean for all the values of the dataset.
Step 2: Calculate the difference between each value of the dataset and the
mean. Only absolute values of the differences will be considered. |d|
Step 3: Calculate the arithmetic mean of these deviations using the formula,

This can be explained using the example.


Example: Calculate the mean deviation for the given ungrouped data, 2, 4,
6, 8, 10
Solution:
Mean(μ) = (2+4+6+8+10)/(5)
μ=6

M. D =

⇒ M.D =
⇒ M.D = (4+2+0+2+4)/(5)
⇒ M.D = 12/5 = 2.4

Quartile deviation:-
Statistical dispersion means the extent to which numerical data is likely to vary about an
average value. Quartile deviation is a statistic that measures the deviation in the middle of the
data. . Quartiles are the values that divide a list of numerical data into three-quarters, such as
Q1, Q2 and Q3.

Quartile deviation depends on the difference between the first quartile and the third quartile in
the frequency distribution. The difference is also known as the interquartile range. The
difference divided by two is known as quartile deviation or semi-interquartile range.

Suppose Q1 is the lower quartile, Q2 is the median , and Q3 is the upper quartile for the given
data set, then its quartile deviation can be calculated using the following formula.

Quartile Deviation = (Q3 – Q1) / 2

Coefficient of Quartile Deviation = (Q3 – Q1) / (Q3 + Q1)

Quartile Deviation for Ungrouped Data


For an ungrouped data, quartiles can be obtained using the following formulas,
Q1 = [(n+1)/4]th item

Q2 = [(n+1)/2]th item

Q3 = [3(n+1)/4]th item

Where n represents the total number of observations in the given data set.

Also, Q2 is the median of the given data set, Q1 is the median of the lower half of the
data set and Q3 is the median of the upper half of the data set.

Before, estimating the quartiles, we have to arrange the given data values in ascending
order. If the value of n is even, we can follow the similar procedure of finding the
median.

How to Find Quartile Deviation?

The quartile deviation can be calculated in two different methods, based on the type of given
data. The quartile deviation is calculated differently for ungrouped data and for the grouped
data. The quartile deviation is

Quartile Deviation for Grouped Data


For a grouped data, we can find the quartiles using the formula,

Here,

Qr = the rth quartile

l1 = the lower limit of the quartile class

l2 = the upper limit of the quartile class

f = the frequency of the quartile class

c = the cumulative frequency of the class preceding the quartile class


N = Number of observations in the given data set

Example 1:

Find the quartiles and quartile deviation of the following data:

17, 2, 7, 27, 15, 5, 14, 8, 10, 24, 48, 10, 8, 7, 18, 28

Solution:

Given data:

17, 2, 7, 27, 15, 5, 14, 8, 10, 24, 48, 10, 8, 7, 18, 28

Ascending order of the given data is:

2, 5, 7, 7, 8, 8, 10, 10, 14, 15, 17, 18, 24, 27, 28, 48

Number of data values = n = 16

Q2 = Median of the given data set

n is even, median = (1/2) [(n/2)th observation and (n/2 + 1)th observation]

= (1/2)[8th observation + 9th observation]

= (10 + 14)/2

= 24/2

= 12

Q2 = 12

Now, lower half of the data is:

2, 5, 7, 7, 8, 8, 10, 10 (even number of observations)

Q1 = Median of lower half of the data

= (1/2)[4th observation + 5th observation]

= (7 + 8)/2
= 15/2

= 7.5

Also, the upper half of the data is:

14, 15, 17, 18, 24, 27, 28, 48 (even number of observations)

Q3= Median of upper half of the data

= (1/2)[4th observation + 5th observation]

= (18 + 24)/2

= 42/2

= 21

Quartile deviation = (Q3 – Q1)/2

= (21 – 7.5)/2

= 13.5/2

= 6.75

Therefore, the quartile deviation for the given data set is 6.75.

Example 2:

Calculate the quartile deviation for the following distribution.

Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
Frequency 5 3 4 3 3 4 7 9 7 8
Solution:

Let us calculate the cumulative frequency for the given distribution of data.

Class Frequency Cumulative Frequency


0 – 10 5 5
10 – 20 3 5+3=8
20 – 30 4 8 + 4 = 12
30 – 40 3 12 + 3 = 15
40 – 50 3 15 + 3 = 18
50 – 60 4 18 + 4 = 22
60 – 70 7 22 + 7 = 29
70 – 80 9 29 + 9 = 38
80 – 90 7 38 + 7 = 45
90 – 100 8 45 + 8 = 53
Here, N = 53

We know that,

Finding Q1:

r=1

N/4 = 53/4 = 13.25

Thus, Q1 lies in the interval 30 – 40.

In this case, quartile class = 30 – 40

l1 = the lower limit of the quartile class = 30

l2 = the upper limit of the quartile class = 40

f = the frequency of the quartile class = 3

c = the cumulative frequency of the class preceding the quartile class = 12

Now, by substituting these values in the formula we get:

Q1 = 30 + [(13.25 – 12)/3] × (40 – 30)

= 30 + (1.25/3) × 10
= 30 + (12.5/3)

= 30 + 4.167

= 34.167

Finding Q3:

r=3

3N/4 = 3 × 13.25 = 39.75

Thus, Q3 lies in the interval 80 – 90.

In this case, quartile class = 80 – 90

l1 = the lower limit of the quartile class = 80

l2 = the upper limit of the quartile class = 90

f = the frequency of the quartile class = 7

c = the cumulative frequency of the class preceding the quartile class = 38

Now, by substituting these values in the formula we get:

Q3 = 80 + [(39.75 – 38)/7] × (90 – 80)

= 80 + (1.75/7) × 10

= 80 + (17.5/7)

= 80 + 2.5

= 82.5

Finally, the quartile deviation = (Q3 – Q1)/2

QD = (82.5 – 34.167)/2

= 48.333/2

= 24.1665
Hence, the quartile deviation of the given distribution is 24.167 (approximately).

The Quartile Deviation

Formally, the Quartile Deviation is equal to the half of the Inter-Quartile Range and thus we can
write it as –

Qd=Q3–Q12
Therefore, we also call it the Semi Inter-Quartile Range.

 The Quartile Deviation doesn’t take into account the extreme points of the distribution.
Thus, the dispersion or the spread of only the central 50% data is considered.

 If the scale of the data is changed, the Qd also changes in the same ratio.

 It is the best measure of dispersion for open-ended systems (which have open-ended
extreme ranges).
 Also, it is less affected by sampling fluctuations in the dataset as compared to the range
(another measure of dispersion).

 Since it is solely dependent on the central values in the distribution, if in any experiment,
these values are abnormal or inaccurate, the result would be affected drastically.
Learn more about Range and Mean Deviation here in detail.

Quartile Deviation Formula

Quartile Deviation =

Q3–Q12
Q1 = lower quartile

Q3 = upper quartile

Q2 is also known as the median.

Quartile Deviation for Ungrouped Data


For an ungrouped data, the formula to calculate quartiles are:

Q1 =

[(n+1)4]thitem
Q2 =

[(n+1)2]thitem
Q3 =

[3(n+1)4]thitem
Here, n is the total number of observations.

It is important to note here that students need to arrange the given data values in ascending order
before estimating the quartiles.
Quartile Deviation for Grouped Data
For a grouped data, the quartiles can be calculated using the following formula:

Qr=l1+r(N4)−cf(l2−l1)
Here,

Qr = rth quartile

l1 = the lower limit of the quartile class

l2 = the upper limit of the quartile class

f = the frequency of the quartile class

c = the cumulative frequency of the class preceding the quartile class

N = Number of observations in the given data set

The Coefficient of Quartile Deviation

Based on the quartiles, a relative measure of dispersion, known as the Coefficient of Quartile
Deviation, can be defined for any distribution. It is formally defined as –

Coefficient of Quartile Deviation = Q3–Q1Q3+Q1×100

Since it involves a ratio of two quantities of the same dimensions, it is unitless. Thus, it can act
as a suitable parameter for comparing two or more different datasets which may or may not
involve quantities with the same dimensions.
So, now let’s go through the solved examples below to get a better idea of how to apply these
concepts to various distributions.

Importance of Quartile Deviation


Statistics is a tool that helps us understand the data, its frequency, and the distribution of the
trends. Quartile deviation is the difference between the first quartile and the third quartile in the
frequency distribution table. This is also known as the interquartile range. It is important as in this
range numerous regressions and deviations can be calculated which help to assess the
characteristics of the data. When we divide the interquartile range by two, it is known as quartile
deviation or semi-interquartile range.

Solved Examples on Quartile Deviation

Question 1: The number of vehicles sold by a major Toyota Showroom in a day was recorded for
10 working days. The data is given as –

Day Frequency

1 20

2 15

3 18

4 5

5 10

6 17

7 21
8 19

9 25

10 28

Find the Quartile Deviation and its coefficient for the given discrete distribution case.

Solution: We first need to sort the frequency data given to us before proceeding with the quartiles
calculation –

Sorted Data – 5, 10, 15, 17, 18, 19, 20, 21, 25, 28
n(number of data points) = 10

Now, to find the quartiles, we use the logic that the first quartile lies halfway between the lowest
value and the median; and the third quartile lies halfway between the median and the largest
value.

First Quartile Q1 = n+14th term.


= 10+14th term = 2.75th term
= 2nd term + 0.75 × (3rd term – 2nd term)
= 10 + 0.75 × (15 – 10)
= 10 + 3.75
= 13.75
Third Quartile Q3 = 3(n+1)4th term.
= 3(10+1)4th term = 8.25th term
= 8th term + 0.25 × (9th term – 8th term)
= 21 + 0.25 × (25 – 21)
= 21 + 1
= 22
Using the values for Q1 and Q3, now we can calculate the Quartile Deviation and its coefficient as
follows –
Quartile Deviation = Semi-Inter Quartile Range
= Q3–Q12
= 22–13.752
=8.252
= 4.125
Coefficient of Quartile Deviation
= Q3–Q1Q3+Q1×100
= 22–13.7522+13.75×100
= 8.2535.75×100
≈ 23.08
Question 2:
For the following open-ended data, calculate the Quartile Deviation and its coefficient.

No. of
Marks
Students

0-10 10

10-20 20

20-30 30

30-40 50

40-50 40

50-60 30
Solution: For the case of a grouped-data distribution, we can find the quartiles through the
following steps –

⇒ Construct a cumulative frequency table for the given data alongside the given distribution
⇒ From the total number of data values, estimate the groups/classes of the Lower and Upper
Quartiles
⇒ Use the following formulae to then calculate the quartiles:

The Lower Quartile Q1 = LB+w14n–fcf


The Upper Quartile Q3 = LB+w34n–fcf
where, LB – the lower bound of the class in which the respective quartile lies
w – the class width
f_c – the cumulative frequency up to that class
f – the frequency corresponding to that particular class

For the given data, we can form the required table with the cumulative frequency as –

Cumulative
Marks Frequency
Frequency

0-10 10 10

10-20 20 30

20-30 30 60
30-40 50 110

40-50 40 150

50-60 30 180

Since the total number of students is 180, the first quartile must lie at the position of 180/4 = 45th
student. Similarly, the third quartile must lie at the position of 180×3/4 = 135th student. By the
distribution of our data into groups, we can note that the first quartile will lie in the 20-30 marks
range.

Calculation –
Q1 = LB+w14n–fcf
Here, LB = 20; w = 10
f_c = 30; f = 30; n = 180
Thus, Q1 = 20+1014×180–3030
=20+1530×10
= 25
Similarly, the third quartile will lie in the 40-50 marks range. Calculation –

Q3 = LB+w34n–fcf
Here, LB = 40; w = 10
f_c = 110; f = 40; n = 180
Thus, Q3 = 40+1034×180–11040
=40+2540×10
= 46.25
Now, using the values for Q1 and Q3, now we can calculate the Quartile Deviation and its
coefficient as follows –

Quartile Deviation = Semi-Inter Quartile Range


= Q3–Q12
= 46.25–252
=21.252
= 10.625
Coefficient of Quartile Deviation
= Q3–Q1Q3+Q1×100
= 46.25–2546.25+25×100
= 21.2571.25×100
≈ 29.82
This concludes our discussion on this topic.

https://www.toppr.com/guides/business-mathematics-and-statistics/measures-of-central-
tendency-and-dispersion/quartile-deviation/

Variance is a measure of how data points differ from the mean. According to Layman, a
variance is a measure of how far a set of data (numbers) are spread out from their mean
(average) value.

Variance means to find the expected difference of deviation from actual value. Therefore,
variance depends on the standard deviation of the given data set.

The more the value of variance, the data is more scattered from its mean and if the value of
variance is low or minimum, then it is less scattered from mean. Therefore, it is called a
measure of spread of data from mean.

There can be two types of variances in statistics, namely, sample variance and population
variance.
Population Variance - All the members of a group are known as the population. When we want
to find how each data point in a given population varies or is spread out then we use
the population variance. It is used to give the squared distance of each data point from the
population mean.
Sample Variance - If the size of the population is too large then it is difficult to take each data
point into consideration. In such a case, a select number of data points are picked up from the
population to form the sample that can describe the entire group. Thus, the sample
variance can be defined as the average of the squared distances from the mean. The variance is
always calculated with respect to the sample mean.
A general definition of variance is that it is the expected value of the squared differences from
the mean.
Variance Example

Suppose we have the data set {3, 5, 8, 1} and we want to find the population variance. The
mean is given as (3 + 5 + 8 + 1) / 4 = 4.25. Then by using the definition of variance we get [(3 -
4.25)2 + (5 - 4.25)2 + (8 - 4.25)2 + (1 - 4.25)2] / 4 = 6.68. Thus, variance = 6.68.
Standard Deviation

Standard deviation is the positive square root of the variance. It is one of the basic methods of
statistical analysis. Standard Deviation is commonly abbreviated as SD and denoted by the
symbol 'σ’ and it tells about how much data values are deviated from the mean value. If we get
a low standard deviation then it means that the values tend to be close to the mean whereas a
high standard deviation tells us that the values are far from the mean value.

It tells how the values are spread across the data sample and it is the measure of the variation
of the data points from the mean. The standard deviation of a data set, sample, statistical
population, random variable, or probability distribution is the square root of its variance.
Standard Deviation of Ungrouped Data
The calculations for standard deviation differ for different data. Distribution measures the
deviation of data from its mean or average position. There are three methods to find the standard
deviation.

 Actual mean method


 Assumed mean method
 Step deviation method

Standard Deviation by Assumed Mean Method


For very large values of x finding the mean of the grouped data is a tedious task
and so we assumed an arbitrary value (A) as the mean value and then calculate
the standard deviation using the normal method. Suppose for the group of n
data values ( x1, x2, x3, …, xn), the assumed mean is A then the deviation is,
di = xi – A
Now, the assumed mean formula is,
Probability

Probability means possibility. It is a branch of mathematics that deals with the occurrence of a
random event. The value is expressed from zero to one. Probability has been introduced in

Maths to predict how likely events are to happen. The meaning of probability is basically the

extent to which something is likely to happen. This is the basic probability theory, which is also

used in the probability distribution, where you will learn the possibility of outcomes for a

random experiment. To find the probability of a single event to occur, first, we should know the

total number of possible outcomes

The probability is the measure of the likelihood of an event to happen. It measures the
certainty of the event. The formula for probability is given by;

P(E) = Number of Favourable Outcomes/Number of total outcomes

P(E) = n(E)/n(S)

Here,

n(E) = Number of event favourable to event E

n(S) = Total number of outcomes

Terms Used in Probability and Statistics


Random Experiment
An experiment whose result cannot be predicted, until it is noticed is called a random
experiment. For example, when we throw a dice randomly, the result is uncertain to us.
We can get any output between 1 to 6. Hence, this experiment is random.

Sample Space
A sample space is the set of all possible results or outcomes of a random experiment.
Suppose, if we have thrown a dice, randomly, then the sample space for this
experiment will be all possible outcomes of throwing a dice, such as;

Sample Space = { 1,2,3,4,5,6}

Random Variables
The variables which denote the possible outcomes of a random experiment are called
random variables. They are of two types:

1. Discrete Random Variables


2. Continuous Random Variables

Discrete random variables take only those distinct values which are countable. Whereas
continuous random variables could take an infinite number of possible values.

Independent Event
When the probability of occurrence of one event has no impact on the probability of
another event, then both the events are termed as independent of each other. For
example, if you flip a coin and at the same time you throw a dice, the probability of
getting a ‘head’ is independent of the probability of getting a 6 in dice.

Mean
Mean of a random variable is the average of the random values of the possible
outcomes of a random experiment. In simple terms, it is the expectation of the possible
outcomes of the random experiment, repeated again and again or n number of times. It
is also called the expectation of a random variable.
Expected Value
Expected value is the mean of a random variable. It is the assumed value which is
considered for a random experiment. It is also called expectation, mathematical
expectation or first moment. For example, if we roll a dice having six faces, then the
expected value will be the average value of all the possible outcomes, i.e. 3.5.

Variance
Basically, the variance tells us how the values of the random variable are spread around
the mean value. It specifies the distribution of the sample space across the mean.

The addition rule for probability is a fundamental principle in


probability theory that finds the possibility of the occurrence of at least
one of two or more mutually exclusive or not mutually exclusive events
if not all. It is also known as the “OR” rule in probability.

Given below are the various terminologies used in probability:

1. Event: An event is a conclusion or outcome of an experiment.


2. Sample space: It denotes all the possible outcomes or events.
3. Mutually exclusive events: All the possible events of an experiment
that cannot occur simultaneously or together.
4. Not mutually exclusive or Mutually exhaustive events: It represents
all those possible events that can happen simultaneously.
5. Independent events: Independent events don’t rely on the
occurrence of other events.
6. The addition rule for probability, addition theorem, or “OR” rule
determines the chances that one or more events will occur when
a sample space contains multiple events.
7. For mutually exclusive events, P(A or B) = P(A) + P(B); however,
for events that are not mutually exclusive, P(A or B) = P(A) + P(B)
– P(A and B), where P(A and B) is the probability of the
intersection of events A and B.
8. The addition rule for probability can be marked on a Venn
Diagram separately for the event that is mutually exclusive or not
mutually exclusive

1. Addition Rule For Mutually Exclusive Events


When A and B are mutually exclusive events:
P(A or B) = P(A) + P(B) or, P(A ∪ B) = P(A) + P(B)

What is the probability that getting the card either king or


Queen?

P(A)=King 4/52

P(B)=Queen 4/52

P(A or B) = P(A) + P(B) 4/52+4/52=2/13

What is the probability that getting either 3 or 5 after throwing a


dice?

P(A)=3=> 1/6

P(B)= 5=> 1/6

P(A or B) = P(A) + P(B)=> 1/6+1/6=1/3

What is the probability that getting one ball out of 30 having


multiple of 5 or 8 at random pick?

P(A)=5,10,15,20,25,30=> 6/30

P(B)= 8,16,24=> 3/30

P(A or B) = P(A) + P(B)=> 6/30+3/30=9/30


2. Addition Rule For Not Mutually Exclusive Events
Say A and B are mutually exhaustive or not mutually exclusive events:

P(A or B) = P(A) + P(B) – P(A and B) or, P(A∪B) = P(A) + P(B) –


P(A ∩ B);

Where P(A and B) or, P(A ∩ B) = P(A) * P(B)

Example #1
Suppose an investor considers investing in two stocks, A and B. The
probability of stock A increasing in value over the next year is 0.4, and
the probability of stock B increasing in value over the next year is 0.6.
The investor wants to know the probability that at least one of the two
stocks will increase in value. By using the addition rule of probability,
we can calculate the probability as follows:

P(A or B) = P(A) + P(B) – P(A and B)

= 0.4 + 0.6 – (0.4 x 0.6)

= 0.76

So, there is a 76% probability that at least one of the two stocks will
increase in value over the next year.
Let’s say a bank is considering giving loans to two borrowers, X and Y.
The probability of borrower X defaulting on the loan is 0.3, and the
probability of borrower Y defaulting on the loan is 0.4. The bank wants
to know the probability that at least one borrower will default. By
using the addition rule of probability, we can calculate the probability
as follows:

P(X or Y) = P(X) + P(Y) – P(X and Y)

= 0.3 + 0.4 – (0.3 x 0.4)

= 0.58

Multiplication Rule of Probability?


According to the multiplication rule of probability, the probability of occurrence of both
the events A and B is equal to the product of the probability of B occurring and the
conditional probability that event A occurring given that event B occurs.

If A and B are dependent events, then the probability of both events occurring
simultaneously is given by:

P(A ∩ B) = P(B) . P(A|B)

If A and B are two independent events in an experiment, then the probability of both
events occurring simultaneously is given by:

P(A ∩ B) = P(A) . P(B)

Proof
We know that the conditional probability of event A given that B has occurred is denoted
by P(A|B) and is given by:

�(�|�)=�(�∩�)�(�)
Where, P(B)≠0

P(A∩B) = P(B)×P(A|B) ……………………………………..(1)

�(�|�) = �(�∩�)�(�)

Where, P(A) ≠ 0.

P(B∩A) = P(A)×P(B|A)

Since, P(A∩B) = P(B∩A)

P(A∩B) = P(A)×P(B|A) ………………………………………(2)

From (1) and (2), we get:

P(A∩B) = P(B)×P(A|B) = P(A)×P(B|A) where,

P(A) ≠ 0,P(B) ≠ 0.

The above result is known as the multiplication rule of probability.

For independent events A and B, P(B|A) = P(B). The equation (2) can be modified into,

P(A∩B) = P(B) × P(A)

Multiplication Theorem of Probability


We have already learned the multiplication rules we follow in probability, such as;

P(A∩B) = P(A)×P(B|A) ; if P(A) ≠ 0

P(A∩B) = P(B)×P(A|B) ; if P(B) ≠ 0

Let us learn here the multiplication theorems for independent events A and B.

If A and B are two independent events for a random experiment, then the probability of
simultaneous occurrence of two independent events will be equal to the product of their
probabilities. Hence,

P(A∩B) = P(A).P(B)
Now, from multiplication rule we know;

P(A∩B) = P(A)×P(B|A)

Since A and B are independent, therefore;

P(B|A) = P(B)

Therefore, again we get;

P(A∩B) = P(A).P(B)

Hence, proved.

Solved Example of Multiplication Rule of Probability


Illustration 1: An urn contains 20 red and 10 blue balls. Two balls are drawn from a bag
one after the other without replacement. What is the probability that both the balls are
drawn are red?

Solution: Let A and B denote the events that the first and the second balls are drawn
are red balls. We have to find P(A∩B) or P(AB).

P(A) = P(red balls in first draw) = 20/30

Now, only 19 red balls and 10 blue balls are left in the bag. The probability of drawing a
red ball in the second draw too is an example of conditional probability where the
drawing of the second ball depends on the drawing of the first ball.

Hence Conditional probability of B on A will be,

P(B|A) = 19/29

By multiplication rule of probability,

P(A∩B) = P(A) × P(B|A)


Example 2: A bag contains 19 tickets, numbered from 1 to 19. A ticket is
drawn and then another ticket is drawn without replacement. Find the
probability that both tickets will show even numbers.
Solution:
Let A be the event of drawing an even numbered ticket in the first draw and B
be the event of drawing an even numbered ticket in the second draw. Then,
Required probability = P(A ∩ B) = P(A) P(B/A) … (i)
Since there are 19 tickets, numbered 1 to 19, in the bag out of which 9 are even
numbered viz. 2, 4, 6, 8, 10, 12, 14, 16, 18.
∴ P(A) = 9/19
Since the ticket drawn in the first draw is not replaced, therefore second ticket
drawn is from the remaining 18 tickets, out of which 8 are even numbered.
∴ P(B/A) = 8/18 = 4/9
Substituting these values in (i), we get
Required probability = P(A ∩ B) = P(A) P(B/A) = 9/19 × 4/9 = 4/19

Conditional Probability
The conditional probability, as its name suggests, is the probability of happening an event that
is based upon a condition. Conditional probability is known as the possibility of an event or
outcome happening, based on the existence of a previous event or outcome. It is calculated by
multiplying the probability of the preceding event by the renewed probability of the
succeeding, or conditional, event.

For example, assume that the probability of a boy playing tennis in the evening is 95% (0.95)
whereas the probability that he plays given that it is a rainy day is less which is 10% (0.1). Then
the former case is just normal probability whereas the latter case is the conditional probability.
In this example, we represent the two probabilities as P(Play tennis) = 0.95 and P(Play tennis |
Rainy day) = 0.1.
If A and B are two events associated with the same sample space of a random experiment, the
conditional probability of event A given that B has occurred is given by P(A/B) = P( A ∩ B)/ P
(B), provided P(B) ≠ 0.

Let us understand conditional probability with an example. Let us find the conditional
probability of getting at least two tails given that it is a head on the first toss when 3 coins are
tossed. The sample space, S (the list of all outcomes) when 3 coins are tossed is given as
follows:

Let us assume the two events A and B as follows:

 A = the event of getting at least two tails


 B = the event of getting a head on the first toss

Then, A = {HTT, THT, TTH, TTT} and B = {HHH, HHT, HTH, HTT}.

Then P(A) = 4/8 = 1/2 and P(B) = 4/8 = 1/2.

We have to find the probability of getting at least two tails given that it is a head on the first
toss. It means, out of all elements of B, we have to choose only the ones with two tails. We can
see that among the elements of B, there is only one element (which is HTT) with two tails. Thus,
the required probability is P(A | B) = 1/4 (only 1 outcome of B is favorable to A out of 4
outcomes of B).

Bayes’ Theorem is named after Reverend Thomas Bayes. It is a very important


theorem in mathematics that is used to find the probability of an event, based on prior
knowledge of conditions that might be related to that event. It is a further case of
conditional probability.

Bayes theorem is also known as the Bayes Rule or Bayes Law. It is used to determine
the conditional probability of event A when event B has already happened. The
general statement of Bayes’ theorem is “The conditional probability of an event A,
given the occurrence of another event B, is equal to the product of the event of B,
given A and the probability of A divided by the probability of event B.” i.e.
P(A|B) = P(B|A)P(A) / P(B)
where,
P(A) and P(B) are the probabilities of events A and B
P(A|B) is the probability of event A when event B happens
P(B|A) is the probability of event B when A happens
Example 1: A person has undertaken a job. The probabilities of completion of the job on
time with and without rain are 0.44 and 0.95 respectively. If the probability that it will rain
is 0.45, then determine the probability that the job will be completed on time.
Solution:
Let E1 be the event that the mining job will be completed on time and E 2 be the event that it
rains. We have,
P(A) = 0.45,
P(no rain) = P(B) = 1 − P(A) = 1 − 0.45 = 0.55
By multiplication law of probability,
P(E1) = 0.44
P(E2) = 0.95
Since, events A and B form partitions of the sample space S, by total probability theorem, we
have
P(E) = P(A) P(E1) + P(B) P(E2)
= 0.45 × 0.44 + 0.55 × 0.95
= 0.198 + 0.5225 = 0.7205
So, the probability that the job will be completed on time is 0.684.
Example 2: There are three urns containing 3 white and 2 black balls; 2 white and 3 black
balls; 1 black and 4 white balls respectively. There is an equal probability of each urn being
chosen. One ball is equal probability chosen at random. What is the probability that a
white ball is drawn?
Solution:
Let E1, E2, and E3 be the events of choosing the first, second, and third urn respectively. Then,
P(E1) = P(E2) = P(E3) =1/3
Let E be the event that a white ball is drawn. Then,
P(E/E1) = 3/5, P(E/E2) = 2/5, P(E/E3) = 4/5
By theorem of total probability, we have
P(E) = P(E/E1) . P(E1) + P(E/E2) . P(E2) + P(E/E3) . P(E3)
= (3/5 × 1/3) + (2/5 × 1/3) + (4/5 × 1/3)
= 9/15 = 3/5

You might also like