Probability and Statistics
Probability and Statistics
Probability is the study of how likely an event is to happen. It's a measure of the
chance that a specific outcome will occur out of all possible outcomes.
Examples:
Statistics:
Statistics is the study of collecting, organizing, presenting, analyzing, summarizing,
interpreting data, and making decisions from that study.
Examples:
● Average Score (Collect data in the form of marks, find the average, and then
draw conclusion about performance)
● Survey Reports (Collecting surveys, organize in tabular form with
frequencies, and then make the decision)
Sources of Data:
● Primary Data: Primary data refers to raw data directly collected from you as
an original source. This data is in its unmanaged form, meaning it hasn’t been
processed, analyzed, or manipulated through any statistical process.
Examples:
▪ Conducting Surveys (When you distribute a survey to gather responses,
the data you collect such as customer satisfaction feedback is
considered primary data because it is directly obtained from
participants)
▪ Performing Experiments (In scientific research, data collected directly
from experiments, like measuring the effect of a new drug on blood
pressure, is primary data as it’s unprocessed and directly obtained from
test subjects)
● Secondary Data: Data or information already collected by someone
available for use after processing through some statistical process is called
secondary data.
Examples:
▪ Survey Reports (These are processed summaries of survey data
collected by others. For example, a government report on national
health statistics is secondary data that you can use without conducting
your own surveys)
▪ Experiment Results (Published research results become secondary data.
For example, a study on diet effects on heart health can be used as a
reference in your research, saving time and resources)
Types of Statistics:
Types of Data:
Frequency Distribution:
● Frequency Distribution (Qualitative Data):
▪ F = Frequency
▪ RF = Relative Frequency (proportion)
▪ % = Percentage = RF * 100
Grade F RF %
A 6 6/20 30
B 2 2/20 10
C 2 2/20 10
D 4 4/20 20
F 6 6/20 30
▪ % = Percentage
▪ CB = Class bound
▪ X = midpoint
▪ CF = Cumulative Frequency
Step 2: Select No. Of Classes, Min (4), Ideal (5-7), Max (10-11)
Step 3: Range / No. Of Class = Difference between each interval
✔ The smallest value must fall into the first interval and largest
must fall into last interval. Suitable if they are not on edges and
are in between the interval.
✔ Quantitative data can be summarized into range.
C-I F CB X CF RF %
2-5 8 1.5 - 5.5 3.5 8 8/30 26.6
6-9 7 5.5 - 9.5 7.5 15 7/30 23.3
10 - 13 8 9.5 - 13.5 11.5 23 8/30 26.6
14 - 17 7 13.5 - 17.5 15.5 26 7/30 23.3
18 - 21 9 17.5 - 21.5 19.5 30 9/30 30
▪ To Find CB find the difference between ending of an interval and the starting
of next interval and divide it by 2. e.g. 1 interval ends at 5 and second start at
6. so, it will be 6-5/2 = 0.5.
▪ Now take this number and subtract from starting of each interval and add it i
the ending of each interval. E.g. 1st interval will convert from 2 – 5 -> 1.5 -
5.5
Graphical Representation:
● Histogram
● Pie chart
● Bar graph
● Box plot (using EDA)
1. Histogram:
Diagram:
Types of histogram graphs include.
Example:
Get more info at:
https://www.atlassian.com/data/charts/histogram-complete-guide#:~:text=What%2
0is%20a%20histogram%3F,value%20within%20the%20corresponding%20bin.
2. Pie Chart:
Diagram:
Items Expenses
Food 70
Clothing 50
Fuel 30
Rent 40
Others 10
In this table, we see that Total = 200. So, we distribute the frequencies as:
● Food = 70/200x100%
● Clothing = 50/200x100%
● Fuel = 30/200x100%
● Rent = 40/200x100%
● Others = 10/200x100%
Based on these percentages we draw the pie chart by dividing the sectors based on
calculated percentages.
✔ If we have different objects, then we have different pie charts. Each pie
chart represents every single individual.
3. Bar Chart:
Bar charts are used for predictions and can show both quantitative and
qualitative data. It is used to represent data that changes overtime.
Diagram:
● Mode: This method aims to find the most frequent value in the data set.
❖ Mean:
Mean is a value that is obtained after the sum of all the data items and then
divides this sum to the number of items present in data set.
The representation of mean for sample data:
∑𝑋𝑖 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
𝑥= 𝑛
= 𝑛𝑜 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
The representation of mean for population data:
∑𝑁𝑖 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
μ= 𝑛
= 𝑛𝑜 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
✔ The data set we have is always a sample until the population is
mentioned.
Example:
Suppose I missed the marks of 1 student which is 39, If we add this item to
data set, then:
∑𝑋𝑖 174
𝑥= 𝑛
= 19
= 9. 16
We have seen how much the mean value is affected by extreme value.
Mean value is highly affected by extreme values. So, if the data set has
extreme values, we must find different averages like median or mode which
does not affect extreme values.
Question: The mean of 10 values is 13, after finding the mean we observe
that one value is missed, by adding the value the mean becomes 19, find that
value?
Solution:
First mean:
∑𝑋 = 130 → 𝑥 = 13, 𝑛 = 10
Second mean:
∑𝑋 = 130 + 𝐴 → 𝑥 = 19, 𝑛 = 11
Now,
130+𝐴
19 = 11
→209 = 130 + 𝐴 →𝐴 = 209 − 130 = 79
Properties of Mean:
● Weighted mean
● Combined mean
Weighted Mean:
When values are not equally important and each contain different weights,
then finding the mean in this situation is called weighted mean.
Example: Aggregate calculation (Matric: 10% weight, FSc: 40% weight, Test:
50% weight).
Weights may be in the form of:
▪ Percentage
▪ Ratio
▪ Proportion
▪ Rank
✔ If weights are not assigned, we give our own weights.
∑𝑊𝑖𝑋𝑖 𝑊1𝑋1+𝑊2𝑋2+…
𝑋𝑤 = ∑𝑊𝑖
= 𝑊1+𝑊2+…
Item X W
Food 70 1
Fuel 30 3
Clothing 50 4
Rent 40 2
Other 10 5
1*70+3*30+4*50+2*40+5*10 490
= 1+3+4+2+5
= 15
= 32. 67
Combined Mean:
In this we compute means from different groups whose means are already
known and all these means are computed on the same type of items, e.g.
heights.
∑𝑛𝑖𝑋𝑖
𝑋𝑐 = ∑𝑛𝑖
Example:
Groups A B C
No of observations 15 25 10
Mean 5 7 3
nX 75 175 30
75+175+30 280
= 50
= 50
= 5. 6
❖ Median:
Median is a value which divides any data set into two equal parts after
arranging the data in ascending or descending order. 50% observations are
below the median and 50% observations are above the median.
𝑀= ( )𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑑𝑎𝑡𝑎
𝑛+1
2
Example:
Raw data: 7, 9, 11, 15, 7, 5, 2, 75, 7
Arranging in ascending order: 2, 5, 7, 7, 7, 9, 11, 15, 75
Extensions:
● Quartile: Divide data into 4 parts. It contains 3 points known as Q1, Q2, Q3
and 4 parts. Where Q2 is known as median of the data set.
● Decile: Divide data set into 10 equal parts. It contains 9 points known as D1,
D2, D3, D4, D5, D6, D7, D8, D9. Where Q5 is also known as median of the
data set.
● Percentile: Divide data set into 100 equal parts. It contains 99 points.
Percentile is used when we want to study data individually.
Quartile:
Formula:
𝑣(𝑛+1)
Qv = 4
,Where v = 1,2,3
Example:
Raw data: 7, 5, 3, 7, 9, 11, 15, 12, 20, 17, 13, 15, 10, 9
Now,
Q1 =
1(14+1)
4
= 3. 75𝑡ℎ = 3𝑟𝑑 + 0. 75(4𝑡ℎ − 3𝑟𝑑) = 7 + 0. 75(7 − 7) = 7
2(14+1) 7𝑡ℎ+8𝑡ℎ 10+11
Q2 = 4
= 7. 5𝑡ℎ = 2
= 2
= 10. 5
Q3 =
3(14+1)
4
= 11𝑡ℎ + 0. 25(12𝑡ℎ − 11𝑡ℎ) = 15 + 0. 25(15 − 15) = 15
Deciles:
Formula:
Dv = v(n+1)/10, Where v = 1,2,3,4,5,6,7,8,9
Percentile:
Formula:
❖ Mode:
The mode is the number in a set of numbers that appears the most often. The
mean of a set of numbers is the sum of all the numbers divided by the
number of values in the set. The mean is also known as the average.
2. Measure of Dispersion:
● Range
● Variance
● Standard deviation (SD)
● Interquartile range (IQR)
❖ Range:
R = Max-Min => only for small data sets
Because if we have large data and it also contains outsiders then the range is
not useful.
❖ Variance:
∑𝑋 2
2
𝑆 =
2 ∑𝑋
𝑛
− ( ) 𝑛
❖ Standard Deviation:
∑𝑋 2
2
𝑆=
∑𝑋
𝑛
− ( )
𝑛
Example:
Data of group A: 3, 5, 7, 5, 2, 6, 4
Now,
X 𝑋
2
3 9
5 25
7 49
5 25
2 4
6 36
4 16
∑X = 32, n = 7 2
∑𝑋 = 164, 𝑛 = 7
∑𝑋 2 32 2
2
2
𝑆 =
∑𝑋
𝑛
− ( )
𝑛
=
164
7
− ( ) 7
= 23. 4 − 20. 25 = 3. 15 => 𝑆 = 1. 77
Now,
X 2
𝑋
5 25
9 81
11 121
10 100
11 121
10 100
7 49
∑X = 63, n = 7 2
∑𝑋 = 597, 𝑛 = 7
∑𝑋 2 63 2
2
2
𝑆 =
∑𝑋
𝑛
− ( )
𝑛
=
597
7
− ( )
7
= 85. 3 − 81 = 4. 3 => 𝑆 = 2. 04
Mean = 9, SD = 2.04
❖ IQR:
✔ IQR = Q3 – Q1 = Inter Quartile Range
✔ Best suited for variation in SD.
Coefficient of Variation:
CV is used when we must compare two or more data sets having different
units.
𝑆
𝐶𝑉 = * 100
𝑥
Let,
5𝑘𝑔
𝐶𝑉 = 5𝑘𝑔
* 100→𝑢𝑛𝑖𝑡 𝑙𝑒𝑠𝑠 = 100%
10𝑐𝑚
𝐶𝑉 = 3𝑐𝑚
* 100→𝑢𝑛𝑖𝑡 𝑙𝑒𝑠𝑠 = 333. 3%
1. CV1 = 10%
2. CV2 = 30%
3. CV3 = 20%
Data 2 is more variant than data 3, and data 3 is more variant than data 1.
𝑀𝑎𝑥 = 𝑄3 + 1. 5(𝐼𝑄𝑅)
If a value is greater than the Max value and if a value is less than the Min
value, then the values are outliers
❖ Box Plot:
Probability
o Probability is the chance of an event occurring.
o It is calculated and expressed as quantitative data
o It can’t be greater than 1 or 100%
𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑃(𝐴) = 𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠
E.g.
Male = 15
Total = 45
15 1
𝑃(𝑀) = 45
= 3
𝑜𝑟 0. 33 𝑜𝑟 33. 3%
E.g.
❖ Sample Point:
o Sample point all combine to make Sample space also called as
element or member of sample space
❖ Event:
o Set of all possible Favorable outcomes from sample space
o n = number of elements in that set
E.g.
𝑆 = {1 , 2 , 3 , 4 , 5 , 6}
𝐴 = 𝑜𝑑𝑑 𝑛𝑢𝑚 = {1 , 3 , 5}
𝐵 = 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 𝑜𝑓 4 = { 4 }
𝐶 = 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 𝑜𝑓 7 = ϕ
𝑛(𝐴) = 3 , 𝑛(𝑆) = 6
𝑛(𝐴) 3 1
𝑃(𝐴) = 𝑛(𝑆)
= 6
= 2
𝑜𝑟 50%
Types of Events
❖ Simple event:
o An event consisting of only 1 sample point (element / unit / Outcome)
e.g. 𝐵 = { 4 }
❖ Compound Event:
o An event consisting of more than 1 sample point e.g. 𝐴 = { 1 , 3 , 5}
❖ Sure event:
o An event that contains all possible outcomes from Sample space
o 𝐶 { 𝑥 | 1 ≤ 𝑥 ≤6 |} = 𝑆
❖ Impossible Event:
o An event with outcome does not present in Sample space
o 𝐷 = {7} , 𝑆 ∩ 𝐷 = ∅
E.g.
𝑆 = {1, 2, 3, 4, 5, 6}
𝐴 = {3 , 6 } , 𝐵 = {2 , 4 , 6} , 𝐶 = {1 , 3 , 5 }
B and C are equally likely events
❖ Mutually exclusive events:
o If two events have no common elements
o If two numbers can’t occur together
o 𝐵∩𝐶 = ϕ
❖ Exhaustive events:
o When two events combine to make complete sample space
o 𝐵∪𝐶 = 𝑆
❖ Independent Event:
o If an event B is affected by an earlier event A. Then B depends on A
o 𝑃 ( 𝐴 ∩ 𝐵 ) = 𝑃(𝐴) ⋅ 𝑃(𝐵)
Example:
𝑆 = {(𝐻, 1), (𝐻, 2), (𝐻, 3), (𝐻, 4), (𝐻, 5), (𝐻, 6), (𝑇, 1), (𝑇, 2), (𝑇, 3), (𝑇, 4), (𝑇, 5), (𝑇, 6)}
Questions:
Solution:
𝑛(𝐴) 6 1
𝑃(𝐴) = 𝑛(𝑆)
= 12
= 2
𝑛(𝐵) 6 1
𝑃(𝐵) = 𝑛(𝑆)
= 12
= 2
Now,
1 1 1 1
4
= 𝑃(𝐴∩𝐵) = 𝑃(𝐴) . 𝑃(𝐵) = 2
. 2
= 4
𝑛(𝐴) 6 1
𝑃(𝐴) = 𝑛(𝑆)
= 12
= 2
𝑛(𝐶) 2 1
𝑃(𝐶) = 𝑛(𝑆)
= 12
= 6
Now,
1 1 1 1
12
= 𝑃(𝐴∩𝐶) = 𝑃(𝐴) . 𝑃(𝐶) = 2
. 6
= 12
𝑛(𝐵) 6 1
𝑃(𝐵) = 𝑛(𝑆)
= 12
= 2
𝑛(𝐶) 2 1
𝑃(𝐶) = 𝑛(𝑆)
= 12
= 6
Now,
1 1 1 1
12
= 𝑃(𝐵∩𝐶) = 𝑃(𝐵) . 𝑃(𝐶) = 2
. 6
= 12
Addition law:
Question: A & B can solve 70% and 80% of the problems using software
respectively. Find the chance that a problem chosen at random is solved by at least
one of them?
Solution:
P(A) = 0.7
P(B) = 0.8
Now,
Question: A card is selected from a deck. Find the probability that the chosen card
is King or Queen.
Solution:
𝑛(𝐾) 4
𝑃(𝐾) = 𝑛(𝐶𝑎𝑟𝑑𝑠)
= 52
= 0. 07
and,
𝑛(𝑄) 4
𝑃(𝑄) = 𝑛(𝐶𝑎𝑟𝑑𝑠)
= 52
= 0. 07
𝑃(𝐴∪𝐵) = 0. 07 + 0. 07 = 0. 15 𝑜𝑟 15%
Conditional probability:
If the event already occurred, then what will be the probability of coming event
which is affected by previous event is called conditional probability.
𝑃(𝐴∩𝐵)
𝑃(𝐵) = 𝑃(𝐵)
Example:
𝑃(𝐶𝑎𝑟𝑃ℎ𝑈𝑠𝑟∪𝑁𝑜𝑉𝑖𝑜) = ( 305
755
+
685
755 )− 280
755
= 0. 94 𝑜𝑟 94%
Find the probability of car phone users that is affected by speed violation that
has already occurred:
{(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6), (3,1), (3,2),
(3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4),
(5,5), (5,6), (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}
𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 5
𝑃(𝐺) = 𝑇𝑜𝑡𝑎𝑙
= 6
Favorable = 0
𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒
𝑃(𝐴) = 𝑇𝑜𝑡𝑎𝑙
=0
Question: A bag contains 6 red,4 black balls, another bag contains 4 red,6 black
and the 3rd bag contains 5 red, 5 black balls. One bag is selected randomly, and a
ball is drawn. If the ball is red Find probability that ball is selected from bag 1.
Solution:
1
𝑃(𝐵1) = 3
1
𝑃(𝐵2) = 3
1
𝑃(𝐵3) = 3
𝑃(𝐴 |𝐵1)𝑃(𝐵1)
𝑃(𝐵1 |𝐴) = 𝑃(𝐴)
𝑃(𝐴 |𝐵1)𝑃(𝐵1)
𝑃(𝐵1 |𝐴) = 𝑃(𝐴 |𝐵1)𝑃(𝐵1)+𝑃(𝐴 |𝐵2)𝑃(𝐵2)+𝑃(𝐴 |𝐵3)𝑃(𝐵3)
6
𝑃(𝐴 |𝐵1) = 10
4
𝑃(𝐴 |𝐵2) = 10
5
𝑃(𝐴 |𝐵3) = 10
𝑃(𝐵1 |𝐴) =
( )( ) 6
10
1
3
( )( ) ( )( )+( )( )
6
10
+
1
3
4
10
1
3
5
10
1
3
6 2
𝑃(𝐵1 |𝐴) = 15
= 5
Or,
Red Black
Bag1 6 4 10
Bag2 4 6 10
Bag3 5 5 10
15 15 30
6 2
𝑃(𝐵1 |𝐴) = 15
= 5
Question: 2000 scooter drivers, 4000 car drivers, and 6000 truck drivers. The
probability of an accident involving a scooter driver, car driver, and truck drivers are
0.01, 0.03, and 0.15 respectively. If an accident happens, find the probability that
the driver is a scooter driver.
Solution:
𝑇𝑜𝑡𝑎𝑙 = 12000
𝑆𝑐𝑜𝑜𝑡𝑒𝑟 = 2000
𝐶𝑎𝑟 = 4000
𝑇𝑟𝑢𝑐𝑘 = 6000
Now,
2000 1
𝑃(𝑆) = 12000
= 6
4000 1
𝑃(𝐶) = 12000
= 3
6000 1
𝑃(𝑇) = 12000
= 2
And,
𝑃(𝐴 |𝑆) = 0. 01
𝑃(𝐴 |𝐶) = 0. 03
𝑃(𝐴 |𝑇) = 0. 15
1
𝑃(𝑆)𝑃(𝑆) 0.01* 6
𝑃(𝐴) = 𝑃(𝐴 |𝑆)𝑃(𝑆)+𝑃(𝐴 |𝐶)𝑃(𝐶)+𝑃(𝐴 |𝑇)𝑃(𝑇)
= 1 1 1 = 2%
0.01* 6 +0.03* 3 +0.15* 2
Weather Forecasting:
A weather service predicts rain with an accuracy of 80%. However, the actual
probability of rain on any given day is 30%. What is the probability that it will rain
given that the service has predicted rain?
Solution:
𝑃(𝑊𝑃 |𝑅) = 0. 8
𝑃(𝑊𝑃 |𝑁𝑅) = 0. 2
𝑃(𝑅) = 0. 3
𝑃(𝑁𝑅) = 0. 7
Now,
𝑃(𝑊𝑃 |𝑅)𝑃(𝑅) 0.8*0.3
𝑃(𝑅 |𝑊𝑃) = 𝑃(𝑊𝑃 |𝑅)𝑃(𝑅)+𝑃(𝑊𝑃 |𝑁𝑅)𝑃(𝑁𝑅)
= 0.8*0.3+0.2*0.7
= 0. 6316
Spam Filtering:
A certain email is classified as spam based on certain keywords. If 70% of spam
emails contain a specific keyword, and only 10% of legitimate emails contain it,
what is the probability that an email with the keyword is spam, given that 20% of
all emails are spam?
Solution:
𝑃(𝐾𝑊 |𝑆𝐸) = 0. 7
𝑃(𝐾𝑊 |𝐿𝐸) = 0. 1
𝑃(𝑆𝐸) = 0. 2
𝑃(𝐿𝐸) = 0. 8
Now,
𝑃(𝐾𝑊 |𝑆𝐸)𝑃(𝑆𝐸) 0.7*0.2
𝑃(𝑆𝐸 |𝐾𝑊) = 𝑃(𝐾𝑊 |𝑆𝐸)𝑃(𝑆𝐸)+𝑃(𝐾𝑊 |𝐿𝐸)𝑃(𝐿𝐸)
= 0.7*0.2+0.1*0.8
= 0. 6364
Quality Control:
A factory produces two types of products, A and B. Product A is defective 1% of the
time, and Product B is defective 2% of the time. If 60% of the products are A and
40% are B, what is the probability that a randomly chosen defective product is from
type A?
Solution:
𝑃(𝐷 |𝐴) = 0. 01
𝑃(𝐷 |𝐵) = 0. 02
𝑃(𝐴) = 0. 6
𝑃(𝐵) = 0. 4
Now,
𝑃(𝐷 |𝐴)𝑃(𝐴) 0.01*0.6
𝑃(𝐴 |𝐷) = 𝑃(𝐷 |𝐴)𝑃(𝐴)+𝑃(𝐷 |𝐵)𝑃(𝐵)
= 0.01*0.6+0.02*0.4
= 0. 4286
𝑃(𝐷𝑁𝑂) = 0. 99
𝑃(𝑇 |𝐷𝑂) = 0. 9
𝑃(𝑇 |𝐷𝑁𝑂) = 0. 05
Now,
𝑃(𝑇 |𝐷𝑂)𝑃(𝐷𝑂) 0.9*0.01
𝑃(𝐷𝑂 |𝑇) = 𝑃(𝑇 |𝐷𝑂)𝑃(𝐷𝑂)+𝑃(𝑇 |𝐷𝑁𝑂)𝑃(𝐷𝑁𝑂)
= 0.9*0.01+0.05*0.99
= 0. 1538
Legal Cases:
In a certain region, 5% of people are criminals. If a person is a criminal, there is a
75% chance that they will leave evidence. If they are not criminals, there is a 10%
chance they will leave evidence. If evidence is found, what is the probability that the
person is a criminal?
Solution:
𝑃(𝐶) = 0. 05
𝑃(𝑁𝐶) = 0. 95
𝑃(𝐸 |𝐶) = 0. 75
𝑃(𝐸 |𝑁𝐶) = 0. 10
Now,
𝑃(𝐸 |𝐶)𝑃(𝐶) 0.75*0.05
𝑃(𝐶 |𝐸) = 𝑃(𝐸 |𝐶)𝑃(𝐶)+𝑃(𝐸 |𝑁𝐶)𝑃(𝑁𝐶)
= 0.75*0.05+0.1*0.95
= 0. 283
False Alarm:
A security system has a 98% chance of correctly detecting a break-in but also has a
3% false alarm rate. If an alarm goes off, what is the probability that there was a
break-in, given that break-ins occur in 1% of cases?
Solution:
𝑃(𝐷 |𝐵) = 0. 98
𝑃(𝐷 |𝑁𝐵) = 0. 03
𝑃(𝐵) = 0. 01
𝑃(𝑁𝐵) = 0. 99
Now,
𝑃(𝐷 |𝐵)𝑃(𝐵) 0.98*0.01
𝑃(𝐵 |𝐷) = 𝑃(𝐷 |𝐵)𝑃(𝐵)+𝑃(𝐷 |𝑁𝐵)𝑃(𝑁𝐵)
= 0..98*0.01+0.03*0.99
= 0. 2481
Counterfeit Coins:
You have two coins: one is a fair coin (50% heads, 50% tails), and the other is a
biased coin (75% heads. 25% tails). You pick one at random and flip it, getting
heads. What is the probability that you picked the biased coin?
Solution:
𝑃(𝐹) = 0. 5
𝑃(𝐵) = 0. 5
𝑃(𝐻 |𝐹) = 0. 5
𝑃(𝐻 |𝐵) = 0. 75
Now,
𝑃(𝐻 |𝐵)𝑃(𝐵) 0.75*0.5
𝑃(𝐵 |𝐻) = 𝑃(𝐻 |𝐵)𝑃(𝐵)+𝑃(𝐻 |𝐹)𝑃(𝐹)
= 0.75*0.5+0.5*0.5
= 0. 6
Urn Problem:
An urn contains 3 red balls and 2 blue balls. You draw a ball at random, note its
color, and put it back. Then you draw a second ball. If the second ball is red, what is
the probability that the first ball was also red?
Solution:
𝑃(𝑅1 |𝑅2) =?
Given,
3
𝑃(𝑅1) = 5
= 0. 6
2
𝑃(𝐵1) = 5
= 0. 4
3
𝑃(𝑅2 | 𝑅1) = 5
= 0. 6
3
𝑃(𝑅2 |𝐵1) = 5
= 0. 6
Now,
𝑃(𝑅2 |𝑅1)𝑃(𝑅1) 0.6*0.6
𝑃(𝑅1 |𝑅2) = 𝑃(𝑅2 |𝑅1)𝑃(𝑅1)+𝑃(𝑅2 |𝐵1)𝑃(𝐵1)
= 0.6*0.6+0.6*0.4
= 0. 6
Question: A person goes to work daily, he uses his car 70% of the time and walks
30% of the time and he uses the bus 40% of the time. He is late 10 % of the time
when he walks to work, 3% of the time when he uses his car and 7% of the time
when he uses the bus. The person is late, find the prob he is in his car?
𝑃(𝐿 | 𝐶) . 𝑃 (𝐶)
𝑃(𝐶 |𝐿) = 𝑃(𝐿 |𝐶) . 𝑃(𝐶)+𝑃(𝐿 |𝑊).𝑃(𝑊)+𝑃(𝐿 |𝐵). 𝑃(𝐵)
2. ∑ 𝐹(𝑥) = 1
3. P (X = x) = F(x)
● x is a random variable.
● Through a function F(X) generates different probability and if ∑ 𝐹(𝑥) = 1 then
it is a probability distribution.
Example Question
(𝑛𝐶𝑥) *((𝑁−𝑛)𝐶𝑥)
𝑃( 𝑋 = 𝑥) = 𝑁𝐶𝑘
● n = Defective = 3
● x = selected = 0,1,2
● K = selected item = 2
● N = total population = 20
3𝐶0 . 17𝐶2
𝑃( 𝑋 = 0) = 20𝐶2
= 0. 715
3𝐶1 . 17𝐶1
𝑃( 𝑋 = 1) = 20𝐶2
= 0. 268
3𝐶2 . 17𝐶0
𝑃( 𝑋 = 2) = 20𝐶2
= 0. 015
Frequency Distribution
X P(X)
0 0.715
1 0.268
2 0.015
Total 0.998
a) 𝑃 (𝑋 = 2) = 0. 015
b) 𝑃(𝑋≥1) = 𝑃(1) + 𝑃(2) = 0. 268
c) 𝑃(𝑋 = 0) = 0. 715
Question:
a) A bag contains 6 red, 4 white and 5 blue balls. 3 balls are randomly selected.
Find the probability that all balls are different color
x = 0, 1, 2, 3
Discrete-Probability Distribution
Steps:
Formula:
𝑃(𝑥 = 𝑋) = ( ). 𝑝 . 𝑞
𝑛
𝑥
𝑥 𝑛−𝑥
X = random variable
p = probability of success
q = probability of failure
n = no fixed trials
Parameters:
1) N = no fixed trials
2) P = probability of success
Question: The probability of generating a true code using one specific algorithm is
65%. For quality checking we randomly selected 3 algorithms. What’s the
probability that exactly 2 generates true code and what’s the probability that at
least one algorithm generates true code?
Solution:
● p=0.65
● n=3
● x=0,1,2,3
𝑃(𝑥 = 0)
( 3
0 )(0. 65) (0. 35)
0 3
𝑃(𝑥 = 1)
( 3
1 )(0. 65) (0. 35)
1 2
𝑃(𝑥 = 2)
( 3
2 )(0. 65) (0. 35)
2 1
𝑃(𝑥 = 3)
( 3
3 )(0. 65) (0. 35)
3 0
1) 𝑃(𝑥 = 2) = 0. 4436
2) 𝑃(𝑥≥1) = 1 − 𝑃(𝑥 = 0) = 0. 95
Binomial Probability
o Formula: ( ). 𝑝 . 𝑞
𝑛
𝑥
𝑥 𝑛−𝑥
o Parameters: [ n & p ]
o Mean: 𝑛 . 𝑝
o Variance = 𝑛 . 𝑝. 𝑞
o Standard deviation: 𝑛 . 𝑝. 𝑞
o 1=𝑝 + 𝑞
Question: A company runs an online ads campaign and estimates that 20% of
individuals will click on the ad if 200 individuals see the ad
𝑃(30) = ( ) (0. 2)
200
30
30 170
(0. 8) = 0. 0147
𝑃( 𝑥≤198) = 1 − ⎡
⎣ ( ) (0. 2)
200
199
199
(0. 8)
1
+ ( ) (0. 2)
200
200
200 0
(0. 8) ⎤ = 1
⎦
𝑀𝑒𝑎𝑛 = 𝑛 . 𝑝 = 200 * 0. 2 = 40
𝑀𝑒𝑎𝑛 = 𝑛 . 𝑝
10 = 𝑛 . 𝑝→𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 1
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑛 . 𝑝 . 𝑞
3 = 𝑛 . 𝑝 . 𝑞→𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 2
Now:
𝑝=1 −𝑞
3 7
𝑝=1 − 10
= 10
Also:
𝑚𝑒𝑎𝑛 = 𝑛 . 𝑝
10
0.7
= 𝑛
𝑛 = 14
Using 𝑛 = 14 , 𝑝 = 0. 7 & 𝑞 = 0. 3
Question: 20 patients are ill. Medicine is effective 70% of the time what is the
probability that at-least 5 people will be cured
𝑃(𝑥≥5) = 1 − ⎡
⎣ ( ) (0. 7)
20
0
0
(0. 3)
20
+ ( ) (0. 7)
20
1
1
(0. 3)
19
+ ( ) (0. 7)
20
2
2
(0. 3)
18
+ ( ) (0. 7)
20
3
3
(0. 3)
17
+ ( ) (0. 7)
20
4
4 16
(0. 3) ⎤
⎦
𝑃(𝑥 = 5) = ( ) (0. 1)
50
5
5
(0. 9)
45
Poisson Probability
−λ 𝑥
𝑒 . λ
o Formula: 𝑃 ( 𝑋 = 𝑥) = 𝑥!
o Parameter: λ = Mean / average / var
o 0≤𝑥≤ ∞
o Used in place of binomial if 𝑛≥100 and 𝑃(𝑠𝑢𝑐𝑐𝑒𝑠𝑠) < 0. 1
o λ = 𝑛. 𝑝 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙
λ = 30 , 𝑛 = 5
Question: A person makes a website and he observed that some customers visit his
website four times per minute. What is the probability that there will be exactly 2
visits in the next 2 minutes? And how many visits in the next 5 minutes? What’s the
probability that there will be at most 2 visits?
Solution:
λ=4
λ=2 *4 = 8
−8 2
𝑒 8
𝑃(𝑥 = 2) = 2!
= 0. 0107
2) = 4𝑥5 = 20
−4 0 −4 1 −4 2
𝑒 4 𝑒 4 𝑒 4
3) 𝑃(𝑥≤2) = 𝑃(0) + 𝑃(1) + 𝑃(2) = 0!
+ 1!
+ 2!
o Formula: 𝑃 ( 𝑋 = 𝑥) =
( ).( )
𝑘
𝑥
𝑁−𝑘
𝑛−𝑥
( ) 𝑁
𝑛
o Parameter:
▪ N = sum of objects in all categories
▪ n = number of selected objects
▪ k = successful object in total population (N)
o Range:
▪ Lower bound: 𝑚𝑎𝑥(0, 𝑛 + 𝑘 − 𝑁)
Example Questions:
Question: A committee of size 3 is selected from 4 teachers and 2 doctors. Find the
probability that at least 2 teachers are selected.
Solution:
𝑁 = 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑏𝑗 = 4 + 2 = 6
𝑛 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑐𝑜𝑚𝑚𝑖𝑡𝑡𝑒𝑒 = 3
𝑘 = 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙 𝑜𝑏𝑗 = 4
𝑈𝑝𝑝𝑒𝑟: 𝑚𝑖𝑛(3, 4) = 3
𝑅𝑎𝑛𝑔𝑒 = (1 , 3) 𝑖𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒
𝑋 ~ 𝐻( 𝑁 = 6, 𝑛 = 3, 𝑘 = 4)
⎡ ( ).( ) ⎤
𝑘 𝑁−𝑘
𝑃 ( 𝑋≥2) = 1 − ⎢ 𝑥 𝑁 𝑛−𝑥 ⎥
⎣ (𝑛) ⎦
⎡ ( )( ) ⎤
4 6−4
= 1 − ⎢ 1 63−1 ⎥
⎣ (3) ⎦
4
= 1 − ⎡ 20 ⎤
⎣ ⎦
= 0. 8
𝑈𝑝𝑝𝑒𝑟 : 𝑚𝑖𝑛(3, 2) = 2
𝑅𝑎𝑛𝑔𝑒 = (0 , 2) 𝑖𝑛𝑐𝑙𝑢𝑠𝑖𝑣𝑒
𝑆𝑜 , 𝑥 = 0, 1, 2
𝑃(𝑋 = 0) =
( ).( )
2
0
6−2
3−0
= 0. 2
( ) 6
3
𝑃(𝑋 = 1) =
( ).( )
2
1
6−2
3−1
= 0. 6
( )6
3
𝑃(𝑋 = 2) =
( ).( )
2
2
6−2
3−2
= 0. 2
( )6
3
X Probability
0 0.2
1 0.6
2 0.2
Total 1
𝑓(𝑥) >= 0
o Parameter:
▪ a = lower limit
▪ b = upper limit
𝑎+𝑏
o Mean = 2
2
(𝑏 − 𝑎)
o Variance = 12
Question: The total duration of basketball game in 2011 is between 447 and 521
hrs
𝑋 ~ 𝑈 (447 , 521)
1 1
𝐹(𝑥) = 521 − 447
= 74
a) What is the probability that the duration is between 480 and 500 hrs?
𝑎 = 480 , 𝑏 = 500
500
1 1
P(480 ≤ x ≤ 500) = ∫ 74
= 74
[ 500 - 480] = 0.27
480
b) What is the probability that the duration is greater than equal 500 hrs?
𝑎 = 500 , 𝑏 = 521
521
1 1
P(500 < x ≤ 521) = ∫ 74
= 74
[ 521 - 500] = 0.28
500
Easier method:
𝑓(𝑥) = 2
𝑒
2Πσ
Parameters:
● Mean: 𝜇
● Standard deviation: σ
To convert a normal variable X into the standard normal form (denoted as Z), use
this formula:
𝑋−𝜇
𝑍= 𝛿
● Z: The Z-score (how many standard deviations X is away from the mean).
● X: The actual value from the distribution.
● μ: The mean of the distribution.
● σ: The standard deviation of the distribution.
Calculating Probabilities:
● Use the standard normal distribution table (Z-table)
● Example: To find P(X ≤ x), standardize x to Z, then look up the cumulative
probability for Z in the Z-table.
Question:
The test scores of a physics class with 800 students are distributed normally with a
mean of 75 and a standard deviation of 7.
(a) What percentage of the class has a test score between 68 and 82?
(b) Approximately how many students have a test score between 61 and 89?
(c) What is the probability that a student chosen at random has a test score
between 54 and 75?
(d) Approximately how many students have a test score greater than or equal 96?
Solution:
● Mean= 𝑢 = 75
● Standard Deviation = σ = 7
● Total Students = 800
● For X=68,
68−75
𝑍= 7
=− 1
𝑃(𝑍 ≤− 1) = 0. 15866
● For X=82,
82−75
𝑍= 7
= 1
𝑃(𝑍 ≤ 1) = 0. 84134
● For X=61,
61−75
𝑍= 7
=− 2
𝑃(𝑥 ≤ − 2) = 0. 02275
● For X=89,
89−75
𝑍= 7
= 2
𝑃(𝑥 ≤ 2) = 0. 97725
Students= 0.9544×800=763.52≈764
● For X=54,
54−75
𝑍= 7
=− 3
𝑃(𝑍 ≤− 3) = 0. 00135
● For X=75,
75−75
𝑍= 7
= 0
𝑃(𝑍 ≤ 0) = 0. 50000
● For X=96,
96−75
𝑍= 7
= 3
𝑃(𝑍≤3) = 0. 9987
The probability of 𝑃(𝑍≥3) = 1 − 0. 9987 = 0. 0013
Students=0.0013×800=1.04≈1
Correlation:
● If both variables have the same direction (both increasing or both decreasing)
then there is positive correlation.
● If both variables have different directions (if one increases, other decreases)
then there is negative correlation.
● If no variable is affected by another then there is no relation.
Notation: r
Where,
− 1 <= 𝑟 <= 1
If,
Formula:
𝑛Σ𝑥𝑦 − (Σ𝑥)(Σ𝑦)
𝑟= 2 2 2 2
(𝑛Σ𝑥 −(Σ𝑥) )(𝑛Σ𝑦 −(Σ𝑦) )
Question: Find the relation between marks of the students and number of lectures
he attended?
N=5
X (Marks) Y (Lectures) XY 𝑋
2
𝑌
2
3 1 3 9 1
5 2 10 25 4
7 2 14 49 4
2 1 2 4 1
5 3 15 25 9
ΣX = 22 ΣY = 9 ΣXY = 44 2
Σ𝑋 = 112
2
Σ𝑌 = 19
Thus,
5*44 − (22)(9)
𝑟= 2 2
= 0. 67
(5*112−(22) )(5*19−(9) )
Regression:
Formula:
𝑦 = 𝑎 + 𝑏(𝑥)
● a = value of y when x is 0
● b = slop formed by x and y
Thus,
𝑎 = 𝑦 − 𝑏𝑥
𝑛Σ𝑥𝑦−(Σ𝑥)(Σ𝑦)
𝑏= 2 2
𝑛Σ𝑥 −(Σ𝑥)
Interpretation:
If x(independent) is increased/decreased by one unit then there are b(slope)
times increase/decrease occur in y(dependent).
N=5
X (Lectures) Y (Marks) XY 𝑋
2
5 11 55 25
7 10 70 49
3 15 45 9
6 4 24 36
4 12 48 16
ΣX = 25 ΣY = 52 ΣXY = 242 2
Σ𝑋 = 135
25
𝑥= 5
= 5
52
𝑦= 5
= 10. 4
As,
𝑦 = 𝑎 + 𝑏(𝑥)
Then,
𝑛Σ𝑥𝑦−(Σ𝑥)(Σ𝑦) 5*242−(25)(52)
𝑏= 2 2 = 2 = − 1. 8
𝑛Σ𝑥 −(Σ𝑥) 5*135−(25)
𝑎 = 𝑦 − 𝑏𝑥 = 10. 4 + 1. 8 * 5 = 19. 4
Thus,
𝑦 = 19. 4 − 1. 8(𝑥)
So, if the number of lectures increases by one then marks will decrease by -1.8.
𝑦 = 19. 4 − 1. 8(10) = 1. 4
Thus, if students attend 10 lectures then it is assumed that he gain 1.4 marks.
Hypothesis Testing
Null-Hypothesis:
Alternative-Hypothesis:
● Represents the claim we want to test.
● It states that there is an effect, difference, or relationship.
● Example: Ha: μ≠50H(the population mean is not 50).
● Inequality must lie in alternative hypothesis
Level of Significance:
● The threshold for accepting/rejecting the null hypothesis.
● Common values are 0.1(10%), 0.05 (5%) or 0.01 (1%).
Test-Statistic:
● A standardized value used to decide whether to reject H0.
● We commonly use T-statistic(t-test)
𝑥−𝜇
Formula: 𝑡 = 𝛿
𝑛
Where,
𝑥 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒
𝜇 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑆 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒
𝑛 = 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒
Rejection/Critical Region:
Case A(2-tail):
If H0 is 𝜇 = 𝜇𝑜
then, Reject if
● t-value>+table-value
● t-value <-table-value
Case B:
If H0 is 𝜇≤𝜇𝑜
then, Reject if
● t-value>+table-value
Case C :
If H0 is 𝜇≥𝜇𝑜
then, Reject if
● t<-table-value
Conclusion:
● Draw conclusion in context of the problem
Question: From the data available, it is observed that 400 out of 850 customers
purchased the groceries online. Can we say that most of the customers are moving
towards online shopping even for groceries?
Solution:
400
𝑥= 850
= 0. 47
µ = 0. 5 (𝑖𝑓 𝑎𝑛𝑦 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑐𝑟𝑜𝑠𝑠 50% 𝑚𝑒𝑎𝑛𝑠 𝑖𝑡 𝑖𝑠 𝑖𝑛 𝑚𝑎𝑗𝑜𝑟𝑖𝑡𝑦)
Step5: (Conclusion)
Hence, we reject the null hypothesis, that means with given data we can validate
significantly that most of the customers are not moving towards online shopping
even for groceries.
Confidence Interval:
1−µ 1−µ
𝑥 − (𝑧 α ) (µ)( 𝑛
) < 𝜇 < 𝑥 + (𝑧 α ) (µ)( 𝑛
)
2 2
Question: A teacher claims that the average score of students in class is greater
than 6. To test this claim we took a sample of 5 has mean of 5 and sample standard
deviation of 2, and level of significance is 5% or α = 0. 05
Solution:
α = 0. 05, 𝑥 = 5, 𝑆 = 2, 𝑛 = 5, µ = 6
Step5: (Conclusion)
As from step4, we are unable to reject the null hypothesis, so by this test the
average score of class is less than equal 6.
Confidence Interval:
𝑆 𝑆
𝑥 − (𝑡 α )( ) < 𝜇 < 𝑥 + (𝑡 α )( )
2 𝑛 2 𝑛