Unit3 L2
Unit3 L2
• Modeling Uncertain Events: Probability allows data scientists to model and analyze
uncertain events.
For example, in insurance risk assessment, probability models are used to estimate the
likelihood of certain events, such as car accidents or property damage, helping insurance
companies determine premium rates.
• Decision Making: Probability helps in making optimal decisions based on available data.
For instance, in medical diagnosis, probability models can be used to calculate the probability
of a patient having a particular disease based on symptoms and test results, aiding doctors in
making accurate diagnoses.
- Data Science
- Insurance
- Genetic Science
- Clinical trials
- Confidence Intervals
It helps to know the probability of our data lying within a given interval.
- Probability distribution
Most of the data follows a certain distribution, some follow a normal distribution,
Poisson distribution, Bernoulli distribution, Binomial distribution, while some others
follow Exponential distribution
- Bayes Theorem AND Naive Bayes Algorithm
Bayes Theorem is used in machine learning, specifically in the Naive Bayes algorithm.
This algorithm is commonly used for text classification, spam filtering, and sentiment
analysis. It calculates the probability of a certain event occurring given the prior
knowledge of related events.
- Conditional Probability
Conditional probability is used in various Statistical and data science applications, such
as recommendation systems. It helps determine the likelihood of a certain event
occurring given the occurrence of another event. For example, in a movie
recommendation system, conditional probability can be used to predict the probability
of a user liking a certain movie based on their previous ratings and preferences.
This material is prepared and editted by Dr. Khaled Mohamad
Usage of probability in Statistical analysis and Data Science Cont….
- Markov Chains
Markov Chains are used in a variety of Machine Learning applications, such as natural language processing and speech recognition. They model
sequential data where the probability of transitioning from one state to another depends only on the current state
These concepts play crucial roles in various techniques and algorithms, enabling scientists to make predictions, classify data, analyze
patterns, and gain insights from complex datasets.
Sample space: It is a universal set that consists of all possible outcomes of an experiment. An example: outcome of a college
application S = {admitted, not admitted}
Event: It is a subset of sample space and probability is usually calculated with respect to an event.
Example: Chances of getting head on a fair coin.
𝑛 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐴
𝑃 𝐴 = =
𝑁 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒
Rules of probability
Rule I:
The probability of an impossible event is zero. The probability of a certain event is one, Therefore, for any event A, the range of possible
probability is
0 ≤ 𝑃(𝐴) ≤ 1
Answer: P(yellow) = 0
Rule III:
For any event A, P(AC) = 1 – P(A). It follows then that P(A) = 1 – P(AC), where AC is the complement of the event A.
Example:
The probability of not have red P(RedC)
In order to understand let us have an example from Netflix (a survey). It is a survey for 500 answers
Frequency distribution table
To convert it to probability distribution table, simple you will divide all entries by 500.
Joint probability is a statistical measure that calculates the likelihood of two events occurring together at the same time
Marginal probability is the probability of an event irrespective of the outcome of another variable
Question 3: What is the probability of a Netflix subscriber preferring Breaking Bad and being Male?
Answer: 0.2
Question 5: Nur is a new Netflix subscriber, what is the chance that he would like breaking bad?
Answer: As you can see Nur is a male and we basically need to calculate the probability of liking breaking Bad given that he is a male.
This scenario is called Conditional probability.
𝑷(𝑨 ∩ 𝑩)
𝑷 𝑨|𝑩 =
𝑷(𝑩)
This material is prepared and editted by Dr. Khaled Mohamad
It is the probability of event A given B has occurred equal to the intersection of probability A and B (Both of them occurred together) divide by the
probability of event B.
0.2
𝑃 𝐵𝑟𝑒𝑎𝑘𝑖𝑛𝑔 𝑀𝑎𝑙𝑒) = = 0.43
0.46
Question 6: Marie is anew Netflix subscriber, what is the chance that she would like Money heist?
Answer: it is a conditional probability
0.24
𝑃 𝑀𝑜𝑛𝑒𝑦 𝐹𝑒𝑚𝑎𝑙𝑒) = = 0.44
0.54
Disjoint Events are the events that cannot happen at the same time. It is also known as Mutually exclusive events.
Written in probability notation, events A and B are disjoint if their intersection is zero. This can be written as:
P(A and B) = 0 A B
P(A∩B) = 0
Examples:
- The outcome of a single coin toss can not be a head and tail at the same time.
- A student can not both fail and pass a same subject
We know that 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵), since the events A and B are disjoint, this means that their intersection is zero.
Example: A student can get an A in statistics and A in History at the same time. This means
P(A and B) ≠ 0
A B
P(A∩B) ≠ 0
𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − P(A ∩ B) .
Dependent events are those events that are affected by the outcomes of another events. i.e. Two or more events that depend on one another are
known as dependent events. If one event is changed, then another is going to change.
Examples:
- Getting into a traffic accident is dependent upon driving or riding in a vehicle
- If you park your car illegally, you are more likely to get a parking ticket
- You must buy a lottery ticket to have a chance at winning, the likelihood of winning are increased if you buy more than one ticket.
Independent Events are those events whose occurrence is not dependent on any other event. If the probability of occurrence of an event A is not
affected by the occurrence of another event B, then A and B are said to be independent events.
Example, the color of your hair has absolutely no effect on where you work. The two events of “having black hair” and “Working in a google” are
completely independent of one another.
- Taking an uber and getting a free meal at your favorite restaurant.
The multiplicative rule of probability or product rule is used to find the probability of an intersection of events or two events happening at the
same time.
There are two forms of this rule
- Specific Multiplication Rule
- General Multiplication Rule
Example 1:
Calculate the probability of obtaining “head” during two consecutive coin flips
Answer: In the first flip, P(H)= 0.5 and P(H) = 0.5 in the second flip. The probability of getting head in the first flip and getting head in the second
flip are P(Head ∩ Head)= 0.5x0.5 = 0.25
Answer: the probability of getting blue pants is P(Blue_pant) = 3/10 = 0.3 and the probability of getting red shirt P(red_shirt) = 4/16 = 0.25. The
probability of getting blue pants and red shirt is P(red_shirt ∩ Blue_pants)= 0.25x0.3 = 0.075
In these situations, we apply the specific multiplication rule
Answer:
Total number of students = 23 + 25 = 48
Then,
P(B1 and B2) = P(B1) and P(B2|B1)
= (25/48) × (24/47)=0.265
Answer a):
There are six red balls and a total of fifteen balls.
P (red) = 6 / 15
P(drawing red, then blue) = P (drawing red) * P (blue after red) = (6/15)*(5/14)= 0.143
This material is prepared and editted by Dr. Khaled Mohamad
Answer b):
Now, there are 4 blue balls left and a total of 14 balls left.
P (blue, then blue) = P (drawing blue ball) * (drawing a blue ball after a blue ball)= (5/15) x (4/14)
Answer:
P (drawing a 5-dollar bill) ?
Number of 5-dollar bills = 4
Total number of bills = 12
P (drawing a 5-dollar bill) = 4 / 12
P (drawing a 5-dollar bill followed by a 5-dollar bill) = P (drawing a 5-dollar bill) * P (drawing a 5-dollar bill after a 5-dollar bill) = 4/12 x 3/11 =
0.91
The result of the first draw will affect the probability of the second draw.
P(red, then pink) = P(red) x P(pink after red) = (4/13) × (1/4) = 1/13
Hence, the probability of drawing a red ball and then a pink ball is 1/13.
This material is prepared and editted by Dr. Khaled Mohamad
Answer b):
P(pink) = 3/13
The result of the first draw will affect the probability of the second draw.
P(pink, then pink) = P(pink) · P(pink after pink)= (3/13) × (1/6) = 3/78 = 1/26
Hence, the probability of drawing a pink ball and then a pink ball is 1/26.
This material is prepared and editted by Dr. Khaled Mohamad
The Law of Total Probability
- The Law of Total Probability is a fundamental concept in probability theory that allows us to calculate the probability of an event by
considering all possible ways or scenarios that can lead to that event.
- It is often used when the event of interest can occur under different conditions or circumstances.
- Formally, let's consider an event B and a set of mutually exclusive events (disjoint events, it means that only one of them can occur at a time)
and exhaustive events (it means that together they cover all possible outcomes or scenarios) {A₁, A₂, ..., An}.
The Law of Total Probability states that the probability of event B can be calculated as the sum of the probabilities of B occurring given each
possible condition Ai, weighted by the probability of each condition Ai occurring. Mathematically, it can be expressed as follows:
Example:
A company manufactures two types of smartphones: Type A and Type B. The company produces 60% Type A smartphones and 40% Type B
smartphones. The defect rate for Type A smartphones is 5%, while the defect rate for Type B smartphones is 3%. Suppose that a customer buys a
smartphone from this company, and we want to calculate the probability that the purchased smartphone is defective.
We want to calculate P(B), the probability that the purchased smartphone is defective.
Therefore, the probability that the purchased smartphone is defective is 0.042 or 4.2%.
This material is prepared and editted by Dr. Khaled Mohamad
Bayes Theorem
It is mathematical formula for determining conditional probability. It shows the relation between a conditional probability and its reverse form.
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
To prove it
𝑃(𝐴∩𝐵) 𝑃(𝐵∩𝐴)
𝑃 𝐴|𝐵 = and 𝑃 𝐵|𝐴 =
𝑃(𝐵) 𝑃(𝐴)
Examples:
- Given that test was positive, what is the probability of having a disease.
- Given that a person likes action, what is the probability that he would like to watch Kung Fu Panda.
Answer:
A: Player is lying
B: Player wins the game
We are given:
P(A) = 0.75 (probability that a player is lying)
P(not A) = 0.25 (probability that a player is not lying)
P(B|A) = 0.43 (probability of winning when lying)
P(B|not A) = 0.57 (probability of winning when not lying)
We need to find P(A|B), the probability that a player was lying given that they won the game.
According to Bayes' theorem:
𝑃(𝐵|𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
This material is prepared and editted by Dr. Khaled Mohamad
To calculate P(B), we can use the law of total probability:
Therefore, the probability that a player was lying given that they won the game is approximately 0.691, or 69.1%.
1. Testing Hypothesis
Hypothesis approximates a target function which needs to be tested on data. For example, mean weight of newborn baby is 3.5 kg.
Bayes theorem allows to test whether the hypothesis hold true for a given data as, P(h|D) which means, probability of “hypothesis” being true for a
given “data”.
2. Classification
When the possible values are categorical. Bayes theorem can be applied for classification problems.
For example: whether a customer defaults on credit card payment or not based on the account balance.
- Naïve Bayes classifier is one implementation of Bayes theorem.
3. Model Optimization
Optimizing machine learning models involves finding an input that minimizing or maximizes an objective function. For example, if my objective
function is related to accuracy, the objective is to maximizes. If it is related to a cost may be the objective is to minimize.
- Bayes’ theorem applied probability to find out these values,
- Bayesian optimization is a technique used to improve the performance of a machine learning model.
In the previous topics in this chapter, our explanation was based on coins, dice and other random experiment with a few outcomes
In reality,
Our main treatment
- In web, …. “subscriber, clicks, viewers ….
- Sales, ….”Yield, weight, sales….
- Traffic, ….”time, congestion, delay…..
- Medicine, ….”age, temperature, heart rate…
- Student,…… “GPA, Tuition, assignment…
Random variable is a real -valued function, defined over the sample space of a random experiment.
- It is a rule that assigns a numerical value to each outcome in a sample space.
- Generally, the random variable denotes with capital letters such as X and Y and defines the possible outcome values of an unexpected
phenomenon
▪ It is a variable that takes only a finite number of distinct values such as 0, 1, 2 and so on. Examples: {1,2,3}, {e, pi} or countably infinite N
(natural number) or Z (integer number)
X Outcomes P(X)
0 ttt 1/8
1 tth, tht, htt 3/8
3 Hhh 1/8
X 1 2 3 4
P(X) 0.1 0.2 0.3 0.4
It is explicit that P(1) = 0.1, P(2) = 0.2, P(3) = 0.3, P(4) = 0.4
𝑋
- Without number, you can make it as function 𝑃 𝑋 = , 𝑋 ∈ 1, 2, 3, 4
10
For example, the height of students in a class, the amount of tea in a glass, the change in temperature throughout a day, and the number of hours a
person works in a week all contain a range of values in an interval, thus continuous random variables.
- The main difference between continuous and discrete random variables is that continuous probability is measured over intervals, while
discrete probability is calculated on exact points.
- For example, it would make no sense to find the probability it took exactly 32 minutes to finish an exam. It might take you 32.012342472…
minutes. Probability of points no longer makes sense when we move from discrete to continuous random variables.
- Instead, you could find the probability of taking at least 32 minutes for the exam,
or the probability of taking between 31 and 33 minutes to complete the exam. Probability
functions distributions will help us in this regard. These functions help to determine the
probability by finding the area under the function where the total area under the curve
must equal 1.
- It is a function that describes the relative likelihood for a random variable to take on a given value. It is a mapping. P maps the sample space to
real numbers P : Ω →R
- It is non-negative and its integral over the entire sample space is equal to 1 or they sum to one σ𝑋∈ Ω 𝑃 𝑋 = 1
The probability density function produces the likelihood of values of the continuous random variable.
- It is defined as an integral of the density function of the random variable over a given range. It is denoted by f (x).
o This function is positive or non-negative at any point of the graph, 𝑓 𝑥 ≥ 0
∞
o and the integral, more specifically the definite integral of PDF over the entire space is always equal to one. −∞ 𝑓 𝑥 𝑑𝑥 = 1
- If we find P(X = x), it does not work. Instead of this, we must calculate the probability of X lying in an interval (a, b). The Probability density
function formula is given as,
𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓 𝑥 𝑑𝑥
𝑎
There is a simple function will help to determine interval probabilities called cumulative distribution function (CDF)
𝐹 𝑥 = 𝑃 𝑋 ≤ 𝑥 = σ𝑢≤𝑥 𝑃 𝑢
- It is a function that describes the probability that a random variable will take on a value less than or equal to a certain value.
- It is a way to describe the discrete or continuous probability distribution of a random variable.
- An example of CDF is Exponential distribution
In other words, a PDF gives the probability of a random variable being in a certain range, while a CDF gives the probability of a random variable
being less than or equal to a certain value.
This material is prepared and editted by Dr. Khaled Mohamad
Properties of CDF
- Non decreasing
It means that the values of the CDF increase or stay the same as the input variable increases. In other words, as the input variable moves along the
range of possible values, the CDF either remains constant or increases.
Mathematically, for any two values x₁ and x₂ in the domain of the random variable, if x₁ ≤ x₂, then the corresponding CDF values F(x₁) and F(x₂)
satisfy F(x₁) ≤ F(x₂).
We mentioned that CDF can be used for both discrete and continues random variables.
Solution:
To calculate the CDF, we need to sum up the probabilities of achieving a score less than or equal to a specific value.
Number of
CDF(x) = P(X ≤ x) Score
Students
For example, to find the probability that a student scored less than or equal to 80, we calculate the CDF as follows: 60 5
70 10
CDF(80) = P(X ≤ 80) = P(X = 60) + P(X = 70) + P(X = 80) 80 15
90 7
To calculate each probability, we divide the number of students who achieved that score by the total number of students: 100 3
P(X = 60) = 5 / (5 + 10 + 15 + 7 + 3)
P(X = 70) = 10 / (5 + 10 + 15 + 7 + 3)
P(X = 80) = 15 / (5 + 10 + 15 + 7 + 3)
Then, we sum up these probabilities:
CDF(80) = (5 / 40) + (10 / 40) + (15 / 40) = 0.375
Therefore, the CDF at 80 is 0.375, indicating that there is a 37.5% probability that a student scored 80 or less on the final exam.
By calculating the CDF for different values, the professor can determine the probabilities associated with specific score ranges and make informed
decisions about grading or assessing the performance of their students
This material is prepared and editted by Dr. Khaled Mohamad
Probability Distribution Function for Discrete Random variables
Binomial Distribution
- It is a discrete probability distribution of a random variable that has two outcomes, either Success or Failure.
- The objective is to find the probability of getting x successes out of n-trials
For example, if we toss a coin, there could be only two possible outcomes: heads or tails, and if any test is taken, then there could be only two
results: pass or fail.
(b) At least 4 heads. This material is prepared and editted by Dr. Khaled Mohamad
Solution:
The repeated tossing of the coin is an example of a Bernoulli trial. According to the problem:
Number of trials: n=5
Probability of head: p= 1/2 and hence the probability of tail, (1-p) =1/2
(a) For exactly two heads:
x=2 (number of getting head/success)
5!
P(x=2) = 𝑛𝐶𝑥 𝑃 𝑥 1 − 𝑃 𝑛−𝑥
= × 0.52 × 0.5 3
2! 3 !
P(x=2) = 5/16
(b) For at least four heads,
P(x ≥ 4) = P(x = 4) + P(x=5)
Hence,
5!
P(x = 4) = × 0.54 × 0.5 1 = 5/32
4! 1 !
Uniform Distribution
- It is a probability distribution that has constant value in each interval
- It is also known as Rectangular Distribution
- Used to generate random values
An example of uniform distribution
Random Number Generation: When generating random numbers within a specified range, a uniform distribution is often used. For example, if you
need to generate random integers between 1 and 100, a uniform distribution ensures that each number in that range has an equal probability of
being selected.
Exponential Distribution
- It provides a way to find the probabilities in time for a process
- Traditionally, it is used for modelling time to failure of electronic components
- This represents a process in which events occur continuously and independently at a constant average rate
Exponential Distribution Formula
The continuous random variable x have an exponential distribution, if it has the following probability density function:
𝑃𝐷𝐹 𝑥 = 𝜆𝑒 −𝜆𝑥
Where λ (lambda) is the rate parameter, which determines the average number of events per unit of time. x is the time variable
This material is prepared and editted by Dr. Khaled Mohamad
Example: Assume that, you usually get 2 phone calls per hour. calculate the probability, that a phone call will come within the next hour.
Solution:
It is given that, 2 phone calls per hour. So, it would be expected that one phone call every half-an-hour. So, we can take
λ = 0.5
So, the computation is as follows:
1 1
−𝜆𝑥
𝑃𝐷𝐹 0 ≤ 𝑥 ≤ 1 = න 𝜆𝑒 = න 0.5𝑒 −0.5𝑥 = 0.393469
0 0
Therefore, the probability of receiving the phone calls within the next hour is 0.393469
Normal Distribution
- One of the most important probability Distribution in the field of statistics
- It is also called the Gaussian distribution or Bell curve.
- Fits several natural phenomena such as Measurement error, heights, IQ score, and Blood pressure and so on
Normal Distribution Formula
The probability density function of normal or gaussian distribution is given by
1 − 𝑥−𝜇 2
𝑓 𝑥, 𝜇, 𝜎 = 𝑒 2𝜎2
𝜎 2𝜋
Where, x is the variable, μ is the mean, and σ is the standard deviation
Answer:
a)
Using the PDF of the normal distribution, the probability density at a specific point x is given by the formula:
1 − 𝑥−𝜇 2
𝑓 𝑥, 𝜇, 𝜎 = 𝑒 2𝜎2
𝜎 2𝜋
For this problem, x = 54, μ = 65, and σ = 9. Plugging these values into the formula, we get:
1 −121
𝑓 54 = 𝑒 162 = 0.021
9 2𝜋
Therefore, the probability that student get mark less than 56 is equal to the area under the curve to the left of 56, which is the cumulative
probability. We can obtain this by integrating the PDF from negative infinity to 56:
54
𝑃 ≤ 54 = න 𝑓 𝑥 𝑑𝑥
−∞
Probability
(shaded area)
𝑃 ≤ 54 = 𝑃 𝑍 = −1.22 = 0.1112
b) At least 80
- zscore for the value 80 is
𝑥−𝜇 80 − 65
𝑍= = = 1.666
𝜎 9
This material is prepared and editted by Dr. Khaled Mohamad
𝑃 ≥ 80 = 1 − 𝑃 ≤ 80 = 1 − 𝑃 𝑍 = 1.67 = 1 − 0.9525 = 0.0475
a) Between 70 and 86
𝑥−𝜇 70−65
𝑍= = = 0.555
𝜎 9
𝑥−𝜇 86−65
𝑍= = = 2.333
𝜎 9
𝑃 70 ≥ 𝑥 ≥ 86 = 𝑃 ≤ 86 − 𝑃 ≤ 70