0% found this document useful (0 votes)
21 views53 pages

Unit3 L2

Probability is the likelihood of an event occurring. It is quantified on a scale from 0 to 1, with 0 being impossible and 1 being certain. Probability is used in many domains like data science, insurance, and clinical trials to model uncertain events and make predictions. Key concepts in probability include sample spaces, events, rules of probability, and types of probabilities like marginal and joint probabilities. Probability provides a foundation for statistical analysis and machine learning algorithms.

Uploaded by

zenkaevaaiym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views53 pages

Unit3 L2

Probability is the likelihood of an event occurring. It is quantified on a scale from 0 to 1, with 0 being impossible and 1 being certain. Probability is used in many domains like data science, insurance, and clinical trials to model uncertain events and make predictions. Key concepts in probability include sample spaces, events, rules of probability, and types of probabilities like marginal and joint probabilities. Probability provides a foundation for statistical analysis and machine learning algorithms.

Uploaded by

zenkaevaaiym
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Probability

This material is prepared and editted by Dr. Khaled Mohamad


Probability is simply how likely something is
to happen

Whenever we are unsure about the outcome of an


event. We can talk about the probabilities of
certain outcomes- how likely they are

Example: if you are flipping a fair coin, how can


you quantify which one will you get, tail or head
in such scenario we make use of probability

This material is prepared and editted by Dr. Khaled Mohamad


Why probability is important?

• Accurate Predictions: Probability helps scientists to make accurate predictions by


quantifying the likelihood of different outcomes.
For example, in weather forecasting, meteorologists use probability models to predict the
chance of rain, enabling people to plan their activities accordingly

• Understanding Uncertainty: Probability helps in understanding and managing


uncertainty in data.
For instance, in stock market analysis, data scientists use probability models to estimate the
probability of a stock's price going up or down, enabling investors to make informed decisions.

• Modeling Uncertain Events: Probability allows data scientists to model and analyze
uncertain events.
For example, in insurance risk assessment, probability models are used to estimate the
likelihood of certain events, such as car accidents or property damage, helping insurance
companies determine premium rates.

• Decision Making: Probability helps in making optimal decisions based on available data.
For instance, in medical diagnosis, probability models can be used to calculate the probability
of a patient having a particular disease based on symptoms and test results, aiding doctors in
making accurate diagnoses.

This material is prepared and editted by Dr. Khaled Mohamad


Probability is important …

• Framework for Other Concepts: Probability provides a foundation for


various other concepts in data science. For example, machine learning
algorithms often use probability theory to estimate the likelihood of different
outcomes and make predictions.

Usage of probability: probability is used in various domain.


Some of them are

- Data Science

- Insurance

- Genetic Science

- Clinical trials

- Vaccination efficacy testing


This material is prepared and editted by Dr. Khaled Mohamad
Usage of probability in Statistical analysis and Data
Science

- Confidence Intervals
It helps to know the probability of our data lying within a given interval.
- Probability distribution
Most of the data follows a certain distribution, some follow a normal distribution,
Poisson distribution, Bernoulli distribution, Binomial distribution, while some others
follow Exponential distribution
- Bayes Theorem AND Naive Bayes Algorithm
Bayes Theorem is used in machine learning, specifically in the Naive Bayes algorithm.
This algorithm is commonly used for text classification, spam filtering, and sentiment
analysis. It calculates the probability of a certain event occurring given the prior
knowledge of related events.
- Conditional Probability
Conditional probability is used in various Statistical and data science applications, such
as recommendation systems. It helps determine the likelihood of a certain event
occurring given the occurrence of another event. For example, in a movie
recommendation system, conditional probability can be used to predict the probability
of a user liking a certain movie based on their previous ratings and preferences.
This material is prepared and editted by Dr. Khaled Mohamad
Usage of probability in Statistical analysis and Data Science Cont….

- Central limit Theorem


The Central Limit Theorem is a fundamental concept in statistics and is widely used in Statistical analysis and data science. It will be discussed in
details in the next lecture. This theorem allows scientists to make inferences about a population based on a sample, enabling hypothesis testing and
confidence interval estimations.

- Markov Chains
Markov Chains are used in a variety of Machine Learning applications, such as natural language processing and speech recognition. They model
sequential data where the probability of transitioning from one state to another depends only on the current state

These concepts play crucial roles in various techniques and algorithms, enabling scientists to make predictions, classify data, analyze
patterns, and gain insights from complex datasets.

This material is prepared and editted by Dr. Khaled Mohamad


Key Terminology of Probability Theory

Random experiment: It is an experiment in which outcome is not known with certain

Sample space: It is a universal set that consists of all possible outcomes of an experiment. An example: outcome of a college
application S = {admitted, not admitted}

Event: It is a subset of sample space and probability is usually calculated with respect to an event.
Example: Chances of getting head on a fair coin.

This material is prepared and editted by Dr. Khaled Mohamad


Probability and Rules of Probability
In General, the probability score is given by the number of outcomes divided by the total number of outcomes in the sample space

𝑛 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝐴
𝑃 𝐴 = =
𝑁 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒

Rules of probability

Rule I:
The probability of an impossible event is zero. The probability of a certain event is one, Therefore, for any event A, the range of possible
probability is
0 ≤ 𝑃(𝐴) ≤ 1

Example: Calculate the probability of drawing yellow ball?

Answer: P(yellow) = 0

This material is prepared and editted by Dr. Khaled Mohamad


Rule II:
For S the sample space of all possibilities, P(S) = 1. That is the sum of all the possibilities for all possible evens is equal to one.

Example: Calculate P(Green), P(Red), P(Blue), and P(S)


P(Green) = 10/20=0.5
P(Red) = 7/20=0.35
P(Blue) = 3/20=0.15
P(S) = P(Green)+ P(Red)+ P(Blue)= 0.15+0.35+0.5=1

Rule III:
For any event A, P(AC) = 1 – P(A). It follows then that P(A) = 1 – P(AC), where AC is the complement of the event A.

Example:
The probability of not have red P(RedC)

P(RedC) = 1-P(Red) = 1 – 0.35 = 0.65

This material is prepared and editted by Dr. Khaled Mohamad


Marginal Probability and Joint Probability

In order to understand let us have an example from Netflix (a survey). It is a survey for 500 answers
Frequency distribution table

Male Female Total


Breaking Bad 100 25 125
Money Heist 80 120 200
Other 50 125 175
230 270 500

To convert it to probability distribution table, simple you will divide all entries by 500.

Male Female Total


Breaking Bad 0.2 0.05 0.25
Money Heist 0.15 0.24 0.4
Other 0.1 0.25 0.35
0.46 0.54 1

Joint probability is a statistical measure that calculates the likelihood of two events occurring together at the same time
Marginal probability is the probability of an event irrespective of the outcome of another variable

This material is prepared and editted by Dr. Khaled Mohamad


Question 1: What is the probability of a Netflix subscriber being Male?
Answer: 0.46

Question 2: What is the probability of a Netflix subscriber preferring Money Heist?


Answer: 0.4

Question 3: What is the probability of a Netflix subscriber preferring Breaking Bad and being Male?
Answer: 0.2

It is referred as the intersection between the two events 𝑷 𝑨 ∩ 𝑩

This material is prepared and editted by Dr. Khaled Mohamad


Question 4: What is the probability of a Netflix subscriber preferring Breaking Bad or being Female?
Male Female Total
Breaking Bad 0.2 0.05 0.25
Money Heist 0.15 0.24 0.4
Other 0.1 0.25 0.35
0.46 0.54 1

Answer: 0.2+0.05+0.24+0.25 = 0.74


It is referred as the union between the two events 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩)
Let us use the above formula by assuming A(Breaking Bad) and B (Female), this means
𝑃 𝐴 ∪ 𝐵 = 0.25 + 0.54 − 0.05 = 0.74

Question 5: Nur is a new Netflix subscriber, what is the chance that he would like breaking bad?

Answer: As you can see Nur is a male and we basically need to calculate the probability of liking breaking Bad given that he is a male.
This scenario is called Conditional probability.
𝑷(𝑨 ∩ 𝑩)
𝑷 𝑨|𝑩 =
𝑷(𝑩)
This material is prepared and editted by Dr. Khaled Mohamad
It is the probability of event A given B has occurred equal to the intersection of probability A and B (Both of them occurred together) divide by the
probability of event B.

0.2
𝑃 𝐵𝑟𝑒𝑎𝑘𝑖𝑛𝑔 𝑀𝑎𝑙𝑒) = = 0.43
0.46

Question 6: Marie is anew Netflix subscriber, what is the chance that she would like Money heist?
Answer: it is a conditional probability

0.24
𝑃 𝑀𝑜𝑛𝑒𝑦 𝐹𝑒𝑚𝑎𝑙𝑒) = = 0.44
0.54

This material is prepared and editted by Dr. Khaled Mohamad


Disjoint and Non-Disjoint Events

Disjoint Events are the events that cannot happen at the same time. It is also known as Mutually exclusive events.
Written in probability notation, events A and B are disjoint if their intersection is zero. This can be written as:

P(A and B) = 0 A B

P(A∩B) = 0
Examples:
- The outcome of a single coin toss can not be a head and tail at the same time.
- A student can not both fail and pass a same subject

Union of Disjoint Events

We know that 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵), since the events A and B are disjoint, this means that their intersection is zero.

The union of disjoint event 𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 .

This material is prepared and editted by Dr. Khaled Mohamad


Non-Disjoint Events, where events can happen at the same time.

Example: A student can get an A in statistics and A in History at the same time. This means

P(A and B) ≠ 0

A B
P(A∩B) ≠ 0

Union of Non- Disjoint Events

𝑃 𝐴 𝑜𝑟 𝐵 = 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − P(A ∩ B) .

This material is prepared and editted by Dr. Khaled Mohamad


Dependent and Independent Events

Dependent events are those events that are affected by the outcomes of another events. i.e. Two or more events that depend on one another are
known as dependent events. If one event is changed, then another is going to change.

Examples:
- Getting into a traffic accident is dependent upon driving or riding in a vehicle

- If you park your car illegally, you are more likely to get a parking ticket

- You must buy a lottery ticket to have a chance at winning, the likelihood of winning are increased if you buy more than one ticket.

Independent Events are those events whose occurrence is not dependent on any other event. If the probability of occurrence of an event A is not
affected by the occurrence of another event B, then A and B are said to be independent events.

Example, the color of your hair has absolutely no effect on where you work. The two events of “having black hair” and “Working in a google” are
completely independent of one another.
- Taking an uber and getting a free meal at your favorite restaurant.

- Getting 6 on a dice, and getting promotion at work.


This material is prepared and editted by Dr. Khaled Mohamad
Product Rule for Dependent and Independent Events

The multiplicative rule of probability or product rule is used to find the probability of an intersection of events or two events happening at the
same time.
There are two forms of this rule
- Specific Multiplication Rule
- General Multiplication Rule

For Specific Multiplication Rule


- It is used to calculate the joint probability of independent events.
P A ∩ B = P(A) × P(B)
Remember that in the independent events, the occurrence of event A does not affect the likelihood of event B.

Example 1:
Calculate the probability of obtaining “head” during two consecutive coin flips
Answer: In the first flip, P(H)= 0.5 and P(H) = 0.5 in the second flip. The probability of getting head in the first flip and getting head in the second
flip are P(Head ∩ Head)= 0.5x0.5 = 0.25

This material is prepared and editted by Dr. Khaled Mohamad


Example 2:
You have 10 pants, out of them 3 are blue and you have got 16 shirts, out of them 4 are red. Selecting of pants doesn’t affect the likelihood of
drawing the red shirt.

Answer: the probability of getting blue pants is P(Blue_pant) = 3/10 = 0.3 and the probability of getting red shirt P(red_shirt) = 4/16 = 0.25. The
probability of getting blue pants and red shirt is P(red_shirt ∩ Blue_pants)= 0.25x0.3 = 0.075
In these situations, we apply the specific multiplication rule

General Multiplication Rule


This rule is use to calculate the joint probability for either independent or dependent events.
P A ∩ B = P(A) × P(B|A)
The joint probability of A and B occurring equals the probability of A occurring multiplied by the conditional probability of B occurring given that
A occurred.
Remember that
𝑷(𝑨 ∩ 𝑩)
𝑷 𝑩|𝑨 =
𝑷(𝑨)
In the case of independent events
𝑷 𝑩|𝑨 = 𝑷(𝑩)
Because the occurrence of B has no effect on the occurrence of A.
This material is prepared and editted by Dr. Khaled Mohamad
Example 1: Nur has to select two students from a class of 23 girls and 25 boys. What is the probability that both students chosen
are boys?

Answer:
Total number of students = 23 + 25 = 48

The probability of choosing the first boy, say B1 = 25/48

The probability of choosing the second boy, say B2 = 24/47

Then,
P(B1 and B2) = P(B1) and P(B2|B1)

= (25/48) × (24/47)=0.265

This material is prepared and editted by Dr. Khaled Mohamad


Example 2: A bag contains 6 red, 5 blue, and 4 yellow balls. 2 balls are drawn, but the first ball is drawn without replacement.
Find the following:
a) P (red, then blue)

b) P (blue, then blue)

Answer a):
There are six red balls and a total of fifteen balls.
P (red) = 6 / 15

The probability of the second draw affected the first.

Number of blue balls = 5

Total number of balls left = 14


P (drawing blue after red) = 5 /14

P(drawing red, then blue) = P (drawing red) * P (blue after red) = (6/15)*(5/14)= 0.143
This material is prepared and editted by Dr. Khaled Mohamad
Answer b):

Number of blue balls = 5

Total number of balls left = 15

The probability of drawing a blue ball = 5/15

The probability of the second draw affected the first.

Now, there are 4 blue balls left and a total of 14 balls left.

P (drawing a blue ball after a blue ball) = 4/14

P (blue, then blue) = P (drawing blue ball) * (drawing a blue ball after a blue ball)= (5/15) x (4/14)

Hence, the probability of drawing


This material a red
is prepared and ball
editted followed
by Dr. by a blue ball is 2/21.
Khaled Mohamad
Example 3: A wallet contains 4 bills of 5 dollars, 5 bills of 10 dollars and 3 bills of 20 dollars. 2 bills are chosen randomly without replacement.
Find P (drawing a 5-dollar bill followed by a 5-dollar bill).

Answer:
P (drawing a 5-dollar bill) ?
Number of 5-dollar bills = 4
Total number of bills = 12
P (drawing a 5-dollar bill) = 4 / 12

P (drawing a 5-dollar bill after a 5-dollar bill)?


The probability of the second draw affecting the first.
Number of 5-dollar bills left = 3
A total of 11 bills are left.
P (drawing a 5-dollar bill after a 5-dollar bill) = 3/11

P (drawing a 5-dollar bill followed by a 5-dollar bill) = P (drawing a 5-dollar bill) * P (drawing a 5-dollar bill after a 5-dollar bill) = 4/12 x 3/11 =
0.91

Hence, P (drawing a 5-dollar


This material is preparedbill followed
and editted by Dr.by a 5-dollar
Khaled bill) = 0.91.
Mohamad
Example 4: A bag contains 4 red, 3 pink and 6 green balls. Two balls are drawn, but the first ball drawn is not replaced.
a) Find P(red, then pink)

b) Find P(pink, then pink)


Answer a):
Number of red balls = 4
Total number of balls = 13
P(red) = 4/13

The result of the first draw will affect the probability of the second draw.

There are 3 pink balls.


Number of balls left = 12
P(pink after red) = 3/12 = ¼

P(red, then pink) = P(red) x P(pink after red) = (4/13) × (1/4) = 1/13

Hence, the probability of drawing a red ball and then a pink ball is 1/13.
This material is prepared and editted by Dr. Khaled Mohamad
Answer b):

Number of pink balls = 3


Total number of balls = 13

P(pink) = 3/13

The result of the first draw will affect the probability of the second draw.

Number of pink balls left = 2


Total number of balls left = 12
There are a total of 12 balls left.

P(pink after pink) = 2/12 = 1/6

P(pink, then pink) = P(pink) · P(pink after pink)= (3/13) × (1/6) = 3/78 = 1/26

Hence, the probability of drawing a pink ball and then a pink ball is 1/26.
This material is prepared and editted by Dr. Khaled Mohamad
The Law of Total Probability

- The Law of Total Probability is a fundamental concept in probability theory that allows us to calculate the probability of an event by
considering all possible ways or scenarios that can lead to that event.
- It is often used when the event of interest can occur under different conditions or circumstances.
- Formally, let's consider an event B and a set of mutually exclusive events (disjoint events, it means that only one of them can occur at a time)
and exhaustive events (it means that together they cover all possible outcomes or scenarios) {A₁, A₂, ..., An}.

The Law of Total Probability states that the probability of event B can be calculated as the sum of the probabilities of B occurring given each
possible condition Ai, weighted by the probability of each condition Ai occurring. Mathematically, it can be expressed as follows:

P(B) = P(A₁) × P(B|A₁) + P(A₂) × P(B|A₂) + ... + P(An) × P(B|An)

Example:
A company manufactures two types of smartphones: Type A and Type B. The company produces 60% Type A smartphones and 40% Type B
smartphones. The defect rate for Type A smartphones is 5%, while the defect rate for Type B smartphones is 3%. Suppose that a customer buys a
smartphone from this company, and we want to calculate the probability that the purchased smartphone is defective.

This material is prepared and editted by Dr. Khaled Mohamad


Answer:
A: The smartphone is of Type A.
B: The smartphone is defective.
We are given:
P(A) = 0.6 (probability of Type A smartphone)
P(B|A) = 0.05 (probability of defect given Type A smartphone)
P(not A) = 0.4 (probability of Type B smartphone)
P(B|not A) = 0.03 (probability of defect given Type B smartphone)

We want to calculate P(B), the probability that the purchased smartphone is defective.

According to the Law of Total Probability:


P(B) = P(A) × P(B|A) + P(not A) × P(B|not A)

Substituting the given values:


P(B) = (0.6 x 0.05) + (0.4 x 0.03) = 0.03 + 0.012 = 0.042

Therefore, the probability that the purchased smartphone is defective is 0.042 or 4.2%.
This material is prepared and editted by Dr. Khaled Mohamad
Bayes Theorem

It is mathematical formula for determining conditional probability. It shows the relation between a conditional probability and its reverse form.

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
To prove it
𝑃(𝐴∩𝐵) 𝑃(𝐵∩𝐴)
𝑃 𝐴|𝐵 = and 𝑃 𝐵|𝐴 =
𝑃(𝐵) 𝑃(𝐴)

Since 𝑃 𝐴 ∩ 𝐵 and 𝑃(𝐵 ∩ 𝐴) are the same, then


𝑃 𝐴|𝐵 𝑃 𝐵 = 𝑃 𝐵|𝐴 𝑃(𝐴)
𝑃(𝐵|𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)

Examples:
- Given that test was positive, what is the probability of having a disease.
- Given that a person likes action, what is the probability that he would like to watch Kung Fu Panda.

This material is prepared and editted by Dr. Khaled Mohamad


Example:
In a certain game, the probability that a player is laying = 0.75 and the probability that a player is not laying = 0.25. The probability that a player
wins when he lied = 0.43, the probability that a player wins when he didn’t lie = 0.57. Find the probability that a player was lying given that he
won the game?

Answer:
A: Player is lying
B: Player wins the game

We are given:
P(A) = 0.75 (probability that a player is lying)
P(not A) = 0.25 (probability that a player is not lying)
P(B|A) = 0.43 (probability of winning when lying)
P(B|not A) = 0.57 (probability of winning when not lying)

We need to find P(A|B), the probability that a player was lying given that they won the game.
According to Bayes' theorem:
𝑃(𝐵|𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)
This material is prepared and editted by Dr. Khaled Mohamad
To calculate P(B), we can use the law of total probability:

P(B) = P(A) x P(B|A) + P(not A) x P(B|not A)

Substituting the given values:


P(B) = 0.75 * 0.43 + 0.25 * 0.57
= 0.3225 + 0.1425
= 0.465

Now we can calculate P(A|B):

P(A|B) = (P(A) x P(B|A)) / P(B)


= (0.75 * 0.43) / 0.465
≈ 0.691

Therefore, the probability that a player was lying given that they won the game is approximately 0.691, or 69.1%.

This material is prepared and editted by Dr. Khaled Mohamad


Application of Bayes Theorem in Data Science

1. Testing Hypothesis
Hypothesis approximates a target function which needs to be tested on data. For example, mean weight of newborn baby is 3.5 kg.
Bayes theorem allows to test whether the hypothesis hold true for a given data as, P(h|D) which means, probability of “hypothesis” being true for a
given “data”.

2. Classification
When the possible values are categorical. Bayes theorem can be applied for classification problems.
For example: whether a customer defaults on credit card payment or not based on the account balance.
- Naïve Bayes classifier is one implementation of Bayes theorem.

3. Model Optimization
Optimizing machine learning models involves finding an input that minimizing or maximizes an objective function. For example, if my objective
function is related to accuracy, the objective is to maximizes. If it is related to a cost may be the objective is to minimize.
- Bayes’ theorem applied probability to find out these values,
- Bayesian optimization is a technique used to improve the performance of a machine learning model.

This material is prepared and editted by Dr. Khaled Mohamad


Random Variable

In the previous topics in this chapter, our explanation was based on coins, dice and other random experiment with a few outcomes
In reality,
Our main treatment
- In web, …. “subscriber, clicks, viewers ….
- Sales, ….”Yield, weight, sales….
- Traffic, ….”time, congestion, delay…..
- Medicine, ….”age, temperature, heart rate…
- Student,…… “GPA, Tuition, assignment…

What are the common things in the above mentioned?


“Numbers”

Random variable is a real -valued function, defined over the sample space of a random experiment.
- It is a rule that assigns a numerical value to each outcome in a sample space.
- Generally, the random variable denotes with capital letters such as X and Y and defines the possible outcome values of an unexpected
phenomenon

This material is prepared and editted by Dr. Khaled Mohamad


Why Random Variable ?

With numerical values (numbers), the distribution of P(X) can be


- View on a line
1
- Express as a function 𝑃 𝑋 =
2𝑋

- East to consider the properties such as increasing or decreasing

Random variable X helps to

- Perform operation such as addition (X+1) and squaring (X2)


- Combine variable X+Y
- Consider properties like the average value of X

Mainly, we have two types of Random vatable.

Types of Random Variable

- Discrete random variable

- Continues Random variable.


This material is prepared and editted by Dr. Khaled Mohamad
In Discrete random variable,

▪ It is a variable that takes only a finite number of distinct values such as 0, 1, 2 and so on. Examples: {1,2,3}, {e, pi} or countably infinite N
(natural number) or Z (integer number)

Example: If we are interested in the number of heads in 3 fair coins

 = {ttt, tth, tht, thh, htt, hth, hht, hh}……|  |= 8


Each event gets same probability …P = 1/8
Let us denote X is the number of heads

X Outcomes P(X)
0 ttt 1/8
1 tth, tht, htt 3/8

2 Thh, hth, hht 3/8

3 Hhh 1/8

This material is prepared and editted by Dr. Khaled Mohamad


Example 2: In four sided dominoes, we got the following table

X 1 2 3 4
P(X) 0.1 0.2 0.3 0.4

It is explicit that P(1) = 0.1, P(2) = 0.2, P(3) = 0.3, P(4) = 0.4
𝑋
- Without number, you can make it as function 𝑃 𝑋 = , 𝑋 ∈ 1, 2, 3, 4
10

- It can be presented by Graphs

Stem plot Line plot Histogram

This material is prepared and editted by Dr. Khaled Mohamad


In Continues random variable
It is a variable that can take on any value of a specified domain (i.e., any value in an interval), e.g., uncountably infinite [0, 2], (-1, 4) U [4, 5), or R

For example, the height of students in a class, the amount of tea in a glass, the change in temperature throughout a day, and the number of hours a
person works in a week all contain a range of values in an interval, thus continuous random variables.

- The main difference between continuous and discrete random variables is that continuous probability is measured over intervals, while
discrete probability is calculated on exact points.

- For example, it would make no sense to find the probability it took exactly 32 minutes to finish an exam. It might take you 32.012342472…
minutes. Probability of points no longer makes sense when we move from discrete to continuous random variables.

- Instead, you could find the probability of taking at least 32 minutes for the exam,
or the probability of taking between 31 and 33 minutes to complete the exam. Probability
functions distributions will help us in this regard. These functions help to determine the
probability by finding the area under the function where the total area under the curve
must equal 1.

This material is prepared and editted by Dr. Khaled Mohamad


These are different ways of specifying the distribution of a random variable

Probability Mass Function (PMF)

- It is a function that describes the relative likelihood for a random variable to take on a given value. It is a mapping. P maps the sample space to
real numbers P : Ω →R

- It is a way to describe the continuous probability distribution of a random variable.

- It is non-negative and its integral over the entire sample space is equal to 1 or they sum to one σ𝑋∈ Ω 𝑃 𝑋 = 1

- It is not negative 𝑃 𝑋 ≥ 0 𝑤ℎ𝑒𝑟𝑒 𝑋 ∈ Ω

- An example of PMF is Binomial Distribution

This material is prepared and editted by Dr. Khaled Mohamad


Probability Density Function (PDF)

The probability density function produces the likelihood of values of the continuous random variable.
- It is defined as an integral of the density function of the random variable over a given range. It is denoted by f (x).
o This function is positive or non-negative at any point of the graph, 𝑓 𝑥 ≥ 0

o and the integral, more specifically the definite integral of PDF over the entire space is always equal to one. ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1
- If we find P(X = x), it does not work. Instead of this, we must calculate the probability of X lying in an interval (a, b). The Probability density
function formula is given as,
𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓 𝑥 𝑑𝑥
𝑎

- An example of PDF is a normal distribution

In short, PDF (probability density function) is defined for continuous random


variables, whereas PMF (probability mass function) is defined for
discrete random variables.
This material is prepared and editted by Dr. Khaled Mohamad
Cumulative Distribution Function (CDF)
When we talk about random variable, sometimes we are interested in specific value like the probability that we will get exactly a A+ in the class or
the temperature is going to be exactly 27 degrees Celsius. More often we are more interested not in a particular value but rather in a range of
values or in an interval of values.
- Temperature between 25 to 30 Celsius
- Salary > 80 k $
- GPA > 3.0

There is a simple function will help to determine interval probabilities called cumulative distribution function (CDF)

𝐹 𝑥 = 𝑃 𝑋 ≤ 𝑥 = σ𝑢≤𝑥 𝑃 𝑢

- It is a function that describes the probability that a random variable will take on a value less than or equal to a certain value.
- It is a way to describe the discrete or continuous probability distribution of a random variable.
- An example of CDF is Exponential distribution

In other words, a PDF gives the probability of a random variable being in a certain range, while a CDF gives the probability of a random variable
being less than or equal to a certain value.
This material is prepared and editted by Dr. Khaled Mohamad
Properties of CDF
- Non decreasing
It means that the values of the CDF increase or stay the same as the input variable increases. In other words, as the input variable moves along the
range of possible values, the CDF either remains constant or increases.
Mathematically, for any two values x₁ and x₂ in the domain of the random variable, if x₁ ≤ x₂, then the corresponding CDF values F(x₁) and F(x₂)
satisfy F(x₁) ≤ F(x₂).

- Limits: lim 𝐹 𝑥 = 1 & lim 𝐹 𝑥 = 0


𝑥→∞ 𝑥→−∞

- Right continuous but not necessarily left continuous lim 𝐹 𝑥 = 𝐹(𝑎)


𝑥→𝑎

Question: Why is CDF useful?

Because it allows us to calculate the probability of intervals easily

We mentioned that CDF can be used for both discrete and continues random variables.

How does CDF work with Discrete random variables?


This material is prepared and editted by Dr. Khaled Mohamad
Let us elaborate on this question with an example.
Example: A professor wants to analyze the performance of their students in a class. The final exam scores are discrete and range from 0 to 100.
The professor wants to determine the probability that a student scored less than or equal to a specific grade. The final exam scores follow a discrete
distribution. The professor has the following data on the number of students who achieved each score:

Solution:
To calculate the CDF, we need to sum up the probabilities of achieving a score less than or equal to a specific value.
Number of
CDF(x) = P(X ≤ x) Score
Students
For example, to find the probability that a student scored less than or equal to 80, we calculate the CDF as follows: 60 5
70 10
CDF(80) = P(X ≤ 80) = P(X = 60) + P(X = 70) + P(X = 80) 80 15
90 7
To calculate each probability, we divide the number of students who achieved that score by the total number of students: 100 3
P(X = 60) = 5 / (5 + 10 + 15 + 7 + 3)
P(X = 70) = 10 / (5 + 10 + 15 + 7 + 3)
P(X = 80) = 15 / (5 + 10 + 15 + 7 + 3)
Then, we sum up these probabilities:
CDF(80) = (5 / 40) + (10 / 40) + (15 / 40) = 0.375
Therefore, the CDF at 80 is 0.375, indicating that there is a 37.5% probability that a student scored 80 or less on the final exam.
By calculating the CDF for different values, the professor can determine the probabilities associated with specific score ranges and make informed
decisions about grading or assessing the performance of their students
This material is prepared and editted by Dr. Khaled Mohamad
Probability Distribution Function for Discrete Random variables

Binomial Distribution
- It is a discrete probability distribution of a random variable that has two outcomes, either Success or Failure.
- The objective is to find the probability of getting x successes out of n-trials

For example, if we toss a coin, there could be only two possible outcomes: heads or tails, and if any test is taken, then there could be only two
results: pass or fail.

- Probability of success is p, hence probability of failure is (1-p)


- It makes use of PMF and CDF

Binomial Distribution formula


𝐵𝑖 𝑋 = 𝑥 = 𝑛𝐶𝑥 𝑃 𝑥 1 − 𝑃 𝑛−𝑥

where, n = the number of experiments, p = Probability of Success in a single experiment


𝑛!
and 𝑛𝐶𝑥 =
𝑥! 𝑛−𝑥 !

Example 1: If a coin is tossed 5 times, find the probability of:

(a) Exactly 2 heads

(b) At least 4 heads. This material is prepared and editted by Dr. Khaled Mohamad
Solution:
The repeated tossing of the coin is an example of a Bernoulli trial. According to the problem:
Number of trials: n=5
Probability of head: p= 1/2 and hence the probability of tail, (1-p) =1/2
(a) For exactly two heads:
x=2 (number of getting head/success)
5!
P(x=2) = 𝑛𝐶𝑥 𝑃 𝑥 1 − 𝑃 𝑛−𝑥
= × 0.52 × 0.5 3
2! 3 !

P(x=2) = 5/16
(b) For at least four heads,
P(x ≥ 4) = P(x = 4) + P(x=5)
Hence,
5!
P(x = 4) = × 0.54 × 0.5 1 = 5/32
4! 1 !

P(x = 5) == 0.55 = 1/32


Therefore,
P(x ≥ 4) = 5/32 + 1/32 = 6/32 = 3/16

How the distribution looks like, we will take an example in Python…..


This material is prepared and editted by Dr. Khaled Mohamad
Geometric Distribution
- It occurs when you count the number of independent Bernoulli trials until the first success
- X = no. of trails Y until first success
- In this case, X will be distributed with Geometric Distribution
In other words,
It is the probability that first success occurs after k number of trials. If p is the probability of success or failure of each trial,
then the probability that success occurs on the kth trial is given by the formula
𝑘−1
𝑃𝑀𝐹 𝑘 = 1 − 𝑝 𝑝
Example: a person is throwing dice and will stop once he gets 5. Calculate the probability that the person gets number 5 for the first time and
second time.
Answer:
Since there are 6 possible outcomes, the probability of success p =1/6= 0.17. Therefore, the probability of failure 1 – p = 1 – 0.17 = 0.83
The person gets number 5 for the first-time k = 1. The number of failures before the first success is zero. Therefore x = 0,
Substituting the values of x, k, p and 1-p in distribution, we have
𝑘−1 0
𝑃𝑀𝐹 𝑘 = 1 − 𝑝 𝑝 = 0.83 × 0.17 = 0.17
The person gets number 5 for the second time k = 2. The number of failures before the first success is 1. Therefore x = 1.
Substituting the values of x, k, p and 1-p in distribution, we have
= 0.83 1 × 0.17 = 0.14
How the distribution looks like, we will take an example in Python…..
This material is prepared and editted by Dr. Khaled Mohamad
Probability Distribution Function for Continuous Random variables

Uniform Distribution
- It is a probability distribution that has constant value in each interval
- It is also known as Rectangular Distribution
- Used to generate random values
An example of uniform distribution
Random Number Generation: When generating random numbers within a specified range, a uniform distribution is often used. For example, if you
need to generate random integers between 1 and 100, a uniform distribution ensures that each number in that range has an equal probability of
being selected.

Exponential Distribution
- It provides a way to find the probabilities in time for a process
- Traditionally, it is used for modelling time to failure of electronic components
- This represents a process in which events occur continuously and independently at a constant average rate
Exponential Distribution Formula
The continuous random variable x have an exponential distribution, if it has the following probability density function:
𝑃𝐷𝐹 𝑥 = 𝜆𝑒 −𝜆𝑥
Where λ (lambda) is the rate parameter, which determines the average number of events per unit of time. x is the time variable
This material is prepared and editted by Dr. Khaled Mohamad
Example: Assume that, you usually get 2 phone calls per hour. calculate the probability, that a phone call will come within the next hour.
Solution:
It is given that, 2 phone calls per hour. So, it would be expected that one phone call every half-an-hour. So, we can take
λ = 0.5
So, the computation is as follows:
1 1
−𝜆𝑥
𝑃𝐷𝐹 0 ≤ 𝑥 ≤ 1 = න 𝜆𝑒 = න 0.5𝑒 −0.5𝑥 = 0.393469
0 0

Therefore, the probability of receiving the phone calls within the next hour is 0.393469

Normal Distribution
- One of the most important probability Distribution in the field of statistics
- It is also called the Gaussian distribution or Bell curve.
- Fits several natural phenomena such as Measurement error, heights, IQ score, and Blood pressure and so on
Normal Distribution Formula
The probability density function of normal or gaussian distribution is given by
1 − 𝑥−𝜇 2
𝑓 𝑥, 𝜇, 𝜎 = 𝑒 2𝜎2
𝜎 2𝜋
Where, x is the variable, μ is the mean, and σ is the standard deviation

This material is prepared and editted by Dr. Khaled Mohamad


Example: In an exam, the marks are normally distributed with a mean of 65 and standard deviation of 9. Find the probability of the students marks
that are
a) Less than 54
b) At least 80
c) Between 70 and 87

Answer:
a)
Using the PDF of the normal distribution, the probability density at a specific point x is given by the formula:

1 − 𝑥−𝜇 2
𝑓 𝑥, 𝜇, 𝜎 = 𝑒 2𝜎2
𝜎 2𝜋
For this problem, x = 54, μ = 65, and σ = 9. Plugging these values into the formula, we get:
1 −121
𝑓 54 = 𝑒 162 = 0.021
9 2𝜋
Therefore, the probability that student get mark less than 56 is equal to the area under the curve to the left of 56, which is the cumulative
probability. We can obtain this by integrating the PDF from negative infinity to 56:
54
𝑃 ≤ 54 = න 𝑓 𝑥 𝑑𝑥
−∞

The answer should be 0.1112


This material is prepared and editted by Dr. Khaled Mohamad
An easy way to solve this problem is to use Z-score table

- zscore for the value 54 is


𝑥−𝜇 54 − 65
𝑍= = = −1.2222
𝜎 9
From Z-score table, we will look to the probability of z = -1.22

Probability
(shaded area)

𝑃 ≤ 54 = 𝑃 𝑍 = −1.22 = 0.1112

b) At least 80
- zscore for the value 80 is

𝑥−𝜇 80 − 65
𝑍= = = 1.666
𝜎 9
This material is prepared and editted by Dr. Khaled Mohamad
𝑃 ≥ 80 = 1 − 𝑃 ≤ 80 = 1 − 𝑃 𝑍 = 1.67 = 1 − 0.9525 = 0.0475

a) Between 70 and 86

- zscore for the value 70 is

𝑥−𝜇 70−65
𝑍= = = 0.555
𝜎 9

- zscore for the value 86 is

𝑥−𝜇 86−65
𝑍= = = 2.333
𝜎 9

𝑃 70 ≥ 𝑥 ≥ 86 = 𝑃 ≤ 86 − 𝑃 ≤ 70

𝑃 𝑍 = 2.33 − 𝑃 𝑍 = 0.55 = 0.9901-0.7088= 0.281

This material is prepared and editted by Dr. Khaled Mohamad


Next,
We are going to generate random numbers and visualize various Data Distribution using Python …….

This material is prepared and editted by Dr. Khaled Mohamad


This material is prepared and editted by Dr. Khaled Mohamad
This material is prepared and editted by Dr. Khaled Mohamad
This material is prepared and editted by Dr. Khaled Mohamad
This material is prepared and editted by Dr. Khaled Mohamad

You might also like