0% found this document useful (0 votes)
8 views43 pages

5 Probability and Probability Distribution

The document provides an overview of probability distributions, including discrete and continuous types, and their applications in statistics. It covers key concepts such as random variables, probability mass functions, binomial distributions, and normal distributions, along with examples and calculations. Additionally, it discusses the Central Limit Theorem and its significance in statistical analysis.

Uploaded by

jiexfg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views43 pages

5 Probability and Probability Distribution

The document provides an overview of probability distributions, including discrete and continuous types, and their applications in statistics. It covers key concepts such as random variables, probability mass functions, binomial distributions, and normal distributions, along with examples and calculations. Additionally, it discusses the Central Limit Theorem and its significance in statistical analysis.

Uploaded by

jiexfg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

PROBABILITY DISTRIBUTIONS

IT MS02: Quantitative Methods


Learning Objectives
At the end of the unit, the students should be able to:
• Identify probability distributions of different variables
• Compute for probabilities of variables with different
probability distributions
• Probability Distributions
• Discrete Probability Distribution
• Continuous Probability Distribution
Topics • Normal Distribution
Probability Distributions
• A random variable is a variable that is subject to variations
due to random chance. It represents a possible numeric
value from an uncertain event
• Example: number of dots you get when rolling a die, number of
heads when flipping a coin multiple times, measuring a specific
distance, picking a number from a given interval.
• A probability distribution is a function that describes how
likely you will obtain the different possible values of the
random variable.
• It shows the possible values a variable can take and how frequently
they can occur
• Family of distribution, differentiated by the paramaters
Notation
• X (Capital Letter X) = actual outcome of an event
• x (small letter x) = one of the possible outcome

• P(X=x) or p(x) = probability that x is the actual outcome

• Ex: X is the number of dots we get when we toss a die


• If we want to get the probability of getting a 3 (x=3) then we
get P(X=3)
Probability Distributions
The set of possible values could
be finite or countably infinite, Discrete Uniform
Distribution
Discrete Probability
Distribution
Binomial Distribution
Probability Distribution

Continuous Probability
Normal Distribution
Distribution

Can take on any value from a


continuum, such as the set of all
real numbers or an interval.
Discrete Probability Distribution
• A probability distribution for a discrete random variable is a
mutually exclusive list of all possible numerical outcomes of
the random variable with the probability of occurrence
associated with each outcome
• The probability distribution function for discrete random
variables is also called Probability Mass Function.
• Usually denoted as P(x), must satisfy that
Discrete Probability Distribution
• Example: If we roll two six-sided dice, and let X be the sum,
then X could take on any value in the set
{2,3,4,5,6,7,8,9,10,11,12}.

• The probability distribution


function for this X is
Common Types of Discrete
Distributions
• The correct discrete distribution depends on the properties of
your data.
• Some of the common Discrete Distributions are
• Uniform distribution to model multiple events with the
same probability, such as rolling a die.
• Binomial distribution to model binary data, such as coin
tosses.
Discrete Uniform Distribution
• A probability distribution where a finite number (n) of values
are equally likely to be observed
• Every one of the n values has a probability of 1/n
• Denoted as X~U(a,b) where a is the minimum value and b is
the maximum value
1
𝑃 𝑥 = ቐ𝑛 , 𝑥 = 1, 2, 3, … , 𝑛
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Discrete Uniform Distribution
Example: Rolling a six-sided die has the following Probability
Distribution function where 6 represents the number of dots in
the result

If we need to get the probability of getting a six, then

P(x=6) = 1/6
Example:
• 10 ping pong balls are numbered 1-10 and placed in a
bag. One ping pong ball is removed from the bag
randomly. Find the probability that the number on the
drawn ping pong ball is between 7 and 10.

Since n is 10, p(x) = 1/10, if x=1,…, 10

P( 7 < X < 10) = P(X=8) + P(X=9) = 1/10 =1/10 = 2/10 =0.20


Binomial Distribution
• Can be thought of as simply the probability of a SUCCESS or
FAILURE outcome in an experiment or survey that is repeated
multiple times
• If X follows a Binomial Distribution, it can be denoted as X~Bi(p,n)

Possible binominal scenarios:


• A manufacturing plant labels items as either defective or
acceptable
• A firm bidding for contracts will either get a contract or not
• A marketing research firm receives survey responses of ‘yes, I will
buy’ or ‘no, I will not’
• New job applicants either accept the offer or reject it
Binomial Distribution
Requirements:
1. Fixed number of observations or trials
• e.g. 15 tosses of a coin; 10 light bulbs taken from a warehouse
2. There are only two mutually exclusive and collectively exhaustive
outcomes, generally called ‘success’ and ‘failure’
• e.g. head or tail in each toss of a coin; defective or not defective
light bulb
3. Constant probability for each observation
4. Observations are independent - the outcome of one observation
does not affect the outcome of the other
Binomial Distribution
The Binomial Probability Distribution Function is given as

where:
P(X) = probability of X successes in n trials, with the probability of
success p on each trial
X = number of ‘successes’ in sample, (X = 0, 1, 2, ..., n)
n= sample size (number of trials or observations)
p = probability of ‘success’
1-p = probability of failure
Binomial Distribution: Example
• Example: A customer has a 35% probability of making a purchase.
Ten customers enter the shop. What is the probability of three
customers making a purchase?

Let X = number of customers with purchase


From the question, n=10, p=0.35, (1-p)= 1-0.35=0.65, X= 3
Binomial Distribution : Example
Alternate Solution: Use the Distribution Table
Let X = number of customers with purchase
From the question,
n=10
p=0.35
(1-p)= 1-0.35=0.65
X= 3
Continuous Probability Distribution
• A probability distribution in which the random variable X can take
on any value (is continuous).
• Because there are infinite values that X could assume, the probability of
X taking on any one specific value is zero.
• Therefore we often speak in ranges of values (p(X>0) = .50).
• Represented by a formula such that the likelihood of a value of X
between a and b equals the integral (area under the curve)
between a and b. This probability is always positive.
Common Continuous Distributions
Normal Distribution
• A Bell-shaped and symmetrical distribution
• Mean, median and mode are equal
• Central location is determined by the mean, μ
• Spread is determined by the standard deviation, σ
• The random variable X has an infinite theoretical range: +∞ to -∞
• Most used in Statistics as it can be used to approximate other
discrete distributions
• If X follows a normal distribution it can be denoted as X ~N (μ, σ)
Normal Probability
Computing for the probability: Get the area under the curve
Normal Probability
The total area under the curve is 1.0, and the curve is symmetric,
so half is above the mean, half is below
Normal Distribution

• If a continuous variable X follows a Normal Distribution with mean, μ,


and standard deviation, σ, we can compute the probability of X ≤ x (or
other range of X) by getting the integral of this function.
Example
• Example: Determine the probability that a randomly selected
blue crab has a weight greater than 1 kg. Based on previous
research we assume that the distribution of weights (kg) of
adult blue crabs is normally distributed with a population
mean (μ) of 0.8 kg and a standard deviation (σ) of 0.3 kg.
Example - Solution
• Solution: We need to integrate the probability density
function (PDF) of the normal distribution from 1 kg to positive
infinity. Substituting the values into the PDF

• Alternate Solution: Transform into the Standard Normal


Distribution
Standard Normal Distribution
• The standard normal distribution, also called the z-
distribution, is a special normal distribution where the mean
is 0 and the standard deviation is 1.
• Any normal distribution can be standardized by converting its
values into z scores. Z scores tell you how many standard
deviations from the mean each value lies.
Standard Normal Distribution
• Note: the distribution is the same, only the scale has
changed. We can express the problem in original units (X) or
in standardized units (Z)
Standard Normal Distribution
• To get the area
under the curve,
we can now use
the Z-
Distribution
Table
Standard Normal Distribution
• To get the area
under the curve,
we can now use
the Z-
Distribution
Table
Example – Alternative Solution
• Example: Determine the probability that a randomly selected
blue crab has a weight greater than 1 kg. Based on previous
research we assume that the distribution of weights (kg) of
adult blue crabs is normally distributed with a population
mean (μ) of 0.8 kg and a standard deviation (σ) of 0.3 kg.

= 0.67
Example – Alternative
Solution
• Identify the P(Z< 2/3) = P(Z<0.67) in the Z Distribution Table

P(Z< 0.67) = 0.7486


Example – Alternative Solution
• Example: Determine the probability that a randomly selected
blue crab has a weight greater than 1 kg. Based on previous
research we assume that the distribution of weights (kg) of
adult blue crabs is normally distributed with a population
mean (μ) of 0.8 kg and a standard deviation (σ) of 0.3 kg.
2
P( Z> ) = 1- P(Z<0.67)
3
= 1 – 0.7468
= 0.2514 = 0.25
Example:
• Suppose, video download time follows a Normal Distribution with mean
of 7 and standard deviation of 2. What is the probability that video
download time for the OurCampus! website will be between5 and 9
seconds—that is, P(5 <X<9)?

SOLUTION:
1. Get the Z scores: If x=5, z = (5-7)/2 = -1.00 and if x =9, z =(9-7)/2 =1
2. P ( 5 < X < 9) = P( -1.00 < Z < 1.00)
= P( Z<1.00) – P( Z < -1.00)
= 0.8413 – 0.1587
= 0.6826
Example
• Time spent on the home page of visitors of Website A is
normally distributed with a mean of μ = 26.5 seconds and a
standard deviation of σ = 2.5 seconds. Approximately what
percentage of visitors stay more than 26 seconds?
𝑋− 𝜇 26−26.5
𝑍= = = -0.2
𝜎 2.5
P (X> 26) = 1 – P(Z < -0.2) = 1 – 0.4207 = 0.5793

Thus, approximately 59.87% of the visitors of Website A stayed on


the home page for more than 26 seconds.
Getting X from Probability
• If X follows a normal distribution, we can also get the specific
value of X given the probability of occurrence by:

1.Find the z value associated with the normal probability.


2.Use the transformation X = 𝑍𝜎 + 𝜇 to find the value of x.
Example:
• Based on the Z table compute for the value of X for the
following:

1. P(X<x) =0.95
2. P(X<x) =0.90
3. P(X >x) = 0.35
4. P( x1 <X < x2) = 0.6
Example:
• There are approximately one billion smartphone users in the
world today. In the United States, the ages of smartphone
users from 13 to 55+ follow a normal distribution with
approximate mean and standard deviation of 36.9 years and
13.9 years, respectively. 80% of the users in the age range 13
to 55+ are less than what age?
Example:
• Solution:
• X ~ N(36.9, 13.9)

P(Z < z) =0.80 → From the Z table, the z value that will give a
probability of ≈ 0.80 is 0.84.

𝑋− 𝜇
𝑍= → X = 𝑍𝜎 + 𝜇 = 0.84(13.9) + 36.9 =48.58
𝜎

80% of the smartphone users in the age range 13 – 55+ are


48.5 years old or less.
Example:
• There are approximately one billion smartphone users in the
world today. In the United States, the ages of smartphone
users from 13 to 55+ follow a normal distribution with
approximate mean and standard deviation of 36.9 years and
13.9 years, respectively. 45% of the users in the age range 13
to 55+ are less than what age?
Example:
• Solution:
• X ~ N(36.9, 13.9)

P(Z < z) =0.45 → From the Z table, the z value that will give a
probability of ≈ 0.45 is -0.12.

𝑋− 𝜇
𝑍= → X = 𝑍𝜎 + 𝜇 = -0.12(13.9) + 36.9 =35.23
𝜎

45% of the smartphone users in the age range 13 – 55+ are


35.23 years old or less.
Z score for Outlier Detection
• 68% of the data is within 1 standard
deviation (σ) of the mean (μ).
• 95% of the data is within 2 standard
deviations (σ) of the mean (μ).
• 99.7% of the data is within 3 standard
deviations (σ) of the mean (μ).

Given an observation, if the absolute value


of the z-score is greater than 3 x standard
deviation, then the observation is an outlier
Central Limit Theorem
• If the Population is NOT Normal, we can
apply the Central Limit Theorem (CLT)
• The CLT states that, as the sample size (i.e.
the number of values in each sample) gets
large enough, (generally n ≥ 30), the
sampling distribution of the mean is
approximately normally distributed. This is
true regardless of the shape of the
distribution of the individual values in the
population
• Given CLT, we can then estimate the
proportions (for categorical data) and mean
(for quantitative data) of any population
QUESTIONS?

You might also like