0% found this document useful (0 votes)
16 views

Module II Probability Distribution

The document explains the concept of random variables, distinguishing between discrete and continuous types, and introduces probability distributions, expected value, and various probability functions. It includes examples of calculating probabilities for different scenarios, such as sales and binomial distributions. Additionally, it covers important distributions in epidemiology and provides practice problems to reinforce understanding.

Uploaded by

sajiniossajini0
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Module II Probability Distribution

The document explains the concept of random variables, distinguishing between discrete and continuous types, and introduces probability distributions, expected value, and various probability functions. It includes examples of calculating probabilities for different scenarios, such as sales and binomial distributions. Additionally, it covers important distributions in epidemiology and provides practice problems to reinforce understanding.

Uploaded by

sajiniossajini0
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

PROBABILITY

DISTRIBUTIONS
Random Variable
•A random variable x takes on a defined set of
values with different probabilities.
•For example, if you roll a die, the outcome is
random (not fixed) and there are 6 possible
outcomes, each of which occur with probability
one-sixth.
•For example, if you poll people about their voting
preferences, the percentage of the sample that
responds “Yes on Proposition 100” is a also a
random variable (the percentage will be slightly
differently every time you poll).
Random Variables
• A variable is random if it takes on different
values as a result of the outcomes of a random
experiment
• Random variable can be either discrete or
continuous
• If a random variable is allowed to take on only a
limited number of values, which can be listed, it
is a discrete random variable
• If it is allowed to assume any value within a
given range, it is a continuous random variable
• Identify the random variable.
• Assign probability to each value of the random variable.
● Subjective probability technique
● A priori probability assignment
● Empirical probability assignment
Expected Value
• Fundamental idea in the study of probability
distribution
• To get the Expected value: multiply each value that
the random variable can assume by the probability of
occurrence of that value and then sum these
products.
Example
A dealer in refrigerator estimates from his past experience that
the probabilities of his selling refrigerators in a day are:

# Refrigerators sold 0 1 2 3 4 5 6
Probability 0.03 0.20 0.23 0.25 0.12 0.1 0.07

Find out the average number of refrigerators that will be sold


in a day.
Expected Value of a Random
Variable
Age Probability
22 0.2
23 0.3
24 0.2
25 0.3
Probability Distribution
• It is a listing of all the probabilities of all the
possible outcomes that could result if the
experiment were done
• It can be based on theoretical considerations
or on a subjective assessment of the
likelihood of certain outcome
• It can also be based on experience
Example: Cumulative Probability
Distribution
Sales Probability Cumulative
Probability
0 0.03 0.03
1 0.20 0.23
2 0.23 0.46
3 0.25 .071
4 0.12 0.83
5 0.10 0.93
6 0.07 1.00
Example
Sales Probability
0 0.03
1 0.20
2 0.23
3 0.25
4 0.12
5 0.10
6 0.07
Probability functions
• A probability function maps the possible
values of x against their respective
probabilities of occurrence, p(x)
• p(x) is a number from 0 to 1.0.
• The area under a probability function is always
1.
Discrete example: roll of a die

p(x)

1/6

x
1 2 3 4 5 6
Probability mass function
(pmf)
x p(x)
1 p(x=1)=1/6

2 p(x=2)=1/6

3 p(x=3)=1/6

4 p(x=4)=1/6

5 p(x=5)=1/6

6 p(x=6)=1/6
1.0
Cumulative distribution
function (CDF)

1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function
x P(x≤A)
1 P(x≤1)=1/6

2 P(x≤2)=2/6

3 P(x≤3)=3/6

4 P(x≤4)=4/6

5 P(x≤5)=5/6

6 P(x≤6)=6/6
Examples
1. What’s the probability that you roll a 3 or less?
P(x≤3)=1/2

2. What’s the probability that you roll a 5 or higher?


P(x≥5) = 1 – P(x≤4) = 1-2/3 = 1/3
Practice Problem
Which of the following are probability functions?

a. f(x)=.25 for x=9,10,11,12

b. f(x)= (3-x)/2 for x=1,2,3,4

c. f(x)= (x2+x+1)/25 for x=0,1,2,3


Answer (a)
a. f(x)=.25 for x=9,10,11,12

x f(x) Yes, probability


function!
9 .25
10 .25
11 .25

12 .25
1.0
Answer (b)
b. f(x)= (3-x)/2 for x=1,2,3,4

x f(x)
Though this sums to 1,
1 (3-1)/2=1.0 you can’t have a negative
probability; therefore, it’s
2 (3-2)/2=.5 not a probability
function.
3 (3-3)/2=0

4 (3-4)/2=-.5
Answer (c)
c. f(x)= (x2+x+1)/25 for x=0,1,2,3

x f(x)
0 1/25
1 3/25
Doesn’t sum to 1. Thus,
2 7/25 it’s not a probability
function.
3 13/25
24/25
Practice Problem:
• The number of ships to arrive at a harbor on any given
day is a random variable represented by x. The
probability distribution for x is:

x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1

Find the probability that on a given day:


a. exactly 14 ships arrivep(x=14)= .1
b. At least 12 ships arrivep(x≥12)= (.2 + .1 +.1) = .4
c. At most 11 ships arrivep(x≤11)= (.4 +.2) = .6
Practice Problem:
You are lecturing to a group of 1000 students. You ask
them to each randomly pick an integer between 1 and
10. Assuming, their picks are truly random:
• What’s your best guess for how many students picked
the number 9?
Since p(x=9) = 1/10, we’d expect about 1/10th of the 1000 students to
pick 9. 100 students.

• What percentage of the students would you expect


picked a number less than or equal to 6?
Since p(x≤ 6) = 1/10 + 1/10 + 1/10 + 1/10 + 1/10 + 1/10 =.6 60%
Important discrete distributions
in epidemiology…
• Binomial
●Yes/no outcomes (dead/alive, treated/untreated,
smoker/non-smoker, sick/well, etc.)
• Poisson
●Counts (e.g., how many cases of disease in a
given area)
Continuous case
▪ The probability function that accompanies a
continuous random variable is a continuous
mathematical function that integrates to 1.
▪ The probabilities associated with continuous
functions are just areas under the curve (integrals!).
▪ Probabilities are given for a range of values, rather
than a particular value (e.g., the probability of getting
a math SAT score between 700 and 800 is 2%).
Continuous case
▪ For example, recall the negative exponential
function (in probability, this is called an
“exponential distribution”):

▪ This function integrates to 1:


Continuous case: “probability density
function” (pdf)

p(x)=e-x

The probability that x is any exact particular value (such as 1.9976) is 0;


we can only assign probabilities to possible ranges of x.
For example, the probability of x falling within 1 to 2:

p(x)=e-x

x
1 2
Cumulative distribution function
As in the discrete case, we can specify the “cumulative
distribution function” (CDF):

The CDF here = P(x≤A)=


Example
p(x)

2 x
Example 2: Uniform distribution
The uniform distribution: all values are equally likely

The uniform distribution:


p(x)
f(x)= 1 , for 1≥ x ≥0
1

x
1

We can see it’s a probability distribution because it integrates


to 1 (the area under the curve is 1):
Example: Uniform distribution
What’s the probability that x is between ¼ and ½?

p(x)

¼ ½ x
1

P(½ ≥x≥ ¼ )= ¼
Probability Distribution Table/
Probability Distribution Function
Two Coins No. of Heads Probability
HH
HT 0 1/4

TH 1 1/2

TT 2 1/4
Types of Probability
Distribution
• Discrete Probability Distribution
● Can take only limited number of values, which
can be listed
● Example: probability that you are born in a given
month (only 12 possible ways)
• Continuous Probability Distribution
● The variable under consideration can take any
value within a given range, we cannot list all the
possible values
● Continuous distributions are convenient ways to
represent discrete distributions that have many
possible outcomes, all very close to each other
Binomial Distribution
• One widely used probability of a discrete random
variable
• Binomial distribution describes discrete, not
continuous, data, resulting from an experiment
known as a Bernoulli process
●The tossing of a coin fixed number of times is a
Bernoulli process
Bernoulli Process
• Example: Toss a fair coin a fixed number of times
1. Each trial has only two possible outcomes: Tail or Head,
Yes or No, Success or Failure- Dichotomy
2. The probability of the outcome of any trail remains fixed
over time- Stability
3. The trials are statistically independent- Independence
Binomial Formula


Example
• There are a total of 12 bulbs. The probability of a
faulty bulb is 0.35. Find the probability of 4 faulty
bulbs.
P(4) = 12 C4 (0.35)4 (0.65)8 = 0.2366
Probability Distribution Function
Faulty bulbs Probability
0 0.0056
1 0.036
2 0.108
:
12
P(<3 Faulty Bulbs) = P(0) +P(1)+P(2)
= 0.1496
P (≥ 3 Faulty Bulbs) = 1 ─ P(<3)
= 1 ─ 0.1496 = 0.8504
• In binomial distribution
Expected Value = np

Variance =npq

Standard Deviation = ⎷(npq)


a) A coin tossed 10 times. Find the probability of
getting 4 heads.
P(4) = ?

b) There is a 0.4 probability of an employee


getting late to the office. Each employee arrive
independently. Draw a binomial distribution of
probabilities 0,1, 2, 3,4 or 5 workers. What is the
probability that on a particular date not more than
two workers are late?
What is the number of workers expected to be
late on a given day?
Solution
a). P(4 heads) = 10C4 (0.5)4 (0.5)6
= 0.196
b). P(x) nCr pr qn-r Solution

P(x=0) 5C0 x 0.50 x 0.55-0 0.07776


P(x=1) 5C1 x 0.51 x 0.55-1 0.2592
P(x=2) 5C2 x 0.52 x 0.55-2 0.3456
P(x=3) 5C3 x 0.53 x 0.55-3 0.2304
P(x=4) 5C4 x 0.54 x 0.55-4 0.0768
P(x=5) 5C5 x 0.55 x 0.55-5 0.01024
Questions: Homework
1. In each of 4 races, the Democrats have a 60% chance of
winning. Assuming that the races are independent of each
other, what is the probability that:
a. the Democrats will win or race, 1 race , 2 race, 3 races
or all 4 races.
b. the Democrats will win at least 1 race
c. the Democrats will win a majority of the races
2. In a family of 11 children, what is the probability that there
will be more boys than girls?
3. The average percentage of failure in a certain examination
is 40. What is the probability that out of a group of 6
candidates at least 4 passed in the examination?
Solution
1.
P(x) nCr pr qn-r Solution

P(x=0) 4C0 x 0.60 x 0.44-0 0.0256


P(x=1) 4C1 x 0.61 x 0.44-1 0.1536
P(x=2) 4C2 x 0.62 x 0.44-2 0.3456
P(x=3) 4C3 x 0.63 x 0.44-3 0.3456
P(x=4) 4C4 x 0.64 x 0.44-4 0.1296
Solution
1. b) P(at least 1) = P(x≥1)
= 1 - P(x=0)
= 1 - 0.0256
= 0.9744
c) P( Democrats will win a majority) = P(x≥3)
= P(3) + P(4)
= 0.3456 + 0.1296
= 0.4752
Solution
2. n = 11
p = 0.5
P(more boys than girls)
= P(6) + P(7) + P(8) + P(9) + P(10) + P(11)
= 0.2556 + 0.1611 + 0.0806 + 0.0269 + 0.0054
= 0.50
3. n=6
q = 0.4 and p = 0.6
P(x ≥ 4 )
= P(4) + P(5) P(6)
= 0.3110 + 0.1866 + 0.046
= 0.5443
Poisson Distribution
• Another discrete probability distribution
• Named after Simeon Denis Poisson, a French
Mathematician
• It is used to describe a number of processes like distribution
of telephone calls going through a switchboard system,
demand of patients service, arrival of trucks and cars at a
toolbooth etc.
• Poisson distribution deals with the number of occurrences in
a fixed period of time
Why did Poisson invent Poisson Distribution?
• To predict the # of events occurring in the future!
• More formally, to predict the probability of a given
number of events occurring in a fixed interval of
time.

Example
A customer purchasing something from you (the
moment of truth, not just browsing). It can be how
many visitors you get on your website a day, how
many clicks your ads get for the next month, how
many phone calls you get during your shift, or even
how many people will die from a fatal disease next
year, etc.
Insta Likes!

In a Commercial account, Every week, on average, 17


people like my post.
I’d like to predict the # of ppl who would like next week
because I get paid weekly by those numbers.
What is the probability that exactly 20 people (or 10, 30,
50, etc.) will like for post next week?
One way to solve this would be to start with the number of visitors. Each person who
visits has some probability that they will really like it.

This is a classic job for the binomial distribution, since we are calculating the
probability of the number of successful events (likes).

A binomial random variable is the number of successes x in n repeated trials. And we


assume the probability of success p is constant over each trial.

However, here we are given only one piece of information — 17 ppl/week, which is a
“rate” (the average # of successes per week, or the expected value of x). We don’t
know anything about the probability p, nor the number of visitors n.

Therefore, we need a little more information to tackle this problem. What more do
we need to frame this probability as a binomial problem? We need two things: the
probability of success (claps) p & the number of trials (visitors) n.
The shortcomings of the Binomial
Distribution
1. A binomial random variable is “BI-nary” — 0 or 1.
2. In the Binomial distribution, the # of trials (n)
should be known beforehand.
Characteristics of Processes that Produce
Poisson Probability Distribution
Taking an example of the number of vehicles
passing through a single toll booth at a rush hour
1. The average number of vehicles that arrive per
rush hour can be estimated from the past
traffic data
2. If we divide the rush hour into periods of one
second each, we can see that
a) The probability that exactly one vehicle will
arrive at the single tollbooth per second is a very
small number and is constant for every one
second interval
b) The probability that two or more vehicle will
arrive within a one second interval is so small
that we can assign a zero value
c) The number of vehicles that arrive in a given
one second interval is independent of the
time at which that one second interval occurs
during the rush hour
d) The number of arrivals in any one-second
interval is not dependent on the number of
arrivals in any other one-second interval
Poisson Formula

• Past police records indicate a mean of five
accidents per month at a particular intersection.
Calculate Probability of
a)No accidents
b)Exactly one accident
c) Exactly two accidents
d)Exactly three accidents
e)Probability of 0, 1, or 2 accidents

Example
A certain product manufactured has 2 defects per unit of products
inspected. Calculate the probabilities of finding a product
1. without any defect,
2. with 3 defects and
3. with four defects.
Solution
P(x) = (e– λ λx)/x!
λ = 2 per unit
x = 0, 3, 4
P(0) = (e–2 20)/0! = 0.135
P(3) = (e–2 23)/3! = 0.1808
P(4) = (e–2 24)/4! = 0.0902
Even though the Poisson distribution models rare
events, the rate λ can be any number. It doesn’t
always have to be small.
The Poisson Distribution is asymmetric — it is always
skewed toward the right. Because it is inhibited by the
zero occurrence barrier (there is no such thing as
“minus one” clap) on the left and it is unlimited on the
other side.
As λ becomes bigger, the graph looks more like a
normal distribution.
When the number of the event is high but the
probability of its occurrence is quite low, poisson
distribution is applied. As for example, Number of
insurance claims/day on an insurance company.
Example
In a cafe, the customer arrives at a mean rate of 2 per min. Find
the probability of arrival of 5 customers in 1 minute using the
Poisson distribution formula.
Solution
Given: λ = 2, and x = 5.
Using the Poisson distribution formula:
P(X = x) = (e-λ λx )/x!
P(X = 5) = (e-2 25 )/5!
BASIS FOR COMPARISON BINOMIAL DISTRIBUTION POISSON DISTRIBUTION

Meaning Binomial distribution is one in Poisson Distribution gives the count of


which the probability of repeated independent events occur randomly with
number of trials are studied. a given period of time.

Nature Biparametric Uniparametric

Number of trials Fixed Infinite

Success Constant probability Infinitesimal chance of success

Outcomes Only two possible outcomes, i.e. Unlimited number of possible outcomes.
success or failure.

Mean and Variance Mean > Variance Mean = Variance

Example Coin tossing experiment. Printing mistakes/page of a large book.


Key Differences Between Binomial and Poisson
Distribution
● The binomial distribution is one in which the probability of repeated
number of trials is studied. A probability distribution that gives the count of
a number of independent events occur randomly within a given period, is
called probability distribution.
● Binomial Distribution is biparametric, i.e. it is featured by two parameters
n and p whereas Poisson distribution is uniparametric, i.e. characterised
by a single parameter m.
● There are a fixed number of attempts in the binomial distribution. On the
other hand, an unlimited number of trials are there in a poisson
distribution.
● The success probability is constant in binomial distribution but in poisson
distribution, there are an extremely small number of success chances.
• In a binomial distribution, there are only two possible
outcomes, i.e. success or failure. Conversely, there are
an unlimited number of possible outcomes in the case of
poisson distribution.
• In binomial distribution Mean > Variance while in
poisson distribution mean = variance.
The binomial distribution can be approximated by
the poisson distribution, if the number of attempts
(n) tends to infinity and success probability (p)
tends to 0 so that m = np.
Normal Distribution
• Continuous Probability Distribution
• Also called as Gaussian Distribution
• Two reasons why is it prominent:
●Applicable to a great many situations in which it is
necessary to make inferences by taking samples
●Normal distribution comes close to fitting the
actual observed frequency distribution of many
phenomena including human characteristics,
outputs from physical processes etc.
Characteristics of Normal
Probability Distribution
1. The curve has single peak; it is unimodal
2. It has the bell shape
3. The mean of a normally distributed population lies
at the center of its normal curve
4. It is symmetrical in nature
5. The median and the mode are also at the center i.e.,
mean, median and mode are the same value
6. The two tails of the normal probability distribution
extend indefinitely and never touch the
horizontal axis
• To define a particular normal probability
distribution only two parameters are needed: the
mean (μ) and the standard deviation (σ)
Areas under the Normal Curve
• The total area under the normal curve is 1

normally distributed population lie within ± 1


1. Approximately 68% of all the values in a

standard deviation from the mean.

normally distributed population lie within ± 2


2. Approximately 95.55 of all the values in a

standard deviation from the mean.


3. Approximately 99.7% of all the values in a
normally distributed population lie within ± 3
standard deviation from the mean.
σ σ
Measuring Area under Normal
Curve
• Statistical Tables: standard normal probability
distribution ( mean = 0 and standard
deviation =1)
• 1σ, 2 σ and 3 σ rule to find area
To Calculate the Area under
the Normal Curve

• A study of past participants in a training program
indicates that the mean length of time spent on
the program is 500 hours and that is normally
distributed with a standard deviation of 100
hours. What is the probability that a participant
selected at random will require more than 500
hours to complete the program?
• Half of the area under the curve is
located on either side of the mean.
Hence the probability will be 0.5

You might also like