Discrete Random Variables
Discrete Random Variables
Variables
MT
1
Introduction
• A discrete random variable can take a finite set of
values in its range.
• Examples…
2
Example 01
• There is a chance that the Senescyt building will have
normal energy service after the water level in the
hydroelectric system recovers to normal levels.
• Let’s denote X as a discrete random variable that expresses
the number of hours without power after two weeks.
• P(X=0) = 0.6561
• P(X=1) = 0.2916
• P(X=2) = 0.0486
• P(X=3) = 0.0036
• P(X=4) = 0.0001
3
Probability Mass Function
• The probability distribution of a random variable X
describes the probabilities associated with the possible
values of X.
• For discrete random variables the distribution is
specified as a list of possible values along with the
probability of each.
4
Probability Mass Function
• Given that f(xi) is defined as a probability, f(xi) is higher
or equal than 0. Also,
5
In R…
6
Cumulative Distribution Function
• A CDF is the sum of the probabilities at all points less
than or equal to x.
7
Example 1 (by hand)…
8
Example 1 (in R)
9
Mean and Variance
• To define the mean and variance for a discrete random
variable compared to continuous random variables, we
replace summation for integration.
10
Mean
11
In our example…
12
Mean
• Notice that X never assumes the value of the mean.
• The mean is an indication of the center of mass (center
of points)
13
Variance
14
In R…
15
Important discrete random
distributions
• Similar to a continuous random variable, discrete
random variables can be described with specific
distributions.
• Let's talk about some of the most frequently found in
experiments.
16
Binomial distribution
• Consider the following examples:
1) Flipping a coin 10 times.
X = the number of heads obtained
2) In Innopolis, students produce sausages. 2% of this
production is out-of-specification in terms of texture.
X = number of defective sausages in the next 25
produced.
3) Water quality samples contain high levels of organic
solids in 10% of the tests.
X = number of samples high in organic solids in the next
18 tested.
17
Binomial distribution
• Each example can be thought of as consisting of a
series of repeated random trials.
10 flips of a coin, 25 sausages produced in Innopolis, 18
water samples.
• The random variable in these cases is a count of the
number of trials that meet specific criteria.
Heads, without size specification, water samples with
high solid concentration.
18
Binomial distribution
• The outcome for a trial X either meets the criteria or
not.
• We can label whether the random variable meets the
criteria as a success or failure.
• Notice that “success” or “failure” are only labels.
• A random variable that can assume one of each value
(success – failure, 0-1, A-B) is called a Bernoulli trial.
• It is assumed that each trial is independent from one
another.
• The probability of success on each trial is assumed
constant. 19
Example 02
• Let’s assume that the probability of producing defective
sausages in Innopolis is 0.1. Let’s also assume that
sausages are manufactured in four working stations that
work independently from one another. Therefore, we
can assume that the events (finding a defective
Sausage) are also independent.
• Let’s denote X = the number of defective sausages
manufactured in Innopolis from the four working
stations.
20
Binomial distribution
• Let’s find a general expression of this probability.
• First, let’s define an expression for the number of
outcomes that contain x successes in n trials.
Where,
n: is the number of trials
x: is the number of successes
21
Binomial distribution
22
Example 03
• Each sample of water has 10% chance of containing
high levels of organic solids. Assume that the samples
are independent with regard to the presence of solids.
Determine the probability that in the next 18 samples,
exactly 2 contain high solids levels.
By hand…
23
Mean and variance
24
Example 04
• For the Innopolis example, find the mean and variance.
25
Normal Approximation to the
Binomial Distribution
• Remember: The binomial random variable is the total of
the count of success for repeated Bernoulli trials.
• Notice that for an extremely large number of trials, the
calculation of probabilities (by hand) become
cumbersome.
• If the number of trials is large enough, we can
approximate the binomial distribution to a normal
distribution.
• The previous is based on the central limit theorem
(more of this later).
26
Normal Approximation to the
Binomial Distribution
• Look at assignment 05
27
Example 05
• Recall our Innopolis sausages example. Let's expand this to the
sausages that a regular industrial manufacturer produces.
• It is important for this company to know whether microbiologic test
results in product withing safety specifications.
• The quality control department knows that the probability of
finding one defective sausage is 0.01.
• The quality department decide to examine all the production in
one week (n=10.000).
• Let's assume that each event is independent.
• Then, each quality test is a Bernoulli trial, and the consecutive
independent testing of the sausages can be modeled by a Binomial
distribution.
28
Example 05
• In this situation, what is the probability that more than
110 sausages are out of safety specifications?
• Whiteboard…
29
Normal Approximation to the
Binomial Distribution
• Recall that:
30
Normal Approximation to the
Binomial Distribution
• And recall that to solve problems with normal
distributions we standardize to Z scores.
31
Normal Approximation to the
Binomial Distribution
• Then, to solve problems that involve a Binomial distribution, where
the number of trial is too large, we can approximate the distribution,
and consequently solve it, as a standard normal distribution.
• Whiteboard…
32
Example 05
• N=10000, p=0.01.
• What is the probability that more than 110 sausages are
out of safety specifications?
• Whiteboard…
33