Ex No-4
Ex No-4
If the support of the random variable is a finite or countably infinite number of values, then the
random variable is discrete. Discrete random variables have a probability mass function (pmf). This
pmf gives the probability that a random variable will take on each value in its support.
The cumulative distribution function (cdf) provides the probability the random variable is less than or
equal to a particular value. The quantile function is the inverse of the cumulative distribution function,
i.e. you provide a probability and the quantile function returns the value of the random variable such
that the cdf will return that probability. (This is not very useful for discrete distributions.)
Binomial distribution
Whenever you repeat an experiment and you assume
1. the probability of success is the same each time and
2. each trial of the experiment is independent of the rest (as long as you know the probability of
success).
Then, if you record the number of success out of the total, you have a binomial distribution.
For example, consider an experiment with probability of success of 0.7 and 13 trials,
i.e. X∼Bin(13,0.7)
We can use the pmf to calculate the probability of a particular outcome of the experiment. For
example, what is the probability of seeing 6 successes? We can use the dbinom function.
n <- 13
p <- 0.7
dbinom(6, size = n, prob = p)
## [1] 0.0441524
x <- 0:n
plot(x, dbinom(x, size = n, prob = p), main = "Probability mass function for
Bin(13,0.7)")
plot(x, pbinom(x, size = n, prob = p), type="s", main = "Cumulative
distribution function for Bin(13,0.7)")
Poisson distribution
While the binomial distribution has an upper limit (n), we sometimes run an experiment and are
counting successes without any technical upper limit. These experiments are usually run for some
amount of time or over some amount of space or both. For example,
the number of photos observed by a detector in a minute
the number of times an a/c unit comes on in an hour
When there is no technical upper limit (even if the probability of more is extremely small),
then a Poisson distribution can be used. The Poisson distribution has a single parameter, the
rate that describes, on average, how many of the things are expected to be observed. We
write X∼Po(λ) where λ is the rate parameter.
Suppose we record the number of network failures in a day and on average we see 2
failures per day. The number of network failures in a day has no upper limit, so we’ll use the
Poisson distribution.
rate <- 2
x <- 0:10 # with no upper limit we need to decide on an upper limit
plot(x, dpois(x, lambda = rate), main = "Probability mass function for
Po(2)")
Continuous distributions
In constrast to discrete random variables, continuous random variables can take on an
uncountably infinite number of values. The easiest way for this to happen is that the random
variable can take on any value between two specified values (and infinity counts), i.e. an
interval.
Continuous random variables have a probability density function (pdf) instead of a pmf.
When integrated from a to b, this pdf gives the probability the random variable will take on a
value between a and b. Continuous random variables still have a cdf, quantile function, and
random generator that all still have the same interpretation.
Uniform distribution
The simplest continuous random variable is the uniform random variable. If Y is a random variable
and it is uniformly distributed between a and b, then we write Y∼Unif(a,b) and this means
that Y can take on any value between a and b with equal probability.
The probability density function for a uniform random variable is zero outside of a and b (indicating
that the random variable cannot take values below a or above b) and is constant at a value of 1/(b-a)
from a to b.
a <- 0
b <- 1
# The curve function expects you to give a function of `x` and then it
# (internally) creates a sequence of values from `from` and to `to` and
creates
# plots similar to what we had before, but using a line rather than points.
curve(dunif(x, min = a, max = b), from = -1, to = 2,
xlab='y', ylab='f(y)', main='Probability density function for
Unif(0,1)')
The cumulative distribution function is the integral from negative infinite up to y of the probability
density function. Thus it indicates the probability the random variables will take on values less than
y, i.e. P(Y<y) Since the probability density function for the uniform is constant, the integral is
simply zero from negative infinite up to a, then a straight line from (a,0) up to (b,1) and then constant
after that.
curve(punif(x, min = a, max = b), from = -1, to = 2,
xlab='y', ylab='F(y)', main='Cumulative distribution function for
Unif(0,1)')
The quantile function is the inverse of the cumulative distribution function. For a given probability p, it
finds the value such that the probability the random variable is below that value is p. That is it finds
the value x in the expression P(X<x)=p
curve(qunif(x, min = a, max = b), from = 0, to = 1,
xlab='p', ylab='F^{-1}(p)', main='Quantile function for Unif(0,1)')
To draw random values from the distribution, use the r version of the function. For instance, here is
100 random Unif(a,b) draws represented as a histogram.
mu <- 0
sigma <- 1 # standard deviation