Probability Distributions in R
Probability Distributions in R
Lecture 9
Lecture Outline
• R - Normal Distribution
• R - Binomial Distribution
• R - Poisson Regression
Random Variable
p(x)
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function (pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative distribution function
(CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Examples
1. What’s the probability that you roll a 3 or less?
P(x≤3)=1/2
One standard
deviation from the
Mean ()
mean ()
Expected value, or mean
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
Discrete case:
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Empirical Mean is a special case of Expected
Value…
x i n
1
X i 1
n
i 1
xi ( )
n
Var ( x) E[( x ) ]
2 2
(x )
all x
i
2
p(xi )
Discrete case:
Var ( X ) 2
(x )
all x
i
2
p(xi )
Continuous case:
Var ( X ) ( xi ) p ( xi )dx
2 2
Similarity to empirical variance
( xi x ) 2 N
1
i 1
n 1
i 1
( xi x ) ( 2
n 1
)
x <- pbinom(26,51,0.5)
print(x)
• When we execute the above code, it produces
the following result:
• [1] 0.610116
qbinom()
• This function takes the probability value and gives a number whose
cumulative value matches the probability value.
# How many heads will have a probability of 0.25 will come out when a coin
# is tossed 51 times.
x <- qbinom(0.25,51,1/2)
print(x)
• When we execute the above code, it produces
the following result −
• [1] 23
rbinom()
• This function generates required number of random values of given
probability from a given sample.
# Find 8 random values from a sample of 150 with probability of 0.4.
x <- rbinom(8,150,.4)
print(x)
• When we execute the above code, it produces
the following result −
• [1] 58 61 59 66 55 60 61 67
R - Poisson Regression
• Poisson Regression involves regression models in which the response variable is in the form of counts
and not fractional numbers. For example, the count of number of births or number of wins in a football
match series. Also the values of the response variables follow a Poisson distribution.
• The general mathematical equation for Poisson regression is −
log(y) = a + b1x1 + b2x2 + bnxn.....
• Following is the description of the parameters used −
y is the response variable.
a and b are the numeric coefficients.
x is the predictor variable.
• The function used to create the Poisson regression model is the glm() function.
• Syntax
• The basic syntax for glm() function in Poisson regression is −
glm(formula,data,family)
• Following is the description of the parameters used in above functions −
formula is the symbol presenting the relationship between the variables.
data is the data set giving the values of these variables.
family is R object to specify the details of the model. It's value is 'Poisson' for Logistic Regression.
Example
• We have the in-built data set "warpbreaks" which
describes the effect of wool type (A or B) and tension
(low, medium or high) on the number of warp breaks
per loom. Let's consider "breaks" as the response
variable which is a count of number of breaks. The
wool "type" and "tension" are taken as predictor
variables.
• Input Data
• input <- warpbreaks
• print(head(input))
When we execute the above code, it
produces the following result
Create Regression Model
output <-glm(formula = breaks ~ wool+tension, data = warpbreaks,
family = poisson)
print(summary(output))
When we execute the above code, it
produces the following result
• In the summary we look for the p-value in the
last column to be less than 0.05 to consider an
impact of the predictor variable on the
response variable. As seen the wooltype B
having tension type M and H have impact on
the count of breaks.