0% found this document useful (0 votes)
36 views26 pages

Lecture Statistics 07

This document provides an overview of discrete random variables. It defines key concepts like random variables, probability mass functions, and cumulative distribution functions. It then discusses commonly used discrete probability models like the discrete uniform distribution, binomial distribution, and Poisson distribution. Examples are provided to illustrate how to define and calculate properties of discrete random variables, like the expected value and variance. Chebyshev's theorem is also introduced as a way to estimate confidence intervals without assumptions about the distribution.

Uploaded by

AG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views26 pages

Lecture Statistics 07

This document provides an overview of discrete random variables. It defines key concepts like random variables, probability mass functions, and cumulative distribution functions. It then discusses commonly used discrete probability models like the discrete uniform distribution, binomial distribution, and Poisson distribution. Examples are provided to illustrate how to define and calculate properties of discrete random variables, like the expected value and variance. Chebyshev's theorem is also introduced as a way to estimate confidence intervals without assumptions about the distribution.

Uploaded by

AG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Statistics #07 – Discrete Random Variables

Prof. Dr. Jonas Dovern


Learning Goals
• You are able to explain what a random variable is
• You can explain what distribution functions and probability mass
functions are and you are able to derive those for examples
• You are able to calculate expected values and variances for discrete
random variables
• You are able to name commonly used discrete probability models
and to explain in which situation they can be used:
 Discrete uniform distribution
 Binomial distribution
 Poisson distribution

Statistics – Chair of Statistics and Econometrics 2


Random Variables
• A random variable assigns a numeric value to each outcome in the
sample space generated by some random process
• We can define events in terms of random variables:
𝑋 = 𝑥 = 𝑠 𝑠 ∈ 𝑆, 𝑋 𝑠 = 𝑥
𝑋 ≠ 𝑥 = 𝑠 𝑠 ∈ 𝑆, 𝑋 𝑠 ≠ 𝑥
𝑋 < 𝑥 = 𝑠 𝑠 ∈ 𝑆, 𝑋 𝑠 < 𝑥
𝑥1 < 𝑋 ≤ 𝑥2 = 𝑠 𝑠 ∈ 𝑆, 𝑥1 < 𝑋 𝑠 ≤ 𝑥2
• Very important distinction:
 𝑋 is the random variable (function, ex ante, probability statements)!
 𝑥 is an outcome (just a number, ex post, no uncertainty)!

Statistics – Chair of Statistics and Econometrics 3


Random Variables
Example: Household Size
• When studying something for which the household size is important, very
large households are often merged into one category
• Assume a survey of private households (like the German SOEP) for which
we denote the event that a household has 𝑖 members as 𝐴𝑖
• We could now define a random variable 𝐻 as follows:
𝐻 𝐴1 =1
𝐻 𝐴2 =2
𝐻 𝐴3 =3
𝐻 𝐴4 =4
𝐻 𝐴𝑖 = 5 for 𝑖 ≥ 5

Statistics – Chair of Statistics and Econometrics 4


Discrete Random Variables
• A discrete random variable can take on no more than a countable
number of values
• What does countable mean?
• Examples:
 Number of defective items in a large shipment
 Number of students attending my lecture on Wednesday
 Number of letters in Twitter messages
 Number of unsuccessful attempts before tossing a 6 with a regular dice

• What do we need to know to properly define such random variables?

Statistics – Chair of Statistics and Econometrics 5


Probability Mass Functions
• A probability mass function (PMF, also referred to by probability
distribution function) assigns a probability to each possible outcome
of a random variable
• It is defined for all real numbers:
𝑃 𝑋 = 𝑥 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒
𝑓 𝑥 = 𝑃(𝑥) = ቊ
0 𝑒𝑙𝑠𝑒
• We call the set 𝒳 = {𝑥|𝑓 𝑥 > 0} the support of a random distribution
• Two basic properties:
 0 ≤ 𝑓(𝑥) ≤ 1 for any value 𝑥
 σ𝑥 𝑓(𝑥) = 1

Statistics – Chair of Statistics and Econometrics 6


Probability Mass Functions
Example: Tossing Two Dices
• Random variable 𝑋
takes on the sum of the
revealed dots on both
dices
• Each bar indicates 𝑓(𝑥)
• Stacking all bars yields
a bar of length 1
• How to read the graph:
 Likelihood of tossing a 1
1 1
twice is ⋅ ≈ 0.028
6 6
Statistics – Chair of Statistics and Econometrics 7
Cumulative Distribution Functions
• Other representation of distribution is the cumulative distribution
function (CDF, also referred to by cumulative probability distribution)
• Analogical to empirical distribution function that we saw earlier
• CDF states the probability that 𝑋 does not exceed a certain value
𝐹 𝑥 = 𝑃(𝑋 ≤ 𝑥)
• Because of calculus rules for events, this function is sufficient
information to determine probability of any event
• Basic properties:
 0 ≤ 𝐹(𝑥) ≤ 1 for any value 𝑥
 If 𝑥0 < 𝑥1 are two numbers, then 𝐹(𝑥0 ) ≤ 𝐹(𝑥1 )

Statistics – Chair of Statistics and Econometrics 8


Cumulative Distribution Functions
Example: Tossing Two Dices
• Random variable 𝑋
takes on the sum of the
revealed dots on both
dices
• Plot shows CDF
• How to read the graph:
 Likelihood of obtaining a
number smaller than 8 is
0.583
 Probability of number not
larger than 8 is 0.722
Statistics – Chair of Statistics and Econometrics 9
Cumulative Distribution Functions
• Important to remember that CDFs are always right-continuous
• Step function for discrete random variables:
 Function value for input value at which step occurs belongs to “upper step”
 Due to definition of CDF: 𝐹 𝑥 = 𝑃(𝑋 ≤ 𝑥)

• CDF and PMF are, of course, related:


𝐹 𝑥0 = ෍ 𝑓(𝑥)
𝑥≤𝑥0
 We need to cumulate the values of the PMF from 𝑓 −∞ up to 𝑓(𝑥0 )

Statistics – Chair of Statistics and Econometrics 10


The Discrete Uniform Distribution
• One of the simplest distributions is the discrete uniform distribution
• Each integer from 𝐿 to 𝑈 is drawn with the same (uniform!)
probability
• Its PMF is given by
1
𝑓 𝑥 = 𝑓 𝑥; 𝐿, 𝑈 = 𝑓𝑜𝑟 𝑥 = 𝐿, 𝐿 + 1, … , 𝑈
𝑈−𝐿+1
• We denote this distribution by 𝑋~𝒟𝒰(𝐿, 𝑈)

Example: Lottery
• Each participant has the same chance to be drawn as winner.
Statistics – Chair of Statistics and Econometrics 11
The Binomial Distribution
• How to model a random experiment with only two outcomes?
• Important insights due to Jakob Bernoulli (1654 - 1705)
• Examples:
 Flipping a coin
 Testing if a randomly selected item works or not
• 𝑃 𝐴 = 𝑝 and 𝑃 𝐴ҧ = 1 − 𝑝
• More interesting case:
 Bernoulli process
 Repeated execution of identical Bernoulli experiment
 Important: executions are independent of each other

Statistics – Chair of Statistics and Econometrics 12


The Binomial Distribution
• Relevant question: If we perform a Bernoulli experiment 𝑛 times,
what is the probability of obtaining exactly 𝑥 successes?
• We need to derive …
 the probability of a certain sequence of 𝑥 successes and n − 𝑥 failures; and
 how many of such sequences exist (see textbook and remember the stuff on
combinations)
• Probability for each such sequence is 𝑝 𝑥 1 − 𝑝 𝑛−𝑥

𝑛 𝑛!
• Number of sequences with 𝑥 successes is 𝑥 = 𝑥! 𝑛−𝑥 !
𝑛
• PMF is given by 𝑓 𝑥; 𝑛, 𝑝 = 𝑥 𝑝 𝑥 1 − 𝑝 𝑛−𝑥 with 𝒳 = {0,1, … , 𝑛}

Statistics – Chair of Statistics and Econometrics 13


The Binomial Distribution
Example: Election Survey
• In the 2018 state elections, the CSU got 37.2%
of the votes
• What is the distribution of the number of CSU
voters 𝑋 in a random sample of 100 voters?
• You guessed it already:
 𝑋~ℬ(100, 0.372)
 Because 100 << #voters, the assumption “with replacement” (that implies independence
and invariance of Bernoulli trials) is justified

Statistics – Chair of Statistics and Econometrics 14


Properties of Discrete Random Variables
• Which outcome value should we expect “on average”?
• Expected value of a discrete random variable:
𝐸 𝑋 = 𝜇 = ෍ 𝑥𝑓(𝑥)
𝑥∈𝒳
 It is basically a weighted average of all potential outcomes
 Weight are given by the values of the PMF

• Variance of a discrete random variable:


𝜎2 = 𝐸 𝑋 − 𝜇 2
= ෍ 𝑥 − 𝜇 2 𝑓 𝑥 = ෍ 𝑥 2 𝑓 𝑥 − 𝜇2
𝑥∈𝒳 𝑥∈𝒳
 Note that in general 𝐸 𝑔 𝑋 = σ𝑥 𝑔 𝑥 𝑓(𝑥)

Statistics – Chair of Statistics and Econometrics 15


Chebyshev’s Theorem
• Confidence intervals often helpful for decision making:
 Which range can we expect sales volume to fall into with 95 % certainty?
 With 99 % certainty, what is the maximum duration of unemployment a person
can expect?
• Chebyshev’s theorem offers a nice way to estimate such intervals
without making any assumptions about the distribution:
1
𝑃 𝜇−𝑘⋅𝜎 ≤𝑋 ≤𝜇+𝑘⋅𝜎 ≥1− 2
𝑘
for any random variable 𝑋 with mean 𝜇 and standard dev. 𝜎 and a number 𝑘 > 0
(trivial for 0 < 𝑘 ≤ 1)
• More later during the course:
 Making assumptions about the distribution leads to much smaller intervals
Statistics – Chair of Statistics and Econometrics 16
The Binomial Distribution
Example: Election Survey
• Remember our election survey example with 𝑋~ℬ(100, 0.372)
• How can we determine the moments of this distribution and confidence
intervals?
• Work via properties of the underlying Bernoulli experiment!
• Expected value is 𝐸 𝑋 = 100 ⋅ 0.372 ≈ 37 and 𝜎𝑋 = 100 ⋅ 0.372 ⋅ 0.628 = 4.83
• According to Chebyshev’s theorem (with 𝑘 = 1.45), we know that
1
𝑃 37 − 1.45 ∙ 4.83 ≤ 𝑋 ≤ 37 + 1.45 ∙ 4.83 = 𝑃 30 ≤ 𝑋 ≤ 44 ≥ 1 − ≈ 0.52
1.452
• With knowledge of the distribution we obtain: 𝑃 𝑋 ≤ 44 − 𝑃 𝑋 < 30 = 0.899
Statistics – Chair of Statistics and Econometrics 17
The Poisson Distribution
• Often we are interested in modelling the number of events that occur
in a certain period of time
• Examples?
 How many customers enter a shop in each hour?
 How many damage events does an insurance firm need to settle each year?
 How many job offers does an unemployed individual receive in one month?
• The Poisson distribution is commonly used for such cases
• Assumptions:
 Constant probability of occurrence in each subinterval
 No more than one occurrence at a time
 Occurrences are independent of each other
Statistics – Chair of Statistics and Econometrics 18
The Poisson Distribution
• The PMF of the Poisson distribution is given by
𝑒 −𝜆 𝜆𝑥
𝑓 𝑥 =
𝑥!
• The parameter 𝜆 > 0 determines the average number of events during
one time unit
• Properties:
 𝐸 𝑋 =𝜆
 𝜎𝑋2 = 𝜆
 If 𝑋~𝒫(𝜆1 ) and 𝑌~𝒫(𝜆2 ) are independent, then 𝑋 + 𝑌~𝒫(𝜆1 + 𝜆2 )

Statistics – Chair of Statistics and Econometrics 19


The Poisson Distribution
Example: Credit Defaults
• Assume a bank holds many loans on its books and knows that, on
average, each month 400 borrowers default
• We can model this variable 𝐿 as a Poisson distribution with 𝜆 = 400
• What is the median number of defaults in one year?
 𝑞𝐿;0.5 = 400

• What is the likelihood of a good month with less


than 350 defaults?
 𝑃 𝐿 < 350 = 𝑃 𝐿 ≤ 349 = 𝐹𝐿 349 = 0.005

Statistics – Chair of Statistics and Econometrics 20


The Poisson Distribution
Example: Shape of the Poisson Distribution
• In the previous example
the mean was equal to the
median which suggests a
symmetric distribution
• In general, the skewness
for a Poisson distribution
෥ 𝟑 = 𝟏/ 𝝀:
is given by 𝝁
 It becomes symmetric (zero
skewness) for large 𝜆
 For small 𝜆 it is right-skewed

Statistics – Chair of Statistics and Econometrics 21


Multivariate Random Variables
• Two dices (black and white):
 Random variable 𝑊 measures number of revealed dots of white dice
 Random variable 𝐵 measures number of revealed dots of black dice

• Can you derive the joint PMF and CDF of 𝑊 and 𝐵 that we denote by
𝑓𝑊,𝐵 (𝑤, 𝑏) and 𝐹𝑊,𝐵 (𝑤, 𝑏)?

Statistics – Chair of Statistics and Econometrics 22


Covariance of Multivariate Random Variables
• Like we saw when describing datasets, the covariance of bivariate
random variables measures the linear relationship between them:

𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸 𝑋 −𝜇𝑋 𝑌 −𝜇𝑌 =෍ ෍ 𝑥 − 𝜇𝑋 𝑦 − 𝜇𝑌 𝑓𝑋,𝑌(𝑥,𝑦)


𝑥∈𝒳 𝑦∈𝒴

• Again, it is often more useful to look at correlations:


𝐶𝑜𝑣(𝑋, 𝑌)
𝜌 = 𝐶𝑜𝑟𝑟 𝑋, 𝑌 =
𝜎𝑋 𝜎𝑌
• Independence of both variables implies 𝜌 = 0. Is the converse true?

Statistics – Chair of Statistics and Econometrics 23


Linear Combinations of Random Variables
• Sometimes, one is interested in a general linear combination of
random variables:
𝑊 = 𝑎𝑋 + 𝑏𝑌
 Mean of 𝑊 is given by 𝑎 𝜇𝑋 + 𝑏𝜇𝑌
 Variance of 𝑊 is given by 𝑎2 𝜎𝑋2 + 𝑏2 𝜎𝑌2 + 2𝑎𝑏 𝐶𝑜𝑣(𝑋, 𝑌)
• Covariance plays an important role for variance of lin. combination:
 Positive covariance  amplifies volatility (if 𝑎𝑏 > 0)
 Negative covariance  dampens volatility (if 𝑎𝑏 > 0)

• Important for applications in finance / investment theory:


 Building minimum-variance portfolios

Statistics – Chair of Statistics and Econometrics 24


Summary
• Random variables assign values to outcomes of random processes
• Random variables are described by their CDF
• Discrete random variables have a PMF ( probability of each
outcome)
• A number of discrete probability distributions is commonly used to
describe random processes:
 Discrete uniform distribution
 Bernoulli distribution / Binomial distribution
 Poisson distribution
• Extension of concepts to more than one variable possible
Statistics – Chair of Statistics and Econometrics 25
Homework
• Relevant literature:
 Sections 2.2, 4.1 – 4.5, 4.7 and the
Appendix of Chapter 2 of Newbold
et al. (2019)

• Make sure you work on the


current problem set
• Work on the R problem set for
this week
 Go to the R tutorial if you have any
questions!!!

Statistics – Chair of Statistics and Econometrics 26

You might also like