0% found this document useful (0 votes)
18 views

Mstat Note7 Random Variable f23

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Mstat Note7 Random Variable f23

Uploaded by

junmokim123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Random Variables

Math & Stat for Data Science


Graduate School of Data Science
Seoul National University
This note will cover
• Random variable
• Definition, CDF, PMF, PDF
• Discrete Random Variables
• Bernoulli, Binomial, Poisson, etc
• Continuous Random Variables
• Normal, chi-squared, Exponential, etc
• Multivariate RV
• Independence, conditional dist.
• Change of variables
Random Variables
• Sample space and events
• can determine probability
• But we need to assign them to a real number for
analysis

• Ex. 2 Coin toss


• Sample space
• {HH}, {HT}, {TH}, {TT}
• Need to convert them to a certain number for the
analysis
Random Variables

From this point, we will directly work with random variables.


Some examples
Some examples
Random variables
• Probability?
• Calculate the probability by inverting random variables
Random variables

Here, values of random variables are not 1:1 to the event


Probability and Distribution

• CDF has all the information about the random


variables
• CDF is non-decreasing, right continuous function
CDF
CDF

Right continuous
Non-decreasing
Probability mass function (PMF)

• Defined when X is discrete


• Can calculate CDF using PMF
PMF
PMF
Probability density function (PDF)
• Similarly, pdf can be defined for continuous variable
Probability density function (PDF)
• PDF of uniform(0,1) distribution

• Corresponding CDF
Probability density function (PDF)
• PDF is not a probability!!

• For continuous traits P(X=x)=0 for every x


• PDF can be larger than 1
• PDF of Uniform(0,1/5) = 5 for x in (0, 1/5)

• Mathematically, PDF is a something called Radon-


Nikodym derivative
Properties
Quantile function

.75 (third) Quantile?


Equal distribution
• Two random variable X and Y are equal in
distribution:
• FX(x) = FY(x) for all x

• Does not mean that X = Y


Well known Discrete RVs
Bernoulli Distribution
Bernoulli, p=0.3
0.6
Probability

0.3
0.0

0 1
Bernoulli Distribution
Examples
• Coin Toss
• 0: T
• 1: H
• p: probability to have H

• Disease Probability
• 0: Non disease
• 1: Disease
• p : probability to have the disease
Bernoulli Distribution
Examples
• Suppose there are 5 individuals, and the
probabilities to have the disease is p=0.2

Generate random sample?

# R-code
N=5
p=0.2
rbinom(N, 1, p)
Bernoulli Distribution
Examples
• Suppose there are 5 individuals, and the
probabilities to have the disease are all different as
p1=0.1, p2=0.2, p3=0.3, p4=0.4, p5=0.5

Generate random sample?

# R-code
N=5
p=c(0.1, 0.2, 0.3, 0.4, 0.5)
rbinom(N, 1, p)
Binomial Distribution
Binomial, n=10, p=0.3

0.00 0.10 0.20


Probability

0 1 2 3 4 5 6 7 8 9 10
Binomial Distribution
• Sum of n independent Bernoulli(p) random variables
follows Binomial(n, p)

• Sum of two independent Binomial random variables


follows Binomial distribution

• X1 ~ Binom(n1, p), X2~Binom(n2, p)


• X1+X2 = Binom(n1 + n2, p)
Binomial distribution
Examples
• Coin Toss
• Suppose toss coin 10 times
• x: the number of head
• p: probability to have H

• Disease Probability
• Suppose we sample 50 individuals in SNU
• x: number of individuals with disease
• p: probability to have the disease
Binomial Distribution
Binomial, n=1000, p=0.3 Large n: binomial distribution has a bell
shape
=> Close to Normal distribution
Probability

0.015
0.000

0 61 143 235 327 419 511 603 695 787 879 971

Binomial, n=1000, p=0.001


Very small p (rare event), binomial
distribution does not have the bell shape
0.30
Probability

=> Close to Poisson


0.15
0.00

0 2 4 6 8 10 12 14 16 18 20
Geometric Distribution
Geometric, p=0.3
0.30
Probability

0.15
0.00

0 2 4 6 8 10 12 14 16 18 20

Ex. Number of trials needed until the first head in coin toss
Poisson Distribution
Poisson, lambda=1

0.30
Probability

0.15
0.00

0 2 4 6 8 10 12 14 16 18 20
Poisson Distribution
Siméon Denis Poisson
Poisson, lambda=1
Binomial, n=1000, p=0.001

0.30
0.30

Probability
Probability

0.15
0.15

0.00
0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

• Derived to model the number of rare event


• Poisson derived it to model wrongful conviction

• Ex. Binomial (1000, 0.001) and Poisson(1) are essentially


the same
Poisson Distribution - derivation
Poisson, lambda=1
Binomial, n=1000, p=0.001

0.30
0.30

Probability
Probability

0.15
0.15

0.00
0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
Poisson Distribution

• 𝜆 : mean and variance of


the distribution

• Sum of two Poisson RVs


follows Poisson
• X1~Poisson(𝜆!),
X2~Poisson(𝜆")
• X1+X2 ~Poisson(𝜆! + 𝜆")

Plot from Wikipedia


Poisson Distribution
Examples
• Event incidence
• Suppose we are interested in the incidence of car
accident
• x: number of the incidence in each day
• 𝜆: average number

• DNA data
• The number of mutation in the region
• x: number of mutation
• 𝜆: average number
Well known Continuous RVs
Normal Distribution
Normal, mu=0, sigma=1
0.4
0.3
Density

0.2
0.1
0.0

-4 -2 0 2 4

x
Normal Distribution

Abraham de Moivre Carl Friedrich Gaussc

• One of the most important Prob. Distribution!!

• Derived to approximate the limit of the Binomial


trial (De Moivre, 1721) and to model error
distribution in Astronomy (Gauss, 1809)
Normal Distribution
• One of the feature of normal distribution is linear
transformation of Normal RV follows Normal
distribution.

• Ex. X ~ N(3, 5)
#$%
• Dist of &
?

• Calculate P(X > 1)?


Normal Distribution
Examples
• Widely used for model continuous measure

• Any measurement
• Noise (error) in the observation
• Linear regression is a good example
!
𝜒 distribution
Exponential Distribution
Exponential

0.8
Density

0.4
0.0

0 2 4 6 8 10

x
Exponential Distribution
• CDF
𝑥
𝐹 𝑥 = 1 − exp(− )
𝛽

• Memorylessness
P 𝑋 > 𝑡 + 𝑠 𝑋 > t) = P(X > s)

• The current waiting time is independent to the previous


waiting time
Multivariate Distribution
Bivariate Distribution
• Given a pair of random variables, (X, Y), we can
describe a joint distribution
• Discrete: joint mass function

• Continuous: joint pdf


Bivariate continuous
Marginal Distribution

Find an univariate distribution of X from the joint distribution of (X,Y)!


Marginal Distribution-discrete
Marginal Distribution-continuous
Independent Random Variables
Independent Random Variables
• To check the independence, we need to check the
equation (2.7). The following holds for continuous
Example

Independent?
Example

Joint distribution
of X and Y?
Independence
• Following theorem is very useful to identify the
independence
Independence

Independent?
Conditional Distribution

Discrete

Continuous
Example

Conditional Dist. of P(X < 1/4 | Y = 1/3) ?


Example

Marginal distribution of Y?
Multivariate Dist.
• For multivariate random variables, using vector-
notation is more convenient
• X= (X1,…, Xn)
• Corresponding PDF is f(X1,…, Xn)

• Independence of X1,…, Xn
• Can be confirmed using

• Or
IID sampling

Many of the observed data can be thought as IID samples


Multinomial
• Multivariate version of binomial
• Suppose there are k groups, and in each trial, one group can be
selected
• Ex. Dice throw
• 6 possible outcome
• Suppose to throw n times.
• 𝑋 = 𝑋! , 𝑋" , … , 𝑋# : number of each group
• 𝑝 = 𝑝! , 𝑝" , … , 𝑝# : probability to select each group

• X ~ Multinomial (n, p)
Multinomial
• Each element Xj marginally follows Binomial(n, pj)

• Commonly used in survey data


• Satisfaction

• Preference
Multivariate Normal

• One of the most important MV distribution

• Two parameters
• Mean: 𝜇=(𝜇1, …, 𝜇k)
• Variance (nxn matrix): Σ
• Variance should be symmetric and positive definite!!
Multivariate Normal (Extra)
• If each Xj follows IID N(0, 1) (so Z value) and then
Multivariate Normal (Extra)

Linear transformation of MVN follows MVN !


Multivariate Normal
Example: correlated outcomes
• Suppose we want to generate height and weight
• Height ~ N(170, 𝜎 "=25)
• Weight ~ N(72, 𝜎 " =16)
• Covariance = 12
Multivariate Normal
Example: correlated outcomes
Multivariate Normal (Extra)
Transformation of RV
Transformation of RV
• In many situations we need to consider to
transform RVs
• Ex. X -> X2 (for variance calculation)

• Suppose Y=r(X) is a transformation of X. PMF of Y is


Transformation of RV
• Ex. P(X=-1)=P(X=1)=1/4, P(X=0)=1/2. Let Y=X2, then
PDF of Y?
Transformation of RV
• Continuous case
Transformation of RV

Distribution of Y?
Transformation of multivariate RV
• Transform of several random variables
• Max(X, Y), Min(X, Y), X+Y, X/Y
• Ex. Minimum waiting time.
• Let Z=r(X,Y)
Transformation of multivariate RV
• Suppose X1 and X2 are independent RV and follows
exp(1) distribution. Y = Min(X1, X2 ).
Distribution of Y?
Summary
• Random variable
• Map sample space to real number (or vector)
• We actually use random variables (not sample space) to data
analysis
• Discrete Random Variables
• Bernoulli, Binomial, Poisson, etc
• Continuous Random Variables
• Normal, chi-squared, Exponential, etc
• Multivariate RV
• Independence, conditional dist.
• Change of variables

You might also like