0% found this document useful (0 votes)
89 views51 pages

Multivariate Normal Distribution

The document discusses multivariate probability distributions, specifically the multivariate normal distribution. It provides examples of univariate and bivariate normal distributions with different means, variances, and correlations. It also covers functions in R for simulating and calculating densities of multivariate normal distributions, including rmvnorm(), dmvnorm(), and pmvnorm(). The cumulative distribution function (CDF) and inverse CDF are discussed in the context of calculating probabilities and quantiles.

Uploaded by

Marvi Harsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views51 pages

Multivariate Normal Distribution

The document discusses multivariate probability distributions, specifically the multivariate normal distribution. It provides examples of univariate and bivariate normal distributions with different means, variances, and correlations. It also covers functions in R for simulating and calculating densities of multivariate normal distributions, including rmvnorm(), dmvnorm(), and pmvnorm(). The cumulative distribution function (CDF) and inverse CDF are discussed in the context of calculating probabilities and quantiles.

Uploaded by

Marvi Harsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Multivariate normal
distribution

Surajit Ray
Senior Lecturer, University of Glasgow
DataCamp Multivariate Probability Distributions in R

Univariate normal distribution

Univariate normal with mean 2 and variance 1


DataCamp Multivariate Probability Distributions in R

Density shape of a bivariate normal


DataCamp Multivariate Probability Distributions in R

Bivariate normal density - 3D density plot

1 1 0.5
μ = ( ), Σ=( )
2 0.5 2
DataCamp Multivariate Probability Distributions in R

Bivariate normal density - contour plot

1 1 0.5
μ = ( ), Σ=( )
2 0.5 2
DataCamp Multivariate Probability Distributions in R

Bivariate normal density with a different mean

−1 1 0.5
μ = ( ), Σ=( )
−3 0.5 2
DataCamp Multivariate Probability Distributions in R

Bivariate normal density with a different variance

1 2 0
μ = ( ), Σ=( )
2 0 2
DataCamp Multivariate Probability Distributions in R

Bivariate normal density with strong correlation

1 1 0.95
μ = ( ), Σ=( )
2 0.95 1
DataCamp Multivariate Probability Distributions in R

Functions for statistical distributions in R


DataCamp Multivariate Probability Distributions in R

Functions for statistical distributions in R

The first letter denotes Followed by the distribution name


p for "probability" norm

q for "quantile" mvnorm

d for "density" t

r for "random" mvt


DataCamp Multivariate Probability Distributions in R

The rmvnorm function


library(mvtnorm)
rmvnorm(n, mean , sigma)

Need to specify:

n the number of samples

mean the mean of the distribution

sigma the variance-covariance matrix


DataCamp Multivariate Probability Distributions in R

Using rmvnorm to generate random samples

Generate 1000 samples from a 3 dimensional normal with

⎛1⎞ ⎛1 1 0⎞
μ= 2 Σ= 1 2 0
⎝−5⎠ ⎝0 0 5⎠

mu1 <- c(1, 2, -5)


sigma1 <- matrix(c(1,1,0,
1,2,0,
0,0,5),3,3)
set.seed(34)
rmvnorm(n = 1000, mean = mu1, sigma = sigma1)
DataCamp Multivariate Probability Distributions in R

Plot of generated samples


DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Let's practice simulating


from a multivariate normal
distribution!
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Density of a multivariate
normal distribution

Surajit Ray
Senior Lecturer, University of Glasgow
DataCamp Multivariate Probability Distributions in R

Why calculate the density of a distribution?


DataCamp Multivariate Probability Distributions in R

Why calculate the density of a distribution?


DataCamp Multivariate Probability Distributions in R

Univariate normal functions dnorm()


DataCamp Multivariate Probability Distributions in R

Probability density of a bivariate normal

Standard bivariate normal with Density heights calculated at several


0 1 0
μ = ( ) ,Σ = ( )
locations (xy coordinates)
0 0 1
DataCamp Multivariate Probability Distributions in R

Density using dmvnorm


library(mvtnorm)
dmvnorm(x, mean, sigma)

x can be a row vector or a matrix

mu1 <- c(1, 2)


sigma1 <- matrix(c(1, .5, .5, 2), 2)
dmvnorm(x = c(0, 0), mean = mu1, sigma = sigma1)

0.0384
DataCamp Multivariate Probability Distributions in R

Density at multiple points using dmvnorm


x <- rbind(c(0, 0), c(1, 1), c(0, 1)); x

[1,] 0 0
[2,] 1 1
[3,] 0 1

dmvnorm(x = x, mean = mu, sigma = sigma)

[1] 0.0384 0.0904 0.0679


DataCamp Multivariate Probability Distributions in R

Plotting bivariate densities with perspective plot

Steps:

Create grid of x and y coordinates


Calculate density on grid
DataCamp Multivariate Probability Distributions in R

Plotting bivariate densities with perspective plot

Steps:

Create grid of x and y coordinates


Calculate density on grid
Convert densities into a matrix
Create perspective plot using
persp() function
DataCamp Multivariate Probability Distributions in R

Code for plotting bivariate densities


# Create grid
d <- expand.grid(seq(-3, 6, length.out = 50 ), seq(-3, 6, length.out = 50))

# Calculate density on grid


dens1 <- dmvnorm(as.matrix(d), mean=c(1,2), sigma=matrix(c(1, .5, .5, 2), 2))

# Convert to matrix
dens1 <- matrix(dens1, nrow = 50 )

# Use perspective plot


persp(dens1, theta = 80, phi = 30, expand = 0.6, shade = 0.2,
col = "lightblue", xlab = "x", ylab = "y", zlab = "dens")
DataCamp Multivariate Probability Distributions in R

Changing viewing angle in perspective plot

persp() with theta = 30, phi = 30 persp() with theta = 80, phi = 10
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Let's practice!
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Cumulative Distribution and


Inverse CDF

Surajit Ray
Senior Lecturer, University of Glasgow
DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?


DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

Normal density with μ = 210 and σ = 10


DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

Area under the curve for x < 200


DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?


pnorm(200, mean = 210, sd = 10)
[1] 0.159
DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

What is the x0 such that the cumulative


probability at x0 is 0.95?

qnorm( p = 0.95, mean = 210, sd = 10)


[1] 226.45

⇒ 95% of the coffee jars will have less


than 226.45 grams of coffee
DataCamp Multivariate Probability Distributions in R

Cumulative distribution for a bivariate normal

1 1 .5
Bivariate CDF at x = 2 and y = 4 for a normal with μ = ( ), Σ=( )
2 .5 2
DataCamp Multivariate Probability Distributions in R

Cumulative distribution using pmvnorm

1 1 0.5
Bivariate CDF at x = 2 and y = 4 for a normal with μ = ( ), Σ=( )
2 0.5 2

mu1 <- c(1, 2)


sigma1 <- matrix(c(1, 0.5, 0.5, 2), 2)

pmvnorm(upper = c(2, 4), mean = mu1, sigma = sigma1)

[1] 0.79
attr(,"error")
[1] 1e-15
attr(,"msg")
[1] "Normal Completion"
DataCamp Multivariate Probability Distributions in R

Probability between two values using pmvnorm

Probability of 1 <x <2 and 2 <y <4

pmvnorm(lower = c(1, 2),


upper = c(2, 4),
mean = mu1,
sigma = sigma1)
DataCamp Multivariate Probability Distributions in R

Probability between two values using pmvnorm

Probability of 1 < x < 2 and 2 < y < 4

pmvnorm(lower = c(1, 2),


upper = c(2, 4),
mean = mu1,
sigma = sigma1)

[1] 0.163
DataCamp Multivariate Probability Distributions in R

Inverse CDF for bivariate normal

Dark red ellipse is the 0.95 quantile


DataCamp Multivariate Probability Distributions in R

Implementing qmvnorm to calculate quantiles


sigma1 <- diag(2)
sigma1
[,1] [,2]
[1,] 1 0
[2,] 0 1

qmvnorm(p = 0.95, sigma = sigma1,


tail = "both")

$quantile
[1] 2.24

$f.quantile
[1] -1.31e-06

attr(,"message")
[1] "Normal Completion"

The red circle with radius 2.24 contains


0.95 of the probability
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Let's practice!
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Checking normality of
multivariate data

Surajit Ray
Senior Lecturer, University of Glasgow
DataCamp Multivariate Probability Distributions in R

Why check normality?


Classical statistical techniques that assume univariate/multivariate normality:
Multivariate regression
Discriminant analysis
Model-based clustering
Principal component analysis (PCA)
Multivariate analysis of variance (MANOVA)
DataCamp Multivariate Probability Distributions in R

Review: univariate normality tests


qqnorm(iris_raw[, 1])
qqline(iris_raw[, 1]) If the values lie along the reference
line the distribution is close to normal
DataCamp Multivariate Probability Distributions in R

Review: univariate normality tests


qqnorm(iris_raw[, 1])
qqline(iris_raw[, 1])
If the values lie along the reference
line the distribution is close to normal

Deviation from the line might indicate

heavier tails
skewness
outliers
clustered data
DataCamp Multivariate Probability Distributions in R

qqnorm of all variables


uniPlot(iris_raw[, 1:4])
DataCamp Multivariate Probability Distributions in R

MVN library multivariate normality test functions

Multivariate normality tests by

Mardia
Henze-Zirkler
Royston

Graphical appoaches

chi-square Q-Q
perspective
contour plots
DataCamp Multivariate Probability Distributions in R

MVN library multivariate normality test functions

Multivariate normality tests by

Mardia ✓
Henze-Zirkler ✓
Royston

Graphical appoaches

chi-square Q-Q ✓
perspective
contour plots
DataCamp Multivariate Probability Distributions in R

Using mardiaTest to check multivariate normality


mardiaTest(iris_raw[, 1:4])

Mardia Multivariate Normality Test


---------------------------------------
data : iris_raw[, 1:4]

g1p : 2.697
chi.skew : 67.43
p.value.skew : 4.758e-07

g2p : 23.74
z.kurtosis : -0.2301
p.value.kurt : 0.818

chi.small.skew : 69.33
p.value.small : 2.342e-07

Result : Data are not multivariate normal.


------------------------------------------------------
DataCamp Multivariate Probability Distributions in R

Using qqplot from mardiaTest to check multivariate normality


mardiaTest(iris_raw[, 1:4], qqplot = TRUE)
DataCamp Multivariate Probability Distributions in R

Using hzTest to check multivariate normality


hzTest(iris_raw[,1:4])

Henze-Zirkler's Multivariate Normality Test


---------------------------------------------
data : iris_raw[, 1:4]

HZ : 2.333269
p-value : 0

Result : Data are not multivariate normal.


---------------------------------------------
DataCamp Multivariate Probability Distributions in R

Testing multivariate normality by species


mardiaTest(iris[iris_raw$Species
== "setosa", 1:4])

Mardia's Multivariate Normality Test


--------------------------------------
g1p : 3.08
chi.skew : 25.7
p.value.skew : 0.177

g2p : 26.5
z.kurtosis : 1.29
p.value.kurt : 0.195

chi.small.skew : 27.85973
p.value.small : 0.1127617

Result : Data are


multivariate normal.
--------------------------------------
DataCamp Multivariate Probability Distributions in R

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Let's make use of the tests


for multivariate normality!

You might also like