How to Perform Multivariate Normality Tests in Python

Last Updated : 20 Feb, 2022

In this article, we will be looking at the various approaches to perform Multivariate Normality Tests in Python.

Multivariate Normality test is a test of normality, it determines whether the given group of variables comes from the normal distribution or not. Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution.

multivariate_normality() function

In this approach, the user needs to call the multivariate_normality() function with the required parameters from the pingouin library to conduct the multivariate Normality test on the given data in Python.

Syntax to install pingouin library:

pip install pingouin

Syntax: multivariate_normality(x,alpha)

Parameters:

X: Data matrix of shape (n_samples, n_features).
alpha: Significance level.

Returns

hz:he Henze-Zirkler test statistic.
pval:P-value.
normal: True if X comes from a multivariate normal distribution.

This is a hypotheses test and the two hypotheses are as follows:

H0 (accepted): The variables follow a multivariate normal distribution..(Po>0.05)
Ha (rejected): The variables do not follow a multivariate normal distribution.

Example 1: Multivariate Normality test on the multivariate normal distribution in Python

In this example, we will be simply using the multivariate_normality() function from the pingouin library to Conduct a Multivariate Normality test on the randomly generated data with 100 data points with 5 variables in python.

Python3

from pingouin import multivariate_normality
import pandas as pd
import numpy as np
data = pd.DataFrame({'a': np.random.normal(size=100),
                         'b': np.random.normal(size=100),
                         'c': np.random.normal(size=100),
                         'd': np.random.normal(size=100),
                         'e': np.random.normal(size=100)})

# perform the Multivariate Normality Test
multivariate_normality(data, alpha=.05)

Output:

HZResults(hz=0.7973450591569415, pval=0.8452549483161891, normal=True)

Output Interpretation:

Since in the above example, the p-value is 0.84 which is more than the threshold(0.5) which is the alpha(0.5) then we fail to reject the null hypothesis i.e. we do not have evidence to say that sample follows a multivariate normal distribution.

Example 2: Multivariate Normality test on not multivariate normal distribution in Python

Python3

from pingouin import multivariate_normality
import pandas as pd
import numpy as np
data = pd.DataFrame({'a':np.random.poisson(size=100),
                   'b': np.random.poisson(size=100),
                   'c': np.random.poisson(size=100),
                   'd': np.random.poisson(size=100),
                   'e':np.random.poisson(size=100)})

# perform the Multivariate Normality Test
multivariate_normality(data, alpha=.05)

HZResults(hz=7.4701896678920745, pval=0.00355552234721754, normal=False)

Output Interpretation:

Since in the above example, the p-value is 0.003 which is less than the alpha(0.5) then we reject the null hypothesis i.e. we have sufficient evidence to say that sample does not come from a multivariate normal distribution.