Correlation Coefficient in Power BI Using DAX - Ben's Blog
Correlation Coefficient in Power BI Using DAX - Ben's Blog
In this post, I will describe what is the Pearson correlation coefficient and how to implement it in Power BI using DAX.
Table of Contents
What is the Correlation Coefficient
The correlation coefficient is a statistical measure of the relationship between two variables; the values range between
-1 and 1. A correlation of -1 shows a perfect negative correlation (https://www.investopedia.com/terms/n/negative-
correlation.asp), and a correlation of 1 shows a perfect positive correlation
(https://www.investopedia.com/terms/p/positive-correlation.asp). A correlation of 0.0 shows no linear relationship
between the movement of the two variables.
One very important thing to remember is that when two variables are correlated, it does not mean that one causes the
other. Correlation does not imply causation
(https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation)
The Formula
Unlike in Excel, there’s no DAX built-in correlation function in Power BI (at the time of writing this post).
In Excel, the built-in function is called Correl (https://support.microsoft.com/en-us/office/correl-function-995dcef7-
0c0a-4bed-a3fb-239d7b68ca92?ui=en-us&rs=en-us&ad=us), this function requires two arrays as a parameter (X and Y).
The Correl formula used in Excel is as follows:
There are actually several ways of writing the Pearson correlation coefficient formula but to keep consistent with the
formula used in Excel I will stick with the above formula which is one of the most common anyway.
The Σ (sigma) symbol is used to denote a sum of multiple terms (x1+ x2+x3..) which is an equivalent of sum or
sumx
x̄ (mu x bar), is used to represent the mean of x
ȳ (mu y bar), is used to represent the mean y
√ is the square root its dax function is sqrt
Now let’s see the DAX code for the Pearson correlation formula:
Head_Size (variable x)
Brain_Wight (variable y)
In Power BI when clicking on the Analytics icon we can easily add a trend line to visualize the relationship between two
variables on a scatter plot.
However, to show the correlation coefficient on top of the trend line we still need to create a DAX measure that I have
called “coeff corr”.
And as the final touch let’s create another measure “coeff correl type” that will return the interpretation of the
correlation so we can display it on top of our visual.
coeff correl type = SWITCH(TRUE, [coeff corr]=-1 ,”Perfect negative correlation”, [coeff corr]>-1 && [coeff corr]<=-0.8
,"Very strong negative correlation", [coeff corr]>-0.8 && [coeff corr]<=-0.6 ,"Strong negative correlation", [coeff
corr]>-0.6 && [coeff corr]<=-0.4 ,"Moderate negative correlation", [coeff corr]>-0.4 && [coeff corr]<=-0.2 ,"Weak negative
correlation", [coeff corr]>-0.2 && [coeff corr]<0 ,"Very weak negative correlation", [coeff corr]=0 ,"No correlation", [coeff
corr]>0 && [coeff corr]<0.2 ,"Very weak positive correlation", [coeff corr]>=0.2 && [coeff corr]<0.4 ,"Weak positive
correlation", [coeff corr]>=0.4 && [coeff corr]<0.6 ,"Moderate positive correlation", [coeff corr]>=0.6 && [coeff corr]<0.8
,"Strong positive correlation", [coeff corr]>=0.8 && [coeff corr]<1 ,"Very strong positive correlation", [coeff corr]=1
,"Perfect positive correlation" )
And this is how things look like when we concatenate our “coeff corr” with the “coeff correl type” measure and add
them on top of our scatter plot.
The Power BI
Correlation between Head Size and Brain Weight Strong positive correlation r=0.800
1.600
1.400
Brain_Weight
1.200
1.000
Microsoft Power BI 1 de 2
Conclusion
This is just another post on the series of implementing statistical functions in DAX you can read some other similar
posts in my blog such as AB testing in Power BI (https://datakuity.com/2020/09/29/ab-testing-with-power-bi/) or
Poisson distribution in Power BI (https://datakuity.com/2021/09/15/poisson-distribution-in-power-bi-with-dax/).
Power BI is still lacking some advanced statistical functions compared to Excel but with DAX we can write almost any
existing Excel function!
Hi Oded,
For this example, there’s nothing in details I put only two variables in my chart X–> Head size and Y–>
Brain Weight and the dataset is very small (a few hundred rows)
However, I ran it on a bigger model with more than 100M rows and it takes around 2.5sec to compute on
my PC which is quite old (with 8 CPU), so on a good server, it should be just fine.
Of course, as this measure is a bit complex the more Cores a machine has the quicker it will get computed.
Mike says:
https://community.powerbi.com/t5/DAX-Commands-and-Tips/Counting-Columns-for-a-Dynamic-
Measure/m-p/2341557#M59106 (https://community.powerbi.com/t5/DAX-Commands-and-Tips/Counting-
Columns-for-a-Dynamic-Measure/m-p/2341557#M59106)
Peter says:
Marius says:
Ben
Bill says:
So the bigger the sample is the better, here is a link that goes into more detail as you will see the formula
to evaluate the significance of “r” uses the size of the sample “n”
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(OpenStax)/1
(https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(OpenStax)/1