0% found this document useful (0 votes)
68 views10 pages

Correlation Coefficient in Power BI Using DAX - Ben's Blog

Power BI tactics

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views10 pages

Correlation Coefficient in Power BI Using DAX - Ben's Blog

Power BI tactics

Uploaded by

andy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

BEN'S BLOG

Azure – Data – Power BI – R – Statistics


(https://datakuity.com/)

CORRELATION COEFFICIENT IN POWER BI


USING DAX
 2 0 2 1 - 1 0 - 2 9 ( H t t p s : // D a t a k u i t y. C o m / 2 0 2 1 / 1 0 / 2 9 / C o r r e l a t i o n - C o e f f i c i e n t - I n -
P o w e r - B i - U s i n g - D a x / )  B e n ( H t t p s : // D a t a k u i t y. C o m /A u t h o r /A d m i n / ) 

In this post, I will describe what is the Pearson correlation coefficient and how to implement it in Power BI using DAX.

Table of Contents
What is the Correlation Coefficient
The correlation coefficient is a statistical measure of the relationship between two variables; the values range between
-1 and 1. A correlation of -1 shows a perfect negative correlation (https://www.investopedia.com/terms/n/negative-
correlation.asp), and a correlation of 1 shows a perfect positive correlation
(https://www.investopedia.com/terms/p/positive-correlation.asp). A correlation of 0.0 shows no linear relationship
between the movement of the two variables.

How to interpret the Correlation Coefficient

To go a bit more in detail we can interpret the correlation coefficient as follows:

-1: Perfect negative correlation


Between -1 and <=-0.8: Very strong negative correlation
Between >-0.8 and<=-0.6: Strong negative correlation
Between >-0.6 and<=-0.4: Moderate negative correlation
Between >-0.4 and<=-0.2: Weak negative correlation
Between >-0.2 and<0: Very weak negative correlation
0: No correlation
Between 0 and<0.2: Very weak positive correlation
Between >=0.2 and <0.4: Weak positive correlation
Between >=0.4 and <0.6: Moderate positive correlation
Between >=0.6 and <0.8: Strong positive correlation
Between >=0.8 and <1: Very strong positive correlation
1: Perfect positive correlation

One very important thing to remember is that when two variables are correlated, it does not mean that one causes the
other. Correlation does not imply causation
(https://en.wikipedia.org/wiki/Correlation_does_not_imply_causation)
The Formula
Unlike in Excel, there’s no DAX built-in correlation function in Power BI (at the time of writing this post).
In Excel, the built-in function is called Correl (https://support.microsoft.com/en-us/office/correl-function-995dcef7-
0c0a-4bed-a3fb-239d7b68ca92?ui=en-us&rs=en-us&ad=us), this function requires two arrays as a parameter (X and Y).
The Correl formula used in Excel is as follows:

There are actually several ways of writing the Pearson correlation coefficient formula but to keep consistent with the
formula used in Excel I will stick with the above formula which is one of the most common anyway.

Calculate the Correlation Coefficient with DAX


Since we saw the formula above we now need to translate it into DAX.
So let’s break down the formula:

The Σ (sigma) symbol is used to denote a sum of multiple terms (x1+ x2+x3..) which is an equivalent of sum or
sumx
x̄ (mu x bar), is used to represent the mean of x
ȳ (mu y bar), is used to represent the mean y
√ is the square root its dax function is sqrt

Now let’s see the DAX code for the Pearson correlation formula:

coeff corr = //x̄ var __muX =calculate(AVERAGE(YourTable[x])) //ȳ var __muY=calculate(AVERAGE(YourTable[y]))


//numerator var __numerator = sumx(‘YourTable’,( [x]-__muX)*([y]-__muY)) //denominator var __denominator=
SQRT(sumx(‘YourTable’,([x]-__muX)^2)*sumx(‘YourTable’,([y]-__muY)^2)) return divide(__numerator,__denominator)

Let’s visualise it in Power BI


Let’s now build a small report that will show the correlation between the head size (x) and the brain weight (y).
The data are as follows:

Head_Size (variable x)
Brain_Wight (variable y)
In Power BI when clicking on the Analytics icon we can easily add a trend line to visualize the relationship between two
variables on a scatter plot.

However, to show the correlation coefficient on top of the trend line we still need to create a DAX measure that I have
called “coeff corr”.

And as the final touch let’s create another measure “coeff correl type” that will return the interpretation of the
correlation so we can display it on top of our visual.
coeff correl type = SWITCH(TRUE, [coeff corr]=-1 ,”Perfect negative correlation”, [coeff corr]>-1 && [coeff corr]<=-0.8
,"Very strong negative correlation", [coeff corr]>-0.8 && [coeff corr]<=-0.6 ,"Strong negative correlation", [coeff
corr]>-0.6 && [coeff corr]<=-0.4 ,"Moderate negative correlation", [coeff corr]>-0.4 && [coeff corr]<=-0.2 ,"Weak negative
correlation", [coeff corr]>-0.2 && [coeff corr]<0 ,"Very weak negative correlation", [coeff corr]=0 ,"No correlation", [coeff
corr]>0 && [coeff corr]<0.2 ,"Very weak positive correlation", [coeff corr]>=0.2 && [coeff corr]<0.4 ,"Weak positive
correlation", [coeff corr]>=0.4 && [coeff corr]<0.6 ,"Moderate positive correlation", [coeff corr]>=0.6 && [coeff corr]<0.8
,"Strong positive correlation", [coeff corr]>=0.8 && [coeff corr]<1 ,"Very strong positive correlation", [coeff corr]=1
,"Perfect positive correlation" )
And this is how things look like when we concatenate our “coeff corr” with the “coeff correl type” measure and add
them on top of our scatter plot.
The Power BI

Head Size Brain Weight


2.720 4.747 955 1.635

Correlation between Head Size and Brain Weight Strong positive correlation r=0.800

1.600

1.400
Brain_Weight

1.200

1.000

3.000 3.500 4.000 4.500


Head_Size

Microsoft Power BI 1 de 2

 

Conclusion
This is just another post on the series of implementing statistical functions in DAX you can read some other similar
posts in my blog such as AB testing in Power BI (https://datakuity.com/2020/09/29/ab-testing-with-power-bi/) or
Poisson distribution in Power BI (https://datakuity.com/2021/09/15/poisson-distribution-in-power-bi-with-dax/).

Power BI is still lacking some advanced statistical functions compared to Excel but with DAX we can write almost any
existing Excel function!

 DAX (Https://Datakuity.Com/Category/Dax/), Power BI (Https://Datakuity.Com/Category/Power-Bi/), Statistics


(Https://Datakuity.Com/Category/Statistics/)
 Correlation (Https://Datakuity.Com/Tag/Correlation/), Dax (Https://Datakuity.Com/Tag/Dax/), Statistics
(Https://Datakuity.Com/Tag/Statistics/)
13 thoughts on “Correlation Coefficient in Power BI using DAX”

Oded Dror says:

2021-10-30 at 4:53 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1062)


Ben,
What did you put in the details of your chart?
what if we have very large table dax can’t calculate very large table?

Ben (https://datakuity.com) says:

2021-10-31 at 10:38 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1063)

Hi Oded,
For this example, there’s nothing in details I put only two variables in my chart X–> Head size and Y–>
Brain Weight and the dataset is very small (a few hundred rows)
However, I ran it on a bigger model with more than 100M rows and it takes around 2.5sec to compute on
my PC which is quite old (with 8 CPU), so on a good server, it should be just fine.
Of course, as this measure is a bit complex the more Cores a machine has the quicker it will get computed.

Mike says:

2022-02-17 at 10:20 am (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1189)


Thanks for this. I adapted this somewhat to add to my own visuals. This also inspired me to add a dynamic
calculation of chi-square to report with a a bar graph and frequency table. Pretty cool! You can see what I
did following the link below. Will be reading some of your other posts for more ideas.

https://community.powerbi.com/t5/DAX-Commands-and-Tips/Counting-Columns-for-a-Dynamic-
Measure/m-p/2341557#M59106 (https://community.powerbi.com/t5/DAX-Commands-and-Tips/Counting-
Columns-for-a-Dynamic-Measure/m-p/2341557#M59106)

Ben (https://datakuity.com) says:

2022-02-24 at 8:13 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1200)


hi, Mike excellent thanks I’ll definitely have a look at your post!
I should soon write another similar post of this one for the chi2.

Peter says:

2022-02-28 at 1:49 am (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1206)


This is great. Can you make the pbix file available for download?

Ben (https://datakuity.com) says:

2022-03-29 at 10:07 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1238)


Hi Peter sure here is the link to download the pbix file: https://github.com/f-
benoit/PowerBI/raw/main/CoeffCorrelation/correl.pbix (https://github.com/f-
benoit/PowerBI/raw/main/CoeffCorrelation/correl.pbix)

Marius says:

2022-03-22 at 4:27 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1232)


Ben,
I recently needed to find a correlation, and ended up with almost the same measure as you.
However, i’m currently not using it because i’m not sure of the results.
Maybe you can help !
For both averages, we use a simple “__muX =calculate(AVERAGE(YourTable[x]))”. However, in the numerator
VAR, we use a SUMX to calculate “[x]-__muX”
Isn’t the result wrong ? Using a SUMX, the data will be treated row by row.
CALCULATE(AVERAGE(YourTable[x])) would then be applied row by row. With this in mind, it would mean
__muX = [x], therefore [x] – __muX = 0.
I understand that this could very well mean that there is no correlation at all, but these calculations also
seem a bit weird, so I am wondering.
Does this make any sense to you at all ?

Ben (https://datakuity.com) says:

2022-03-29 at 10:02 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1237)


Hi Marius,
If you type “CALCULATE(AVERAGE(YourTable[x]))” inside the SUMX you’re right “it would mean __muX =
[x], therefore [x] – __muX = 0” because the avg would be calculate for each row in its row context.
However, in my dax expression, I calculated the average inside a variable so the average is only
evaluated in the context of the variable definition not where it is used.
So as you go through each row inside the sumx the result of the variable __muX is never recalculated.
Hope that makes sense?

Ben

Charle Londoño says:

2022-05-19 at 10:31 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1342)


hello
I have the following exercise, and it is that I need to calculate this statistic for several variables, to be able to
make that graphical correlation I use two different tables and thus the correlation X1 (table 1) with X2 (table
2) changes, or the correlation X1 (table 1) with X3 (table 2). What could I do to calculate the covariance if I
have this way of relating.

Ben (https://datakuity.com) says:

2022-06-05 at 7:55 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1382)


Hi Charle,
I’m not too sure to understand your question, even if it is possible to calculate a covariance for more than
2 variables it would be very hard to interpret since we will not be able to distinguish how X2 and X3
contribute to the covariance result.
However, if you want to calculate the covariance for each combination X1 with X2, X1 with X3 … I would
either create one measure for each combination or create a dynamic measure.

Bill says:

2022-06-09 at 7:59 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1396)


How come the sample size or population size (n) doesn’t come into play here? I beginning with statistics
and don’t understand why it is missing.

Ben (https://datakuity.com) says:

2022-06-21 at 9:27 pm (https://datakuity.com/2021/10/29/correlation-coefficient-in-power-bi-using-dax/#comment-1422)


Hi Bill,
The sample size (n) will come into play when we will need to test the significance of the correlation
coefficient which is not covered in my post.
Basically, even if the coeff of correlation “r” shows a strong relationship let’s say “0.8” between two
variables it does not mean that we can rely on it to make a prediction we have first to test if this result is
significant.

So the bigger the sample is the better, here is a link that goes into more detail as you will see the formula
to evaluate the significance of “r” uses the size of the sample “n”
https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(OpenStax)/1
(https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(OpenStax)/1

Pingback: Multiple Linear Regression in Power BI - Ben's Blog (https://datakuity.com/2023/03/12/multiple-


linear-regression-in-power-bi/)

You might also like