Composite Key Relationships – A better way using Power Query

Edit: If you are using DirectQuery on both tables, then you probably want to use the CombineValues DAX function for performance benefits. Click on the link below for details:

https://www.sqlbi.com/articles/using-combinevalues-to-optimize-directquery-performance/

The problem

In Power BI (relationships view) you create relationships by connecting a column in one table to a column in another table. Sometimes however, you will want to connect using multiple columns. Although Power BI does not natively support this, you can get around this quite easily by concatenating the columns into a new column in each table.

The simple and most common way of concatenating columns

In tables view, use simple DAX to create a calculated column. E.g. SalesIDCountry = SalesID&Country

A cleaner solution (skip to the bottom for the pros/cons)

Note that this solution only applies if the column names you are concatenating are identical in each table. For example: you need to connect “Zip” & “Country” in Sales table to “Zip” & “Country” in Region table. Personally, I like this practice of having identical column names across the tables and many consider it best practice. However, it is still debated and you might find it unsuitable for your data model.

Use the Power Query function that I created below:

(Columns as list, NewColName as text)=>
(Table as table, RemoveCols as logical)=>
let 
    AddCol = Table.AddColumn(Table
                            ,NewColName
                            ,each Text.Combine(List.Transform(Record.ToList(Record.SelectFields(_,Columns)),Text.From),";")
                            ,type nullable text)
   ,MaybeRemCols = if RemoveCols then Table.RemoveColumns(AddCol, Columns) else AddCol
in
    MaybeRemCols

How to turn above code into a function

To create this Power Query function you need to:

  • Create a blank query
  • Whilst that query is selected in the query pane on the left side, click on the advanced editor button in the ribbon to open a new window. Paste the above code in and then click “Done” to close the window. The query is now a function and you see this by “fx” symbol in the query pane that is on the left hand side.
  • Rename the function (not necessary, but definitely best practice). (Right click on the function in the query pane). Let’s call it “fn_AddCompositeKeyMain”.

How this function works

This is a function that creates a function which the two tables can then use. Don’t worry it’s super simple and I present a quick scenario below.

So let’s say you have 2 queries: Table A and Table B, both tables having columns “X” and “Y” which you wish to use for the creation of a relationship.

The first thing we do is create the function that can be applied to both Tables A & B.

We do this by supplying parameters (a list and a text) to “fn_AddCompositeKeyMain”. We can’t use the GUI (graphical user interface) unfortunately as the GUI assumes that the list parameter needs to be a column of a table. That’s not a problem though. So we create a blank query, go into the advanced editor and fill out the function with the required parameters.

All I need to write in the advanced editor is:

fn_AddCompositeKeyMain({"X","Y"},"XY")

We now name this function:  “fn_AddCompositeKeyXY”

So, “fn_AddCompositeKeyXY” takes two parameters: a table value and a binary value. It transforms the table by inserting a new column named “XY” that concatenates columns “X” and “Y” together.  Again, I do not invoke this function using the GUI; instead I add it as a last step the table/query.  So I click on Table A or B and make sure that the last step of that query is selected. Then I click on “fx” symbol in the formula bar to add a new last step.

I would then write something like “ = fn_AddCompositeKeyXY(PreviousStep, false) “ (without the quotes) but instead of having “PreviousStep” you would have the name of what was the last step.  You’ll also notice that there is a second parameter that can either be true or false. This allows you to delete the columns that you have used for concatenation. This can be quite useful for reducing the size of one or both of the tables.  So, if the second parameter is true then the columns are deleted but if the second parameter is false then no columns are deleted.

What are the advantages of doing it this way

  • You define the names of the columns on which you are concatenating once rather than twice. This makes it easier and less error prone to create and also later change.
  • You also define the name of the new concatenated column only once rather than twice.
  • And by having them the same name, not only do you enforce standardisation, a relationship will be automatically generated in the relationships view after hitting “Close & Apply”.
  • You can easily and efficiently (and in the same step!) delete the columns that were used in concatenation. I will often do this to the table on the many side of the relationship.  This will speed up, simplify and reduce the size of a data model.

What the disadvantages

  • The columns which you are concatenating (the composite key columns) need to have their column names the same in both tables
  • Although it leads to cleaner code, another person (less skilled in power query) might have a hard time understanding what you have done.
  • If the composite key is wholly numerical then it may be more efficient to generate a number instead of a text value which is what this function does. You could create a similar function for numbers but I leave this to the reader.
One Comment

Add a Comment

Your email address will not be published. Required fields are marked *

Follow this blog!

Get the latest posts delivered to your mailbox: