One of the main benefit of using Power BI as a Self-Service BI tool is the capability of Power BI Desktop to build data models.
What that means is instead of forcing every source of data into a single table, which will inevitably require over-simplifications, data loss or even errors, in Power BI it is possible to import multiple datasets and define relationships between them as easily as dragging and dropping the linking column from one to the other.
Here we have 5 different queries, connected to 5 different data sources, that are pulled into a single model. They are then linked by relationships which will have an effect on visualizations and filters.
When starting to use this technique, one of the first and most frustrating issue that may be encountered is the following:
Dragging the key column from one table to the other results in a message saying: “You can’t create a relationship between these two columns because one of the columns must have unique values.”
The cause is simple: Power BI can only create relationships between 2 tables if at least one of the two selected columns hosts a distinct, unique list of the key values. It is to be noted that this is not a limitation specific to Power BI, it is in fact one of the basic design principle of all relational database technologies (the concept of primary and foreign keys). We will see more about that later, but it makes sense that in order to be able to find something, it needs to have a unique address.
We can work around that in 2 ways:
- Edit one the query and make the column a distinct list of the key values by using "Remove Duplicate Rows" on it. The issue with this approach is that we will lose information, as usually the key is duplicated for a good reason (see example below)
- Add an intermediary table made of the list of distinct key values to the model, which will then be linked to both original columns
Let’s illustrate that through an example.
Our data is going to be coming from 2 Excel tables, one containing the employee expenses by month, the other their monthly budget allowance by category:
We can quickly load those datasets into Power BI Desktop via 2 queries, but when we try to build a chart comparing monthly expenses versus budget, pulling from the 2 resulting tables, the numbers just don’t make any sense:
To solve that we know that we need a relationship between those 2 tables, using their common attribute: the employee column.
But of course, at that point we will get an error message when we try to create that relationship:
Now we know why: in both tables the employees are duplicated, repeated either by months or by categories. Neither column contains a distinct list of employees, we need to build it ourselves. We have to take this approach here, as doing otherwise would result in an impacting loss of information (no more categories, or periods).
To do that, we first need to create a new query, appending values from both existing ones: in the query editor, Home tab, select Append Query as New.
We want data coming from both of our tables as we could be in a situation where an employee has an allocated budget but no expense, or the other way around. If we want the comprehensive list, we need to use both data sources to generate it.
We will append as new, and not just append, as we don’t want to lose one of our existing queries as the result of this operation. Just appending will override one of the existing ones.
We will then select the tables that will be used to generate the distinct list of employees, in our case Expenses and Budget:
We will then rename the query as “Employees”, remove all columns other than Employee, and Remove Duplicates to end up with the distinct list of employees coming from both our tables:
This method is not the most efficient one to build the unique list, but it is the simplest to illustrate here. A better approach would be to create reference queries (with disabled load) from our existing ones, make those distinct lists of employees, append them together and make them unique again. It would hold much better again a large volume of rows.
We can then close and apply, and get to the relationships pane to create them:
No error message should bother us anymore!
We usually recommend to “Hide in Report View” the employee columns in the original tables, so that users are not confused by the fact that they are multiple ones, and always use the common one we just created.
Finally, we can build a report with all the benefits of a complete data model with well-defined relationships:
This technique is part of a greater ensemble called dimensional modeling, and here we actually created the premise of an Employee dimension for our data model. Dimensional modeling is a fundamental design practice of Business Intelligence. It allows to build sound data models, both easily understandable by end users and efficient to use in terms of query performance. Some great resources to dig into this approach can be found in the literature section of the associated Wikipedia article.
That is another benefit of Power BI: we’re not only learning about the tool in itself, but also discovering more about data modeling and gaining new data related skills.