This post demonstrates how the order of steps added to a query can make a big performance difference and drastically effect the number of steps generated by the designer. I’ll demonstrate how to use the new query Diagnostics tools to compare and understand query performance.
The Power Query Editor for Power BI simplifies data transformation processing by generating query steps for each action you perform in the query designer. This whiteboard diagram shows the high-level flow of information through a Power BI solution. Every query has a source (“SRC” in the diagram) followed by a connection. The query consists of a series of transformations (“XForm”) prior to populating a table in the data model.
These steps are defined in “M” code which is executed when the data model is processed. In simple projects, all the query steps are automatically generated. The order with which you add these steps makes a difference. Not only does the order that you add steps to a query help organize and manage a query but it can have a significant impact on performance and the computer resources needed for a query to run. A little planning and iterate clean-up as you work through the design process can make a big difference.
The two queries shown here have exactly the same outcome and they were both created just by choosing transformations from the query designer menus. The only difference is the order that I chose the options.
Introducing Query Diagnostics
To understand how query steps are being processed an to compare two test queries, I use the new Query Diagnostics features on the Tool ribbon. In this simple test, this is really easy.
I select a query in the designer, start the diagnostics, perform a refresh and then stop the diagnostics. This generates two new queries with the diagnostics results.
I then choose the other query and repeat the same steps to get diagnostics for that query.
There is a boatload of useful information in the diagnostic results query but it’s way more than we need.
The most important information for this test is the Exclusive Duration column. For this test, I all need is to summarize this column. I did the same thing with both diagnostic queries and then compared the two results. Appending these two summarized diagnostic query results clearly shows the difference in performance:
This video demonstration is an exaggerated yet effective example of working through the process of importing a simple Excel worksheet and then transforming a series of columns. In the first example, I rename and change the data type of each column, one-at-a-time. In the second example, I consolidate the steps; renaming each column and then change the column data types. How does this simple change to my approach affect the generated query and execution performance?