[ Related posts in this series: SQL, M or DAX: When Does it Matter? SQL, M or Dax? – part 2 SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) SQL or M? – SSAS Partitions in Power Query/M ]
This is a post about a post about a post. Thanks to those of you who are entering comments in the original May 12 post titled SQL, M or DAX? This is a popular topic. And thanks to Adam Saxton for mentioning this post in his Guy in A Cube Weekly Roundup.
This is a HUUUUGE topic and I can tell that I’ve struck a chord with many BI practitioners by bringing it up. Please post your comments and share your ideas. I’m particularly interested in hearing your challenging questions and your thoughts about the pros-and-cons of some less-obvious choices about whether to implement transformations & calculations in SQL, M or DAX.
This week, I have had engaging conversations on this topic while working on a Power BI consulting project for a large municipal court system. As a consultant, I’ve had three weeks of experience with their data and business environment. The internal staff have spent decades negotiating the intricacies and layers upon layers of business process so of course, I want to learn from their experience but I also want to cautiously pursue opportunities to think outside the box. That’s why they hired me.
Tell me if this situation resonates with you… Working with a SQL Server database developer who is really good with T-SQL but fairly new to Power BI & tabular modeling, we’re building a data model and reports sourced from a line-of-business application’s SQL Server database. They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports. Every time a user wants a new report, a request is sent to the IT group. A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules. Complex reports might take days or weeks of development time. I needed to update a dimension table in the data model and needed a calculated column to differentiate case types. Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”. The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements. It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting. After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it. After all, someone had already solved this problem, right?
The trade-off by using the existing T-SQL code is that the calculations and business rules are applied at a fixed level of granularity and within a certain business context. The query would need to be rewritten to answer different business questions. If we take the “black box” approach and paste the working and tested SQL script into the Power Query table definition, chances are that we won’t be able to explain the query logic in a few months, after we’ve moved on and forgotten this business problem. If you are trying to create a general-purpose data model to answer yet-to-be-defined questions, then you need to use design patterns that allow developers and users to navigate the model at different levels of grain across different dimension tables, and in different filtering contexts. This isn’t always the right answer but in this case, I am recommending that we do as little data merging, joining and manipulation as possible in the underlying source queries. But, the table mapping between source and data model are not one-to-one. In some cases, two or three source tables are combined using SQL joins, into a flattened and simplified lookup table – containing only the necessary, friendly-named columns and keys, and no unnecessary clutter like CreatedDateTime, ModifiedDateTime and CreatedByUser columns. Use custom columns in M/Power Query to transform the row-level calculated values and DAX measures to perform calculations in aggregate and within filter/slicing/grouping context.
I’d love to hear your thoughts and ideas on this topic.