[ Related posts in this series: SQL, M or DAX: When Does it Matter? SQL, M or Dax? – part 2 SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) SQL or M? – SSAS Partitions in Power Query/M ]
This is a post about a post about a post. Thanks to those of you who are entering comments in the original May 12 post titled SQL, M or DAX? This is a popular topic. And thanks to Adam Saxton for mentioning this post in his Guy in A Cube Weekly Roundup.
This is a HUUUUGE topic and I can tell that I’ve struck a chord with many BI practitioners by bringing it up. Please post your comments and share your ideas. I’m particularly interested in hearing your challenging questions and your thoughts about the pros-and-cons of some less-obvious choices about whether to implement transformations & calculations in SQL, M or DAX.
This week, I have had engaging conversations on this topic while working on a Power BI consulting project for a large municipal court system. As a consultant, I’ve had three weeks of experience with their data and business environment. The internal staff have spent decades negotiating the intricacies and layers upon layers of business process so of course, I want to learn from their experience but I also want to cautiously pursue opportunities to think outside the box. That’s why they hired me.
Tell me if this situation resonates with you… Working with a SQL Server database developer who is really good with T-SQL but fairly new to Power BI & tabular modeling, we’re building a data model and reports sourced from a line-of-business application’s SQL Server database. They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports. Every time a user wants a new report, a request is sent to the IT group. A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules. Complex reports might take days or weeks of development time. I needed to update a dimension table in the data model and needed a calculated column to differentiate case types. Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”. The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements. It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting. After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it. After all, someone had already solved this problem, right?
The trade-off by using the existing T-SQL code is that the calculations and business rules are applied at a fixed level of granularity and within a certain business context. The query would need to be rewritten to answer different business questions. If we take the “black box” approach and paste the working and tested SQL script into the Power Query table definition, chances are that we won’t be able to explain the query logic in a few months, after we’ve moved on and forgotten this business problem. If you are trying to create a general-purpose data model to answer yet-to-be-defined questions, then you need to use design patterns that allow developers and users to navigate the model at different levels of grain across different dimension tables, and in different filtering contexts. This isn’t always the right answer but in this case, I am recommending that we do as little data merging, joining and manipulation as possible in the underlying source queries. But, the table mapping between source and data model are not one-to-one. In some cases, two or three source tables are combined using SQL joins, into a flattened and simplified lookup table – containing only the necessary, friendly-named columns and keys, and no unnecessary clutter like CreatedDateTime, ModifiedDateTime and CreatedByUser columns. Use custom columns in M/Power Query to transform the row-level calculated values and DAX measures to perform calculations in aggregate and within filter/slicing/grouping context.
I’d love to hear your thoughts and ideas on this topic.
10 thoughts on “SQL, M or Dax? – part 2”
Haha, definitely resonates – peeling layers of the corporate onion can make anybody cry! And yes, SQL Views before direct Tables are essential for reasons noted – never regretted indeed, and even better through a data warehouse (plus de-norm flattening).
Power Query (including SSAS with Visual Studio alongside SSIS) is excellent even for business logic rules – esp where team more focused there, but SQL Views, Stored Procedures, UDFs etc seem to be the first place if the skills available, as they’re more re-usable and often good (though not always) to keep query optimisation on those servers – SQL continues to serve us well and it’s all coming together now!
Call me traditional but my first thought is that the ideal solution is to have a data warehouse database sat between the source system and Power BI. All that tricky SQL coded logic can be stored once either as views or the results inserted in to tables (so it’s also only run once).
You then have a central source that can feed SSRS, Power BI and any other tool that’s needed and you know your business logic is consistent.
As an added bonus you now have a buffer between the source system and reports. This is particularly important if you’re using vendor supplied source system has the possibility of uncontrollable re-structuring that would break all of your queries.
Barney, I couldn’t agree with you more. Implementing a data warehouse is an ideal solution. In that scenario, it would make a lot of sense to populate fact and dimension tables using conventional techniques – with SQL and ETL packages. The reality is that many organizations aren’t going to down this path for a variety of reasons; whether small-scale projects or reporting needs outside of traditional IT circles. I don’t think that the old guard is dying off but there is definitely a new generation of BI & Analysts who are building BI solutions in very different ways, and with different tools.
Thanks for the response Paul. I was aware as I typed my first comment that potentially I’m out of the ordinary in terms of being in a position to follow something like the ‘ideal’ approach (being in a large organisation, with an SQL Server based source system, sat on enough already paid for hardware and software for there to be room for our work). I fully accept that many people with more restrictions or a smaller budget will skip the bits in the middle of the classical approach.
It will be interesting to see how things develop over time, the loading straight to Power BI reminds me of how I approached things when I was starting out over a decade ago as a report builder\analyst with a network drive full of Access databases. Bitter experience has taught me to put as many fire breaks in to the ETL process as possible – I told my team yesterday “I’ve never regretted adding a layer of views between two steps” and it stands as good advice to me.
I wonder to what extent a lot of people coming to Power BI are new to ETL\BI and so on because it’s a tool that opens up abilities who would previously never had access to them or could never have afforded them. These are people who have probably never experienced to give one memorable example what happens to a set of several hundred reports running a Select * query over a server link when the source system vendor adds an extra xml column to every table (for those not aware SQL server can’t use a table over a link if it has an xml column in it).
It will be interesting to watch things progress over the next few years and see what practices develop as lessons are learnt.
Great blog post I am really enjoying them.
I have to agree that I find you can very quickly build in business logic using Power Query. I always suggest to people to leverage Power Query for business logic, as it is very powerful and is easy to make changes as the business requires.
Not only that but with the new CDS-A the existing M language can be copied and pasted into CDS-A, as well as I can use my existing M knowledge to get data and easily modify it.