SQL, M or DAX: When Does it Matter?

Column-based calculations are part of every BI project. Some of the most common examples include building a street address column from individual fields, concatenating a person’s full name from First, Middle and Last Name fields; or creating a location string from City, State and Country fields. More complex examples might require a lookup or join operation to get a reference value used in a calculation that is then stored as a column on each entity record. Keep in mind that we are strictly talking about calculated column values that are stored for each row and not dynamic calculations that run in the context of filters and slicers. Those are measures and that is a separate topic.

It Depends, or It Matters (one or the other)

If the data source is a relational database that supports queries, should you perform these calculations in SQL, Power Query “M”, or in a DAX calculated column? The standard tongue-in-cheek answer from a consultant is usually “it depends”. That was the answer I grew up with, but apparently popular language has changed in the last generation. My kids, who are all now young adults, say “it matters”. Back in the day, if I said “Hey, Dad. Can we get ice cream on the way to the store?” He would say, “It depends on whether you get your chores done.” My kids would say “it matters whether I get back from the beach on time.” So, it either depends or it matters, I guess.

Self-service BI is all about having the freedom to create reports that make an impact and bring important value to business users and leaders. When importing, shaping and modeling data; if we can get simple and mundane tasks out of the way, this leaves time and energy to move on to more important things. If you can just get the core table structures in-place; with unique keys, calculated columns, and numeric columns for summaries and aggregate measures, you can design the more impactful bits of the solution to support the report design.

Many BI projects start out the same way, with aspirations to import data from several different sources, to work-out the complexities of cleansing and matching records in various tables to create a nice uniform data model used to build all kinds of beautiful dashboards and interactive reports. Our optimism about making quick progress at the beginning of the project is often squelched when we realize that the data source for a lookup table isn’t reliable, and that the system of record is an application controlled by a different business group in some remote corner of the organization. The data is in a different format, access is restricted and the person in charge of managing it is on extended leave. We get caught-up in the complications of just getting essential data into the model and then deliver far less than expected. I can’t tell you how common this scenario is – especially in larger projects.

For calculated columns that end up stored in a data model table, there is rarely a difference in performance, storage or report query speed based on the technique used to calculate the column value. In cases where there is a technical advantage, the decision should be clear – use the most optimal method that is feasible. In the majority of cases where there is no strong technical argument for one method or the other, use the method that simplifies development and maintenance, and offers more control.

You should have a standard method for managing calculated columns, so you know how to maintain them down the road. This might seem trivial so why does it matter so much?

The data model schema is the foundation for your reporting solution and making changes after the rest of the solution is designed can be catastrophic if you don’t plan and manage future changes. A semantic data model is literally a house of cards. Deleting, renaming or changing the data type for a column could break every calculation and the report visual referencing that column. Whether you should create these calculated values in a source query using hand-written SQL or a database object like a view or user-defined function, in Power Query or as a calculated column using DAX – that will depend on who maintains the Power BI or SSAS model and who should manage the design in the future.

SQL and Database Objects

As a general rule of thumb, in formal SSAS projects built on a relational data mart or data warehouse that is managed by the same project team as the BI data model, I typically recommend that every table in the model import data from a corresponding view or UDF stored and managed in the relational database. Keep in mind that is the way we’ve been designing Microsoft BI projects for several years. Performing simple tasks like renaming columns in the 2012 or 2014 SSAS data model designer was slow and cumbersome. Performing this part of the data prep in T-SQL was much easier than in SSDT. With the recent advent of Power Query in SQL Server Data Tools 2017, there is a good argument to be made for managing those transformations but the tool is still new and frankly I’m still testing the water. Again, keep changes in one place for future maintenance.

Do your absolute best to avoid writing complex SQL query logic that cannot be traced back to the sources. Complicated queries can become a black box – and a Pandora’s box if they aren’t documented, annotated and easy to decipher.

Power Query/M

For less formal projects in Power BI data models, Power Query is king. We’ve never had a tool so flexible and easy to use. If I’m importing data from multiple sources into a single model, you bet I’m going to use Power Query instead of SQL queries because I’ll know where to find and manage all the query definitions.

DAX Calculated Columns

Why not use DAX calculated columns? There is a good argument for using DAX. It’s quick and easy, and sometimes more convenient. If I add a custom column to a multi-million row table defined in Power Query, I have to re-process the table to see the new column. If I use DAX, I don’t have to wait. If the calculation relies on a DAX calculation residing in the rest of the model, DAX is the clear winner. These cases are less common though. Once again, I’ll make the argument to manage calculations, as much as possible, in one place.

IT Process, Business Culture, Team Dynamics, Rules & Restrictions

Now that you have some clear criteria for always implementing column calculation in either SQL, Power Query or DAX; let me inject some reality back into the “it depends or it matters” equation.

I do my best to put cynicism aside and focus on what it takes to get IT projects over the finish line. I’ve found that good BI project practitioners are positive, optimistic and tough skinned; although there are many forces at work to change this disposition. If you know what I mean, no further explanation is needed. If you don’t, you will. Wherever you choose to work, just do what you can to maintain your perspective throughout your career. There is much insight to be had when you learn how projects work and don’t work in different organizations.

I’ll give one example that represents situations I’ve encountered on several larger, formal BI projects over the years. Defensively, I’ll say that this story is “inspired by true events” in my experience:

The BI project Architect, Database Administrator, Lead Developer and IT Director all agree that any schema dependencies on the data warehouse or data mart should be managed using database views and this becomes a paramount rule in the solution architecture. The SSAS data model and Power BI data model developers should import tables from these views. The team is using a pure Agile mythology and will use JIRA to manage and assign tasks performed in two-week team sprints.

Based on high-level report requirements documented by the Business Systems Analyst, the Lead Architect creates views in the database build script. The ETL developer must stage the source data and then the data warehouse ETL developer populates the dimension table before the view can be created, which takes 3 sprints or six weeks. The Power BI data model developer adds the table to the model in the 4^th sprint. After a prototype report is created, the BSA gets feedback from a stakeholder user who tells us that customer names should be in a single column rather than separate first name and last name columns. A task is added in JIRA to modify the view with another task to refresh the data model, so it takes two weeks to add the column.

After the data warehouse is in production, the Power BI report developer gets word that the customer city, state and zip code need to be concatenated into a single column and marked as a geographical location, so they can be used in a map visual. The data warehouse is in production and managed by an offshore DBA group. A support ticket is created to request that the view used to populate the customer table be altered and a CustomerLocation column be added. Three days later, the contracted help desk determines this is not in their area of responsibility and closes the ticket as “completed” while the model developer continues to wait for a call or email. The email goes to the IT Director, who happily dismisses it since it was marked as “completed” while the developer sits and waits. Two weeks later, the issue resurfaces and the Project Manager organizes a meeting with the IT Director, BI Lead Architect, BI Lead Developer, Database Developer, In-house DBA and Help Desk Contractor Liaison to resolve the issue. In the meantime, users have exported their report to an offline Excel file and are working around the issue using a copy of the data.

A month after the request, the Power BI Developer spends 2 minutes creating a DAX calculated column and then creates the map report.

Power BI Direct Lake and DirectQuery in the Age of Fabric

I just returned from the Microsoft Fabric Community Conference in Las Vegas. Over 4,000 attendees saw a lot of demos showing how to effortlessly build a modern data platform with petabytes of data in One Lake, and then ask CoPilot to generate beautiful Power BI reports from semantic models that magically appear from data in a Fabric Lakehouse. Is Direct Lake the silver bullet solution that will finally deliver incredibly fast analytic reporting over huge volumes of data in any form, in real time? Will Direct Lake models replace Import model and solve the dreaded DirectQuery mode performance problems of the past? The answer is No, but Direct Lake can break some barriers. This post is a continuation of my previous post titled “Moving from Power BI to Microsoft fabric”.

Direct Lake is a new semantic model storage mode introduced in Microsoft Fabric, available to enterprise customers using Power BI Premium and Fabric capacities. It is an extension of the Analysis Services Vertipaq in-memory analytic engine that reads data directly from the Delta-parquet structured storage files in a Fabric lakehouse or warehouse.

Moving from Power BI to Microsoft Fabric

Fabric is here but what does that mean if you are using Power BI? What do you need to know and what, if anything will you need to change if you are a Power BI report designer, developer or BI solution architect? What parts of Fabric should you use now and how do you plan for the near-term future? As I write this in March of 2024, I’m at the Microsoft MVP Summit at the Microsoft campus in Redmond, Washington this week learning about what the product teams will be working on over the next year or so. Fabric is center stage in every conversation and session. To say that Fabric has moved my cheese would be a gross understatement. I’ve been working with data and reporting solutions for about 30 years and have seen many products come and go. Everything I knew about working with databases, data warehouses, transforming and reporting on data has changed recently BUT it doesn’t mean that everyone using Power BI must stop what they are doing and adapt to these changes. The core product is unchanged. Power BI still works as it always has.

The introduction of Microsoft Fabric in various preview releases over the past two years have immersed me into the world of Spark, Python, parquet-Delta storage, lakehouses and medallion data warehouse architectures. These technologies, significantly different from the SQL Server suite of products I’ve known and loved for the past twenty years, represent a major shift in direction, forming the backbone of OneLake; Microsoft’s universal integrated data platform that hosts all the components comprising Fabric. They built all of Fabric on top of the existing Power BI service, so all of the data workloads live inside familiar workspaces, accessible through the Power BI web-based portal (now called the Fabric portal).

CI/CD & DevOps for Power BI… Are We There Yet?

In my view, projects and teams of different sizes have different needs. I described DevOps maturity as a pyramid, where most projects don’t require a sophisticated DevOps implementation, and the most complex solutions do. The DevOps maturity is a progression, but only for projects of a certain scale. One of the following options might simply be the best fit for a particular project.
Unless you are throwing together a simple Power BI report that you don’t plan to maintain and add features to, the first and most basic managed project should start with a PBIX file or Power BI Project folder stored in a shared and cloud-backed storage location.
DevOps isn’t a requirement for all projects, but version control and shared file storage definitely is.

6 thoughts on “SQL, M or DAX: When Does it Matter?”

Pingback: SQL, M or DAX: When Does it Matter? – Business Intelligence Article – Business Blog

Pingback: SQL, M or DAX? | Paul Turley's SQL Server BI Blog

Pingback: SQL, M or Dax? – part 2 | Paul Turley's SQL Server BI Blog

Pingback: SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) | Paul Turley's SQL Server BI Blog

For demo reports it is safer to use the only the query editor instead of views, I agree with you. Views have some important advantages such as the “with schemabinding” option and the use of the view depencencies function in SSMS, they make managing access also easier. When you use a DAX calculated column, you make it harder for the Vertipaq engine to effectively compress the column. If you don’t need it in a slicer but in a value, you can use a measure. My argument is to create calculations mainly in SQL views (one table-one view), in the query editor and last in DAX measures or c. columns.

Pingback: When To Use SQL, DAX, Or M In Power BI Models – Curated SQL

Pingback: SQL, M or DAX: When Does it Matter? – Business Intelligence Article – Business Blog
Pingback: SQL, M or DAX? | Paul Turley's SQL Server BI Blog
Pingback: SQL, M or Dax? – part 2 | Paul Turley's SQL Server BI Blog
Pingback: SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) | Paul Turley's SQL Server BI Blog
datascientistbudapest.com (@datascientistb1) says:

July 2, 2018 at 11:56 am

For demo reports it is safer to use the only the query editor instead of views, I agree with you. Views have some important advantages such as the “with schemabinding” option and the use of the view depencencies function in SSMS, they make managing access also easier. When you use a DAX calculated column, you make it harder for the Vertipaq engine to effectively compress the column. If you don’t need it in a slicer but in a value, you can use a measure. My argument is to create calculations mainly in SQL views (one table-one view), in the query editor and last in DAX measures or c. columns.

Loading...

Pingback: When To Use SQL, DAX, Or M In Power BI Models – Curated SQL

SQL, M or DAX: When Does it Matter?

It Depends, or It Matters (one or the other)

SQL and Database Objects

Power Query/M

DAX Calculated Columns

IT Process, Business Culture, Team Dynamics, Rules & Restrictions

Like this:

Paul Turley

6 thoughts on “SQL, M or DAX: When Does it Matter?”

Leave a ReplyCancel reply

It Depends, or It Matters (one or the other)

SQL and Database Objects

Power Query/M

DAX Calculated Columns

IT Process, Business Culture, Team Dynamics, Rules & Restrictions

Share this:

Like this:

Paul Turley

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

6 thoughts on “SQL, M or DAX: When Does it Matter?”

Leave a ReplyCancel reply

Discover more from Paul Turley's SQL Server BI Blog