SQL, M or DAX?

We live in a world of choices and we have many tools at our disposal.  In Microsoft Business Intelligence solutions using tools like Power BI and SQL Server Analysis Services, you have at least three different ways to perform data collection, transformations and calculations.  A question I get all the time is: “Which database or BI tool should be used to perform routine tasks?  Is it best to shape and transform data at the source, in Power Query using M script, or in the data model using DAX?”

In this series, I’ll demonstrate options for creating utility and dimension tables, columns and calculations using each option and discuss the advantages, disadvantages and recommended practice for each.

SNAGHTML176b0f5

I welcome your questions and ideas on these topics.  Please post comments to this post with your questions and challenges.  Let’s get started with one of the most common examples…

Creating a Date Dimension Table

A Date dimension table is an essential component in most any data warehouse or reporting database so techniques to generate these tables have been around for a long time.  The foundation of a Date dimension table is a table containing one row per contiguous date in a range that includes every possible transaction date or fact record.  To make reporting easier, it is common practice to have multiple date dimensions in the semantic model.  For example, if sales transaction facts have an Order Date and a Delivery Date, and both are used independently for reporting; there may be an Order Date dimension and a Delivery Date dimension in the model.

A common practice for building the dimension table is to just populate a single Date type column with the sequential date values.  After these rows are inserted, date part functions may be used to populate additional columns by referencing the Date value in an expression.  Most every language includes, for example, a MONTH() and YEAR() function to convert a date value into these date parts.

SQL

If you have a data warehouse or a relational database specifically suited to support your Power BI and reporting models, use that to define all of your tables using conventional techniques like T-SQL.  Examples for generating a date reference or dimension table are easy to find online, primarily because this is the oldest and most enduring technique, used for many years in conventional data warehouse design.  T-SQL is a flexible language but the SQL date part functions are pretty bare bones.  In the end, it really comes down to preference and language familiarity.

I think there is a good argument to be made for not only defining a date dimension using familiar SQL script but for persisting the table in the data warehouse along with other standardized dimension tables.  This approach is optimal when you are working with SQL Server or another relational database as your primary data source.

Reporting, BI and dashboard projects don’t always reply on a data warehouse.  Self-service BI solutions usually start with ad-hoc data mashups to support analytic reports rather than a holistic IT-driven solution.  If you aren’t using a relational database as the primary data source, you may be better off using a tool managed within Power BI or SSAS.

Example

There are several different techniques that include using a cursor or a WHILE loop to iterate through each date in a range, one row at a time.  One of the best techniques I’ve found is this example from Aaron Bertrand.  Adding special columns to keep track of holidays or special calendar periods (like Fiscal, 4-5-4, ISO, etc.) can require a lot of complex code.

— Date dimension script by Aaron Bertrand:

https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server


CREATE TABLE #dim
(
[date]       DATE PRIMARY KEY,
[day]        AS DATEPART(DAY,      [date]),
[month]      AS DATEPART(MONTH,    [date]),
FirstOfMonth AS CONVERT(DATE, DATEADD(MONTH, DATEDIFF(MONTH, 0, [date]), 0)),
[MonthName]  AS DATENAME(MONTH,    [date]),
[week]       AS DATEPART(WEEK,     [date]),
[ISOweek]    AS DATEPART(ISO_WEEK, [date]),
[DayOfWeek]  AS DATEPART(WEEKDAY,  [date]),
[quarter]    AS DATEPART(QUARTER,  [date]),
[year]     AS DATEPART(YEAR,     [date]),
FirstOfYear  AS CONVERT(DATE, DATEADD(YEAR,  DATEDIFF(YEAR,  0, [date]), 0)),
Style112     AS CONVERT(CHAR(8),   [date], 112),
Style101     AS CONVERT(CHAR(10),  [date], 101)
);

— use the catalog views to generate as many rows as we need

INSERT #dim([date])

SELECT d
FROM
(
SELECT d = DATEADD(DAY, rn – 1, @StartDate)
FROM
(
SELECT TOP (DATEDIFF(DAY, @StartDate, @CutoffDate))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
— on my system this would support > 5 million days
ORDER BY s1.[object_id]
) AS x
) AS y;

Power Query/M

In my opinion, Power Query is the best choice when other query transformations are also managed in Power Query.  For simplicity, you can keep all of your query and transformation logic in one place.  If you are just getting started with Power BI and aren’t inclined to use a different technique, use this one.

Example

Nearly every step in this process can be performed using menu selections and simple features in the Power Query user interface.  It just takes a little creativity to get started.  I’ve done this using a few different approaching until arriving at this one.  It’s easiest and most flexible.

  • Start by creating two parameters named “Dates From” and “Dates To”.  Assign them values to define the range of dates you need in the date dimension table; like January 1, 2010 and December 31, 2018.
  • Use the Get Data menu to create a Blank Query
  • The first two steps need to be entered manually.  Open the Advanced Editor and pastes these first two lines on a new line after the “let” command:

     DateCount = Duration.Days(Duration.From( #”Dates To” – #”Dates From” )),
Source = List.Dates(#”Dates From”, DateCount, #duration(1,0,0,0))

  • Switch back to the Transform ribbon tab and then click Convert > To Table
  • Change the name of the new date columna dn change the data type to Date
  • At this point, you can simply use the menus on the Add Columns ribbon to generate all of the date part columns you need in the date dimension table

The resulting M query can be viewed in the Advanced Editor:

let
DateCount = Duration.Days(Duration.From( #”Dates To” – #”Dates From” )),
Source = List.Dates(#”Dates From”, DateCount, #duration(1,0,0,0)),
TableFromList = Table.FromList(Source, Splitter.SplitByNothing()),
#”Renamed Columns” = Table.RenameColumns(TableFromList,{{“Column1”, “Date”}}),
#”Changed Type” = Table.TransformColumnTypes(#”Renamed Columns”,{{“Date”, type date}}),
#”Inserted Year” = Table.AddColumn(#”Changed Type”, “Year”, each Date.Year([Date]), Int64.Type),
#”Inserted Month” = Table.AddColumn(#”Inserted Year”, “Month”, each Date.Month([Date]), Int64.Type),
#”Inserted Month Name” = Table.AddColumn(#”Inserted Month”, “Month Name”, each Date.MonthName([Date]), type text),
#”Inserted Quarter” = Table.AddColumn(#”Inserted Month Name”, “Quarter”, each Date.QuarterOfYear([Date]), Int64.Type),
#”Inserted Week of Year” = Table.AddColumn(#”Inserted Quarter”, “Week of Year”, each Date.WeekOfYear([Date]), Int64.Type),
#”Inserted Week of Month” = Table.AddColumn(#”Inserted Week of Year”, “Week of Month”, each Date.WeekOfMonth([Date]), Int64.Type),
#”Inserted Day” = Table.AddColumn(#”Inserted Week of Month”, “Day”, each Date.Day([Date]), Int64.Type),
#”Inserted Day of Week” = Table.AddColumn(#”Inserted Day”, “Day of Week”, each Date.DayOfWeek([Date]), Int64.Type),
#”Inserted Day of Year” = Table.AddColumn(#”Inserted Day of Week”, “Day of Year”, each Date.DayOfYear([Date]), Int64.Type),
#”Inserted Day Name” = Table.AddColumn(#”Inserted Day of Year”, “Day Name”, each Date.DayOfWeekName([Date]), type text),
#”Renamed Columns1″ = Table.RenameColumns(#”Inserted Day Name”,{{“Month”, “Month Number”}, {“Quarter”, “Quarter of Year Number”}}),
#”Added Custom” = Table.AddColumn(#”Renamed Columns1″, “Quarter Name”, each “Q” & Number.ToText([Quarter of Year Number])),
#”Renamed Columns2″ = Table.RenameColumns(#”Added Custom”,{{“Day”, “Day of Month”}}),
#”Changed Type1″ = Table.TransformColumnTypes(#”Renamed Columns2″,{{“Quarter Name”, type text}}),
#”Reordered Columns” = Table.ReorderColumns(#”Changed Type1″,{“Date”, “Year”, “Month Number”, “Month Name”, “Quarter of Year Number”, “Quarter Name”, “Week of Year”, “Week of Month”, “Day of Month”, “Day of Week”, “Day of Year”, “Day Name”})

in
#”Reordered Columns”

Beyond ordinary Gregorian calendar date parts, specialized columns like Fiscal periods, holiday flags and components of a 4-5-4 calendar are a little easier to do in M because

the language includes advanced functions to support complex formulas.  I’ll share some of these advanced techniques in a later post.

DAX

Calculated tables were recently added to the tabular model designer in both Power BI Desktop and the tabular model project editor in SQL Server Data Tools (SSDT) for Visual Studio.  This feature uses a handful of new table-based DAX functions, which include CALENDAR and CALENDARAUTO, for easily defining date dimension tables directly in the model.

Getting started is simple:

  • Click the New Table button on the Modeling ribbon
  • Enter the following script into the formula bar:

       My Calendar = CALENDAR(date(2018,1,1), date(2018, 12, 31))

  • Now you can add new calculated columns and apply the appropriate DAX functions to create date part columns using the Custom Column editor.

This post on the Power BI Tips site demonstrates a few variations of DAX-generated calendar tables: https://powerbi.tips/2017/11/creating-a-dax-calendar/

Each column in the table will be a separate expression or calculated column.  With regard to performance or model optimization, there is no additional overhead or good argument not to use DAX to generate a date dimension table.  However, using Power Query & M to transform data and create some tables and then DAX to generate other tables in the model can be more messy than keeping everything in one place.

So, why are there two different ways to create tables in Power BI?

This is an excellent question and it is really just an artifact of the evolution of the product and its constituent technologies.  The modeling tools behind Power BI (DAX, VertiPaq & the SSAS Tabular model) were created first and became the Power Pivot add-in for Excel.  As the DAX language evolved, that development team gave us the ability to generate tables using DAX Script.  Not long after that, a separate product team created Power Query and the M data mashup language.  Power Query was also made available as an add-in for Excel.  Eventually both tools found their way into the Power BI Desktop product.

Final Recommendation

If you have a SQL Server data warehouse, you can use SQL to create date dimension tables.  It’s usually best to unify all of the reporting data in the data warehouse or data mart to create a single version of the truth for reporting.  You can also use tools like SSIS or Azure Data Factory to build and manage these objects before the data is imported into Power BI or the Analysis Services data model.

If using Power BI only, use the Get Data tools to build all the tables, including the date dimension(s).  There is nothing wrong with using the DAX techniques but that is my second choice in the Power BI toolbox, for this particular need.

11 thoughts on “SQL, M or DAX?

  1. Pingback: Power BI updates, DAX, Dashboards and more… (May 17, 2018) | Online Coding

  2. Pingback: SQL, M or Dax? – part 2 | Paul Turley's SQL Server BI Blog

  3. James, funny that you should mention that. I had the same issue in a Power BI model this week. It was a table to contain measures I had created in the model designer using “Enter Data”. I only renamed the table and column but did not enter a value. This generated a new JSON query from an auto-generated binary source. I got an ODBC error when refreshing the model. I discovered that when I entered a value in the column, the table would refresh. I’m interested to see if you have the same experience.

    Like

      • Not yet! I’ve implemented a temporary workaround to get things going but I haven’t yet sorted out the credential errors. I’ll make sure to let you know when I’ve resolved it. I’ll probably have to send an email and demo off to the ssas prev team.
        If you’re curious I’ve also found some other differences/errors between on-prem and AAS.
        1. There are issues with using the ACE OLEDB drive connecting through the on-prem gateway.
        2. Combining queries from different data sources seems to require Fast Combine to be enabled, even though it’s not needed on-prem.
        3. If you’re given access to the database in AAS but are not a server admin, you run into a few different funky credential errors where you are not able to edit datasource credentials (if deploying through SSDT). In turn, this blocks deployments of metadata from SSDT completely. Luckily, however, Chris Wade’s BISM Normalizer allows you to correctly update the credentials as well as do partial deployments to the metadata.

        Like

  4. Pingback: Power BI updates, DAX, Dashboards and more... (May 17, 2018) | Guy in a Cube

  5. Pingback: Choosing Between SQL, M, And DAX – Curated SQL

  6. I think that one case where DAX calculated tables are useful is when the performance would be an issue in M. Consider two or more tables coming from two or more different data sources. The merge/join between these tables could be expensive and slow in M, whereas it is very quick in DAX. However, it is not a good idea to import data in DAX just to create a calculated table, because you pay the cost for keeping them in memory even when they are no longer needed.
    I usually create the tables that I need in M, and I keep the DAX option open for solving performance issues or for rapid prototyping.

    Like

    • Agreed, Marco. Merge operations in particular are slow in Power Query and performance is a big trade off. There are many use cases where it is adventageous to have separate tables managed in the data model using relationships rather than merging them in an M query.

      Like

      • Usually these performance issues can be solved, but it does require advanced usage of M syntax (besides buffering, you can set a table key which makes things go WAY faster).
        The usual reason why I need to resort to doing Calc Tables relies on doing a UNION. Specifically when I’m fighting against a formula firewall (due to privacy levels).

        Like

    • I’m also a big fan of creating date tables using M, however, when migrating an on-premises model to Azure Analysis Services I ran into credential errors when processing tables generated “from scratch” in M (only on the Azure model).

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s