Part 1: Using SQL Server Table-Valued Functions (UDFs)
In SQL Server Analysis Services projects, as of SQL Server Data Tools 2017, you can define table partitions using Power Query. Of course, we still have the option to use SQL Server database objects like views or user-defined functions. So, which of these two option makes most sense? The same concepts and decision points apply to Power BI data models although the design experience is quite a bit different.
The following steps will bring us to a question: Using the new SSDT partition design method for SSAS 2017, should I define partition filtering logic in SQL or in Power Query/M?
The objective is to define three partitions in the data model for the Sales fact table in the ContosoDW database:
- New transactions added in the current week
- Adjusting entries for the current month
- Historic records prior to the current month
New sales transactions in the source database needs to be refreshed in the data model every hour for reporting. Reprocessing only the records since the beginning of the current week takes seconds to minutes. If we schedule that partition to refresh every hour, users can have up-to-date reports throughout the day. In addition to new transactions, adjusting records are made weekly but only to records in the current month before the end-of-month closing of the books. Records in the current month that are older than the current week might be updated on occasion but changes don’t need to be available until the weekend. Records older than a month rarely change and don’t need to be refreshed but once a month. By scheduling only the first or second partition to process, data can be updated without requiring tens of millions of historical records to be reloaded.
Partitioning with a SQL User-Defined Function
I’ll step through the more conventional method we’ve been using for many years. I’ve written the following T-SQL table-valued User-Defined Function named fnSalesPartitionForPeriod. Three possible input parameter values allow the function to return rows for the past week, for the past month (up to the past week) or for all dates previous to the current month.
Here is the T-SQL script for a table-valued user-defined function created in SQL Server. Passing in one of three parameter values will cause it to return the desired records.
User-defined function used to partition Sales fact table in SSAS tabular model
create function dbo.fnSalesPartitionForPeriod
( @Period varchar(100) )
select * from [ContosoDW].[dbo].[FactSalesCompleteDates]
(@Period = ‘BeforeThisMonth’
[DateKey] < dateadd(month, datediff(month, 0, getdate()), 0)
(@Period = ‘ThisWeek’
[DateKey] >= dateadd(week, datediff(week, 0, getdate()), 0)
(@Period = ‘ThisMonthBeforeThisWeek’
[DateKey] >= dateadd(month, datediff(month, 0, getdate()), 0)
[DateKey] < dateadd(week, datediff(week, 0, getdate()), 0)
To create the three Sales table partitions using this UDF, I start by importing one table. Here’s the Import Table dialog for the new Sales table in the data model. I’ve selected the new UDF and entered the parameter value ‘BeforeThisMonth’ to define the first partition.
This part gets tricky and quite honestly, I rarely get the steps right the first time through. I haven’t quite decided yet if my routine struggle with the SSDT Power Query editor are because I expect it work like it does in Power BI Desktop or if it truly has some quirks that catch me off guard. Regardless, I’m cautious to save copies of my work and if something doesn’t work, I delete the query and repeat the steps.
The query editor was smart enough to create an M function from the UDF query and this function needs to be invoked to generate the new Sales table. Enter the parameter value once again and click the Invoke button.
Change the name of the new query to “Sales” and make sure that the query is set to “Create New Table”, then click the Import button on the toolbar.
After the table is imported, click the Partitions button on the SSDT toolbar. As you can see, the Power Query “M” script for the Sales table calls the function and passes the parameter value I had set. This default partition should be renamed and the other two partitions should be added using different parameter values.
Updating and adding the partitions is fairly simple, using these steps:
- Copy the original partition
- Rename the new partition
- Change the function parameter value
Rename the current partition with a friendly name. Clicking the Copy button twice gives me two copies of the parameter. You can see that I’ve commented the code with the valid UDF parameter values.
Now the table can be refreshed incrementally and only new transaction records for the current week or month can be updated during schedule refresh cycles.
Partitioning with Power Query
No matter what the data source is; whether you use table-valued UDFs, views or in-line SQL, you are still using Power Query to define tables – so why not just use Power Query without creating database objects?
In another post, I’ll repeat the exercise using only Power Query to define the same partitions. Stay tuned.