Web API Data Sources with Power Query and Scheduling Data Refresh in the Power BI Service

Using a Web API is a convenient way to expose and consume data over an Internet connection.  Exercising some essential design patterns, understanding and working with the Power Query Formula Firewall is essential if you need to consume Web API data with Power Query and schedule data refresh with the Power BI Service.

Having recently worked-through numerous issues with API data feeds and deployed report configurations, I’ve learned a few important best practices and caveats – at least for some common use cases.  In one example, we have a client who expose their software-as-a-service (SaaS) customer data through several web API endpoints.  Each SaaS customer has a unique security key which they can use with Power BI, Power Query or Excel and other tools to create reporting solutions.  If we need a list of available products, it is a simple matter to create a long URL string consisting of the web address for the  endpoint, security key and other parameters; an then just pass this to Power Query as a web data source.  However, it’s not quite that easy for non-trivial reporting scenarios. 

Thanks to Jamie Mikami from CSG Pro for helping me with the Azure function code for demonstrating this with demo data.  Thanks also to Chris Webb who has meticulously covered several facets of API data sources in great detail on his blog, making this process much easier.

Some web APIs have a database query or other logic hard-wired to each individual endpoint.  The endpoint I am demonstrating allows a stored procedure name and filter to be passed as parameters, which allows one endpoint to run any query in the database that is allowed by the developer or admin.  The following information was setup for this demo:

  • Here is the endpoint exposed through Azure Function App services:  https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL
  • The security API key is required to authorize execution (*this key has since been retired):
    5FH6AcKwEDTChbOidYEpkveqlncgv/TOvqCAJ6jP2jX0fJUSmCVN0A==
  • The name of the stored procedure is passed in the “name” parameter like this: uspOnlineSalesByYear
  • An optional filter parameter may be passed to the stored procedure to filter the results.

The typical approach is to concatenating all the elements into a single string.  If we were to enter this address into a browser window:

https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL?code=5FH6AcKwEDTChbOidYEpkveqlncgv/TOvqCAJ6jP2jX0fJUSmCVN0A==&name=uspOnlineSalesByYear&filter=2008

…a JSON document is returned beginning with the following data:

[{“SalesKey”:4,”DateKey”:”2008-01-13T00:00:00″,”channelKey”:2,”StoreKey”:306,”ProductKey”:1429,”PromotionKey”:10,”CurrencyKey”:1,”UnitCost”:132.9000,”UnitPrice”:289.0000,”SalesQuantity”:8,”ReturnQuantity”:0,”ReturnAmount”:0.0000,”DiscountQuantity”:1,”DiscountAmount”:57.8000,”TotalCost”:1063.2000,”SalesAmount”:2254.2000,”ETLLoadID”:1,”LoadDate”:”2010-01-01T00:00:00″,”UpdateDate”:”2010-01-01T00:00:00″},{“SalesKey”:5,

…and if we paste the same URL into the address dialog for a new web data source in Power Query, a complete table is returned.

image

Good so far, right?  But, here’s here’s the problem.  If we were to use a single API call in this manner, the Power BI service may, under certain conditions, allow the data source to be refreshed but you cannot modify the query string parameters in this way for the service to trust the web API connection so it can be refreshed.  Now for a more sophisticated and more realistic example.

To minimize the data volume per call and load data incrementally, web API data is often paged or filtered using a date range or category of some kind.  In this example, one call returns a list of years for which there are orders.  With that, orders can be loaded for each year.  An actual production scenario may be more sophisticated but this demonstrates the design pattern.

The first query – the outer query – returns one row per year.  Then we create another query that executes a stored procedure requiring a Year parameter, that returns all the Order records for that year. 

SNAGHTML1e42f4b4

That query, shown here, is converted into a function by adding the Year parameter.

image

In the typical design pattern, a custom column is added which invokes the custom function, passing the YEAR column value.

I deploy a copy of this report file to my Power BI tenant and then try to schedule data refresh.  Here’s the result:

image

The error reads: “You can’t schedule refresh for this dataset because the following data sources currently don’t support refresh…”

The Power Query Formula Firewall prevents the queries from running because they don’t meet requirements to be trusted according to the rules of the the formula firewall and the “fast combine” feature.  Each query must be granted permission to run.  The default permission set for web sources is “Anonymous” which simply means that no credentials are needed.  The developer simply needs to agree to let the query run.

Each query being combined must share a common address or connection string, with a compatible set of privacy level settings.

image

The formula firewall has a problem with us concatenating the endpoint, code and parameters into one humungous web address string.  The reasons and details are less important but separating the address and other elements and letting Power Query manage them as separate objects can resolve this issue.

Here’s the refactored M code for the outer query/function.  This is the patterned I’m using for all web API queries.  Note that the “BaseUri” is a scalar text type variable, and the other web query parameters are elements of a record stored in a variable named “Query”.  These two variables are passed as arguments to the Web.Contents method:

(Year as number) =>
let
     BaseUri     = “https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL”,
     QueryRecord =
         [
             Query=
             [
                 code   = “abFbalkeuCiozdne7PeMG0bZWAZGj65uJ3zLsYoB8zLfisrJo6gv2/Fvw==”,
                 name   = “uspOnlineSalesByYear”,
                 filter = Number.ToText( Year )
             ]
         ],
     Source = Json.Document(
         Web.Contents( BaseUri, QueryRecord )
     ),

By letting the Web.Contents method work it’s magic and by conforming to the other requirements I mentioned, the Power BI service formula firewall will trust the source and allow this report to be scheduled for data refresh.

As I mentioned earlier, Chris Webb has covered a number of nuances related to this method on his blog.  To understand it deeply, I suggest reading about them here.

Mastering Power Query in a Day – Full-day training in Seattle

Following the Power BI World Tour, Seattle event on Oct 30, please join me for a full-day of deep learning.  That’s right… it’s on Oct 31st so put on your Wonder Woman or Captain America costume and get ready to exercise your super powers with Power Query and Power BI!  You will learn to master Power Query extensively from Beginner to Advanced.  The other session taught at the same time by Brian Grant is “Power BI: Enhance Your Data Model with DAX” but ya gotta pick one.  You can learn more about the Power BI World Tour and the Academy by following these events on Twitter and LinkedIn using the links at the bottom of this post, or search these hashtags:
#PowerBIUG | #PowerBI | @pbiusergroup | #PowerBIUGAcademy | #PBIWorldTour

Registration and course description:

Academy – Mastering Power Query in a Day (Seattle, WA)

The foundations of a Business Intelligence solution are data transformations, data wrangling, data cleansing and ETL. A well-crafted Power BI project rests on Power Query and the queries that define the data model, calculations and report visuals. This full-day session will teach you how to lay the foundation for a Power BI solution with simple and advanced Power Query techniques.

Learn from Paul Turley, ten-year Microsoft Data Platform MVP and veteran BI Solution Architect. You will learn best practice design patterns, tricks, shortcuts and proven techniques to improve your skills and add immediate value to your projects. Power Query is everywhere – and growing.

The skills and techniques taught in this workshop apply to Power BI Desktop, the “Get Data” feature in Excel 2016+, SQL Server Analysis Services 2017+ (SSAS), Azure Analysis Services (AAS) and Data Flows in the Power BI Common Data Service (CDS). You will learn through exercises and instructor-led hands-on demos. Bring your laptop with the latest version of Power BI Desktop installed. The rest will be provided. We will cover material from basics through advanced. Each exercise is separate so you can absorb only what you need to learn, based on your prior experience, needs and skill level.

Power Query Basics
Quick tour of the Power Query interface & essentials
Creating and managing queries
Adding and editing steps
Recovery and project management
Essential best practices
Managing data sources
Working with folder paths, web URIs & database connections
Referencing & Duplicating queries
Consolidating queries, building base queries & dependency chains
Loading queries into data model tables
Basic error handling & debugging

Data Sources & Structures
Flat CSV files
Irregular text files (headings & totals)
JSON (simple)
JSON (complex, with nested & ragged hierarchies)
SQL server
Excel (single sheet/table, multiple sheets/tables)
Folders & file collections
Web pages a page tables
Web APIs & web service endpoints

Essential Query Techniques
Managing data types
Applying correct naming conventions
Working with Date & Time values
Splitting & formatting columns
De-duplicating & grouping
Pivot, Unpivot & Transpose
Custom columns & expression basics
Extracting tables from a data sources to supporting essential modeling for Power BI report design:

  • Fact tables
  • Dimension tables
  • Bridge tables
  • Slicer & calculation-driver tables
  • Role-playing dimensions

Advanced Power Query Techniques
Working with M: The Data Mashup language
M function essentials
Prioritized learning (what’s most important)
Using & managing parameters
Using the #shared object for internal documentation, examples & code syntax
Understanding M objects (values, tables, lists & records)
Number, Date, Time & Text manipulation M functions
Create a Date lookup/dimension table using M & Power Query
Create a Time series lookup/dimension table using M & Power Query

  • Why do I need a Date dimension in Power BI?
  • Standard date parts & hierarchies
  • Columns to support time-intelligence calculations
  • Working with fiscal & special-purpose calendars (e.g. 4-4-5, ISO)
  • Working with query functions
  • Parameterized queries, API endpoints & user-defined functions

Putting it Together
Examples:

  • Queries to support data model construction
  • Queries used to support report visuals
  • Deploy a report, configure the on-premises gateway
  • Use query parameters to schedule refresh in a deployed Power BI solution

Follow the events at:

https://www.linkedin.com/groups/13580234

https://twitter.com/pbiusergroup?lang=en

How to Configure the Power BI Gateway to use Dataset Connection Parameters

A service provider or vendor might want to publish multiple copies of a report that should connect to different database servers or databases.  In a true multitenant service solution, we would have a singe database with row-level user mapping tables that filter data by the logged in user.  True multitenant solutions require quite a lot of planning and development work to implement.  In smaller-scale or interim solutions, copies of a report can be deployed to different workspaces and then queries can be parameterized to use different database connections.

In this post, I’ll demonstrate deploying and configuring such a solution where the server name and database name have been parameterized and setup to use the on-premises gateway to connect and refresh data.  I’ll also setup scheduled refresh.  The full video walk-through is below but I’ll do a quick review to set the stage.

This is the Power Query Editor in Power BI Desktop.  I have two parameters that are used to specify the ServerName and DatabaseName for each SQL Server query:

Power BI Gateway with Parameters (Time 0_00_57;24)

Once deployed to a workspace in the service, the gateway must be configured with a data source for every possible server name and database combination.  In my example, I can connect to my local server using the NetBIOS name, IP address or LocalHost.  These all are acceptable methods but a data source must be added to the gateway configuration for each so the that the connection strings match exactly.  Remember that the connection is from the on-prem gateway to the database server so names like LocalHost or an internal IP address will work just fine.  In my example, I’m using the IP address of my local loopback adaptor on the local machine to connect to a local instance of SQL Server over the TCP connection.

Power BI Gateway with Parameters (Time 0_04_31;15)

In the workspace, the dataset is bound to the gateway.  Click the ellipsis and choose Settings.

Power BI Gateway with Parameters (Time 0_02_44;22)

To bind the gateway to the dataset, click to select the radio button next to the gateway.  This flips the switch titled “Use a data gateway”.  Apply the setting and then you can refresh the data or schedule refresh.

Power BI Gateway with Parameters (Time 0_05_47;00)Power BI Gateway with Parameters (Time 0_06_09;27)Finally, the parameters can be updated right here in the dataset settings.

Power BI Gateway with Parameters (Time 0_06_46;21)

 

Video Demonstration

 

Pareto, Burn-down & Accumulating Trend Charts in Power BI

I’m managing an Agile team project using Microsoft Teams – the new project management platform integrated with Office 365.  Teams is a simple and useful project management tool but it’s new and light on features.  Using Power BI, we want to show the hourly task burn-down for each two-week sprint.  In JIRA and some other more mature project management platforms, the burn-down chart is a standard feature in the tool that shows the number of hours or story points remaining, compared to the estimated number for the sprint.  Just as I began working on that, a client asked for some help creating a Pareto chart and it occurred to me that burn-down and Pareto charts are very similar variations of the same type of chart presentation.  These are not so much chart types as they are a set of calculations and techniques for displaying a desired result.

Project Hours Burn-Down Chart

Here’s the Burn-down chart showing days of the sprint on the X-axis and the hours for all resources as columns.  The burn-down line represents the number of estimated hours remaining for the duration of the sprint as a percentage of the total estimated hours for all resources.

imageHere’s another variation of the burn-down chart using stacked columns for each project resource, in my example; developers named Marta, Rob and Vivek.  Again, the burn-down line shows the daily percentage of estimated hours remaining compared to the total hours that have been “burned” for all days up to and including the current day.

image

The columns in the chart are simply the sum of hours reported.  I didn’t even create a measure for this value.  It’s just a summable column in the ‘Project Hours’ table named “Hours”.  Here’s a look at the data, including the measures used in the chart:

image

…and finally, here is the DAX measure code used for the line part of the column/line combination chart:

Project Hours Remaining % =
VAR DayAccumHours =
CALCULATE( SUM( ‘Project Hours'[Hours] ),
FILTER( ALLSELECTED( ‘Day’ ),
‘Day'[Day Number] <= MIN( ‘Day'[Day Number] )
)
)
VAR ProjectHoursRemaining = SUM( ‘Sprint Estimates'[Est Hours] ) – DayAccumHours

RETURN
DIVIDE( ProjectHoursRemaining, SUM( ‘Sprint Estimates'[Est Hours] ) )

Pareto Chart

This style of chart is very similar but the key differences are that the columns are ordered by the contribution value in descending order.   You can see that my sample dataset just uses numbers for the categories (e.g. “8”, “2”, “4”, etc.) but these could just as easily be names of resources, sales people, customers or products.  The columns are in descending order of aggregate value rather a time series of the axis field value.  To order by measure or value (my aggregate field is literally called “Value”), set the Type property for the X-axis to “Categorical”.

SNAGHTMLc49dd9d

Here’s the Pareto chart.  The contribution line shows the percentage of the total contribution that a category item and all of it’s predecessors in ranked descending order have made to that point.  For example, the first four top-ranking categories (8, 2, 4 and 5) account for about 50% of the total.

image

The DAX measure code for the accumulating percentage value used for the line, is as follows:

Accum Value % All =
VAR AccumValue =
CALCULATE( SUM(‘Pareto Series'[Value] ),
FILTER( ALLSELECTED( ‘Pareto Series’ ), ‘Pareto Series'[Value] >= MIN( ‘Pareto Series'[Value] ) )
)
VAR AllSeriesTotal = CALCULATE( SUM( ‘Pareto Series'[Value] ), ALLSELECTED( ‘Pareto Series’ ) )

RETURN
DIVIDE( AccumValue, AllSeriesTotal )

Accumulating Trend Series or Trend Chart

For good measure, I have one more example.  I actually started experimenting with calculation techniques using this chart.  It is essentially the burn-down chart example with the line presenting the opposite of the burn-down, so maybe this is a “burn-up chart” – I don’t know.  This may be useful when you want to visualize a time or other continuous series on the X-axis and the accumulating contribution as a line along with the actual contribution values as columns.

image

…and the measure DAX code used for the trend line:

Accum Series Value % All =
VAR AccumSeriesValue =
CALCULATE( SUM(‘Pareto Series'[Value] ),
FILTER( ALLSELECTED( ‘Pareto Series'[Period] ), ‘Pareto Series'[Period] <= MIN( ‘Pareto Series'[Period] ) )
)
VAR AllSeriesTotal = CALCULATE( SUM(‘Pareto Series'[Value] ), ALLSELECTED(‘Pareto Series’ ) )

RETURN
DIVIDE( AccumSeriesValue, AllSeriesTotal )

I’m hopeful that this will be useful and save you some time and energy.  As always, please leave comments and reach-out if you need some help.

You can download the example Power BI Desktop file here: Simple Pareto & Burn-down Chart Example.zip

Power BI for Grownups

The message from the Microsoft Business Applications Summit this week was very clear: Power BI is growing up.

We have known for a while that Power BI is a great front-end tool for enterprise-scale tabular and multidimensional models.  Power BI Premium will now allow almost limitless data scale using extremely large datasets by leveraging composite models that employ both DirectQuery and VertiPaq cached mode tables.  This feature shatters the previous barriers where data models could only support either DirectQuery or cached mode (imported data).  Additionally, Microsoft publicly unveiled plans to add IT developer centric features to Power BI Desktop for managing large models, version control and application development lifecycle.

These announcements leave many open questions about how Microsoft will continue to support self-service and enterprise customers with one tool and about the long-term future of SSAS and Visual Studio-based SSDT. At this point, none of these tools are on the chopping block, but it is clear that James Phillips and his team have significant momentum and are not slowing down to wait for other products to catch-up.

In this interview with Christian Wade, he describes the benefits of this new capability.  Christian is Group Program Manager on the Power BI and Analysis Services product team with focus on data model development and scalability features.

Composite Models & Aggregations

In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data.  The larger dataset was in a Spark cluster, accessed using DirectQuery.  Aggregated tables were stored in the in-memory model using the new composite model feature.  As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details.  Composite models will allow mashing-up imported database and file-based data with an DirectQuery.

There are limits and complexities with these new features.  You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling.  With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice.  I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.

Power BI Desktop Features for Developers

News Flash: Power BI Desktop will be the primary tool for serious, enterprise data model development.  A bunch of features are in the works and will arrive soon for preview.  The obvious question is what will happen to SSDT for Visual Studio and the answer is that it’s not going away but the product team is putting their energy into enhancing Power BI Desktop for now.  It will likely have some kind of “Beginner/Advanced” switch to simplify things for the novice user and isolate advanced or complicated, developer-centric features.

Here are a few screen captures of the forthcoming features:

Schema Compare & Model Integration

Like the BISM Normalizer tool created by Christian Wade, there will be a fully-supported compare and merge tool baked into Desktop.  The early build is branded with components from MAQ Software.

image

Relationship Diagram Designer, Properties Pane and Application Lifecycle Management Features

A new-and-improved diagram designer is imminent.  It will allow large models to be more effectively managed and displayed.  You will have the ability to save individual views of the diagram with selected, subject-area tables.

image

Like in Visual Studio, properties can be set and managed in a Properties sheet window.  Support for standard SSAS features like display folders is coming and can be managed here.

image

Business Applications Release Notes

With the velocity of features and updates, a comprehensive set of release notes are available here.  This document will be updated frequently with the revised roadmap, feature history and announcements.

APIS and Powershell CmdLets for Power BI Administration

At the MVP Summit in March Kay Unkroth promised API endpoints and Powershell CmdLets for managing all aspects of Power BI.  He even did a dance and sang a song to make the announcement memorable.  This week, the product team delivered what they promised, with the announcement official here.

The following features and capabilities are on the delivery short list coming in the next 2-3 months:

Power BI Service

The new, “modern workspace” eventually replacing current workspaces integrated with Office 365 groups.  The new app workspace will no longer be bound to an Office 365 Group. The feature will be optional at first, but will replace current workspaces soon.

Now, users can be: Member, Admin, Contributor

Can add groups to workspace

Premium capacity tenants can have nodes in different regions

Admins will have visibility and access to all objects and resources

Microsoft Business Applications Summit

Keynotes and Featured Sessions from the Microsoft Business Applications Summit are here.

SQL or M? – SSAS Partitions in Power Query/M

This is a continuation of this post

In the data platform industry, we have been working with SQL for decades.  It’s a powerful language and over many years, we’ve learned to work with it’s strengths and to understand and work around it’s idiosyncrasies.  M is a considerably more modern and flexible query language.  Some best practices have evolved but many are still learning the basic patterns of effective query design.  Reaching that stage with a technology often takes years of trial-and-error design and a community willing to share their learnings.  I will continue to share mine and appreciate so many in the community who share theirs.

Why Use M Instead of SQL?

For database professionals using SQL Server as the sole source of data for an SSAS or Power BI data model, there is a solid argument to be made in favor of encapsulating the query logic in database objects.  DBAs need to manage access to important databases.  A comment posted in an earlier post on this topic mentioned that SQL Server views can implement schema binding – which doesn’t allow a table or any other dependent object to be altered in such a way that it would break the view.  This is a good design pattern that should be followed if you are the database owner, have the necessary permission and flexibility to manage database objects as part of your BI solution design.  Ultimately, this is an organizational decision.  If the BI solution developer is not the DBA, you may have limited options.  If you don’t have control over the source database objects, if you are not using SQL Server or otherwise prefer to manage everything in the SSAS or Power BI project, Power Query is probably the right place to manage all the query logic.

In my earlier post, I used a table-valued user-defined function to manage the partition filtering logic in SQL Server.  Rather than using SQL and database objects, we’ll use Power Query alone.  The working M script is shown below.

For brevity, I’m starting by showing the solution but I will show you the steps we went through to get there a bit later.

Updating the Partition Definitions

The three partitions defined in the earlier example are replaced using the following M script, which returns exactly the same columns and rows as before.

image

Here is the M script for each partition:

“This week” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] >=
Date.StartOfWeek( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

“This month before this week” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] >=
Date.StartOfMonth( DateTime.FixedLocalNow() )
and
[DateKey] <
Date.StartOfWeek( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

“Before this month” partition:

let
Source = #”SQL/localhost;ContosoDW”,
SalesData = Source{[Schema=”dbo”,Item=”FactSalesCompleteDates”]}[Data],
#”Filtered Rows” =
Table.SelectRows( SalesData, each
[DateKey] <
Date.StartOfMonth( DateTime.FixedLocalNow() )
)

in
#”Filtered Rows”

The Power Query Litmus Test: Query Folding

When connected to an enterprise data source like SQL Server, Power Query should be able to pass an important test.  Use the Design… button to view the Power Query Editor.  Select the last query step and then right-click to show the menu.

image

If the View Native Query menu option is enabled, you are good.  This means the the query is being folded – and that’s a good thing.  Query folding converts the query steps into a native query for the database engine to execute.  This is the resulting T-SQL query script generated by Power Query:

select [_].[SalesKey],
[_].[DateKey],
[_].[channelKey],
[_].[StoreKey],
[_].[ProductKey],
[_].[PromotionKey],
[_].[CurrencyKey],
[_].[UnitCost],
[_].[UnitPrice],
[_].[SalesQuantity],
[_].[ReturnQuantity],
[_].[ReturnAmount],
[_].[DiscountQuantity],
[_].[DiscountAmount],
[_].[TotalCost],
[_].[SalesAmount],
[_].[ETLLoadID],
[_].[LoadDate],
[_].[UpdateDate]

from [dbo].[FactSalesCompleteDates] as [_]

where [_].[DateKey] >= convert(datetime2, ‘2018-07-01 00:00:00’) and [_].[DateKey] < convert(datetime2, ‘2018-07-15 00:00:00’)

You don’t need to do anything with this information.  It’s just good to know.  End of story.

And Now… The Rest of The Story

Power Query is an awesome tool that does some amazingly smart things with the simple data transformation steps you create in the designer.  However, it is important to make sure Power Query produces efficient queries.  During the development of this solution, I created an early prototype that didn’t produce a query that would fold into T-SQL.  Thanks to Brian Grant, who is an absolute genius with Power Query and M, for figuring this out (BTW, you can visit Brian’s YouTube tutorial collection here).

In my original design which I prototyped in Power BI Desktop, I thought it would make sense to create custom columns for each of the date parts needed to filter the partitions.  Here’s the prototype query for the query I originally named “This Month Thru Last Week”:

let
Source = FactSales,

#”Add DateTimeNow” = Table.AddColumn(Source, “DateTimeNow”, each DateTime.LocalNow()),
#”Change Type DateTime” = Table.TransformColumnTypes(#”Add DateTimeNow”,{{“DateTimeNow”, type datetime}}),
#”Add StartOfThisWeek” = Table.AddColumn(#”Change Type DateTime”, “StartOfThisWeek”, each Date.StartOfWeek([DateTimeNow]), type date),
#”Add StartOfThisMonth” = Table.AddColumn(#”Add StartOfThisWeek”, “StartOfThisMonth”, each Date.StartOfMonth([DateTimeNow]), type date),
#”Add StartOfPreviousMonth” = Table.AddColumn(#”Add StartOfThisMonth”, “StartOfPreviousMonth”, each Date.StartOfMonth(Date.AddMonths([DateTimeNow], -1)), type date),
#”Partition Filter” = Table.SelectRows(#”Add StartOfPreviousMonth”, each ([DateKey] >= [StartOfThisMonth] and [DateKey] < [StartOfThisWeek]) ),
#”Removed Columns” = Table.RemoveColumns(#”Partition Filter”,{“ETLLoadID”, “LoadDate”, “UpdateDate”, “DateTimeNow”, “StartOfThisWeek”, “StartOfThisMonth”, “StartOfPreviousMonth”})

in
#”Removed Columns”

As you can see, I created separate columns using Transform menu options based on the current date and time, stored in a custom column named “DateTimeNow”:

  • StartOfThisWeek
  • StartOfThisMonth
  • StartOfPreviousMonth

The rest was simple, I just added filters using these columns and then removed the custom columns from the query in the last step.  All good with one small exception… it didn’t work.  We learned that Power Query can’t use custom column values to build a foldable filter expression.  The filters just won’t translate into a T-SQL WHERE clause.

Checking the last query step with a right-click shows that the “View Native Query” menu option is grayed-out so No Folding For You!

image

Simple lesson: When query folding doesn’t work, do something else.  In this case, we just had to put the date comparison logic in the filter steps and not in custom columns.

SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs)

[ Related posts in this series: SQL, M or DAX: When Does it Matter? SQL, M or Dax? – part 2 SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) SQL or M? – SSAS Partitions in Power Query/M ]

In SQL Server Analysis Services tabular projects, as of SQL Server Data Tools 2017, you can define table partitions using Power Query.  Of course, we still have the option to use SQL Server database objects like views or user-defined functions.  So, which of these two option makes most sense?  The same concepts and decision points apply to Power BI data models although the design experience is quite a bit different.

The following steps will bring us to a question: Using the new SSDT partition design method for SSAS 2017, should I define partition filtering logic in SQL or in Power Query/M?

The objective is to define three partitions in the data model for the Sales fact table in the ContosoDW database:

  • New transactions added in the current week
  • Adjusting entries for the current month
  • Historic records prior to the current month

New sales transactions in the source database needs to be refreshed in the data model every hour for reporting.  Reprocessing only the records since the beginning of the current week takes seconds to minutes.  If we schedule that partition to refresh every hour, users can have up-to-date reports throughout the day.  In addition to new transactions, adjusting records are made weekly but only to records in the current month before the end-of-month closing of the books.  Records in the current month that are older than the current week might be updated on occasion but changes don’t need to be available until the weekend.  Records older than a month rarely change and don’t need to be refreshed but once a month.  By scheduling only the first or second partition to process, data can be updated without requiring tens of millions of historical records to be reloaded.

Partitioning with a SQL User-Defined Function

I’ll step through the more conventional method we’ve been using for many years.  I’ve written the following T-SQL table-valued User-Defined Function named fnSalesPartitionForPeriod.  Three possible input parameter values allow the function to return rows for the past week, for the past month (up to the past week) or for all dates previous to the current month.

Here is the T-SQL script for a table-valued user-defined function created in SQL Server.  Passing in one of three parameter values will cause it to return the desired records.

/******************************
User-defined function used to partition Sales fact table in SSAS tabular model
@Period values:
PriorToThisMonth
ThisMonthPriorToThisWeek
ThisWeek

*******************************/

create function dbo.fnSalesPartitionForPeriod
( @Period varchar(100) )

returns table

return
select * from [ContosoDW].[dbo].[FactSalesCompleteDates]
where
(@Period = ‘BeforeThisMonth’
and
[DateKey] < dateadd(month, datediff(month, 0, getdate()), 0) ) or (@Period = ‘ThisWeek’ and [DateKey] >= dateadd(week, datediff(week, 0, getdate()), 0)
)
or
(@Period = ‘ThisMonthBeforeThisWeek’
and
[DateKey] >= dateadd(month, datediff(month, 0, getdate()), 0)
and
[DateKey] < dateadd(week, datediff(week, 0, getdate()), 0)
)

;

go

To create the three Sales table partitions using this UDF, I start by importing one table.  Here’s the Import Table dialog for the new Sales table in the data model.  I’ve selected the new UDF and entered the parameter value ‘BeforeThisMonth’ to define the first partition.

image

This part gets tricky and quite honestly, I rarely get the steps right the first time through.  I haven’t quite decided yet if my routine struggle with the SSDT Power Query editor is because I expect it work like it does in Power BI Desktop or if it truly has some quirks that catch me off guard.  Some of each, I think.  Regardless, I’m cautious to save copies of my work and if something doesn’t work, I delete the query and repeat the steps.

The query editor was smart enough to create an M function from the UDF query and this function needs to be invoked to generate the new Sales table.  Enter the parameter value once again and click the Invoke button.

image

Change the name of the new query to “Sales” and make sure that the query is set to “Create New Table”, then click the Import button on the toolbar.

image

After the table is imported, click the Partitions button on the SSDT toolbar.  As you can see, the Power Query “M” script for the Sales table calls the function and passes the parameter value I had set.  This default partition should be renamed and the other two partitions should be added using different parameter values.

image

Updating and adding the partitions is fairly simple, using these steps:

  1. Copy the original partition
  2. Rename the new partition
  3. Change the function parameter value

Rename the current partition with a friendly name.   Clicking the Copy button twice gives me two copies of the parameter.  You can see that I’ve commented the code with the valid UDF parameter values.

SNAGHTML3791b15

Now the table can be refreshed incrementally and only new transaction records for the current week or month can be updated during schedule refresh cycles.  Seeing green on this dialog is always a welcome sight:

SSAS Partition Process 2018-07-15_18-31-30

Partitioning with Power Query

No matter what the data source is; whether you use table-valued UDFs, views or in-line SQL, you are still using Power Query to define tables – so why not just use Power Query without creating database objects?

In another post, I’ll repeat the exercise using only Power Query to define the same partitions.

SQL or M? – SSAS Partitions in Power Query/M