Web API Data Sources with Power Query and Scheduling Data Refresh in the Power BI Service

Using a Web API is a convenient way to expose and consume data over an Internet connection.  Exercising some essential design patterns, understanding and working with the Power Query Formula Firewall is essential if you need to consume Web API data with Power Query and schedule data refresh with the Power BI Service.

Having recently worked-through numerous issues with API data feeds and deployed report configurations, I’ve learned a few important best practices and caveats – at least for some common use cases.  In one example, we have a client who expose their software-as-a-service (SaaS) customer data through several web API endpoints.  Each SaaS customer has a unique security key which they can use with Power BI, Power Query or Excel and other tools to create reporting solutions.  If we need a list of available products, it is a simple matter to create a long URL string consisting of the web address for the  endpoint, security key and other parameters; an then just pass this to Power Query as a web data source.  However, it’s not quite that easy for non-trivial reporting scenarios. 

Thanks to Jamie Mikami from CSG Pro for helping me with the Azure function code for demonstrating this with demo data.  Thanks also to Chris Webb who has meticulously covered several facets of API data sources in great detail on his blog, making this process much easier.

Some web APIs have a database query or other logic hard-wired to each individual endpoint.  The endpoint I am demonstrating allows a stored procedure name and filter to be passed as parameters, which allows one endpoint to run any query in the database that is allowed by the developer or admin.  The following information was setup for this demo:

  • Here is the endpoint exposed through Azure Function App services:  https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL
  • The security API key is required to authorize execution (*this key has since been retired):
    5FH6AcKwEDTChbOidYEpkveqlncgv/TOvqCAJ6jP2jX0fJUSmCVN0A==
  • The name of the stored procedure is passed in the “name” parameter like this: uspOnlineSalesByYear
  • An optional filter parameter may be passed to the stored procedure to filter the results.

The typical approach is to concatenating all the elements into a single string.  If we were to enter this address into a browser window:

https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL?code=5FH6AcKwEDTChbOidYEpkveqlncgv/TOvqCAJ6jP2jX0fJUSmCVN0A==&name=uspOnlineSalesByYear&filter=2008

…a JSON document is returned beginning with the following data:

[{“SalesKey”:4,”DateKey”:”2008-01-13T00:00:00″,”channelKey”:2,”StoreKey”:306,”ProductKey”:1429,”PromotionKey”:10,”CurrencyKey”:1,”UnitCost”:132.9000,”UnitPrice”:289.0000,”SalesQuantity”:8,”ReturnQuantity”:0,”ReturnAmount”:0.0000,”DiscountQuantity”:1,”DiscountAmount”:57.8000,”TotalCost”:1063.2000,”SalesAmount”:2254.2000,”ETLLoadID”:1,”LoadDate”:”2010-01-01T00:00:00″,”UpdateDate”:”2010-01-01T00:00:00″},{“SalesKey”:5,

…and if we paste the same URL into the address dialog for a new web data source in Power Query, a complete table is returned.

image

Good so far, right?  But, here’s here’s the problem.  If we were to use a single API call in this manner, the Power BI service may, under certain conditions, allow the data source to be refreshed but you cannot modify the query string parameters in this way for the service to trust the web API connection so it can be refreshed.  Now for a more sophisticated and more realistic example.

To minimize the data volume per call and load data incrementally, web API data is often paged or filtered using a date range or category of some kind.  In this example, one call returns a list of years for which there are orders.  With that, orders can be loaded for each year.  An actual production scenario may be more sophisticated but this demonstrates the design pattern.

The first query – the outer query – returns one row per year.  Then we create another query that executes a stored procedure requiring a Year parameter, that returns all the Order records for that year. 

SNAGHTML1e42f4b4

That query, shown here, is converted into a function by adding the Year parameter.

image

In the typical design pattern, a custom column is added which invokes the custom function, passing the YEAR column value.

I deploy a copy of this report file to my Power BI tenant and then try to schedule data refresh.  Here’s the result:

image

The error reads: “You can’t schedule refresh for this dataset because the following data sources currently don’t support refresh…”

The Power Query Formula Firewall prevents the queries from running because they don’t meet requirements to be trusted according to the rules of the the formula firewall and the “fast combine” feature.  Each query must be granted permission to run.  The default permission set for web sources is “Anonymous” which simply means that no credentials are needed.  The developer simply needs to agree to let the query run.

Each query being combined must share a common address or connection string, with a compatible set of privacy level settings.

image

The formula firewall has a problem with us concatenating the endpoint, code and parameters into one humungous web address string.  The reasons and details are less important but separating the address and other elements and letting Power Query manage them as separate objects can resolve this issue.

Here’s the refactored M code for the outer query/function.  This is the patterned I’m using for all web API queries.  Note that the “BaseUri” is a scalar text type variable, and the other web query parameters are elements of a record stored in a variable named “Query”.  These two variables are passed as arguments to the Web.Contents method:

(Year as number) =>
let
     BaseUri     = “https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL”,
     QueryRecord =
         [
             Query=
             [
                 code   = “abFbalkeuCiozdne7PeMG0bZWAZGj65uJ3zLsYoB8zLfisrJo6gv2/Fvw==”,
                 name   = “uspOnlineSalesByYear”,
                 filter = Number.ToText( Year )
             ]
         ],
     Source = Json.Document(
         Web.Contents( BaseUri, QueryRecord )
     ),

By letting the Web.Contents method work it’s magic and by conforming to the other requirements I mentioned, the Power BI service formula firewall will trust the source and allow this report to be scheduled for data refresh.

As I mentioned earlier, Chris Webb has covered a number of nuances related to this method on his blog.  To understand it deeply, I suggest reading about them here.