This is an introduction to a series of posts and pages that will provide a comprehensive set of best practices for successful Power BI solutions. In previous posts, I have asked readers to suggest topics for future posts. Based on that and other feedback, I will be addressing questions and suggested topics. The topics list at the end of this post is a brainstorm list and I ask that you help me make sure it is complete. My goal is to provide a set of guidelines and practices that provide the best chance of success as you navigate many decisions about how to stage and transform source data, how and where to perform data shaping and calculations, how to model data and the best way to visualize the results. The biggest question of all may be how to make the right decisions so that the small project you design today will work when you add more data, more users and transition into a formal, managed solution.
There are many fantastic resources to learn about Power BI and rest of the Microsoft BI and reporting platform; but learning about Power BI and the choosing among design options can be like drinking from multiple firehoses at full pressure at the same time. I will be the first to admit that my “best practices” are “my opinions”. In many cases they work consistently for me but points are debatable and open for discussion. I’ll tell you when I have a very strong opinion about something being done a certain way, or when I have found a pattern that works for me and that I offer for your consideration. I’m not always right… just ask my wife :-). Please comment, ask and offer alternative points of view.
Rather than offering another training course or duplicating the contributions that so many others in the industry make through their blogs, courses, books and articles; this will be a condensed set of guidelines about the many choices you must make when designing a solution. In the posts to follow, I will reference other resources and discussions on various topics.
Best practice guidelines topics
The following topic list will serve as a link menu for future posts. Expect this list to be updated and completed:
Multi-developer and lifecycle management for Power BI
Certified reports, certified datasets & the self-service mindset
Just tell me what to do
Any attempt to apply universal best practices to Power BI solution design is a slippery slope. The tools are so flexible and powerful, and the requirements of each project are so varied that it is challenging to establish a set of steps or rules that, if followed, will always yield the absolute best design for a given scenario. With that out of the way, I’ll say this: In my job, I see a lot of poorly-designed Power BI projects. I’ve worked with dozens or scores (maybe even hundreds?) of consulting clients who bring us projects – some partially completed, some finished, and many that are just broken – to be fixed or completed. My reactions range from “that’s just downright wrong” to “hmmm… I wouldn’t have done it that way but I guess it will work for the time being”. I try not to cast stones and do, on occasion, realize that others have found a better way to solve a problem. I don’t have all the answers but I do have a lot of experience with Microsoft Business Intelligent solution design, and have learned many good practices and design patterns from other community leaders and many successful projects over the past twenty or so years.
A little less conversation and a little more action
Let’s start with a simplified flowchart and condensed decision tree. This first whiteboard drawing is the first half of the Power BI design process, ending with the data model, before measures, visualization and solution delivery. There is a lot more but I think this is a good starting point.
Using a Web API is a convenient way to expose and consume data over an Internet connection. Exercising some essential design patterns, understanding and working with the Power Query Formula Firewall is essential if you need to consume Web API data with Power Query and schedule data refresh with the Power BI Service.
Having recently worked-through numerous issues with API data feeds and deployed report configurations, I’ve learned a few important best practices and caveats – at least for some common use cases. In one example, we have a client who expose their software-as-a-service (SaaS) customer data through several web API endpoints. Each SaaS customer has a unique security key which they can use with Power BI, Power Query or Excel and other tools to create reporting solutions. If we need a list of available products, it is a simple matter to create a long URL string consisting of the web address for the endpoint, security key and other parameters; an then just pass this to Power Query as a web data source. However, it’s not quite that easy for non-trivial reporting scenarios.
Thanks to Jamie Mikami from CSG Pro for helping me with the Azure function code for demonstrating this with demo data. Thanks also to Chris Webb who has meticulously covered several facets of API data sources in great detail on his blog, making this process much easier.
Some web APIs have a database query or other logic hard-wired to each individual endpoint. The endpoint I am demonstrating allows a stored procedure name and filter to be passed as parameters, which allows one endpoint to run any query in the database that is allowed by the developer or admin. The following information was setup for this demo:
…and if we paste the same URL into the address dialog for a new web data source in Power Query, a complete table is returned.
Good so far, right? But, here’s here’s the problem. If we were to use a single API call in this manner, the Power BI service may, under certain conditions, allow the data source to be refreshed but you cannot modify the query string parameters in this way for the service to trust the web API connection so it can be refreshed. Now for a more sophisticated and more realistic example.
To minimize the data volume per call and load data incrementally, web API data is often paged or filtered using a date range or category of some kind. In this example, one call returns a list of years for which there are orders. With that, orders can be loaded for each year. An actual production scenario may be more sophisticated but this demonstrates the design pattern.
The first query – the outer query – returns one row per year. Then we create another query that executes a stored procedure requiring a Year parameter, that returns all the Order records for that year.
That query, shown here, is converted into a function by adding the Year parameter.
In the typical design pattern, a custom column is added which invokes the custom function, passing the YEAR column value.
I deploy a copy of this report file to my Power BI tenant and then try to schedule data refresh. Here’s the result:
The error reads: “You can’t schedule refresh for this dataset because the following data sources currently don’t support refresh…”
The Power Query Formula Firewall prevents the queries from running because they don’t meet requirements to be trusted according to the rules of the the formula firewall and the “fast combine” feature. Each query must be granted permission to run. The default permission set for web sources is “Anonymous” which simply means that no credentials are needed. The developer simply needs to agree to let the query run.
Each query being combined must share a common address or connection string, with a compatible set of privacy level settings.
The formula firewall has a problem with us concatenating the endpoint, code and parameters into one humungous web address string. The reasons and details are less important but separating the address and other elements and letting Power Query manage them as separate objects can resolve this issue.
Here’s the refactored M code for the outer query/function. This is the patterned I’m using for all web API queries. Note that the “BaseUri” is a scalar text type variable, and the other web query parameters are elements of a record stored in a variable named “Query”. These two variables are passed as arguments to the Web.Contents method:
(Year as number) => let BaseUri = “https://intellibizwebapi.azurewebsites.net/api/ExecuteSQL”, QueryRecord = [ Query= [ code = “abFbalkeuCiozdne7PeMG0bZWAZGj65uJ3zLsYoB8zLfisrJo6gv2/Fvw==”, name = “uspOnlineSalesByYear”, filter = Number.ToText( Year ) ] ], Source = Json.Document( Web.Contents( BaseUri, QueryRecord ) ),
By letting the Web.Contents method work it’s magic and by conforming to the other requirements I mentioned, the Power BI service formula firewall will trust the source and allow this report to be scheduled for data refresh.
As I mentioned earlier, Chris Webb has covered a number of nuances related to this method on his blog. To understand it deeply, I suggest reading about them here.
Following the Power BI World Tour, Seattle event on Oct 30, please join me for a full-day of deep learning. That’s right… it’s on Oct 31st so put on your Wonder Woman or Captain America costume and get ready to exercise your super powers with Power Query and Power BI! You will learn to master Power Query extensively from Beginner to Advanced. The other session taught at the same time by Brian Grant is “Power BI: Enhance Your Data Model with DAX” but ya gotta pick one. You can learn more about the Power BI World Tour and the Academy by following these events on Twitter and LinkedIn using the links at the bottom of this post, or search these hashtags:
#PowerBIUG | #PowerBI | @pbiusergroup | #PowerBIUGAcademy | #PBIWorldTour
The foundations of a Business Intelligence solution are data transformations, data wrangling, data cleansing and ETL. A well-crafted Power BI project rests on Power Query and the queries that define the data model, calculations and report visuals. This full-day session will teach you how to lay the foundation for a Power BI solution with simple and advanced Power Query techniques.
Learn from Paul Turley, ten-year Microsoft Data Platform MVP and veteran BI Solution Architect. You will learn best practice design patterns, tricks, shortcuts and proven techniques to improve your skills and add immediate value to your projects. Power Query is everywhere – and growing.
The skills and techniques taught in this workshop apply to Power BI Desktop, the “Get Data” feature in Excel 2016+, SQL Server Analysis Services 2017+ (SSAS), Azure Analysis Services (AAS) and Data Flows in the Power BI Common Data Service (CDS). You will learn through exercises and instructor-led hands-on demos. Bring your laptop with the latest version of Power BI Desktop installed. The rest will be provided. We will cover material from basics through advanced. Each exercise is separate so you can absorb only what you need to learn, based on your prior experience, needs and skill level.
Power Query Basics
Quick tour of the Power Query interface & essentials
Creating and managing queries
Adding and editing steps
Recovery and project management
Essential best practices
Managing data sources
Working with folder paths, web URIs & database connections
Referencing & Duplicating queries
Consolidating queries, building base queries & dependency chains
Loading queries into data model tables
Basic error handling & debugging
Data Sources & Structures
Flat CSV files
Irregular text files (headings & totals)
JSON (complex, with nested & ragged hierarchies)
Excel (single sheet/table, multiple sheets/tables)
Folders & file collections
Web pages a page tables
Web APIs & web service endpoints
Essential Query Techniques
Managing data types
Applying correct naming conventions
Working with Date & Time values
Splitting & formatting columns
De-duplicating & grouping
Pivot, Unpivot & Transpose
Custom columns & expression basics
Extracting tables from a data sources to supporting essential modeling for Power BI report design:
Slicer & calculation-driver tables
Advanced Power Query Techniques
Working with M: The Data Mashup language
M function essentials
Prioritized learning (what’s most important)
Using & managing parameters
Using the #shared object for internal documentation, examples & code syntax
Understanding M objects (values, tables, lists & records)
Number, Date, Time & Text manipulation M functions
Create a Date lookup/dimension table using M & Power Query
Create a Time series lookup/dimension table using M & Power Query
Why do I need a Date dimension in Power BI?
Standard date parts & hierarchies
Columns to support time-intelligence calculations
Working with fiscal & special-purpose calendars (e.g. 4-4-5, ISO)
Working with query functions
Parameterized queries, API endpoints & user-defined functions
Putting it Together
Queries to support data model construction
Queries used to support report visuals
Deploy a report, configure the on-premises gateway
Use query parameters to schedule refresh in a deployed Power BI solution
A service provider or vendor might want to publish multiple copies of a report that should connect to different database servers or databases. In a true multitenant service solution, we would have a singe database with row-level user mapping tables that filter data by the logged in user. True multitenant solutions require quite a lot of planning and development work to implement. In smaller-scale or interim solutions, copies of a report can be deployed to different workspaces and then queries can be parameterized to use different database connections.
In this post, I’ll demonstrate deploying and configuring such a solution where the server name and database name have been parameterized and setup to use the on-premises gateway to connect and refresh data. I’ll also setup scheduled refresh. The full video walk-through is below but I’ll do a quick review to set the stage.
This is the Power Query Editor in Power BI Desktop. I have two parameters that are used to specify the ServerName and DatabaseName for each SQL Server query:
Once deployed to a workspace in the service, the gateway must be configured with a data source for every possible server name and database combination. In my example, I can connect to my local server using the NetBIOS name, IP address or LocalHost. These all are acceptable methods but a data source must be added to the gateway configuration for each so the that the connection strings match exactly. Remember that the connection is from the on-prem gateway to the database server so names like LocalHost or an internal IP address will work just fine. In my example, I’m using the IP address of my local loopback adaptor on the local machine to connect to a local instance of SQL Server over the TCP connection.
In the workspace, the dataset is bound to the gateway. Click the ellipsis and choose Settings.
To bind the gateway to the dataset, click to select the radio button next to the gateway. This flips the switch titled “Use a data gateway”. Apply the setting and then you can refresh the data or schedule refresh.
Finally, the parameters can be updated right here in the dataset settings.
This is a post about a post about a post. Thanks to those of you who are entering comments in the original May 12 post titled SQL, M or DAX? This is a popular topic. And thanks to Adam Saxton for mentioning this post in his Guy in A Cube Weekly Roundup.
This is a HUUUUGE topic and I can tell that I’ve struck a chord with many BI practitioners by bringing it up. Please post your comments and share your ideas. I’m particularly interested in hearing your challenging questions and your thoughts about the pros-and-cons of some less-obvious choices about whether to implement transformations & calculations in SQL, M or DAX.
This week, I have had engaging conversations on this topic while working on a Power BI consulting project for a large municipal court system. As a consultant, I’ve had three weeks of experience with their data and business environment. The internal staff have spent decades negotiating the intricacies and layers upon layers of business process so of course, I want to learn from their experience but I also want to cautiously pursue opportunities to think outside the box. That’s why they hired me.
Tell me if this situation resonates with you… Working with a SQL Server database developer who is really good with T-SQL but fairly new to Power BI & tabular modeling, we’re building a data model and reports sourced from a line-of-business application’s SQL Server database. They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports. Every time a user wants a new report, a request is sent to the IT group. A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules. Complex reports might take days or weeks of development time. I needed to update a dimension table in the data model and needed a calculated column to differentiate case types. Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”. The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements. It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting. After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it. After all, someone had already solved this problem, right?
The trade-off by using the existing T-SQL code is that the calculations and business rules are applied at a fixed level of granularity and within a certain business context. The query would need to be rewritten to answer different business questions. If we take the “black box” approach and paste the working and tested SQL script into the Power Query table definition, chances are that we won’t be able to explain the query logic in a few months, after we’ve moved on and forgotten this business problem. If you are trying to create a general-purpose data model to answer yet-to-be-defined questions, then you need to use design patterns that allow developers and users to navigate the model at different levels of grain across different dimension tables, and in different filtering contexts. This isn’t always the right answer but in this case, I am recommending that we do as little data merging, joining and manipulation as possible in the underlying source queries. But, the table mapping between source and data model are not one-to-one. In some cases, two or three source tables are combined using SQL joins, into a flattened and simplified lookup table – containing only the necessary, friendly-named columns and keys, and no unnecessary clutter like CreatedDateTime, ModifiedDateTime and CreatedByUser columns. Use custom columns in M/Power Query to transform the row-level calculated values and DAX measures to perform calculations in aggregate and within filter/slicing/grouping context.
I’d love to hear your thoughts and ideas on this topic.
Yesterday a friend asked for a little help getting started with Power BI. He’s a DBA and system administrator and wanted to cut his teeth on Power BI with a really simple dashboard-style scorecard report. Using a list of database servers with license expiration dates, he thought it would be a simple matter to calculate and show the expiration status for each server using a simple traffic light indicator. The envisioned server list might look something like this:
Makes perfect sense, right? This is a basic use case and a good application for simple KPIs; with the one minor caveat that POWER BI DOESN’T SUPPORT THIS!
This topic has become a bit of a soapbox topic for me because it’s a capability that, in my opinion, is a very obvious gap in the Power BI feature set. After unleashing my rant, I’ll demonstrate a solution in this post.
The most interesting thing about this missing feature is that for many years it has existed in the products that evolved into the current Power BI product . Key Performance Indicators (KPIs) are defined as scriptable objects in SQL Server Analysis Services (SSAS) with tremendous flexibility. KPIs are simple… the STATE element of a KPI (often considered “Bad”, “OK”, or “Good” status) is translated into a visual indicator, usually an icon (commonly “Red”, “Yellow” or “Green”, respectively). There are variations on this theme but it’s a very simple concept and a good solution has existed for many years. In SSAS Tabular, the State logic was dummied-down to a slider control that eliminated some of the flexibility we have in the earlier multidimensional project designer but it still works. The slider UX expects that the state applies when a value is equal to or greater then the threshold for yellow and green, and less-then the threshold value for red. Queries returned from SSAS include metadata that tells Excel, Power BI visuals or a variety of other client tools: “The KPI state is 1 (meaning ‘good’) so display a square green icon for this item”. If you have the luxury of building your data model in Analysis Services using the SQL Server Data Tools (SSDT) designer for tabular models – or in Power Pivot for Excel, you would define a KPI using this dialog:
The actual return value for a KPI designed this way is really just “–1”, “0” or “1” which typically represent “Bad”, “OK” and “Good” states, respectively. As I said, you have other options like switching the red/green position or using 5 states rather than 3. The multidimensional KPI designer even gives you more flexibility by allowing you to write a formula to return the VALUE, STATE and TREND element values for a KPI separately. It would be wonderful to have the same capability in Power BI. It would be marvelous if we could the slider UI like this and then an Advanced button to override the default logic and define more complex rules in DAX! The SSAS architecture already supports this capability so it just needs to be added to the UI.
If you design your data model using SSAS multidimensional or tabular, or using Power Pivot for Excel (which was the first iteration of Power BI) KPIs are just magically rendered in native Power BI visuals like a Table or Matrix. But alas, Power BI Desktop does not have this well-established feature that could easily be ported from Power Pivot or the SSAS Tabular model designer.
</ END RANT>
…back to my friend’s simple scorecard report.
Using out-of the box features, the best we could do was this…
Create a calculated column in the table that returns -1 when the expiration date has passed, 0 if it is today and 1 if the expiration date is in the future. Here’s the DAX script for the column definition:
Expiration Status Val =
IF([EndofLifeDate] < TODAY(), -1
, IF([EndofLifeDate] > TODAY(), 1
Next, add some fields and the new column to a table visual and use the Conditional Formatting setting in the table properties to set rules for the Back Color property of the calculated column, like this:
Here’s the table with the conditionally-formatted column:
Why Not Use the KPI Visuals?
The standard KPI visual in Power BI is designed to visualize only one value rather than one for each row in a table. Like an Excel Pivot Table, if KPIs were defined in a Power Pivot or SSAS cube or model; a Power BI Table will simply visualize them but the Power BI model designer doesn’t yet offer the ability to create KPI objects.
Several community developers have tried to fill the feature gap with custom visuals but every one of them seems to address different and specific use cases, such as time-series trending or comparing multiple measure values. I have yet to use one of the available KPI visuals that just simply allows you to visualize the KPI status for each row in a table, without having to customize or shape the data in unique and complicated ways.
How to Design Status KPIs With Indicators
Here’s the fun part: Using the Expiration Status column values (-1, 0 or 1), we can dynamically switch-out the image information in another calculated column. Power BI has no provision for embedding images into a report in a way that they can be used dynamically. You can add an image, like a logo, to a report page and you can reference image files using a URL but you cannot embed them into a table or use conditional expressions.
Using this trick, you can conditionally associate images with each row of a table. This is a technique I learned from Jason Thomas, whose blog link is below. Using a Base64 encoder, I encoded three state KPI indicator images as text which I then copied and pasted into the following calculated column formula DAX script:
The encoded binary strings correspond to these three images, in this order:
To reuse this, you should be able to simply copy and paste this code from here into a new calculated column. You no longer need the image files because that binary content is now stored in the table column. It really doesn’t matter what labels you use for the status key values as long as they correspond to the keys used in the preceding code. I’m using the conventional -1, 0 and 1 because that’s the way SSAS KPIs work.
On the Modeling ribbon, set the Data Category for the new column to “Image URL”:
That’s it! Just add any of these columns to a Table visual and WHAM, KPI indicators!
*Incidentally, since adopting Jason’s technique, Gerhard Brueckl came up with a method utilizing Power Query to manage and import image files that I will use in the future. Prior to that, I used this site Jason recommended in his post. My thought is that if a separate table only stored three rows (one for each KPI status), the status key value would be used to relate the tables. It would be interesting to see if using a related table reduces the PBIX file size or if VertiPaq can effectively compress the repeating values of image column. May be a good topic for a later post.
In a series of interviews during PASS Summit 2017 in Seattle, the week of October 30 through November 3, I caught up with Chris Webb, who is an international BI community thought-leader and product team advisor for Microsoft. We chatted about his love affair with Power Query, lessons learned from many years of expert Business Intelligence project work and his thoughts about the future of Power Query and other Microsoft tools. Chris had just finished an all day preconference presentation the day before our interview and his experience was fresh in-mind. He has a long history of deep expertise with SQL Server, Analysis Services multidimensional and tabular technologies, MDX and DAX. His focus lately has been teaching, writing and using Power Query in practice.
He was so interesting and insightful that I honestly couldn’t edit anything out of the final recording, so I split it into three parts.
Chris Webb Interview Part 1 of 3
In the first installment, Chris talks about how he got started with Power Query and why he thought it was worth his investment of time and energy. He hadn’t planned to bet his career on Power Query. He said “this tool was cool! I liked playing with it…” He carried on playing with Power Query until he fell in love with the product and its awesome ability to transform data in ways that weren’t possible before with the same easy and elegance. Maybe the long-term success and pervasive integration of Power Query into more products will be bolstered by the enthusiasm of Chris and others in the community who just love to use it.
Chris Webb Interview Part 2 of 3
We began with the question: “Do you ever foresee Power Query being used as an enterprise ETL tool in lieu of something like SSIS?” Chris shares his thoughts about Power Query as a self-service, desktop data transformation tool and its juxtaposition with other with older, more complicated data tools. He says it is “a universal query generator”, capable of performing “query folding” (translating and pushing queries back to the data source engine for each data provider). We discussed how Microsoft is committed to supporting and enhancing Power Query, not only for desktop analysis but to perform tasks partitioning in Analysis Services tabular, with many other possible scenarios. His passion for this tool is so apparent in our discussion.
Chris Webb Interview Part 3 of 3
How can you get started with Power Query and where is the best place to go for expert advice? Chris wrote one of the best books on the topic that, in my opinion, is very relevant today. I gave him the bait and he wouldn’t take it when I asked: “can you recommend any good books on Power Query?” In his usual, humble, fashion he tells me how difficult it is to write a good book about a product that changes so much; and goes on to recommend several online learning resources.
We conclude with Chris’ advice about how to get started and where to go for best practices, to develop expertise and advanced knowledge.
You can follow Chris Webb’s frequent posts on his blog, at: blog.crossjoin.co.uk
Thank you, Chris, for your time and willingness to share your thoughts and expertise.