Power BI Live Stream this Saturday

Only the most dedicated data professionals are willing to invest their personal time and spend any part of a weekend improving their skills. Having organized many SQL Saturday events, I know that there is a big difference between those who punch out on the proverbial time clock on Friday and those who love working with data so much that they will invest their personal time on a Saturday. One of the greatest examples of this kind of dedication is the group who participate in the weekly Power BI Live Stream from Guy In A Cube.

When Adam invited me to participate in the weekly Guy In A Cube Power BI livestream on Oct 23, I was elated. I am honored to join. If you are not already familiar with the Saturday morning live stream, you can join by subscribing to the Guy In A Cube YouTube channel. The channel is member-supported. Patrick and Adam do all of this outside of their day jobs at Microsoft and then member donations help cover their costs. I encourage you to support the channel with a donation but it is not a requirement. Even without supporting the channel, you can attend the hour-long livestream. Everyone can chat and submit questions during the first half and then the second half of the session is open for members to submit their questions.

So, why am I going to be “in the cube” rather than joining remotely? Well, Houston is one of the stops in our DataOnTheRoad journey in the Data Bus! I’m interviewing Adam and other data professionals as we tour the country, working on the road, on the way to the Orlando Live! conference in November.

A Developer’s Guide to Creating Bad Power BI Projects – Part 1

It occurred to me that we have put so much effort into promoting best practices and proper design that there is far less information about how to create bad reports and data models. In that light, the purpose of this article is to talk about what to do if you want things to go poorly and make sure your projects fail – if not immediately, then sometime in the future – and if not for you then for whoever inherits the work that you have done.

I welcome your comments and thoughts about the worst practices you have seen in Power BI. Post your ideas in the comments below. What examples have you encountered of sure-fire ways to make projects go badly?

Keep in mind that the rules of “good” and “bad” design might be different for small, informal projects vs larger-scale, formal projects where longevity and scale are more important. Arguably, bad practices for enterprise-scale, multi-user report projects might be just peachy keen for small, informal desktop reports for one user with limited data. This article is written from the perspective of formal, larger-scale projects.

So, you’ve taught yourself how to use Power BI! You’ve Googled some questions and let YouTube show you how to cobble together a little data from different places, wire-up a few tables and drag-and-drop fields to make some attractive visual reports. It wasn’t all that hard to do, right? You’ve demonstrated your new Power BI design chops and now all the managers in the company want more cool charts and reports. Since the first round of reports were so well received, the company bought licenses so everyone in the organization can use Power BI to run their business. Welcome to the big leagues, folks! Just import data from all the corporate systems, desktop Excel files and whatever web feeds you can find. If it’s on the Internet it MUST be right! Munge all this data together and create dashboards to help the executives run the business!

This guide demonstrates common examples of how enterprise Power BI projects are bound to fail using self-taught undisciplined techniques to build larger, enterprise solutions. Most, if not all of the examples in this and the posts that follow, are examples that my colleagues and I encounter when we go in the fix failed Power BI projects.

Bad practice #1: Keep PBIX files on your local computer rather than shared storage

Many Power BI projects evolve through stages of prototypes, so they might create multiple PBIX files. After some trial-and-error, solution developers often save-off multiple files with specially-encoded names including things like the date, their initials and other information each one with different features and techniques. One example might be: FinancialAccountingReport_20210615_BobR_TestDisconnectedSlicer and another one might be FinancialAccountingReport_20210618_BobR_FixedForecastMeasureButNowItRunsSuperSlow. Reminds me of when I was eight. We would write notes with cyphers that only the writer and recipient would understand. There is nothing wrong with making a few backup copies of your work in a convenient place where you will find it. someone has to keep track of it all and recover work when there is a staff change.

Without a deliberate effort to exercise discipline, delete or archive the old files and then store the most current copies in a central storage location that gets backup and synced to the cloud; you are sure to have a mess on your hands. Worse than that, if the project ever needs to be turned over to someone else, they wouldn’t have a clue about how to continue the work. I always ask myself what would happen if I had to hand-over my work to another developer tomorrow? Would they have what they need to continue and would they know where to pick things up? Some transition documentation is good to have in this case, but make sure the master files are in a safe location. Better yet, keep them in a source code repository like Azure DevOps or GitHub, or in OneDrive or SharePoint with automatic file versioning.

Oh, wait… this article is about creating bad projects! Bad practice is to just leave files on your local drive. If you want things to go badly, that’s what you should do. When you leave this mess for someone else, they will have no idea about what you did or how to continue the work.

Bad practice #2: Keep the only copy of a report and dataset published in the online workspace to download when you need it

Today, we’re all about using the cloud to safely store files. This is a good practice. You can store all your PBIX and other important files in a OneDrive folder where they are backed up to the cloud and the files are even versioned so you can revert back to an earlier version if you do something silly and need to recover. However, relying on the option to store your one-and-only copy of a report or data model file in a Power BI workspace is optimistic at best.

There are several conditions that will disable to ability to download a PBIX file. The thing is that enterprise-scale data models need to take advantage of features that are not supported in Power BI Desktop (and probably never will be). This is by design. Desktop is primarily a self-service tool and most of the feature we use in enterprise-scale projects require the model definition file to either be saved to a JSON model definition file (e.g. “Model.BIM”) or to a PBIX file with an internal model version number unusable by Power BI Desktop. I’ve heard from numerous novice developers who were accustomed to publishing their latest PBIX file to the service, only to discover that after enabling some advanced feature; they could no longer download the file.

If you want to shoot yourself in the foot, do not keep a master copy of your model and report files (which you are managing separately, right?) in OneDrive and roll the dice that you will be able to download a copy when you need to edit the model or report in Desktop.

Bad practice #3: Import source data files from your local file system and then publish the report to the service

If your source data files are on your local computer, after you publish the data model to the service as a dataset; you will not be able to refresh the dataset from the service unless you take a few more steps. There are a few options to remedy this situation, such as:

  1. Move the source file folder to a shared OneDrive folder or SharePoint library, and then use that connectors to refresh the dataset from the service.
  2. Move the source file folder to a network share and then setup the enterprise gateway to import and refresh the published dataset from the service.
  3. Install and setup a personal gateway to fresh files from your local computer.

Option #3 is a sure fire way to run into problems down the road. This might work for a small project managed by a single developer. But, if you have source files on your laptop which happens to be asleep or shut off during a scheduled refresh event, that’s not gunna work, and the refresh will fail.

Another common gotcha is that if you import source files that are stored in a OneDrive folder, and use a local syched folder as the source; the connection is to the local file system (like “C:\Users\Paul\OneDrive\…”) and not to the OneDrive service in the cloud. In this case, the data refresh will try to read from the local folder which is inaccessible from the Power BI service.

Bad practice #4: Use hand-written SQL queries to import large database tables

SQL is the tool of choice for most database professionals, so it would stand to reason that you should write a SQL query when connecting to a relational database (like SQL Server or Oracle.) Right? Sure, if you want things to be slow and inefficient. Best practice is to connect your queries in Power BI to read from a database table or view, and not to use SQL statements so that Power Query can employ a method called “query folding”. This means that the Power Query processing steps, where possible, will be translated into SQL or the native query language for the data provider. There are advanced techniques to promote query folding over a SQL statement, but this is the exception to the rule. If you need to write SQL logic to join, filter, group or sort data from multiple tables before it hits Power Query, write a view and store it in the source database.

Simple answer: If you want things to go badly, write SQL queries in your Power Query queries using Power BI Desktop.

Bad practice #5: Use DirectQuery for “real-time” analytics

Power BI includes an awesome capability called “DirectQuery” that enables certain data source engines to handle query operations as user interacts with report visual, rather than using the VertiPaq in-memory analytic engine, to process queries. DirectQuery doesn’t use the VertiPaq storage engine, which is really the heart and soul of Power BI and SSAS Tabular data models. With DQ, records are read live from the source. The downside to DQ is huge. Most interactive report visuals can take several seconds to load, and some DAX functions simply can’t be used due to resource overhead and performance cost.

Database professionals often consider DQ to be a cure-all solution because it enables familiar database technologies. Everything is managed centrally, there is no record duplication in the data model and users see the current state of a record when they interact with reports. That’s all fine and dandy but compared to Import mode, DirectQuery performance can be several orders of magnitude slower.

Import mode is fast because because data sits in memory, compressed by column. DQ is slow because queries must be translated and pushed down to the data store, then results must be sent back. DirectQuery is a powerful feature when used strategically to address a few specific problems. It is a good solution for massive fact tables that would otherwise use a lot of memory. Just filter large tables and don’t do a lot of grouping and aggregation over a large set of records.

In a Composite data model, with most tables using Import mode, and one or a few detail tables using DirectQuery, you can a achieve “best-of-both-worlds” solution by allowing users to drill to details against a filtered set a records.

If you want your visual, analytic reports to be really slow, use DirectQuery for all the tables.

Bad practice #6: Combine source data and load one big table into the Power BI data model

People who approach data problems in Power BI the same way they do in Excel, tend to combine source tables into a large spreadsheet-like table. That approach creates an seemingly easy-to-navigate grid of data with a lot of columns; but you can lose a lot of functionality and create a very inefficient model this way. The VertiPaq, Tabular modeling engine was architected around the premise of “star schema” dimensional models.

It’s not that Power BI won’t work with flattened data, or that you “have to” build a star schema, with separate dimension tables for every attribute! The the natural rules of data behavior just cause Power BI to work better when data is modeled in related fact and dimension tables. There are several DAX functions that expect records to be filtered through a relationship. One of the most common examples is a date dimension table, used to filter a fact table to perform time-series calculations. But that same behavior can exist for Products, Customers, Geography and other filtering or grouping attributes that can be used to filter a related table of numeric business facts.

If you want to struggle with DAX calculations and limit users’ options to interact with your data, pound your data a large, flat table, rather than individual dimension or look-up tables.

How to Name Worksheets in a Paginated Report Exported to Excel

This question comes up every few years in SQL Server Reporting Services. Of course, in Power BI Paginated Reports, we have the same features. A couple of days ago, Karthik posted this question as a comment to my post titled Chapter 7 – Advanced Report Design:

I am working on a SSRS report where the grouping is done to group the records in to multiple tabs/worksheets. When the report is exported to excel, the worksheets has the default name (Sheet1, Sheet2, Sheet3,…). Here I want to override the default worksheet name with (Tab1, Tab2, Tab3, …). The number of tabs/worksheets varies each time depending on the parameter selection by the user. How to address this? any suggestions please.

There are a couple of requirements to set this up. First, you must drive report pagination using a Group and then set a Page Break on the Group. This figure shows that I am grouping my table data region on the Brand field and then setting a page break on that group. You can get to the Page Break settings by using the little arrow next to the group item in the Row Groups pane or you can use the properties window like I’m doing here.

Next, you want to give the page a name under the PageBreak properties. Expand PageBreak and then use the Expression builder in the PageName property to use the grouped field name in an expression. In this case, I am referencing the Brand field.

The result is shown here when I open the file resulting from exporting this report to Excel. You can see that the worksheet tabs are titled with each Brand value and each sheet contains that product information group by the corresponding brand.

New Blog: Data On The Road

That’s right! I’m starting a new blog site today. Just to be clear, it will not replace this one. I will continue to post here at SqlServerBi.blog about everything related to Business Intelligence best practices, Power BI, Paginated Reports, SQL and the Azure Data Platform.

The new blog is called Data On The Road and it is dedicated to connecting with the professional community while on the road as a working technology professional. In May of 2021, my wife and I decided we needed an adventure so we transitioned our living space into a new Tiffin Class A motorhome. We had it outfitted with a private office and tech so we could stay connected while working fulltime from the road. We call it the “Data Bus”.

We’re just getting started and taking short trips to practice and work-out the kinks – but it has been great so far. We hit the road for good in the Fall to go experience the US from the Southwest, the South, the East coast – and who knows where?

We’re still assembling the tech and working out the logistics to visit and interview friends and colleagues along the way. In the database and BI community, we’re a dedicated and social bunch so let’s talk about work and about fun! We might chat a little tech but the purpose of this blog – and this adventure – is to talk about work/life balance. In our jobs, we work hard and play hard – so we want to know what our community friends and leaders do when they need to get away. How do they balance their careers with other activities?

Please follow both of my blogs but head on over to DataOnTheRoad.blog and follow us on this new adventure!

Doing Power BI the Right Way: 10. Designing and Managing Large Datasets

Part of the the series: Doing Power BI the Right Way

Later in this post, you will find a 27 minute framed video including a series of interviews and discussions with a few of my colleagues. They provide perspectives and recent experiences related to this topic. The best approach to design and maintain large data models depends on different factors. However, many years of experience have proven that there are best practices to embrace and common mistakes to avoid.

I was just talking to a consulting client about the best approach to build a data model and he told me something very interesting about the way they were loading data into Power BI. He said “We don’t use facts and dimensions, we load all of our data into one huge table.” He said that their data model performs well and that it meets their reporting needs. It is a difficult point to argue, when something is working at the time although the design might not follow the accepted rules. Life is like that and there are plenty of analogies to make the point that a practice, even a real bad practice, might solve a problem for a period of time and under certain conditions. <analogy>You can drive a car at excessive speed to get to your destination faster. You might not get caught by the police on that day and you might not crash but eventually, if you make it a habit, this practice will catch up to you.</analogy> Data is like that. If you don’t play by the rules, you limit your options. Bending the rules lets you move faster and sometimes with less hassle. But, as the project scope expands – and after adding enough data or other complexities to the solution, it will not endure. The data model won’t perform well, won’t load the correct data or it just won’t be reliable.

This post will explore the realities of best practice design for large data models; some important considerations and trade-off decisions when working with both “big data” and “large data”.

One of the great challenges with Power BI is that it is so easy to build simple data models and reports quickly. But, often when those projects need to grow-up, they aren’t ready for enterprise use. This isn’t a problem with the tool, it is a problem with the way it is often used. If someone says “Hey, Paul, I need you to create a report quickly. Can you import data from these three sources into Power BI and create a quick dashboard report?” I might say “sure, let’s just do this:”

  • Merge records from multiple files and queries into a single table in the model.
  • Use in-line SQL queries in Power Query to join source tables together and apply transformation logic.
  • In reports, summarize numeric columns from a table in the data model, rather than defining DAX measures.
  • Reload the entire model when source data changes.

There is nothing inherently wrong or bad about using these quick-and-easy techniques in a smallish project, but if it needs to grow, scale and be future-proof, you may need to follow a longer path to achieve a more durable result.

The answer to a question we often ask at the beginning of a BI project: “what information do we need include in the data model?”, is often the same. I think Freddie Mercury said it best: “I want it all and I want it now!” More specifically, stakeholders might say that they want:

  1. A lot of rows of detail or historical fact records, (lets say tens of millions or billions of rows.)
  2. Many columns from the source table(s) so they can slice, filter or group values on anything imaginable.
  3. Reports must perform well and return results quickly.

…to these requests, we might say “pick any two but you can’t have all three”. Maybe you can have all three to a certain point but not to an extreme.

Tall and Wide tables

A “tall” table contains many rows or records. A million might be a substantial number of fact table rows in a desktop Power BI data model, and row counts in the billions are not out of reach in an enterprise-scale solution built in either Power BI or Analysis Services. The VertiPaq analytic data processing engine (the data modeling engine behind SSAS, AAS, Power Pivot and Power BI) is very good at handling very tall tables when the columns are conservative, numeric data types; especially when they contain repeating, non-unique values to aid columnar compression. In rare cases, Power BI data models can even read and process PetaBytes of data! The point is that tall tables are very achievable.

Tables with many columns can be managed by the modeling engine but tall (many rows) and wide (many columns) are often not a good combination. The fact is that analytic reporting solutions shouldn’t include the whole kitchen sink. The problem often stems from developers not building data models based on on business requirements but simply exposing all the available data just in case someone might find it valuable. This is just not the way BI works. If you don’t need certain columns to meet the current report requirements, leave them out. Columns can usually be added later when they are needed. Additional tables can be also added until the model exceeds its design goals. Herein lies the need for balance.

The thing is that simple data models can be relatively easy to build and often don’t take up a lot of space. It may be more cost-effective in the long run to create another data model with some duplicate design elements, than to spend all of your time and energy trying to craft an all-encompassing data model.

Large Data and Big Data

By “Large Data”, we generally mean that tables contain a high number of rows. Conversely, “Big Data” refers to a set of technologies that were engineered to manage large volumes of data records using distributed storage and processing.

Big Data

In the first wave of cloud computing, we saw the introduction of “Big Data” which was a new concept for data professionals. Technologies like Hadoop and Apache Spark use distributed computing to process larges sets of data in parallel, using highly scaled-out architecture. The term “Big Data” doesn’t just mean that there is a lot of data but it really means “distributed data processing” which can potentially be used to handle very large data sets. Like any architecture, there are strengths and weaknesses. Big Data implementations like Spark and Databricks can transform and process an enormous amount of information very quickly with portions of the data processed on distributed nodes.

Conversely, the VertiPaq engine at the center of a Power BI and SSAS tabular data model, is not a distributed architecture. Because VertiPaq keeps all of the data in memory on a single computing node, it can perform calculations extremely fast. For this to work, data must reside in contiguous memory on a single computer. So, Power BI is not Big Data but it can be used in conjunction with Big Data sources when you use these respective technologies the way they were designed and architected to work.

Complex Models

Unique problems require unique data model designs. Sometimes a Power BI report and accompanying data model just need to address a few business reporting objectives. As time goes by, additional requirements are introduced and the model grows. Along the way, certain data challenges might require additional data massaging or new tables to bridge tables and address special needs. Over time, a model can get complicated. This often raises the question: should we create one big, complex data model to address many requirements or multiple, less complicated data models? There are trade-off decisions either way. It is often possible to engineer larger, high-value models. However, you have to weight the cost to maintain a complex model with the relative simplicity and cost savings of less-complicated models. Sometimes starting over is cheaper than layering more messiness onto a data model with a shaky foundation.

Partitioning

Once considered an essential activity in the design of any data model, partitioning breaks a table into multiple storage units that can be managed and loaded separately. Partitions containing unchanged records don’t need to be reprocessed, and the smaller partitions containing new records or those that have changed, can be processed quickly and in parallel, so they don’t block other operations.

Video Interviews

This video runs about 27 minutes and includes a series of interviews and discussions with some of my colleagues, with their thoughts and experiences on this topic.

Power BI Data Modeling Sessions

This is going to be a very busy week for presentations Iam presenting with five sessions scheduled on the topic of data modeling!

Data modeling is the heart and foundation of a useful Power BI solution. If you can get the data model right, most everything else falls into place. Correct data modeling is relatively simple on the surface but can be challenging in practice. Evidence of the interest and importance of this topic is that I submit abstracts on a few different topics for these events and this is the one keeps getting selected.

Silicon Valley Microsoft Data Platform Meetup
Tuesday, April 20 at 6:00 PM Pacific Time

Power BI Summit
April 19-23 – multiple sessions and times

The session will be repeated three times for attendees in different time zones.

https://globalpowerbisummit.com/

BI & Analytics community the Norwegian Computer Association
April 21 5:00 PM Central European Time

Supporting materials are available on the Presentations page on my blog here: Presentations | Paul Turley’s SQL Server BI Blog. In addition to to the material and demonstrations I present in these sessions, I wanted to share some additional thoughts which I have recorded in this short video:

Drill-through from Power BI to Paginated Report – Report Recipe #4

Navigation between reports is the hallmark of an interactive reporting solution, enabling the ability to drill-through and see relevant details and contextual filtered information in a target report. Power BI and Paginated Reports both offer navigation capabilities suited for different purposes. Having the ability to navigate between reports can provide a tremendous amount of power and flexibility.

This is simple math: with two different report types, there are four different possible report navigation combinations. They are:

  • Power BI report to Power BI report
  • Power BI report to Paginated report
  • Paginated report to Power BI report
  • Paginated report to paginated report

Continuing this series of blog posts to highlight our recently published eBook about Paginated Report recipes, this post summarizes the technique detailed in recipe #4, “Drill-through from Power BI to Paginated Report On-premises”.

If you have worked with the native SSRS report drill-through using report actions, that capability is not yet baked into the product. It probably will be some day but likely not soon. That’s OK because this technique, using URL navigation, works quite well. There is one other twist which is that the URL parameters are a little different when comparing on-premises Paginated (SSRS) reports to Paginated reports in the Power BI service. The technique in in this recipe uses an on-prem target report. You can apply the same pattern in the cloud with sight modifications and I will post an example in the near future.

This image depicts the source Power BI report with links that navigate to the target paginated report, showing filtered details:

The mechanism to enable navigation is a report URL produced with a DAX measure. The code emits a URL with parameters that preserve the row and filter context of the selected item or record. This URL measure is exposed as a link in a table or matrix visual in the Power BI report, optional with a URL “link” icon:

To see the instructions in detail, follow this link to the recipe: 04-Drill-through from Power BI to Paginated Report On-premises | Paul Turley’s SQL Server BI Blog

Paginated Reports Recipe eBook – first two recipes

First and most importantly, I want to acknowledge and thank these contributing authors who wrote report recipes and working samples included in this eBook:

  • Robert Bruckner
  • Kirill Perian
  • Kathi Kellenberger
  • Stacia Varga
  • Tomaž Kaštrun
  • Tom Martens

As I am working on posting the initial report recipes, I wanted to share some comments about the first two recipes. These recipes are step-by-step instructions that lead through the start-to-finish process using provided sample database, and can be used with SQL Server Reporting Services (SSRS), Paginated Report Server or Paginated Reports published to a Power BI Premium capacity workspace. Reports may be authored with Power BI Report Builder, Report Builder or SQL Server Data Tools/Visual Studio. Additional recipes will follow.

The eBook with all of the recipes will be accessible from the main menu here on the blog. Each recipe is published as a page rather than a post, so you don’t have to search and try to figure out when each one was originally posted. As of tonight, I have published eight of twelve recipes that have been completed thus far. I’ll get others posted in the near future. Please enjoy these and watch for more paginated report recipes here: Paginated Report Recipes eBook | Paul Turley’s SQL Server BI Blog

The first two recipes in the Paginated Report Recipe eBook are about applying alternate row and group shading. Dating back to the early days of printed reports, this style of reports were called “greenbar reports” because they were literally printed on continuous, pin-fed paper that had alternating shaded green and white rows to make it easier for someone reading the report to follow-along.

As the industry evolve from scrolling paper reports to laser-printed pages and then to on-screen reports, the greenbar effect was still useful. Recipe #1 introduces expression-based styling by changing the background color using the modulus (divisional remainder) of the row number in a table data region. If the row number is on odd number, set the background color to a shading color. Otherwise, set it to white or transparent.

01: Alternate Row Table “Green Bar” Report | Paul Turley’s SQL Server BI Blog

In recipe #2, contributing author Kirill Perian takes this technique to the next level by maintaining the alternate shading across column and row groups in a matrix data region. The method is a little more involved than just using the row number. He uses a hidden column to track the column group and then an expression to reference that value to manage the background shading for each row.

02: Alternate Row Shading in Matrix (with Column Groups) | Paul Turley’s SQL Server BI Blog

The end results looks similar to the simple table report using in recipe #1 but the problem is a little more complex in a matrix report because of the column groups. The advantage of this technique is that it will work across row and column groups at multiple levels.

Paginated Reports eBook Finally Released

At last, the second edition of “SQL Server Report Recipes” written by Paul Turley, Robert Bruckner and a host of contributors; is being released, a few recipes at a time. This time around, it will be a free book published through my blog and perhaps other sources. I’ve posted the introductory chapter and six report design recipes. You may also download the sample SQL Server database used in all the recipe reports. I sincerely hope that you and others will use this as a free resource to serve the common interest of the greater community. Please spread the word and send those who might learn and benefit to this page.

The work on this book really started about eighteen years ago…

Late December back in 2003; what a very special time for me! SQL Server Reporting Services was due to be released to the market and I had been using the pre-released version to integrate reports into a custom web application. After using a variety of products like FoxPro, Access and Crystal Reports, and after mastering essential T-SQL queries and .NET programming, using SSRS was a dream. It was simple and elegant, programmable and powerful. I was so excited that I wanted to tell the world, so I recruited authors to write our first book about Reporting Services. Three editions later, I teamed up with Robert Bruckner, one of the lead developers on the SSRS product team at Microsoft, to write a different kind of book. Robert and I had both blogged quite a lot about report design techniques. Robert had done some crazy things in SSRS that no one knew was possible. He found very creative ways to develop games using SSRS such as tic-tac-toe, hangman and Battleship. Honestly, there are easier ways to do game programming, but he proved that almost anything was possible if you were willing to think outside the box and maybe even draw outside the lines a bit.

Several contributing authors have worked tirelessly over the past difficult year to make this eBook a reality. Others, through no fault of their own, sign-up to contribute to the book but “the year that shall not be named” had other plans. At first we paused, and then we stopped and then we regrouped and decided to publish a shorter but comprehensive, modern kind of book.

Please visit the new Welcome and Introductory page with the Table of Contents to the current and future recipes: Paginated Report Recipes: 2020-2021 | Paul Turley’s SQL Server BI Blog

Doing Power BI the Right Way: 4. Power Query design best practices

Part of the the series: Doing Power BI the Right Way (link)

Although my professional focus is building enterprise-scale BI solutions, I’ve created my share of informal Power BI reports that were put together quickly, with the goal to create something “good enough” rather then achieving perfection. This guide is about designing proper and formal solutions but these practices apply to any Power BI project that needs to survive future maintenance.

When you need to load data into a data model, you have a few options and the right choice is going to depend on a few factors. This equation is usually a balance between quickly and conveniently generating a table or using a disciplined approach to get reliable data from a sustainable source of record. The following image shows that data can be transformed in the source (or before data is loaded into the source) or within Power Query using Power BI Desktop.

Convenience or Scale?

Life is full of choices and trade-off decisions. Let’s say that you need to create a lookup table containing sales regions and this information doesn’t exist in the source database. You could easily create a new table in Power Query using the “Enter Data” feature, and just manually enter the sales regions. This would solve the immediate problem with very little effort, but how will the table be maintained if sales regions are added or change in the future? We could keep the list of values in an Excel file stored in SharePoint for a business owner to maintain when information changes. You could also go all the back back to the beginning of the process and load a dimension table in the source database. IT professionals might take the hardline and say that all data must flow through the data warehouse regardless of the effort and cost. Which of these is the right choice? …it depends on the scope reporting project, and the long-term ownership of the solution. Sometimes quick and convenient are OK but particularly tables that need to scale and handle larger data volumes in the future, the following guidelines are critical.

I’ll get to the point and then explore the details afterward. Start with these fundamentals:

  • A proper data model is the heart of a Power BI report solution.
  • The purpose of Power Query is to shape and prepare each of the tables loaded into the data model.
  • The tables, fields and measures in the data model should be uncomplicated and user-friendly; intuitive and easy to navigate.

Recommended Practices

Following are the general best-practices I religiously apply when creating queries. After briefly describing each recommended practice, I’ll break it down and review some in detail.

PracticeExplanation
Use parameters for query paths & connectionsParameters are used to make the solution portable. Any connection information, like a file path or database server name, should be stored in a parameter so it can be changed without modifying query code.
For large SQL tables, reference either tables or viewsPower Query has optimizations built-in to work with different data connectors. Several connectors support query folding, where Power Query translates query steps into the native query language.
Starting with a SQL query rather than selecting a table or view from the list of database objects will ensure that query folding will not work. When possible, start with a table and if you need to use SQL to prepare data before loading it with Power Query, create a view.
Limit large table volume using data range parameters, compatible with Incremental RefreshIncremental Refresh enables the Power BI service to partition large tables and only load data that changes rather then the entire table when the dataset is refreshed. This was once a Premium-only feature that now works with shared capacity licensing with datasets up to 1GB in size. Even if you don’t intend to use the Incremental Refresh feature, using a pair of date range parameters allows you to filter large tables and keep the PBIX file size small. After publishing the file to the service, you can update the parameters and load more records.
Create two date/time type parameters named RangeStart and RangeEnd, and then add a date range filter according to these instructions.
Remove all unneeded columnsResist the urge to leave columns that you are not sure that you need for reporting. In each query, remove all unneeded columns early in the sequence applied steps. The easiest way to do this is to use the Choose Columns button on the Home ribbon and deselect columns. To change the selection later, click the gear icon next to the Remove Other Columns step.
Rename using title case for all table names and column names that will be visible in the data modelAlthough it may seem trivial, it is absolutely necessary to apply friendly naming conventions to all tables and fields. Chris Webb wrote an excellent post and about object naming conventions. As a rule, rename all columns that will not be hidden in the data model, using friendly title names (with spaces and mixed case).
There is no need to rename primary key, foreign key and other utility fields. After tables are added tot he data model, hide those fields to remove clutter and confusion for report developers (even if you are the report developer).
Explicitly set data type for all columnsColumn renaming and changing the data types an be time-consuming work but be meticulous about check every column returned by the query.
Consolidate multiple steps of the same type (such as change type and rename columns)As query design evolves, you will inevitably create inefficient queries with redundant steps. There will always be opportunities to improve the design, often by consolidating and reorganizing steps.
Rename key steps to document purpose and aid future developmentRenaming query steps allows you to understand the function and purpose of each step in the Applied Steps list. This creates a self-documented series of operations that will be easier to maintain down the road.
Add code comments in the M script and/or step descriptions to document queriesYou can further document query steps by changing the step Description in the Properties window or by adding code comments in the Advanced Query Editor.
Use steps that promote query foldingMany query steps allow records to flow-through without blocking subsequent steps. There are also certain transformation steps that must read all the records into memory to apply the transformation.
Move steps that break query folding as late as possibleSteps that support query folder, that can be translated into the native query language of the data provider, should be performed first. If non-foldable steps can’t be avoided, they should be performed as late as possible in the sequence of query steps.
Perform Pivot, Unpivot, Transpose transformations only on small result setsThese transformations must load all records into memory. They are powerful but have volume and performance limitations. Test them with production-scale source data.
Create row-level calculations in Power Query instead of DAXRow-level calculations and derived column values can be performed either using Power Query or DAX, but M s a far more capable language for data transformation. Effective data preparation using Power Query will simplify and ease the burden of data model design. Regardless of the relative efficiency, if you perform all data preparation and transformation in one place, this can simplify maintenance in the future. There are less-common cases where DAX is the best choice to create calculated tables and calculated columns outside the scope of a single row.
Avoid using DirectQuery when not warranted by latency and data volume requirementsDirectQuery has it’s place in data model design but it should be an exception to the normal pattern of importing data into an in-memory data model. DirectQuery tables can work with simple aggregation calculations but don’t perform well using many other DAX functions.
Use DirectQuery to aggregate values over very large tables that won’t otherwise fit into memory, or to support drill-through into non-aggregated detail records. These are advanced design patters that should be treated as rare and exceptional.
Avoid using native SQL queries, especially with large source tablesNative SQL queries don’t support query folding so they should be avoided when possible. This may be a an acceptable exception to load low-volume query results but generally use a database table or view for query data sources.
Use Date/Time values for dates (use Date or Time)Load column data only at the necessary level of grain so values in the data model will compress. In particular, reduce date/time values to date only. For time-level resolution, store date and time values in separate columns.
Import or create Date/Time tables in Power Query rather than using auto-generated date hierarchiesAuto-generated calendar tables in Power BI Desktop are fine for self-service projects but purposely-built date tables will afford more flexibility.
If available, use a date dimension table in your data warehouse or source database. Secondarily, generate a date table using Power Query.
Date tables can effectively be created with DAX functions but if all tables are loaded with Power Query this gives you the convenience of managing all tables centrally.

Breaking It Down

Let’s take a deeper look at some of these recommendations.

Use parameters for query paths & connections

For every file path, web address or server name in source connections; use a parameter. It is not hard to go back and edit source connection information using the Advanced Editor but the easiest way to build parameterized connections is to build them as you go.

Start by enabling the feature “Always allow parametrization in data source and transformation dialogs” on the Power Query Editor page in the Option dialog.

As you build each query connection, for most connection types, you will be promoted to select or create a new parameter.

Here is an example of the parameters in a demonstration project. Without modifying any code or editing a query, any of these values can be changed easily.

Here are two examples of parameterized connections. For the SQL Server connection, the server/instance name is passed as the first argument to the Sql.Database function. The second example concatenates the folder path (stored in the SourceFolderPath parameter) with the file name to create the fill folder and file path.

For large SQL tables, reference either tables or views

The best way to guarantee poor query performance with a relational data source is to start with a hand-written native query (like SQL) and then perform transformations on the results.

If a query is based on a relational table or view, Power Query can generate the native SQL (and a few other supported languages) with a SQL statement rather than selecting a table or view.

Use Query Diagnostics to Diagnose and Performance Tune

The following image shows the results of Power Query Diagnostics which I describe here: Power BI Query Performance & Query Diagnostics | Paul Turley’s SQL Server BI Blog. This post demonstrates how to capture timings for each step and overall queries.

Doing Power BI the Right Way: 4. Power Query in Dataflows or Power BI Desktop

Part of the the series: Doing Power BI the Right Way (link)

Power BI, more than any other Microsoft product in my recollection, offers more options and choices to architect and deliver a solution. Without compromise, Power BI can effectively be used to do anything from create a simple chart using an Excel spreadsheet, to enterprise reporting and analytics on a Fortune 100’s massive data warehouse. At the end of this post, I will share a comprehensive list of resources and insights from Matthew Roche, a Program Manager on the Power BI Customer Advisory Team (CAT). To tease that series, I’ll start with this quote from Matthews’s blog:

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain of change has become outweighed by the cost and pain of maintaining course.

Matthew Roche, Building a data culture – BI Polar (ssbipolar.com)

When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparent when the business reaches a certain stage of data culture maturity.

Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.

Solution Architecture Choices

On a scale of one to ten, with ten being the most formalized, strictly-governed and complex corporate reporting platform; the self-service Power BI option might range from one to four.

For the self-service data analyst, working entirely in Power BI Desktop, data can be imported and transformed using Power Query. Tables are modeled, calculations are defined and data is visualized. This mode is simple and works well for small to moderate-scale solutions with less emphasis on data governance and centralized control.

Even using this simple approach, data models can be developed separately from reports, certified and shared with multiple report developers and self-service report authors. So, to a point, business data can be managed and governed – but the queries in the Power BI solution read directly from source systems or files that are not curated for analytic reporting.

Data Warehouses and Data Marts

The “single version of the truth” or “golden record” repository, a data warehouse (or smaller-scale “data mart”) is the ideal solution to store and manage reliable corporate information. The challenge with creating a central data warehouse to manage centrally-governed organizational data is that it is costly and time-consuming, however the trade-off is that self-service data models can be inaccurate and out of date. When business leaders need answers quickly, it is not always feasible to add more data sources to a data warehouse quickly.

On the complexity scale of one to ten, versions of this option might be from seven to ten.

A conventional DW/BI solution typically uses on-prem data transformation tools like SSIS to stage and transform source data into a central data warehouse built using a relational database product like SQL Server. Although viable for on-prem systems, this old-school architecture model doesn’t embrace scalable and cost-effective cloud technologies.

The first generation of the Microsoft cloud-based modern data warehouse can utilize several different Azure services. The components in following example are easily equated to the conventional data warehouse solution in the previous example. Azure Data Lake services as the staging environment typically using text files and structured file storage as an inexpensive landing area for ingested source data. Azure Data Factory is used to orchestrate and transform files and data streams into and out of the data lake – and the data warehouse. Depending on the need for scale and size, Azure SQL Database or Azure Data Warehouse (now called Azure Synapse) may be used for data warehouse storage.

If your organization has a comprehensive data warehouse to serve-up all or most of the data needed for analytic reporting, this is probably the best fit for a Power BI solution in your business environment.

Constructing an enterprise data warehouse solution is not a trivial endeavor, often involving as much effort to negotiate business process challenges as the technology development to implement the solution.

The newer generation of the Azure modern data warehouse is a best-of-breed collection of tightly-integrated cloud services called Azure Synapse Analytics. Compared to the previous set of independent Azure services, Synapse Analytics provided a unified development and management interface. Apache Spark and other industry standard technologies designed for data science and platform-agnostic analytics provides the open source data prep engine. Azure Synapse is the evolution of Azure Data Warehouse, Microsoft’s read-optimized, scalable massive parallel-processing (MPP) SQL-based database engine.

Power BI Dataflows

Dataflows can fill an important gap between purely self-service data prep and formal data warehouse solutions. If you don’t have a comprehensive data warehouse to meet your analytic reporting requirements but need to provide more data quality control over standardized entities, incorporating dataflows might be the ticket.

In its simplest form, dataflows provides reusable transformation logic (queries) that can be shared by multiple Power BI data models. Using dataflows deployed to a workspace can save data model developers from repeating the same transformation steps in multiple datasets. But these are more than just Power Query scripts stored in the cloud.

A long list of capabilities are enabled by using dataflows. They can provide integrity and standard entity definitions stored in Dataverse (previously known as the Common Data Model) to enforce standard naming, data types and schema compliance among other features.

In Premium Capacity, dataflow results may be persisted in Azure Data Lake Gen2 storage. This essentially allows you to use dataflows to create a moderate-scale data warehouse without a big investment. Entities may be linked to related entities which creates virtual joins and referential constraints. Other Premium features include DirectQuery, Computed entities and Incremental refresh – all managed in the dataflow rather than for each dataset. Integrations with Azure AI, Machine Learning and Cognitive Services allow you to utilize AI features without writing code. For example, in a recent project, we used AutoML on a dataflow containing high school student data to predict graduation outcomes.

flow of data

Dataflows Start with M Queries

Dataflows begin with an M query, just like the queries in Power BI Desktop before adding the additional capabilities mentioned earlier. Queries are authored entirely in the browser but migrating from Power Query in Power BI Desktop is fairly easy. Start with a Power BI solution (PBIX file) in Desktop and open a query in the Advanced Query Editor. You can create a new dataflow in the browser and then copy and paste the existing query M code from Desktop to the dataflow designer. You do have to copy each query one at a time and there just a few compatibility differences but for the most part, it should be a one-to-one transfer.

Building a Data Culture

Matthew Roche from the Power BI Customer Advisory Team has an excellent 17-part blog series about Building a data culture. Dataflows sit at the crossroads between business process, data stewardship and technology. The industry has been throwing technology and software at data governance and quality problems for decades, with marginal success. It is much easier for data practitioners to acknowledge that these are multi-faceted business culture challenges than it is to formulate a plan to be successful. If anyone can effectively carry and delivery this message, it is Matthew. In this video series, he offers prescriptive guidance to enlist an executive sponsor, work with business stakeholders, and to navigate the landmines of a business landscape to a successful data culture transition.

Honestly, I’ve only pursued this series in fragments over the past year and now that I’ve caught the vision, I plan to watch the entire series start-to-finish. It is that good. Think of it as Game Of Thrones with data.

Resources

Matthew also provides a comprehensive list of Power BI Dataflows resources here. Matthew recently presented to our 3Cloud Power BI and Analytics development team about using dataflows to promote a data culture. This presentation was an epiphany for me, that helped to better understand how dataflows fit into the BI solution puzzle – that’s when the gauge metaphor popped into my head. I encourage you to watch and perhaps you will have a similar moment of reckoning.

The Power BI Adoption Framework is a set of presentations from Microsoft that can serve as a checklist of important tasks and areas that should be covered in any Power BI implementation, large and small. These decks are also a great tool for adopting and sharing your organization’s BI and Analytics strategy with business leaders and stakeholders. You can use them a s a picklist to assemble your own presentations.

A First Look at Gen 2 Composite Models with Live Power BI Datasets

About three years ago when the data model development engineers from the Power BI product team told me they were working on the ability for tabular data models to share other published data models, that sounded amazing and almost too good to be true. They said it would be a lot of work to make this a reality. The Utopian dream of distributed datasets was publicly announced on the Wave 2 Roadmap last year and then the feature went to Preview in December with this announcement: Using DirectQuery for datasets and Azure Analysis Services (preview) – Power BI | Microsoft Docs. We’ve seen the new feature demonstrated and explained several times but now I’m exploring this exciting new feature for myself.

For this test, I have two different data models in separate PBIX files. One published dataset contains Internet Sales information from the ContosoDW Azure SQL database and the other contains Store Sales information. These models contain many of the same tables and a few unique tables. Notably, each data model contains a different fact table. Here are the two published data models in the Contoso Sales workspace:

After making sure that I had upgraded to the December 2020 version of Power BI Desktop, I open the Options dialog and enable the preview feature: DirectQuery for Power BI datasets and Analysis Services.

After restarting Power BI Desktop, I use Get Data and choose Power BI datasets. After connecting to the workspace and selecting the ContosoDW Internet Sales published dataset, the local model is now connected using a live connection.

After connecting to the remote model, all the tables are listed in the Fields list. At this point, I’m using a capability that has been available for a long time: one Power BI report file connected to a separate, published dataset. Previously I would not be able to add additional dataset connections or import from other sources into a live connected data model.

The status bar shows a new message with a link to enable changes to the model. On the Model ribbon, a new button is also displayed. Using either of these options will modify the local model metadata to allow multiple live dataset connections.

When I click the link Make changes to this model, a dialog box allows me to confirm that I want to switch the live connection to a DirectQuery connection and prepare the local model to allow additional DirectQuery connections.

What happens if I start with an existing local model?

The first important lesson I learned was this if I were to create a new PBIX report file containing existing tables and then connect to a published dataset, this does not enable the ability to make changes to the local model.

With an imported table in the data model before using Get Data to select and connect to a published dataset, the status bar shows this message:

I found this to be unintuitive: The text “(click to change)” is not a clickable link. At this point there is no way to update the local model with the necessary changes. I had to start with an empty PBIX to add the first live connection. After that, I could use Get Data to import additional tables from others sources, but not the other way around.

Adding More Dataset Connections

Next, I use Get Data to connect to the ContosoDW Store Sales published dataset. After the local model had been converted, there was nothing particularly unique about the experience. I could just connect to an additional dataset and all of the tables will simply show up in the data model diagram view. The tables from each model are displayed with a different colored heading, in this case they are blue and red.

I had several duplicate table names after connecting the second data model. The second set of tables were renamed and postfixed with the number 2. I renamed the tables that I wanted to keep visible. The redundant tables that I don’t need are displayed on the right side of this diagram layout. Tables in a remote model cannot be deleted from the local model. I used the Properties pane to set the unused tables to be hidden.

If I were connecting to a large remote model that had a lot of tables just for the sake of bringing one table into the local model, it would be cumbersome to hide all of the the unneeded tables – so I hope there will be an easier way to manage this in the future.

I opened Tabular Editor from the External Tools ribbon and was able to use it just as if I were designing a normal import model data model, to rename tables and set properties with multiple objects selected.

Adding relationships between tables from each remote model was pretty much the same process as it would have been in a local model. I can use the Manage Relationships dialog and use the drag-and-drop method in the diagram view. The latter method opens the Manage Relationships dialog rather then simply creating the relationship.

Finally, I added a new Combined Sales Qty measure to one of the remote tables in my local model that references an existing measure from each of the remote models. For demonstration, the table visual shown below includes the Order Date field which is related to each of the two fact tables (Online Sales and Store Sales), the Online Sales Qty measure from the Internet Sales remote model, the Store Sales Qty measure from the Store Sales remote model, and the new Combined Sales Qty measure which I added to the local model:

After publishing the new composite data model and report, the Lineage View shows that the Combined Sales dataset has connections and dependencies on the two remote datasets that show up as Analysis Services DirectQuery connections, apparently using the XMLA endpoint in the Power BI service.

Once I worked through the design process a couple of times, the data model behavior is seems to be exactly the same as always, and designing reports is no different at all. I’m just scratching the surface and I’m anxious to dig deeper and compare notes with others in the community. I have no doubt that behaviors, limits and issues will arise with this brand new and very compelling capability but I’m very encouraged so far.

Power BI: The Week in Graphics

If you’re like me right now, you need a distraction. I got started on a new blog and tweet about project management best practices and after yesterday’s events, just had to put it on a shelf. So, a couple quick things…

Power BI Top Features

Melissa Coates, long-time BI community rock star, published this amazing infographic depicting all of the features and services in the Power BI ecosystem. Next time you hear someone incorrectly describe Power BI as “just a visualization tool”, show them this:

You can download a hi-res version and get more background here: Diagrams — Coates Data Strategies

BI-Survey Searchable Maps

I recent heard from Donald Farmer, one of the original architects of the Microsoft Business Intelligence platform. Donald went on the work for QlikTech and now runs an advisory practice called Treehive Strategy Services. Donald has been working with the BI-Survey, one of the leading analyst firms in Europe, to enable an interactive map of the Azure analytics ecosystem. Similar to Garner and Forrester research, BI Survey compare industry platforms and products but do so using survey data. I’ve been participating in the BI-Survey for about 12 years and find it to be a comprehensive and insightful resource to compare BI tools, among other products. They just released these interactive guide maps for Power BI and Azure Services, Partners and Success Stories. These are just a couple of examples:

Each map has a bunch of filtering and search filters that allow you to visualize the heatmap based on pricing, features and a variety of categories. These are really useful tools.

Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (1 of 2)

Part of the the series:  Doing Power BI the Right Way (link)

Data Modeling 101: part 1 (more advanced techniques in part 2)

A data model is the foundation of analytic reporting. It provides structure and order over information that might otherwise be chaotic and untrustworthy. Loading data into a properly designed model ensures that it conforms to some essential rules that provide better performance, reliability and accuracy.

Different Schemas for Different Needs

One of the most important lessons I have learned about data modeling over the past 20+ years is that there isn’t one model that fits all business needs. However, there are simple patterns we can follow to model data according to different business needs. Each pattern or schema has a different set of rules. At a very high level, we can think of these schemas in three different categories.

When reporting requirements are simple and unsophisticated, sometimes the easiest way to model a set of data is to transform it into a single, flat table. There are several trade-offs and limitations when using this approach, though. This is typically the goal when bringing information into a spreadsheet so you can simply add-up a column of values, perhaps filtered by other columns. It might seem natural to use the same approach when making the transition from Excel to Power BI. Here is an example of retail order data organized into a flat table:

Look over this sheet before moving to the next figure. If this data were stored in an Excel sheet, how could we use it for reporting? You could filter by the Order Date column and add up the Quantity, Unit Cost and Unit Price. You could also apply filters to the Company Name, Category, Subcategory or Product Name columns. For simple reporting, this could meet our needs.

The master/detail pattern is typically used with transactional data. Master records, like orders or invoice headers, are related to detail records such as the order details or invoice line items. The master/detail pattern can be repeated with multiple tables to create a normalized database. Use this pattern sparingly to support drill-to-detail reports. Including unnecessary detail in an analytic data model uses a lot of memory and storage, and can impact report performance.

When relational database professionals are introduced to Power BI, they will often see DirectQuery as a simple means to leverage their familiar and existing relational database or data warehouse, only to be disappointed by poor performance and other limitations. However, master/detail schemas are where DirectQuery can provide a best of both worlds solution so long as the analytic needs over detail data is light. If we need to provide the ability to drill-down to detail records from an interactive report without needing to perform calculations over the entire detail record set and avoid sucking billions of records into memory, DirectQuery might be the ticket.

Dimensional Modeling 101

The rules of dimensional modeling are a lot like the laws of physics in that the rules of dimensional modeling were really discovered rather than invented. By applying dimensional design patterns, data will naturally behave and reports will perform well. Dimensional rules really help describe the way data behaves when stored in the most optimal form for reporting and analysis.

The Kimball Group are credited as the pioneers of dimensional modeling and although the books and papers that Ralph Kimball and his colleagues authored are 20-30 years old, they are as relevant today as ever. I’ll cover some of the basics but you can find a comprehensive reference guide to dimensional modeling patterns and techniques on the Kimball Group site here: Dimensional Modeling Techniques – Kimball Group. There have a been a few recent publications employing modern examples in books such as Star Schema – The Complete Reference and Agile Data Warehouse Design. Both are excellent books but unless you are undertaking a huge data warehouse project with complicated data problems, understanding the essential concepts is sufficient.

By organizing information into fact and dimension tables, we store as little data as possible while maintaining consistency and data integrity. In data design, we often refer to “entities” and “attributes”. An entity is something we keep track of (like a Customer or Product). Entities have attributes which describe specific properties. In the analytic data world, entities are usually materialized as dimension tables and each attribute is a column or field.

A fact table contains the numeric columns used to summarize and aggregate measure values, along with the key columns that are related to the dimension tables. Dimensions contain the attributes used to group and filter the business facts. Fact records share the same level of grain across all dimensions. For example, if domestic sales orders and international sales orders each have the same level of detail for dimensions like Customers, Products and Order Date, these records could be stored in the same fact table. But, if Sales Targets are applied at the month level rather than the date, they must be stored in a separate fact table.

The essence of a dimensional model is the star schema, simplified here to show one fact table related to a small number of dimensions.

A real application of the star schema design is shown in the following image. The Online Sales fact table contains key columns that are used to relate this table to each of the dimensions. The fact table also contains numeric type columns used to define measures which aggregate and total numeric values such as Net Price, Quantity, Unit Cost, Unit Discount and Unit Price. You’ll notice that the relationship from each of the dimension tables to the fact table is one-to-many and filters records in one direction, depicted by the arrow on the relationship line. For example, the relationship from the Customer table to Online Sales is based on the CustomerKey column in both tables. The relationship filter direction indicates that if the Customer table were filtered on a report, Online Sales would be filtered accordingly. However, if the Online Sales table were filtered in a report, this would not cause the Customer table to be filtered.

According to the Kimball Method, there are four stages to define fact and dimension tables:

  1. Select the business process
  2. Declare the grain
  3. Identify the dimensions
  4. Identify the facts

Seems pretty simple, right? It can be at a macro level until you need to get into the weeds. The trick is to maintain a high-level business perspective as you work through the the details to solve data problems. I find this to be a useful whiteboard exercise when starting a new project. Using only business tools and business language, diagram data entities at a high level using this approach. Next, break the process down using these steps:

  1. Load detailed atomic data into dimensional structures
  2. Structure dimensional models around business processes
  3. Ensure that every fact table has an associated Date dimension table
  4. Ensure that all facts in a single fact table are at the same grain or level of detail
  5. Resolve many-to-many relationships in fact tables
  6. Resolve many-to-one relationships in dimension tables
  7. Store report labels and filter domain values in dimension tables
  8. Make certain that dimension tables use a surrogate key
  9. Create conformed dimensions to integrate data across the enterprise
  10. Continuously balance requirements and realities to deliver a DW/BI solution that’s accepted by
  11. business users and that supports their decision-making

Something to keep in mind is that these rules and steps apply equally to formal, IT-based model design and to self-service BI projects. However, the tools have been modernized over the past decade or so; along with the degree of strict adherence to the rules. Simply put, in the new age of self-service BI, everything is negotiable. But something else to keep in mind is that bending or breaking the rules nearly always comes at a cost. For example, step 8 prescribes creating surrogate keys to identify dimension records. You can use the original business keys throughout the model but run the risk of encountering key collisions or other data quality issues.

In some cases, the lines that separate facts and dimensions can become blurred, which may need to be considered in your model design. For example, I recently developed a BI solution where we counted employee hiring events and a row in the Hires fact table was simply a record of the event when an employee was hired along with information like the hiring date and position. There were no numeric columns to sum-up so we simply needed to count these records. This is often referred to as a “factless fact”. Another exception to the typical star pattern are cases where a fact table contains a value we could use for grouping or filtering, and it might be pointless to extract it into a separate dimension table. This is known as a “degenerate dimension”. These are examples of cases where you could arguably apply the dimensional design rules strictly out of principle or err on the side of simplicity. The right answer for each case should be driven by the business reporting requirements while considering the the reliability and matainability of the overall solution.

Let’s apply the process to the flat table of sales orders. Here it is again with some of the dimensions and facts identified:

In this example, there are three separate date columns that record the date an order was placed, the date it was due and the date that the order was delivered. Customer information includes the Company Name and a unique CustomerKey which could be used to lookup detailed customer information. There are three numeric columns that can be summarized or aggregated. These columns could also be used to perform more complex calculations like ratios, running totals and time-series comparisons.

Dimensional Concepts

Now, let’s review some common practical dimensional modeling applications.

Role Playing Dimensions

Dimensional entities can play more than one role in reporting. In the flat table, the three date columns have completely different purposes but all store the same type of value; a date. But dates can be used to group and aggregate data in date part groups like month, quarter, year or fiscal periods. They can be used to perform time-series calculations like month-to-date or same period in the previous year. To do this, you absolutely must have a date dimension* table containing a contiguous set of date records over a period of time. For our purposes, we might need all dates for the past five years through the end of the current year. Date dimensions are the most common example of role-playing dimension but let’s look at another scenario before looking at at more date examples.

*Power BI Desktop includes features that will automatically generate a date dimension table, and there are DAX functions you can use to generate date tables in code. Although useful in certain cases, I prefer to manage date tables the old fashion way so I have one constant method for importing and managing all tables. By comparison to other data challenges, building date dimension tables is a relatively simple task.

The following image shows a fact table named Flights, with two role-playing Airport dimensions. An airport is, well it’s just an airport… it has a three-letter code and some information about the geographical location. But with respect to any given flight, an airport can serve two functions: to depart from or to arrive to. In this example, it is necessary to materialize the Airport table twice: the Departure Airport and the Arrival Airport.

With two role-playing Airport dimensions, a report user could ask for all the flights from Japan to Australia on a given date.

Back to the retail sales example, let’s say that our Sales fact table has two date keys; OrderDate and ShipDate. A single Date dimension can only have one active relationship. The following image shows the result of the autodetected relationships between Sales and Date, where one of the relationships is inactive and effectively useless for most practical purposes.

This image shows three separate instances of the Date dimension table imported using referenced queries in Power Query. The three tables: Order Date, Due Date and Delivery Date use up a tiny bit more memory than a single instance of the table but give report users more flexibility to answer business questions.

A Table for Every Date Column?

An attendee in a data modeling session recently asked this question. She said: “I have 14 date columns in my fact table used to track a variety of different events. Do I need to create 14 different Date dimensions?” My answer was “certainly not.” This is an extreme example but a very good question. You only need separate date dimension tables when you need to have the flexibility to use time-series functions in DAX or to perform comparisons using date part fields (like Year, Quarter or Month). Operations like calculating the difference between two date type columns or counting rows with future dates can be performed using fields in the fact table. I’ll show you an example in part 2 of this blog series. This is one of those “it depends” situations where you must make a judgement call depending on your business reporting requirements, balancing simplicity and maintainability with complexity and flexibility.

Many-to-Many relationships and Bi-directional Filters

File this under “don’t do this unless you have to.” Many data modeling decisions are a trade-off between performance and functionality; and using iterative design, you will often find a better way to solve problems as you go. There are a few different ways to design for many-to-many relationships. The conventional approach is to use a bridging table that contains all of the combinations of keys that relate two tables together. In the following example, the Customer and Product dimension tables normally would have a single-direction filter from the “one-side” to the “many-side” of the relationship. If reporting requirements call for filtering products based on selected customers who purchased them, we can use the Sales fact table as a bridge table and change the relationship between Product and Sales to use a bi-directional filter.

Depending on the cardinality of the relationship, using a bi-directional filter could introduce a performance penalty. If we only have 100 products and less than a million Sales records, this is probably not a big deal. If we have 10,000 products and 100 million Sales records, this relationship will likely slow things down considerably.

Here is another example: Given the requirement to show accounts and transactions for a selected customer, the following model doesn’t work with out-of-the-box relationships. To understand why, follow the flow of filtered records through the relationships. From Customer to AccountCustomer, the arrow on the relationship line indicates that the filter flows in the right direction. Once the AccountCustomer table has been filtered, the Account table will not be filtered as we try to swim upstream because that relationships doesn’t naturally flow from the many-side to the one-side.

Changing the Account-AccountCustomer relationship filter to bi-directional fixes the problem and meets the reporting requirement.

As I mentioned, the bi-directional filter introduces a potential performance penalty. However, if the cardinality is low and the memory footprint of the data model is low and lean (using efficient key values and removing unnecessary columns), this design may perform just fine and you are good-to-go. I’ll demonstrate more creative techniques in part two of this two-part topic, as we explore more advanced data modeling design techniques.

Continued in Part 2

Composite Models Gen 2 and DirectQuery over Power BI Datasets

My wife bought me a nice telescope for Christmas. I’ve wanted one since I was a kid. We opened gifts early because we’re going to take the RV out in the dessert to watch the great convergence of Jupiter and Saturn next week. But, what am I doing now, after opening my gift? …updating Power BI Desktop to check-out the new Composite Model feature! This new feature of Power BI has been in the works for several years and we have been anxiously waiting for this announcement. It is truly the Utopian dream; the Holy Grail!

It is hard to articulate the impact that this capability can have for businesses. The new composite model gen 2 feature is a monumental breakthrough that will significantly raise the bar for Power BI model development and could change the way we think about designing enterprise BI solutions – but customers will need guidance to use it correctly. If you haven’t been sitting on the edge of your seat waiting for this announcement, read the announcement and understand how significant this is.

Power BI December 2020 Feature Summary | Microsoft Power BI Blog | Microsoft Power BI

There are been many attempts by Microsoft and other vendors to create a data modelling architecture that provides for fast access to cached data, direct access to live data and scaled-out connections to established data models. Remember ROLAP and HOLAP storage in multidimensional cubes? These were great concepts with significant trade-off limitations. No other vendor has anything like this. Way back in the day, Microsoft jumped on the Ralph Kimball bandwagon to promote the idea that a company should have a “one version of the truth” exposed through their data warehouse and cubes or semantic data models. They met customer demand and gave us a BI tool that, in order to bring data together from multiple sources, makes it easy to create a lot of data silos. Arguably, there are design patterns to minimize data duplication but to use governed datasets, self-service report designers are limited to connecting to large, central models that might only be authored and managed by IT. This new feature can restore balance to the force and bring us back to “one version of the truth” again.

To put this into perspective, here’s a quote from Alberto Ferrari’s recent article: “It is not easy to find a name that accurately describes the massive step forward that is this feature in the evolution of Business Intelligence solutions. It would fully deserve a hyperbolic name, but the marketing departments already overused so many terms over the years: “universe” was already used in the past century; you can imagine what happened after that. We like “composite models” because it is a name that concisely describes the purpose of this new feature.”

Alberto has a great video emphasizing the magnitude of this release: New composite models in Power BI: A milestone in Business Intelligence – SQLBI

Those of us who live a breathe BI will continue to have thoughts, opinions and eventually will have war stories and prescriptive guidance to offer about using this new capability most effectively.

Upon initial review, it seems to work exactly as expected. In future posts, I will demonstrate how to design and implement a next-gen composite model and share my experience and thoughts as we move down this new road.

Power BI Expert Resources

Where to go for best practice advice for Power BI: The Data Gods

One of the most common questions I hear from customers and students is “what are your recommended resources for learning and staying current with Power BI?”. I find some people get stuck learning Power BI because they are simply overwhelmed with information and choices. Google search will often raise more questions than answers because there is a lot of noise on the Internet. Expert advice can be found only after dismissing the myriad of posts asking questions, sharing opinions, and sifting through outdated and confusing information. Most information found in discussion forums is outdated because the product changes so often. So, what are the most reliable sources for Power BI expert advice and best practices?

I follow several industry experts and I own many great books. Chris Wagner (KratosBi.com | @KratosBi), Microsoft MVP, has compiled the most comprehensive go-to list of Microsoft data experts called the Data Gods (you can see the complete list here). When I was added to the collective of featured “Data Gods” last week, I was hesitant to brag about. I mean, there’s no modest way to post this picture. The title of “Data God” is a lot to live up to – but it is truly an honor and I want to thank Chris and others for this recognition.

More importantly, this is a concise list of reliable industry experts – which I have paired-down to those who blog and post regularly about their experiences and advice. I know and follow nearly everyone on this list but there are a few folks with whom I have not yet had the pleasure to meet. There are also others featured in the Data Gods collection who make significant contributions in other ways, aside from frequent blogs and articles. I’ve also added a few additional notable community contributors.

Free Best Practice Resources

Following is my selection of members of the Data Gods counsel with links to their free resources. Most of these are blog sites or web sites that include blogs and articles available for free without requiring a sign-in or subscription:

Alberto Ferrari & Marco Russo – SQLBI.com | @marcorus | @FerrariAlberto

For most of us who work day-to-day with Power BI and other Microsoft BI tools, SQLBI.com is the first stop for expert advice. Alberto and Marco have set the gold standard for Microsoft BI best practices and design patterns. The breadth and depth of their expertise is expansive and they are known for going deep into the nuts and bolts that make Power BI work. These guys are are literally an extension of the Microsoft product development teams by guiding Microsoft leadership and evangelizing the platform. Their books on DAX and Tabular model design are considered by the most serious Power BI Developers to be the most comprehensive reference books available.

Patrick Leblanc & Adam Saxton – GuyInACube.com | @GuyInACube | @awsaxton |
@patrickdba

You either know these guys or you’ve never searched the web for Power BI help. Adam started the Guy In A Cube YouTube channel several years ago as a side project while doing BI technical support work for Microsoft. Patrick came from a BI and SQL Server consulting background before joining Microsoft and teaming with Adam. Today they share design techniques and feature highlights in weekly, high-quality short videos and a Saturday morning live streamed Q&A session. Adam and Patrick are members of the Customer Advisory Team (CAT) focused on supporting Power BI enterprise customers. Their weekly Power BI community Roundup video features the most current Power BI blog posts and release notes.

Chris Webb – blog.crossjoin.co.uk | @cwebb_bi

Chris is one of the most prolific, enduring and consistent bloggers in the Microsoft BI industry. His posts and articles go deeper then most, prepared with thought, effort and insight. Before joining the Microsoft Customer Advisory Team (CAT), he ran a successful training and consulting firm, specializing in the Microsoft BI platform. Over the years he was, and continues to be, the go-to expert on SQL Server Analysis Services, MDX, Power Query and DAX; and continues to share his knowledge and advice through his blog.

Matt Allington – Excelerator Blog | @ExceleratorBI

Matt’s main focus is business-centered self-service BI and financial analysis. He is a tried-and-true expert in the Excel and Power BI Desktop space. His training, presentation and writing style is clear, authoritive and to-the-point. He has the best collection of intro and intermediate level books on Power BI and DAX; each is short, concise and easy-to-understand. His books, Supercharge Power BI and Learn to Write DAX are excellent way to get started.

Seth Bauer, Mike Carlo & Steve Campbell – PowerBI.Tips | @PowerBITips | @Seth_C_Bauer | @Mike_R_Carlo | @PowerBISteve

PowerBI.tips is an expansive collection of free resources and utilities for Power BI with focus on visual themes, layout and color selection. This dynamic trio covers the important bases: Seth on visual design, Mike on enterprise solutions and Steve focusing on application development. They have an array of external tools for things like connection management and project documentation.

Melissa Coates – CoatesDataStrategies.com | @SqlChick

Melissa is one of the deep thinkers in the industry, always asking the right questions and challenging the status quo. She’s also one of the most pleasant people I’ve ever met. Her blog posts are thorough and insightful. Melissa has co-authored multiple whitepapers for Microsoft on data governance, enterprise deployment and migration.

Leila Etaati & Reza Rad – Radacad.com | @leila_etaati | @Rad_Reza

Leila and Reza are known as one of the power couples in the industry. With nearly limitless energy, they have traveled from New Zealand over the years to present at more conference events than I can count. Leila is one of the BI industry’s thought leaders in the machine learning and AI space, with education and credentials behind her experience. Reza’s expertise with Power Query and Power BI are impressive. They freely share their knowledge through their blogs, articles, training classes and have published numerous books.

Kasper de Jonge – KasperOnBI.com | @Kjonge

Kasper is a Principal Program Manager on the Power BI product team and based in The Netherlands. A regular conference session headliner, Kasper is very tuned into customers’ real-world enterprise BI reporting problems. He has been blogging for years and has recently launched a YouTube channel.

Alice Drummond – DiscoverEI.com | @AliceDiscoverEI

With an emphasis on environmental data analysis, one of the things I appreciate about this Alice’s active blog series is that with each post, she shares a comprehensive collection of blog sites and training resources to help readers find supportive information and expand their knowledge.

David Eldersveld – DataVeld.com | @DataVeld

If David can’t figure it out, it probably can’t be done. Over the years, he has pushed the limits, discovered and developed some very unique and interesting techniques to integrate things like graphics and geospatial data with Power BI. David is very active in the community and shares different ways to take Power BI beyond the out-of-the box features to do things that many of us had no idea was even possible.

Brian Grant – CSGPro.com | @BrianGrantBI

Brian has a unique intellect and a passion for problem-solving and sharing knowledge, particularly related to Power Query and DAX. I’ve had the pleasure of collaborating with Brian on multiple occasions in conference presentations and training events. He and his colleagues have developed an interesting method to understand the mechanics of the DAX language that he shares in a video series called the Elements of DAX.

Reid Havens – HavensConsulting.net | @HavensBI

In a few short years, Reid has risen to all-star status by tirelessly designing and sharing data visualization techniques. He is a master of spit-and-polish presentation and report design.

Meagan LongoriaDataSavvy.me | @MMarie

Meagan has broad expertise but her focus is designing reports for accessibility and accurate data presentation. Her posts and conference presentations are very insightful. She challenges the status quo and shares specific techniques to provide accessible report designs.

Matthew Roche – BIPolar.com | @SQLAllFather

Matthew is a member of the Microsoft Power BI Customer Advisory Team (CAT). He has considerable experience with several Microsoft development initiatives that have culminated in leading products and Azure services that are used to create enterprise data solutions. He is an industry thought leader who is very active in the online community. He shares his insights, leadership and expertise through his blog and industry conference events, most notably related to modern data governance and Power BI dataflows.

Gil Raviv | DataChant.com | @GilRa

Author of Collect, Combine & Transform Data in Excel using Power Query and Power BI, and excellent reference book. Gil maintains an active blog.

SqlServerBi.com | @Paul_Turley

I add my own blog to this resources list where I share my thoughts about best practice design and development approaches in this series: Doing Power BI The Right Way.

Others

Gilbert “Q” Quevauvilliers – FourMoo.com | @GilbertQue

From has vast experience consulting with IT and business consulting customers, Gilbert shares relevant examples and lessons learned. He is an active blogger and community supporter.

Imke Feldmann – TheBiAccountant.com | @TheBIccountant

Deep, business-focused, creative problem-solving techniques using Power Query/M in Power BI and Excel.

Ruth Pozuelo Martinez – Curbal.com | @CurbalEN

Ruth puts tremendous energy and creativity into her video presentations which she posts frequently. She covers everything from intro to moderately advanced level Power BI design with complete and thorough examples, all presented in a stunning greenscreen format.

Parker Stevens – BiElite.com | @PowerBIElite

Parker’s blog site is an extension of his “BI Elite” YouTube channel, with numerous short tutorials and excellent “how-to” examples.

James Serra | JamesSerra.com

Thorough and consistent, James shares announcements and guidance about the Microsoft data platform and data warehouse architecture.

Books

Beginner
Intermediate/Advanced

Doing Power BI the Right Way: 7. Planning for separation – data models and reports

Part of the the series: Doing Power BI the Right Way (link)

Back in the day before Power BI came along, when we created BI solutions, reports and data models were separate. If you created a cube or Tabular model with Analysis Services, it was developed with Visual Studio and deployed to a server. Reports could be authored and deployed to the report server, separately. Now, with Power BI Desktop, you author your data model and report in the same development space and then deploy the whole kit and kaboodle to the service at once.

The Thick and Thin of Reports

Power BI reports generally come in two varieties. If you open up Power BI Desktop, choose “Get Data”, build a model and a report; all of that stuff ends up in one self-contained PBIX file. Conversely, a report using a live connection to a published dataset is often called a “thin report” because it contains no data or data model. Thin reports offer several advantages: not only separation of development effort but also centralized control and the ability to share published data models with multiple report authors.

Even when a single PIBIX file contains everything, Power BI actually separates the report from the data model (called a dataset in the service) and gives them both the same name. This is very convenient for self-service projects because it is quick and easy to make changes to the data model, queries and measures if you need to make a report enhancement. This is all well and good for small, one developer projects but what about those larger scale solutions where the data model and reports are developed by different folks, or when multiple reports are connected to the same data model?

At what point does it make sense to separate the data model and reports into separate files?

I believe that managing reports and the dataset separately is a best practice in enterprise solutions. However, like a lot of “proper” practices in the IT data world, it is far less convenient to “do it right”. In each project, we must exercise an appropriate degree of discipline. I’ve worked on projects that were so laise fair and lacking management structure that they produced a tangled web of garbage. I’ve also worked on projects managed with strict adherence to rules and restrictive application development methodologies; where report and BI developers had no freedom to be creative; and thus producing bland and useless reports. It is hard to prescribe a set of rules that always works for every project with the right dose of balance. A certain degree of freedom is necessary to learn-as-we-go and produce masterful analytic reports. At the same time, the process requires a well-defined plan, time-boxing and prioritizing of features.

When I started writing this post, I was working on a consulting project that started out with the dataset and report in two separate files. Under that plan, one developer could work on the data model and publish it to the DEV/QA workspace and then the other developer could design reports as new modeling features become available. In theory, this is typically the right approach. The requirements for this project expanded and we found ourselves in a rapid iteration cycle, needing to make model, measures and report changes. As a result, I brought the report pages back into the dataset PBIX file. There is no elegant way to do this but it wasn’t difficult to create new pages and then copy and paste the visuals from the separate report file to the combined dataset/report file. Bookmarks, drill-through and tooltip pages had to be rewired but it wasn’t a difficult task. Kasper De Jonge walks through the process in this blog post from May, 2020.

In cases where you have crisp requirements and a clear division of labor; you can start with two files to delineate the data model and the report. In cases where you are actively prototyping and working alone, developing a single PBIX file containing all the parts will allow you to move quickly. When it becomes more important to lock things down and manage releases than to keep iterating, move report pages to a separate PBIX file that is connected to the published dataset in the service.

Hot Swap Live Connections

I’ve stared using a new external tool that promises to make this process much easier. The Hot Swap tool is a utility developed by Steve Campbell and available from PowerBI.tips as part of the Business Ops externals tools package.

This simple tool allows you to start a project with separate data model and report files and then to swap the live connection between copies of the model: either a dataset published to the Power BI Service or to a PBIX file on the local desktop using a local TCP port. Hot Swap is a community-developed external tool that is not officially supported by Microsoft but my experience in testing the tool thus far has been very good. I used an earlier solution from Steve in a project about a year ago that worked-out quite well. It was a set of PowerShell scripts that rewired connects in much the same way. Having this capability in a simple pop-up dialog is an even more convenient option. You can check out this demo of the Hot Swap tool in the PowerBI.tips YouTube channel.

One of the tings I like about starting with two files is that we don’t run the risk of design issues left-over after moving report pages and visuals to another file.

The new approach is simple: begin your project with separate data model and report files. You can publish an early version of the data model file to a development workspace in the service and connect to establish a live-connected report. Then, run the Hot Swap tool and switch the connection to a local copy of the your data model to continue iterative development. Changes to the local data model PBIX file/Power BI Desktop instance are immediately available in the report PBIX. Publish changes whenever you need to and then you can use Hot Swap to remove the local connection. You don’t actually use the tool to perform the reconnect because Power BI Desktop allows you to use the normal Get Data options after a live connection has been removed from the report file. Regardless, it is a simple process.

Recommendations

For small, informal projects developed by one person (or one developer at a time), keeping the data model and report in one file is convenient and easier to make changes as you go. If you later decide to share the dataset for others to create additional reports, or to promote or certify the dataset, you should separate the dataset file and create a “thin” report using a live connection to the published datset.

For moderate-scale and formalized Power BI projects, begin with two separate PBIX files or make it a point to separate the data model from the report file before you deploy to the Power BI service. Consider using the Hot Swap tool to help manage local mode development.

Doing Power BI the Right Way: 6. Validating data model results – Part 2

Part of the the series: Doing Power BI the Right Way (link)

Part 1 is here

Validation 301: Creating a Validation Report Solution, Part 2

We often have users of a business intelligence solution tell us that they have found a discrepancy between the numbers in a Power BI report and a report produced by their line-of-business (LOB) system, which they believe to be the correct information.

Using the LOB reports as a data source for Power BI is usually not ideal because at best, we would only reproduce the same results in a different report. We typically connect to raw data sources and transform that detail data, along with other data sources with historical information to analyze trends, comparisons and ratios to produce more insightful reports.

However, if the LOB reports are really the north star for data validation, these can provide an effective means to certify that a BI semantic model and analytic reports are correct and reliable.

This is a pattern that I have applied for consulting customers. My current client refers to the main Power BI report and dataset as their “dashboard” solution. The data model is large and complex with sophisticated reports used monthly by hundreds of users. The dataset is setup with a scheduled refresh. The “dashboard” solution has been thoroughly tested and we are careful to only apply updates and enhancements infrequently under a rigorous application lifecycle management process.

At the end of each month, a user exports their LOB report to an Excel file and drops it into a SharePoint library. I have one consulting client who use an automation tool to essentially open the LOB application, punch in the right keystrokes and export the report to a file at the end of every month.

The format of these exported reports have proven to be fragile and we don’t want to risk destabilizing the main Power BI “dashboard” solution. So, rather than importing the validation files directly into the main dataset, I have created a separate Power BI project with a dataset & report that has two sources. The first data source for the validation report connects to the main “dashboard” Power BI dataset using the Analysis Services connector and the XMLA endpoint. A DAX query is issued to return values to compare with the LOB report totals. The second data source for the validation report imports the validation files that were exported from the LOB report. If that import process were to fail for any reason, it doesn’t affect the main dashboard solution.

This diagram depicts the validation solution at a high level. The main Power BI “dashboard” solution get its data from the corporate EDW. That data is processed through rigorous ETL pipelines and contains data extracted from multiple systems used by the corporation.

A lightweight DAX query imports summarized results from the main dataset into the validation dataset.

Before we go any further, I just want to make the point that in most normal Power BI data models, I would never import data from another published Power BI or SSAS data model. That’s just a bad idea and can make it very difficult to trace data lineage. Do not use DAX queries as a data source for your regular report models. This validation report is a rare and appropriate exception to the rule.

The second data source for the validation solution is a set of files exported from reports produced by the LOB system. These reports only contain summary information but can be compared to totals from our Power BI solution. Values from the two sources are compared and the validation report shows whether they match or fail validation.

Just a note about the decision to create a separate dataset. There were two reasons for this:
1. We could not control the format of the validation files. If the file format were to change, it would cause an error and stall our scheduled refresh cycle for a large mission-critical report used by several hundred high-level employees (whose job is it to tell the CEO why his dashboard doesn’t work?)
2. Today we do not have the ability to use a live connection to a Power BI dataset in a composite model along with other sources, but we will in the near future. This capability doesn’t address problem #1 but will certainly change the way I think about architecting large, multi-source models in the future.

Writing the DAX query to read the summary values from the main Power BI “dashboard” dataset was a great learning experience. Although I am now pretty good at writing DAX queries by hand after a couple of these projects, the easier way to get started is to allow Power BI Desktop to generate the query for you. Here are the steps using the ContosoDW dataset as an example:

  1. Add a table visual to the report canvas. Add the same field and measure used in the LOB report to be used for comparison.
  2. Turn off totals for the table visual.
  3. Add and set slicers or filters to equate the LOB report results.
  4. On the View menu in Power BI Desktop enable the Performance Analyzer.
  5. In the Performance Analyzer pane display to the right side of the report canvas, click the Refresh visuals link.
  6. Click the plus sign next to the table visuals to expand the performance counters for that visual.
  7. Click the Copy query link.

The DAX query is now in the Windows clipboard and can be used to run the same query that the designer produces to populate the table visual. To test and modify the query, do the following.

  1. Make sure you have the latest version of DAX Studio installed. If not, install it from daxstudio.org. Save your work, close and reopen Power BI Desktop after installing DAX Studio.
  2. From the External Tools menu in Power BI Desktop, click DAX Studio. This opens DAX Studio with a query session connected to the local Power BI dataset.
  3. Paste the clipboard contents to the query window and run the query.

Here’s the query that was generated by Power BI:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Product'[Category Name], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Product'[Category Name]

This code can be simplified for our needs. The essential code is the expression using the SUMMARIZECOLUMNS function. You can disregard everything after that. Make the following changes to simplify the DAX query code:

  1. Delete everything after the closing parenthesis for the SUMMARIZECOLUMNS function.
  2. Remove the variable line just before the SUMMARIZECOLUMNS function and replace it with “EVALUATE”

The resulting query code should look like this:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])
EVALUATE
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

This is the query I would use to populate a new table in my validation dataset. Just use Get Data to open the Power Query editor, create a new SQL Server Analysis Services connection, paste the workspace path for a premium enabled workspace and connect to the published dataset for my main Power BI solution. Then finally, paste this query code into the query window.

Next step is to import the validation file(s) that were exported from the LOB report. In our validation solution, I used the SharePoint connector and imported all files with a predefined file name prefix. After the validation files are imported, make sure that the column data types are compatible, load the table and wire-up a relationship between the two tables. In our last validation solution, I created a new measure that compares a measure from the summary table imported from the main “dashboard” solution dataset with a comparable measure from the validation file source table. Measures calculated the difference and the % difference between the two. I then used a KPI indicator icon in the table visual using conditional formatting to show if the differences were within an acceptable range.

This screen capture from the validation page has the real data blurred out but you can see the grouping and comparisons resulting in either green or red indicators.

Each month, I receive an email from our user who is responsible for the report validation. He tells me that at a quick glance he is able to verify that the totals from their line-of-business application and Power BI solution match, and that the executive leaders and financial analysts who use these reports can be certain that they are accurate and reliable. When an item doesn’t match, they can pinpoint the account in question and usually are able to track it down to a data refresh timing issue or a mistake made in their source system that hasn’t yet been reconciled.

We are now setting up alerts using KPI tiles on a dashboard that will message key users when validation either succeeds or fails in key areas. As a result, the confidence in this solution has increased significantly and business leaders trust the information they get from these reports.

Doing Power BI the Right Way: 6. Validating data model results – Part 1

Part of the the series:  Doing Power BI the Right Way (link)

Moving important business data into a data model for analytic reporting can often be a two-edge sword. Data retrieval is fast and can support all kinds of analytic trending and comparisons. But, data in the model may be one or two layers away from the original source data, making it more challenging to compare with familiar user reports. Often the first validation effort after transforming and loading data into the model and then visualizing the initial results is having a business user say “yeah, that looks about right.” Then, sometime later after more development and extensive user testing, the feedback might be “hmm, that seems a bit off.” …not exactly scientific.

I have been doing a lot of data validation work lately – both formally and informally. Informally: Validating calculation results from a Power BI data model is just a normal part of the development process. Formally: After developing and delivering an entire business dashboard solution, a formal validation process is used to validate the ongoing results after future data refresh cycles and to certify reports so that business leaders know they can trust them.

Particularly for database developers, making the leap to using reports built on a tabular data model instead of the source database is a mental transition. The developer has to stop thinking in SQL and start thinking in model mechanics. This can pose a challenge when needing to validate numbers in a report, which should be done using the data model and not only by running by a SQL query.

Validation 101: Create & Use COUNT Measures

When designing a new data model, this is typically the first thing I do… For every fact table and for each large dimension table, I create a measure that returns the record count for that table. Users normally think about their data in business terms (like sums, ratios and averages) and not about how many records there are in a table. Record counts are a convenient sanity check for record completeness and distribution across groups; and may also be a good indicator for model size and performance.

Let’s say I have a data model containing the tables: Sales, Orders, Customers, Products and Date. At minimum, I will create the following measures with code similar to this:

Sales Row Count = COUNTROWS( ‘Sales’ )

Orders Row Count = COUNTROWS( ‘Orders’ )

Customer Count = COUNTROWS( ‘Customers’ )

These measures allow you to quickly troubleshoot problems that might otherwise go unnoticed – with model relationships, key matching, orphaned records and missing records. Just drag-and-drop the measure and a field to group the values. For example, if I drag the Sales Row Count measure and Year from the Date table onto the report canvas, I can immediately see sales records across years. If I do the same thing with the customer region field, I might be able to spot a gap in the region distribution of records over time. If every column in a column chart (created by default when dragging a numeric field) shows 100%, then I know that I either forget to create a relationship, have a mismatched data type or forgot the “Set as Date Table” setting for the Date table. Easy peasy.

Can’t I just count any field value rather that creating explicit measures? – Yes but don’t. Coming from an SSAS multidimensional background, I learned to define measures for every useful numeric metric. If we needed to show the SUM or the COUNT of anything, we needed to define a measure for that. In Power BI and SSAS Tabular, you can simply use implicit measures – meaning you can just drag-and-drop any field and then indicate that you want to count each occurrence or sum up the numeric value of the field. For quick-and-dirty Power BI reports, this is a convenient way to get simple aggregate results but not a best practice in my opinion because of inconsistent behavior and other limitations. There is a growing consensus in the Power BI community that using explicit measures should be the norm and not the exception. If you always use explicit measures rather than aggregating numeric columns, users and report developers will always have a consistent experience.

Validation 201: Measure Dependency Chains

This is a fairly normal pattern:

  1. Create a simple base measure using a simple expression, let’s say:
    [Employee Count]
  2. Create additional measures that reference and then perform some logic on the first measure. For example:
    [Employee Count Females] and [Employee Count Males]
  3. Layer another new measure on the second layer of measures. Like:
    [Female to Male Employees Ratio]
  4. Create another measure to apply special business logic, let’s say to conditionally apply row-level security and non-visual totals:
    [Female to Male Employees Ratio (RLS Adjusted)]
  5. ..and so on

This is a simple example but it is not unusual. Reusing measures helps to separate reusable calculation logic but it can also create deep dependency chains. At some point, you will inevitably discover a calculation error that you will need to track-down and debug. I can’t remember my own name some mornings so I certainly cannot work through the mental gymnastics needed to keep track of every dependent measure.

When I have a visual that is not displaying the expected results, I usually do the following:

  1. Duplicate the report page and prefix the page name with TEST: (something meaningful)
  2. Delete visuals not needed for testing
  3. Convert the visual to a table so I can just see the numbers grouped by a field
  4. Drag the referenced measures into the table to create multiple columns. Validate each dependent measure.

This helps to break the problem down into manageable steps and components. If you know that the first measure is always behaving as expected, move on to the next measure that depends on it. If the problem isn’t immediately apparent, recreate the same conditions with slicers. By isolating the problem, I can find and fix the issue every time.

How to Track Object Dependencies

Keeping track of dependencies between layers of measures can be challenging. I keep a scratchpad on my desk to make lists of things I need to remember. Sometimes I don’t even save my notes but it just helps me organize my thoughts. I use a whiteboard for a similar purpose, diagramming dependencies between measures and other objects with boxes and lines. Just standing and diagramming on the whiteboard is a very effective problem-solving technique for me. Electronic tools like OneNote and Microsoft Whiteboard can be just as effective if you are in the habit of using them.

The Object Dependencies feature in Tabular Editor is pure magic and a tool I am using more frequently to test, debug and to document my data models.

Tabular Editor Object Dependencies dialog

Reusing Measures

On a different but related topic… Create separate measures for calculations that will be used in visuals and reused by other measures but having too many measures can be difficult to manage. Strike a balance between code reuse and a more succinct, easier-to-manage portfolio of measures. If you have to jump through seven hoops (e.g. trace multiple dependent layers of measures) every time you need to debug or validate a calculated value, you probably have too many dependent measures.

Organize measures but don’t over-organize them… I prefer to put all my measures in a central table or at least assign them to just a few subject area fact tables. Don’t leave measures in different tables, scattered throughput the data model. Although I’m a fan of using display folders, it can be more work to assign measures to several different folders than it is just to use meaningful naming conventions. The implementation of display folders in Power BI Desktop in my opinion is an unfinished feature. If you want to use display folders to organize measures within container tables, use Tabular Editor. Name measures so they’re predictable and easy to find in a sorted list. For example, if you will have a series of measures based on “Sales Amount“, don’t name the variations “MTD Sales” or “Ratio of Sales to Prior Year“. Instead name them with the same prefix, like “Sales Amount“, “Sales Amount MTD“, “Sales Amount YTD“, “Sales Amount Ratio to Prior Year” and “Sales Amount Region % of All Regions“. This way, they all get sorted and grouped together in the field list. The theme here should be finding the right balance to organize your objects while keeping things simple.

Validation 301: Creating a Validation Report Solution, Part 1

This is a big topic and I have a learned some great techniques which will be covered in a follow-up post. So, let’s call this teaser.

How do users and business leaders know with certainty that they can trust the information in a Power BI report? There are a variety of factors that can affect data reliability. If the data source is known to be correct, there could be issues in the extraction, staging, loading, transformation or the calculations used to present the information in an analytical report. Testing the report with data at a point in time is a good place to start but it doesn’t absolutely guarantee accuracy down the road. If we had a report with a simple set of aggregate values read directly from the business source of truth alongside calculated values from the data model after it has been refreshed, this can provide a litmus test to check and instill congoing confidence. When (I said “when” not “if”) a user tells us that report information isn’t right, we can confidentially and easily prove that analytic report information from the data model is consistent with the source. This is what a validation report is all about.

To set the stage for a longer post on this topic, think about this typical scenario:
Every month we load fresh manufacturing production volume and forecast records into our data mart or data lake, and then into the Power BI data model. The results of the data model can be used as a data source for the validation report. Using the XMLA endpoint, you can run a DAX query against the published data model to get the values shown in a report visuals complete with filters and slicers. For comparison, you can load data from a report produced by the business system, load those values into a the validation data model and compare the monthly totals. We can ne flag the data that does and doesn’t match, and calculate the differences.

More on this later.

Creating a Validation Report Solution – Part 2

Doing Power BI the Right Way: 9. Choosing the right report type: analytic or paginated

Part of the the series:  Doing Power BI the Right Way (link)

It has been a few weeks since I last posted, and a while since I started the promised blog series “Doing Power BI the Right Way”. I’ve received a lot of positive comments on the posts in the series this far, and people wondering why it’s taking so long. Well, I am busy working on a book about Paginated Reports and SQL Server Reporting Services. This will be the second edition of a book I co-authored ten years ago and this edition is also a collaborative effort with multiple contributors. We have quite a lot of excellent material that is just starting to go through the process of editorial and technical review. Writing a technical book always takes more time and effort than imagined, and working on this one in my “spare time” is a laughable notion but this is going to be a great book.

Writing this blog series while juggling other commitments has brought me to two realizations: 1. Posting articles in the series will be sporadic, and 2. The topics will be in fairly random order based on the projects I’m working on and the topics that are most top-of-mind at the time.

This brings me to the subject of this post: Paginated and Analytic reports.

Before we had Power BI, we had Reporting Services. When the focus of my career and consulting practice became Business Intelligence, starting about fifteen years ago, most of the “BI reports” I created were in SQL Server Reporting Services (SSRS). We could create a variety of charts and reports that allowed users to drill-down from summary to details. We could create reports that allowed a user to drill-through, from one record or report item to another report that presented more details. With a little creativity, we could create reports that allowed users to click items that would filter other items on the same report. It took some work and time to create these interactive “drill-through-to-self” reports. Today, after transforming data into the right format for a tabular data model, this type of interactive functionality just magically happens in Power BI with very little report design effort. But, Power BI is primarily a tool for analyst users to create their own reports and to get answers to business questions rather than for IT developers to create reports that print or export perfectly to a PDF or Excel. Although it is possible to create fairly advanced paginated reports, most bread-and-butter operational reports are really just what I call “lists of stuff”… the results of SQL queries presented as a table or matrix; perhaps with a few groupings and subtotals. Simple business processes and task often require simple reports to help manage them.

Paginated reports, by definition, are reports that allow a lot of data to be presented on multiple pages. Paginated reports can be viewed online, printed or exported to a variety of file formats. Becasue there is so much overlap in the capabilities of SSRS/Paginated Reports and Power BI, the choice between these two report tools is not always clear. I can cite many cases when using Power BI Desktop is clearly the right choice to create an analytic report, and cases where it would make more sense to create an operational report using Paginated Report Builder. I think these are the easier decision points but I would like to explore those cases where we could argue either way. What are the pros and cons of using Power BI to create traditional reports? When would it make more sense to replace a static, paginated report with an interactive report created with Power BI?

The integration of Paginated Reports into the Power BI cloud service continues to march forward at a rapid pace. Just yesterday, Chris Finlan teased a blog post announcing new Power Automate integrations that will allow report to be distributed and exported using flows. Listen up for some big announcements at Microsoft Ignite about Paginated Reports in the Power BI service. Whether you host operational reports on your own report server or in the cloud, Paginated Reports are an important part of most business reporting portfolios.

When do you choose to use Paginated Reports or SSRS rather than Power BI? Do you design reports differently than you would have 5 years ago? Do your users prefer interactive, self-service reports now or do they still want made-to-order static reports?

What are the deciding factors between creating a paginated report or an interactive report?

Do you create paginated reports with a Power BI dataset or SSAS model as the data source or do you only use SQL queries for those reports?

I’m interested in your ideas and questions. Please post them in the comments below this post and then I will extend this post with your input and my thoughts.

Doing Power BI the Right Way: 2. Preparing, shaping & transforming source data

Part of the the series:  Doing Power BI the Right Way (link)

In a business intelligence solution, data must be shaped and transformed. Your data is rarely, if ever, going to be in the right format for analytic reporting. It may need to be consolidated into related fact and dimension tables, summarized, grouped or just cleaned-up before tables can be imported into a data model for reporting. If source data is in a relational database, chances are that it is optimized for transactional processing or application development. The column names, data types and formats are likely not suitable nor user-friendly for simple reporting. If source data exists in files or spreadsheets extracted from business reports, without data preparation you are likely just to reproduce the existing reports without adding business value.

This important topic raises one of the most critical questions to consider when designing a Power BI project:

Where should I shape and transform data?

Data can be shaped and transformed in at least three different layers in the business intelligence process:

  1. At the source
  2. In Power Query
  3. In the BI data model

Your options for transforming data at the source depend on the capabilities of the data source. In general, a query running closer to a data source will be more efficient than one that must transfer a set of data, cache and manipulate it downstream. If you are importing text files or Excel files, then your only choice is to import an entire file and then potentially throw away what you don’t need – which is fine for small-ish data sets.

The right side of the previous diagram depicts data transformed in the data model. Some data morphing can be performed in DAX using calculated columns or calculated tables. In rare cases, there may be good reasons to do this but if you can perform row-level calculations in a database view or in Power Query, you will be better off. It is also easier to maintain the solution if you can keep all that query logic in the same layer so you are only using one design tool.

Data shaping at the Source

If we break the first choice down a little further, we could prepare data in an Extract, Transform & Load (ETL) process and populate a new table in the source database with correctly-shaped and formatted data for BI reporting. This is the approach used in traditional data warehouse solutions, typically under the control of an IT-managed data warehouse development group.

Data Warehouse, Data Mart or Data Lake

Back in the day, the conventional approach was to move data from sources into a staging database, and then to move it again into a dimensional database that was structured to support analytic reporting. I say this in past-tense because that was the way we always did it 10-20 years ago. Now, in the age of self-service analytics, it has become more common to build reporting projects on-the-fly and to transform data without moving all the data from one location to another before it arrives in the BI data model. However, there are limits to just how much data we can transform and how much complex transformation can be done in an ad-hoc fashion. The water here is muddy and it is difficult to say that one approach is always preferable to another but one thing is very clear: a Data Warehouse is very necessary in cases where data governance, high volume and centralized management are required.

If requirements are well defined and unlikely to change, building a data warehouse or dimensional data mart might be the ideal approach. However, data warehouse projects can get stuck in long cycles of over-planning and requirement gathering which has led many organizations to think about about BI as an extension of business rather than IT processes. Most organizations have come to grips with the reality that the vast majority of BI projects are conducted after databases and data warehouses have been completed. Over decades, we have seen hundreds of companies chasing the Utopian dream of the “perfect data warehouse” along with Master Data Management, data cataloging, cleansing, governance and stewardship. Each of these goals can be achievable but are potential rabbit holes. Experienced data practitioners have learned to apply their experience to time-boxed and resource constrained projects with the goal to create “good enough” solutions that don’t completely drain IT budgets and resources.

The data lake has become a viable option for staging and storing data before moving it into a structured model because relational storage can be expensive. Data lakes provide a low-cost means to store a lot of data in the cloud. We usually think of the “modern data warehouse” platform consisting of a data lake and dedicated data warehouse database. Flavors of the lake/warehouse approach could include using a lake AS a warehouse, using a standard relational database to store a small/medium scale warehouse or using a dedicated parallel processing DB platform (e.g. Synapse, Snowflake or Cosmos DB) to store a large-scale data warehouse.

Power BI and the Data Warehouse

Bringing to discussion back to focus on Power BI queries, data models and reports; let me put a bottom line on this multifaceted topic. I believe that there are three data transformation scenarios related to Power BI design:

  • Business-Centric, Informal and Small Model/Report Solutions
    Power BI makes it easier to transform data as you go. In small projects, using this approach saves time and money, allowing business professionals to meet your own reporting needs. End of story… win-win.
  • Interim POC Solutions
    Without a formal data warehouse, Power BI can be used to munge-together data from multiple sources and discover requirements that a data warehouse solution would provide. Use Power BI to build a prototype or proof-of-concept solution that can eventually evolve into a formal solution. Accept the follies and trade-offs of ad hoc data models and use them to evolve toward the right solution using iterative design.
    Use Power Query to perform all necessary transformations and cleansing operations.
  • Formal Data Model & Report Solutions
    Once properly governed and secured data warehouse/marts exist, rebuild Power BI data models and reports. The “rebuild” effort could encompass anything from switching data sources in an existing report project or starting from scratch. Either way, you are saving time by applying learnings and design elements. If additional query logic is needed, use database views to promote query folding in Power Query.

The simple fact is the making such a transition is never “done”. It is a journey that requires iterative thinking and management. Accepting and embracing this reality will save everyone a lot of angst. With data cleansing and transformation applied to ingested data warehouse data using tools like Azure Data Factory (which may utilize Power Query or SSIS), SSIS or Databricks, data sources for the Power BI project may simply be tables or views.

Where and what is Power Query?

Power Query is the data transformation technology built into Power BI. It is an extremely flexible and extensible data shaping platform that evolved from tools like SSIS, DTS, Excel, Microsoft Query and Access. When you choose Get Data or Transform Data from Power BI Desktop, this opens the Power Query editor. The Power BI service includes an online implementation of Power Query called Dataflows which allows queries and entities to be share between Analysts creating different data models. Power Query is also now integrated with Excel (when you select Get Data), the Visual Studio SSAS Tabular model designer and Azure Data Factory.

Learn to leverage Power Query to perform data transformation efficiently to perform all row-level transformation and calculations. In general, Column naming and data type changes should be performed in Power Query rather than embedded SQL or DAX.

Understand Query Folding

This is a critical concept to understand about Power Query. For supported data sources and whenever possible, queries you create in the Power BI query editor will be converted to native queries to be executed at the source. This means that if you are using SQL Server, rather that transferring all the data from the database engine to the Power BI service before performing transformations, steps may be converted to an optimized TSQL query. This is an ideal scenario for high-volume fact tables.

Query folding only works when Power Query can select records from a single database object like a database table, view or table-valued user-defined function.

Hard pills to swallow

For relational database professionals, there are a few realities about Power BI that can be a shock to their pattern for solving data problems. Without getting into all the details about why things are the way they are, it is very important to understand and embrace these points:

SQL queries do not fold – If you hand-write a SQL query or copy and paste SQL script into a new query within Power BI, Power Query cannot fold – and therefor cannot optimize – that query. This doesn’t mean that this is wrong or necessarily a bad practice. If you need to populate a relative small table in the data model that doesn’t need to be filtered, grouped, sorted or otherwise transformed in ways that could otherwise be optimized; this is probably fine.

Selectable database objects can be folded – If you can use a single database table or create a view in the database, particularly for high-volume fact tables, these queries will be faster and will consume fewer server resources.

Stored procedures – Power Query doesn’t include specific optimizations or support for stored procedures. I have successfully used stored procs as data sources, at the insistence of DBAs, but this is not a recommended design practice.

Parameterized filters

Queries for large tables should be designed with filters so you can easily control the records imported – both at design time and when you need to populate the data model with a full set of records. Even if you don’t need to manage millions of records or load a table incrementally, this is just a good idea and will save a lot of redesign later on.

I use parameters to manage anything that might change, like folder paths and server names.

The example shown here conforms to the requirements needed to implement the incremental refresh feature if Power BI:

The RangeStart and RangeEnd parameters are then used in a Filter Rows step in the query to limit the imported records within the selected date range.

Incremental refresh

One of the perks of using a date range filter is that it enables incremental refresh. This is one of the coolest features to come along in a long while and now it works with both Premium and share capacity licensing. If queries support query folding and use a set of date range filters to limit the records into a table in the data model, Power BI will automatically generate table partitions. For example, if I were to partition a table by month, each partition includes records for only one month and data in each monthly partition is is only loaded when new records are available or when data has changed.

I design my fact tables to support incremental refresh as a standard design pattern whether we have a requirement to implement it or not. This gives me the flexibility to only load a small set of data I need for development. Publishing a smaller PBIX file is faster and then more data can be loaded after the dataset has been published to the service.

If you don’t need it, throw it away

The presence of every single column in your data model comes at a cost. It uses memory, storage space and can potentially impact performance. If you aren’t absolutely certain that you need a column, just omit it from the table in Power Query. If you or you users change their minds, columns can easily be added.

Before you ask, this is not a picture of my garage but according to my wife, it could be. Keeping all the columns in your data that you “might need someday” is a lot like all that crap I’ve been collecting in my garage and on my workbench that I tell myself might be useful some day.

Error on the side of omitting columns that you don’t need to meet immediate reporting requirements. The difference between the cluttered garage analogy and reality of Power BI design is that you immediately benefit by getting rid of unneeded data right now and it costs you little or nothing to go back and add one of those columns later, after your business stakeholders decide they need it.

The easiest way to manage this is to use the Choose Columns button on the Home ribbon, and then just deselect the columns you don’t need. In the future, use the gear icon next to the generated step to make changes to the column selection.

Column naming

Please read this aloud and commit it to memory: Rename all columns that will be visible in the data model using concise, self-describing, meaningful and user-friendly names. I can’t emphasize how important this is. Not only is this good practice in BI solutions, but there is a psychology to the way business users perceive what they see in reports. When they see cryptic labels and field names, they become uncomfortable and may distrust the information.

Your data model should be designed for users and not for developers to design reports and consume. Even if technical professionals are designing reports, field names are going to show up as labels and titles. Consider the following examples:

Friendly Column NameTechnical Name
Customer Number
Customer Nbr
Customer #
customerNumber
CustomerNbr
CUST_NBR
Product Brand NameproductBrandName
ProdBrandNm
PRD_BRAND_NM

Technical naming conventions are deeply engrained into the culture of application and database design and for good reason. For developers and database professionals, certain naming patterns like Camel Case (camelCase) and capitalized word abbreviations separated by underscore characters are almost a right of passage; but object names in this format should not exist in a data model for analytic reporting. Spaces in column and measure names are welcome, as are punctuation and other special characters like #, $, %, etc.

Table and fields should be named so they are familiar and easy to understand. When you or a user designing a self-service report drags and drops a field or measure into a report visual, these names are used as labels and titles that should also be meaningful and self-explanatory.

Field names should also completely describe the object. Consider that if the Product and Customer tables both have “Name” columns, if you were to drag the Name field from the Customer table into one visual and the Name field from the Product table into another visual, each would only be labeled “Name”. It would be more appropriate to name these fields “Customer Name” and “Product Name” in their respective tables.

Naming & annotating steps

Rename the steps in Power Query so that they are self-describing (like the following example). Your future self will thank you when you revisit the project in six months. This is done by right-clicking the step and choosing Rename from the menu.

A new feature in Power Query displays a tooltip with annotations you add to the M script or the text you enter in the Description property of the step.

The names of each step are applied as variable names in the Power Query “M” language script. Variables that contain spaces or special characters are proceeded with # and are encapsulated with double quotes. The comments in the code I added with // prefixes result in the tooltip annotations I mentioned in the previous paragraph.

A recent suggestion from one of the blog readers is to use underscores rather than spaces in these step names. This allows you to double-click the variable name and copy it to the clipboard.

Consolidating steps

Building queries using linear steps helps break-down complicated logic but it can become inefficient as the project evolves. In the course of query design, you are likely to repeat steps like those in the second example below where I removed three different columns after using them in other steps. I also changed the data type each time I created a custom or duplicated column.

If you select an existing step and then perform the same action on a different column or columns, Power Query will add the new transformation to the selected step. In the query on the left, I consolidated all these steps, removing columns and changing data types only once.

The optimized query on the left runs in about one-third the time is the original query on the right.

Power Query Best Practice Guidelines

Here is a summary of best practice recommendations:

  • Set EVERY column to the right data type. Watch for untyped columns with “ABC123” in the column heading.
  • Rename all columns that won’t be hidden in data model (keys and sort columns are OK).
  • Consolidate redundant steps.
  • Rename steps to document transformations.
  • Perform row-level calculations in Power Query instead of DAX calculated columns.
  • Use database views & avoid SQL statements so Power Query can implement query folding.
  • Use parameters & range filters to reduce working set for development, reduce PBIX file size, speed up desktop refresh and reduce time to publish dataset.
  • Use date range parameters in service to expand dataset size during data refresh & optionally implement incremental refresh.

Doing Power BI the Right Way: 1. Futureproofing Power BI solutions

Part of the series: Doing Power BI the Right Way (link)

When starting a Power BI project, you have many choices to make. Decisions like how to source your data, where and how to create queries to cleanse, transform and reshape the data; where and how to create calculations and the nuances of modeling are just the tip of the iceberg. Most Power BI solutions progress through a few stages in their lifespan, which include:

  1. Creating a simple prototype to explore ideas and to get feedback from users and business leaders
  2. Building a working proof-of-concept to frame the solution
  3. Designing a foundational data model to build upon
  4. Completing a production-scale solution
  5. Iterative enhancements & releases

The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.

Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.

Reality Check

If you are using Power BI to create reporting solutions for business users, your circumstances are likely similar to one of these scenarios:

  1. You are a self-service report designer creating a small report solution that will grow and evolve over time.
  2. You work in a software development group in an IT organization with a formal application lifecycle management discipline.
  3. You work in a business unit or small company with an informal software process – and they need your guidance to be successful.

If you are a self-service report designer creating a small report solution that will grow and evolve over time, you likely have a data source that wasn’t specifically curated for reporting and analysis. The data will need to be massaged and shaped before you can design insightful, interactive visuals. That’s cool because we have great tools built into Power BI Desktop to do that sort of thing. You might have some ideas about how to present the data but you’ll probably try different things and see where the journey takes you. These projects tend to have a lot of business impact right out of the gate. Business report users will be delighted with what you can do in a short time with Power BI and they will want more. You’ll get excited and expand the project to include more data or more dimension to the data you have. It is almost inevitable that you will experience two common challenges:

  • Scope creep
  • Data quality issues

These are not necessarily “problems” per se but realities to acknowledge. Building a solid, foundational data model is challenging in under these circumstances. The good news is that Power BI let’s you explore options quickly and allows you to try different approaches while you sort-out the formal business and technical requirements. The typical outcome is to start over with a new project where you will be well-prepared to use a cleansed data source and design to a well-defined set of requirements within the new project scope. An experienced BI practitioner will set this expectation with the business and either carefully evolve certain elements into the final project and toss the rest, or use the first attempt as a throw-away prototype and start over. Just know that this is a natural part of the process.

If you work in an software development group in an IT organization with a formal application lifecycle management discipline, you are probably accustomed to using team development a code repository, version control and differencing tools to manage a shared code base. The main components of Power BI came from a long heritage of IT-centric development so at its core, objects (like tables, fields and measures) have properties and metadata descriptors that can be versioned, differenced, merged and scripted. But what has made Power BI so successful is that is is primarily designed for self-service reporting. Power BI Desktop is a beautifully streamlined product that packages data transformation queries, model tables, data and reports into a tidy package but it is not an IT development tool and doesn’t have these capabilities built in – nor will it. That’s not what Power BI Desktop was designed for. The good news is that the community has rallied to develop tools to meet the needs of IT developers who work in teams to develop, test and deploy formal, enterprise-scale BI reporting solutions. Power BI and the rest of the Microsoft BI platform – can be scripted and automated to create custom deployments.

Right-sizing solutions

Before creating Power BI, earlier versions of Microsoft’s Business Intelligence components existed as separate products. BI reporting projects took many months to design, develop and deploy. By contrast, today you can open-up Power BI Desktop and create a quick “report” (PBIX file). In this form, the file contains data transformation queries, an in-memory data model with measures and other calculations, and report pages that include the interactive visuals. This design typically meets the needs of small and intermediate sized projects.

One of the first questions to consider is whether the solution you intend to create with Power BI is a quick informal project or a formal project – or may become a formal project in the future. Informal projects don’t necessarily have to conform to all the design best practices. They are limited in scope and typically meet a very specific and short-term need. Formal projects are built on best practice design standards to endure a certain degree of expanding scope and are designed for longevity. They have a business sponsor or champion who drives the business requirements and at least one developer who is responsible for designing a solution that meets those requirements. Formal projects are typically testing, signed-off an then deployed in accordance with an established development discipline or application lifecycle process so they can be supported by the business. If your organization has a process for delivering and supporting company-sanctioned applications and reports, then you will need a roadmap to transition from an informal project to a solution that the business is prepared to support. A few common questions that come up in these discussions are:

  • How is the source data procured and maintained?
  • Who is responsible for maintaining that data?
  • Who from the business defines the reporting requirements and signs-off on Power BI reports meeting those requirements?
  • Who owns the development of the Power BI solution (data model and reports)?
  • Who is the developer’s backup if they become unavailable?
  • Are the requirements and project assets sufficiently documented to support such a transition?
  • What are the security requirements?
  • Who and how will users be given access to reports or dataset(s) for ad hoc analysis?
  • Are users assigned to report and dataset permissions directly or through group membership?
  • Should users have conditional or restricted access to data within the Power BI dataset?
  • What is the process for requesting new reporting features, adding features to the existing solution, testing and deploying updates?

Personal BI projects, where a data analyst imports data, creates reports and then publishes a report for their own use serves an important need in business. These reports can easily be promoted to a departmental or business area solution by sharing the report with others. As the scope of the project continues to expand, there are several important considerations that define the threshold between an informal project, designed for quick development and convenience, and a formal project; intended for longevity and long-term business support. For each project component, I start with a typical informal project scenario and then provide recommended practices to to plan for futureproofing.

Planning for separation

Designing transformation queries, data model and reports in one PBIX file is convenient and offers quick troubleshooting, debugging and redesign. However, it also limits development to one person at a time and promotes a monolithic solution. Separating the data model from the report files helps to promote self-service and focused report design. The certified or promoted dataset is deployed, secured and managed as a separate object. Reports are connected to a Power BI dataset live connection and any number of reports can connect to a published dataset.

This pattern supports both fully IT-managed projects with multiple developers and it supports self-service report design using certified and promoted datasets. After the data model design is stabilized and locked-down, report designers simply focus on report creation and data analysis by connecting to the secured dataset. Any tools and products capable of connecting to Power BI or SQL Server Analysis Services (SSAS/AAS) can be used for reporting and ad hoc analysis.

Source data

Spreadsheets and files extracted from reports, manually manipulated using Excel or other line-of-business applications have limited utility as reliable data sources. Manually preparing spreadsheets and source data files is only feasible for small sets of data and one-time or manual data loads. An analyst user can easily remove, rename or add a column or formula to a sheet that will break or negate the import process.

Futureproofing: Data from source systems can be loaded into a managed database or data lake before using Power Query to load the data model. Data from files (if carefully protected and maintained) may be transformed, cleansed and validated before it is ready to be loaded. Using “brute force” transformation steps to replace values and cleanse records one step at a time that can be slow and difficult to manage as the complexity and volume increases. Reduce the number and complexity of query steps and apply the following recommendations for Power Query design optimization.

Use parameters in Power Query to manage any variable information such as database servers, database names (e.g. DEV database, QA database & PROD database) and file paths.

Planning for data volume

In smaller, informal projects we typically designed for a limited set of data and fixed time-frame. If manageable, all production data can be loaded into the working PBIX file on a developer’s desktop.

Futureproofing: When working with large data volumes, a subset of data should loaded into the desktop dataset for development. Query parameters are used to expand the number of records (usually using a date range) loaded after the dataset is deployed for production use. Groups and ranges of data records, often partitioned by months or years, are defined and stored separately so a large volume of records don’t need to be reloaded along with new records.

Power Query design practices

Using the robust Power Query design user interface (ribbon and menu options), you can build sophisticated and complex transformations that work well with moderate data volumes. With a small number of query steps, default names like “Rename Columns” and Change Data Type” are fine but it can be difficult to trace the steps and remember where and why you performed each step.

Futureproofing: Tables and fields should be meticulously renamed to use standard friendly names, categorized and optimized with appropriate data types. Unused columns are removed to save space and improve performance. Spend extra time to get the field names right before creating dependent steps. (see; Chris Webb: Names, Tables, Columns & Measures In Power BI)

Consolidate redundant query steps – like those used to rename columns, remove columns and change data types – into a single step in the query.
Attribute fields needed for grouping and filtering should be moved to separate dimension tables. Dimension unique keys are defined with corresponding fact table keys.
All numeric columns used for aggregation (which share the same level of grain) should be extracted to fact tables, with corresponding dimension keys.
Row-level calculated columns should be defined in either in database views or Power Query tables rather than DAX calculated columns.

Promote query folding by using database tables, views or table-valued user-defined functions in the database rather than writing SQL statements in Power Query. Test and optimize queries to support query folding by checking to see in “View Native Query” is an available right-click menu option on each query step. Power Query also doesn’t work well with stored procedures.

Data loading & refresh management

For large fact tables, design them to be “tall” rather than “wide” by removing unneeded columns or moving them into dimension tables. Tall tables should only include columns that compress efficiently, such as numeric values and short, non-distinct text values. Do not store a single unique key column in a fact table.

If large tables take too long to load or cause the PBIX file to exceed a reasonable size for development (no more than 500 MB), use parameters to limit the range of records. To manage the table size once deployed tot he service, do one of the following:

  1. Use start and end date range parameters to filter a Date or Date/Time type column in the table. Use this to limit the working set on the development desktop and then to expand the range of records in the service.
  2. Similar to step one, use a pair of Date/Time parameters with a range filter on a Date/Time type column and follow the product documentation to implement incremental refresh. This will automatically partition the table records on specified calendar date parts (e.g. weeks, months, years) and then only load changes.
  3. Devise a table partitioning strategy and implement partitions using Tabular Editor. Use PowerShell or the XMLA read/write endpoint to schedule and execute TMSL or XMLA script to manage partition processing.

Data modelling

Many modelling techniques that perform well when tested on small data sets will slow down with large tables. These include relationships on bridging tables, many-to-many relationships, bi-directional relationship filters and multi-measure layered calculations.

Avoid using SWITCH statements over already complex DAX expressions or referenced measures. Avoid using iterator functions (like SUMX) with complex row-by-row predicate logic. Separating calculation logic into separate measures is a useful design pattern to segregate business logic but it can create performance bottlenecks. Identify slow measures and consider building streamlined measures using alternate patterns when bottlenecks are discovered.

Import mode tables are generally faster and more flexible than DirectQuery mode tables. However, DirectQuery is a good option for very tall transactional tables where simple aggregations are used, rather than complex DAX measures. If Import and DirectQuery mode tables are both used in a composite data model, consider creating a summary table using Import model and a detail table using DQ, and then define aggregates to cache certain query results in memory (to compensate for the normal performance degradation resulting from the query translation in DirectQuery.)

There is so much more to discuss at this point but I’ll need to address additional topics in later posts in this series titled “Doing Power BI the Right Way“. Topics include:

  • Service vs on-premises
  • Interactive vs operational reports
  • Spreadsheet reporting
  • Iteration vs redesign: When to start over
  • Designing for scale

Power Query or Power BI Dataflows

Multi-tier and advanced self-service solutions can be architected by further separating the transformation queries from the data model using Power BI dataflows. These are server-hosted Power Query/M queries that are both designed and managed in the Power BI service.

Dataflows expose several integrations with the Microsoft Azure cloud services, such as the Common Data Service and Azure Data Lake Gen2 Storage. Dataflows also have integrations with AutoML machine learning models.

The advantages and disadvantages of dataflows should be carefully considered before jumping in head-first. Power Query in Power BI Desktop is a fast, right query development environment that provides nearly immediate feedback. By contrast, the browser-based dataflows designer is similar in function but slower and has a less-robust query design feature set. Some developers choose to develop Power Query in Desktop and then port their M query code to dataflows as needed.

Stick around. There is a lot more coming in this series in the months ahead:

Doing Power BI the Right Way

  1. Futureproofing Power BI solutions
  2. Preparing, shaping & transforming source data
  3. Power Query design best practices
  4. Power Query in dataflows or Power BI Desktop
  5. Data modeling essentials and best practices in Power BI and AS tabular
  6. Validating data model results
  7. Planning for separation – data models and reports
  8. Power BI delivery options
  9. Choosing the right report type: analytic or paginated
  10. Designing and managing large datasets in Power BI
  11. Multi-developer and lifecycle management for Power BI
  12. Certified reports, certified datasets & the self-service mindset

To wrap up this edition of the series, I will leave the rest of my outline points as teasers for subsequent posts. Please freely comment with your thoughts about what to include in these later editions.

The rest of these are random thoughts left from the outline for the series. Feel free to comment.

Future topics:

  • Define the technical audience
  • Establish solution ownership
  • Continuation plan (if they were to leave?)
  • Who will maintain & manage the solution after deployment?
  • User access
  • Version updates
  • Support

Know and understand the business audience

  • Report/Dashboard Consumer
  • Skilled Data Modeler
  • Analyst, Data Scientist
  • IT BI Developer
  • Planning for capacity
  • Certifying reports and datasets: Gaining business trust
  • Validating results
  • Team collaboration
  • Project file sharing
  • Promote continuous improvement
  • Version management
  • Managing deployment
  • Power Query and Dataflows
  • Promote and support repeatable design
  • Templates and layouts
  • Planning for security
  • Access to reports & dashboards
  • Publish to web
  • Embedding to a business portal
  • Embedding for external access
  • Access to dataset(s) for ad hoc analysis and future report development
  • Conditional data access / role-based access / row-level security

Power BI External Tools: Reading the Tea Leaves

I’m in the midst of writing another post in the series: “Doing Power BI the Right Way” that I started last week. However, this topic is super important so I wanted to pause and spend just a few minutes to share some thoughts. Why is Microsoft (particularly, the Power BI product team) promoting the use of tools that they don’t develop nor officially support?

This week, Christian Wade from Microsoft announced on the Microsoft Power BI blog that a new ribbon “External Tools” was added to the preview features of Power BI Desktop in the July update. In earlier posts, Christian has promoted community developed tools such as Tabular Editor, DAX Studio and the ALM Toolkit that can be used as replacements or extensions to the Power BI development experience. These three tools are autmatically added to the ribbon by default if they are installed, but you can add your own applications by following the instructions in the this Microsoft document titled Using external tools in Power BI.

What are these three “promoted external tools”?

Tabular Editor is an open source application developed primarily by Daniel Otykier. The source code resides in GitHub where multiple community members have contributed to the project. Likewise, DAX Studio is an open source project headed up by Darren Gosbell. The ALM Toolkit for Power BI is an evolution of an earlier Visual Studio extension project from Christian Wade called the BISM Normalizer, used primarily to compare and merge changes between SSAS/AAS Tabular projects and databases. Working with MAQ Software, he evolved the code into a stand-alone application which is optimized for Power BI data models. Each of these tools are free and supported by the community but not officially supported by Microsoft.

There is an important message here

I use all three of these applications regularly. Before the new ribbon, I would open my PBIX file in Power BI Desktop so that the data model was loaded into memory. Then, I would open the needed utility application from the Windows program menu, and then connect to the Power BI data model using the local port address. The new ribbon conveniently allows me to open the installed application and automatically connect to the data model I have loaded. Honestly, it only saves a few steps but the presence of the new ribbon sends an important message.

What does that mean, exactly? Unlike many other vendors, Microsoft has a strong and thriving community who promote and support their products. Much of this is simply volunteerism backed by Microsoft’s MVP, Partner, user group and Certification programs. I typically don’t contact Microsoft product support unless I have a bona fide show-stopping error but I’m certain that if you were to open a support case and tell them that DAX Studio is acting up, they would send you to the GitHub repo and suggest that you chat with other community members.

I worked with a consulting client who had a large investment in database and BI tools from Oracle, IBM and MicroStrategy. As we got started on a Microsoft BI project, I IT operations that all of the developers would need to install and frequently update several software tools. Any software needed to be added to the list of corporate approved software which is referenced by the outsourced support helpdesk to remotely install software for an employee. For each application, we needed to provide the specific product version, licensing agreement and cost, service level agreement and support agreement with the software vendor. Software could only be installed from a network file share and not over the Internet. There was a form and a formal approval process that didn’t support my suggestion that open source tools by installed and that the free, officially-supported applications needed to be updated every month. Altering the approval process required multiple meetings and escalations to top-level leadership who had never heard of a vendor promoting “community-supported” development tools and entertaining such a strange notion was a big paradigm shift – but it is our reality and one of the reasons that Power BI is a successful platform.

External Tools: What Does This Mean?

Microsoft has several forums for product users and partners to make suggestions and to ask for new features. If you search Ideas.PowerBI.com, where any user can ask for new features, you will see many many requests to add IT/enterprise features to Power BI Desktop such as version control, multi-developer support, partitioning and other capabilities that already exist in Analysis Services projects. Adding all these things to Power BI Desktop would clutter the product and defeat the goal of keeping self-service model/report development simple. For the IT BI developer, we have Visual Studio extensions for data model and paginated report design but the process and overhead required to maintain those project templates and extensions for different versions of Visual Studio is complicated to say the least (see my previous post “I am done using Visual Studio for BI data model development“).

Bottom line

Community-supported development tools are an essential part of the BI development experience. These are necessary utilities to have in your toolbox if you are a serious BI developer. The Microsoft product team will not only tolerate them but rely on them, and promote these tools in features like the new External Tools ribbon so that we have best-of-breed tools to fill gaps as the platform continues to expand.

Doing Power BI the Right Way

This is an introduction to a series of posts and pages that will provide a comprehensive set of best practices for successful Power BI solutions. In previous posts, I have asked readers to suggest topics for future posts. Based on that and other feedback, I will be addressing questions and suggested topics. The topics list at the end of this post is a brainstorm list and I ask that you help me make sure it is complete. My goal is to provide a set of guidelines and practices that provide the best chance of success as you navigate many decisions about how to stage and transform source data, how and where to perform data shaping and calculations, how to model data and the best way to visualize the results. The biggest question of all may be how to make the right decisions so that the small project you design today will work when you add more data, more users and transition into a formal, managed solution.

There are many fantastic resources to learn about Power BI and rest of the Microsoft BI and reporting platform; but learning about Power BI and the choosing among design options can be like drinking from multiple firehoses at full pressure at the same time. I will be the first to admit that my “best practices” are “my opinions”. In many cases they work consistently for me but points are debatable and open for discussion. I’ll tell you when I have a very strong opinion about something being done a certain way, or when I have found a pattern that works for me and that I offer for your consideration. I’m not always right… just ask my wife :-). Please comment, ask and offer alternative points of view.

Rather than offering another training course or duplicating the contributions that so many others in the industry make through their blogs, courses, books and articles; this will be a condensed set of guidelines about the many choices you must make when designing a solution. In the posts to follow, I will reference other resources and discussions on various topics.

Best practice guidelines topics

The following topic list will serve as a link menu for future posts. Expect this list to be updated and completed:

  1. Futureproofing Power BI solutions
  2. Preparing, shaping & transforming source data
  3. Power Query design best practices
  4. Power Query in dataflows or Power BI Desktop
  5. Data modeling essentials and best practices in Power BI and AS tabular
  6. Validating data model results
  7. Planning for separation – data models and reports
  8. Power BI delivery options
  9. Choosing the right report type: analytic or paginated
  10. Designing and managing large datasets in Power BI
  11. Multi-developer and lifecycle management for Power BI
  12. Certified reports, certified datasets & the self-service mindset

Just tell me what to do

Any attempt to apply universal best practices to Power BI solution design is a slippery slope. The tools are so flexible and powerful, and the requirements of each project are so varied that it is challenging to establish a set of steps or rules that, if followed, will always yield the absolute best design for a given scenario. With that out of the way, I’ll say this: In my job, I see a lot of poorly-designed Power BI projects. I’ve worked with dozens or scores (maybe even hundreds?) of consulting clients who bring us projects – some partially completed, some finished, and many that are just broken – to be fixed or completed. My reactions range from “that’s just downright wrong” to “hmmm… I wouldn’t have done it that way but I guess it will work for the time being”. I try not to cast stones and do, on occasion, realize that others have found a better way to solve a problem. I don’t have all the answers but I do have a lot of experience with Microsoft Business Intelligent solution design, and have learned many good practices and design patterns from other community leaders and many successful projects over the past twenty or so years.

A little less conversation and a little more action

Let’s start with a simplified flowchart and condensed decision tree. This first whiteboard drawing is the first half of the Power BI design process, ending with the data model, before measures, visualization and solution delivery. There is a lot more but I think this is a good starting point.

I Am Done Using Visual Studio …for BI data model development

For several years, Visual Studio has been my go-to tool for designing semantic data models used for Business Intelligent reporting. Back in 2005, I used the Business Intelligence Development Studio (BIDS) Visual Studio add-in for SSIS, SSRS and SSAS projects to develop BI solutions with multidimensional cubes. In 2012 when Microsoft began the transition from on-disk cubes to in-memory SSAS Tabular models, I used SQL Server Data Tools (SSDT) to create tabular models. It was a rocky road at first. The Tabular designer was fragile to put it mildly.

Enter Power BI… Initially intended for self-service data model and report design, Power BI Desktop has quickly grown into a robust and full-featured BI design tool. Not only does Power BI Desktop include a lot of great features, it is stable and streamlined. It’s a joy to use compared with my early experiences using SSDT for tabular model design. I prefer to use Desktop to perform model design. It’s faster, more convenient and just easier than SSDT. However, at some point I the life of a project it just makes more sense to transition the data model to an enterprise-scale effort.

Now, before anyone has a chance to comment and say “Paul, what the #$@! are thinking? Visual Studio is an essential tool and there are certain things that you can’t do with out it!”, I agree and will continue to use SSDT for a few key features. So, yes, I am not absolutely done using Visual Studio for managing projects other than SSAS, and perhaps for code check-in …I’ll finish this part of the story in a bit.

I want to be clear – I love Visual Studio.  It’s a great product for developing software and a variety of business and data solutions. However, history has demonstrated that the notion of stitching together several different products and expecting them all to just work together seamlessly is just untenable. Without getting into all the reasons that it has been difficult for Microsoft to develop and maintain a rock-solid tabular model design add-in for Visual Studio, contrast that effort with the evolution of the Power BI product.  The Power BI product team is entirely focused on developing one product by a development team under unified leadership, with a focused set of objectives. Negotiating co-development of any product by several different teams is difficult within any organization, particularly one as large as Microsoft. The reason that new features can be added weekly to the Power BI service and monthly to Power BI Desktop is that one product team manages all those features.

Some of you will remember the time when the Business Intelligence message from Microsoft was that we supposed to create solutions relying on coordinated components of many products like SQL Server (relational, SSIS, SSAS and SSRS), Windows Server, SharePoint and Office – all orchestrated to work together seamlessly. It was a good idea – and still is in moderation – but this approach produced a delicate and complicated beast that was difficult to manage and had many potential points of failure.

One of the reasons Power BI Desktop is such a wonderfully streamlined product is that the feature set is optimized for data analysts and not for IT developers. To maintain a streamlined product, we are not at all likely to see enterprise capabilities (like version control, multi-developer code merging and scriptable objects) added to this product. These capabilities do exist, however, for Analysis Services projects and community supported tools like Tabular Editor and DAX Studio. But now (drum-roll, please) Power BI dataset can be developed and deployed to a workspace using enterprise tools through the magic of the XMLA endpoint.

The Visual Studio/SSDT Quandary

<RANT>

Call it a learning disability, but I have tried time and time again to use the Visual Studio tabular designer to manage SSAS projects with the same outcome.  Smallish demo and POC projects go well but not-so-much when tackling the complexities of product-scale design. I guess it is just my natural optimism to hope things work out better than they did last time, but the laws of the universe dictate that if you do the same thing, history will be repeated.

Here’s how it goes… I start developing a data model in SSDT by importing some tables and queries, and adding relationships and measures.  All good, right?  At this point in the timeline, I often convince myself that development environment is stable and that everything will work-out so I forge ahead, believing that all will be good. I then add some more tables and a whole bunch of new DAX calculations – and soon everything goes to hell.  The model designer stops responding or behaves sporadically, Visual Studio crashes, the model definition file gets corrupted and I then I remember that I’ve been down this dark road before.

Recounting the painful past, it is frustrating to open a support ticket and explain to the engineer that “sometimes when I do that, this happens but not always” and “in all the confusion, I really can’t remember exactly how I got to this state.”

Pondering these memories, then I draft a blog post titled “I Am Done Using Visual Studio”

</RANT>

I sincerely appreciate the efforts of Kasper DeJonge from the SSAS product team back in 2012 as we spent hours in remote meetings trying to reproduce various odd behaviors in the tabular designer with a large data model. The fundamental problem was that the Model.bim file, which defines all the objects in the data model, was an enormous XML document (ours was approaching 100,000 lines.) Every change in the designer required the entire file to be re-written to disk and the loaded back into memory. Things improved significantly in 2016 and 2017 when the model definition was streamlined using JSON rather than XML, and the file structure was simplified to reduce the file size. Similar meetings with several other product leaders have proven that the product team are seriously dedicated to optimizing the enterprise tabular model experience.

I’m all about solutions and not just ranting about problems.  So what’s the answer? How should we manage enterprise BI data model and Power BI solutions from now one? Using Tabular Editor alongside Visual Studio is really a best-of-both-worlds experience. You can open the Model.bim file stored in the Visual Studio SSAS project folder.

Tabular Editor

Tabular Editor is a superb tool for developing and managing tabular data models for SQL Server Analysis Services (SSAS), Azure Analysis Services (AAS) and Power BI. It is a community supported tool created by Daniel Otykier, a Microsoft MVP and Senior Business Intelligence Architect with Kapacity.dk in Denmark. The most comprehensive resource to find this and other community supported BI tools for the Microsoft platform is on the Italians’ site at SqlBi.com/Tools

If the project is under source code control, changes made with Tabular Editor will be detected and can be synchronized with the remote source repository from Team Explorer in Visual Studio.

Here are some quick facts and recommendations:

Power BI and Version Control
Power BI Desktop files (.PBIX) do not support version control and code merging.
Recommendation:
Don’t try to do this – It will turn out badly. Starting model design in Power BI Desktop will save you time but once you transition to the Model.bim file format, use Tabular Editor.

Separating Reports and Dataset
A monolythic PBIX file created with Power BI Desktop containing reports, data model and queries is simple and easy to manage until you need to move beyond several limitations that this imposes.
Recommendation:
Power BI reports and datasets (data models) should be managed separately in all serious projects. Period. …whether you need to transition the data model to Model.bim or not.
Separating Power BI reports from the data model/dataset has many advantages which include allowing report and data model development to be performed in parallel and by different team members. This is an absolute must to create a certified dataset for users to connect and do their own reporting and analysis.

Visual Studio and Version Control
Visual Studio SSAS projects allow you to manage version control.
Recommendation:
This is a good thing. Keep doing this but use Tabular Editor as your primary model design tool.

A data model stored as a Model.bim file can have changes compared, split and merged between data model version files, deployed AS databases or Power BI datasets. Manage integrated source control with Azure DevOps or GitHub. Check-in changes, branch, merge, push and pull changes made by other developers but don’t use the Visual Studio Compare tool. Dependencies within the model definition can easily get you into trouble

Tabular Editor
Tabular Editor is a far superior design experience than Visual Studio. It is streamlined, easy to use and it won’t blow up when writing measure calculations. You can switch back and forth between tools since each tool has features that the other lacks. Just be sure to save and close the model file with one tool before opening it in the other …AND MAKE BACKUPS! The more that I do this, the more I prefer using Tabular Editor.

Tabular Editor doesn’t have a graphical model designer so I prefer to use Visual Studio to model tables and relationships. Set table and column properties, create calculated columns and measures, manage partitions and other tabular model features in Tabular Editor.

Start in Power BI Desktop and Migrate to an Enterprise Model
From Power BI Desktop, save the file as a .PBIT (template) which and then be opened in Tabular Editor. Once you save the file to the .BIM format, this is a one-way trip since a an Enterprise model cannot be saved back to a PBIT or PBIX file. Of course, if you start designing the data model in Visual Studio, there is no need to resave the model. You can just and use Tabular Editor to make new model enhancements.

Power BI Premium, AAS & SSAS
Power BI Premium capacity is required to deploy a Model.bim file as a dataset in a Power BI workspace.

Power BI Premium capacity is the E-ticket and is the best way to access all Power BI enterprise capabilities. With the XMLA endpoint, it will make more sense for most customers to deploy enterprise data models as datasets to a workspace rather than Azure Analysis Services (AAS) models.

Future-Proofing Data Models

If your organization chooses not to use Premium, which might be a more economical choice until you have enough licensed users to justify the cost, you can use AAS data models or on-prem SSAS data models to exceed the capacity limits of Power BI datasets under only Pro licensing.

If industry certification or compliance restrictions prevent your organization from using Power BI in the cloud, using enterprise models is the norm. Use a local instance of SQL Server Analysis Services Tabular. If you move to the cloud in the future, the transition should be relatively simple.

Keep in mind that Premium Capacity, or perhaps a more attractive licensing alternative, may be in your future so architect solutions in such as way that you can easily make that transition.

Demystifying the Power BI XMLA Endpoint

When the Power BI product team began promoting the new XMLA endpoint connectivity for Power BI I thought that this could be a game changer for IT-driven, enterprise class BI solutions. Now that I have used it in a few real solutions and actively working with clients, I’m sharing my experience about how it works and what you can do with it. The read-only endpoint has been in GA for Premium capacities and the read/write endpoint is currently in preview.

Before my long-winded introduction, I’ll get to the point:
Using the XMLA endpoint, the Power BI Premium service now includes the capabilities of SQL Server Analysis & Azure Analysis Services combined with newer data modeling capabilities of Power BI. Data models published to the Power BI service now support version control, scripted builds and team application life cycle management, enterprise IT tooling and scripted object management.

…wait, what? How?

Let’s just start by reviewing some of the challenges that have existed in the Power BI platform prior to the availability of this capability:

  • SSAS/AAS enterprise features
    Buried deep within the Power BI cloud service is the SQL Server Analysis Services (SSAS) Tabular model “Vertipaq” in-memory analytics database engine. The SSAS engine & on-prem product itself has many useful features not exposed in the Power BI implementation of data models and datasets. These “enterprise class” features are numerous including object-level scripting for deployments, source control, application life cycle, continuous integration and build management, data partitioning, perspectives, translations, currency conversion, KPI definitions, measure groups and calculation groups.
  • SSAS/AAS redundant services & costs
    At one point, Azure Analysis Services was a superset of data modeling features but now many exclusive features are available in Power BI datasets in a Premium capacity workspace so this choice isn’t so clear. Power BI and AAS also have separate costs. As the Power BI services continues to evolve, many new an compelling features are available only in Power BI datasets and not in Analysis Services; such as hybrid mixed-mode data models, aggregations, incremental refresh policies, dataflows and AutoML.
  • Source control, Application life cycle, builds & Team data model development
    Power BI Desktop is a convenient tool for creating an entire Power BI solution from soup to nuts but isn’t optimized for IT scale projects. A single PBIX file contains connections, queries, data model objects, measures and report pages. A PBIX file structure cannot be easily parsed, compared, branched or merged by established source control and build management tools. Analysis Services projects on the other hand support Microsoft development tools like Visual Studio, Azure DevOps and SQL Server Management Studio. Several mature third-party development tools like Tabular Editor, DAX Studio and the ALM Toolkit enhance or exceed the features of Microsoft’s development suite.

The XMLA endpoint in Power BI Premium bridges this divide by exposing the underlying Analysis Services database instances in the Power BI service. It really is as simple as that. Each workspace is in fact an SSAS instance and each Power BI dataset is an SSAS database. What you can do with traditional SSAS through management and development tools, TMSL or XMLA script, PowerShell or API automation; you can do with Power BI.

Use a Premium capacity workspace

If you have Premium capacity setup in your tenant, you are good to go. Several of our consulting clients do but to experiment and test new features, I use my own private tenant. In lieu of paying for Premium capacity, I can setup an embedded capacity in the Azure portal. Sizing the embedded capacity to A4 is the same as a P1 premium capacity but it can be paused when I don’t need to use it. The cost is about $8.50 per hour so MAKE SURE TO PAUSE WHEN DONE.

Here’s the Power BI Embedded capacity I created in my Azure tenant, sized to A4 and currently paused. It takes about a minute to start or pause the embedded service.

After starting the capacity, I jump over to the Power BI Admin Portal and click the capacity name to change settings. Note that because my capacity was added as an embedded service, it shows up in the Power BI Embedded page but the settings are the same for a Premium capacity.

Enable the Read/Write endpoint

By default, the XMLA endpoint will be set to Read Only. Under the Workloads group on the Capacity settings page, switch XMLA endpoint to Read Write and then save the changes.

Now any workspace set to use Premium capacity can be accessed with an XMLA endpoint address. I’ve created a test workspace in my tenant to test migrating an SSAS tabular database to a Power BI dataset. I select the workspace in the Power BI Portal and in the workspace settings, make sure that Dedicated capacity is switched on. I know that it is because of the diamond icon next to the workspace name. The Workspace Connection address is below. Click the Copy button to get the address on my clipboard.

Now, I can use that address in any tool that knows how to connect to SSAS tabular in compatibility level 1450 or above. Let’s try to connect using SQL Server Management Studio. I need a newer version of SSMS, 18.4 or higher.

Connect to SQL Server Analysis Services and paste the XMLA endpoint address for the Server name. You need to use Azure Active Directory authentication. If your organization uses MFA, you can use that option but I will choose regular AAD authentication.

…and viola! Connected.

I can run queries and scripts here in DAX, MDX, XMLA or TMSL. Most but currently not all capabilities are supported in the preview. In particular, if you have RLS roles, the members must be dropped and then added back in the Power BI Portal.

So far, I have scripted existing Power BI datasets and migrated them to Analysis Services projects in Visuals Studio, and then deployed to a new dataset from Visual Studio. The learning here is that migration is a one-way street from Desktop to Visual Studio. Whether development starts in Power BI Desktop or Visual Studio, there is no going back to Desktop. Ongoing development must be in Visual Studio.

Definitions: “Report”, “Database” and “Dataset

In the self-service Power BI world, the term “Report” has been used to mean at least two different things. With the queries and data model managed separately and deployed as a Power BI dataset, the term “Report” in this context means only report pages with visuals (don’t get me started talk about “dashboards”). A data model originating from a Power BI Desktop file is published as a dataset. Now that we see these objects through the lens of Analysis Services, a Power BI dataset is a Database.

In migrated projects, continue to author and manage reports in Desktop connected to the deployed dataset as an Analysis Services connection. You can actually switch the connection between an SSAS instance, an AAS instance or a published dataset using the XMLA endpoint address. I usually use query parameters for the Server and Database to easily change these values.

There’s a lot more that I will need to cover in later posts, but I’ll mention a few things briefly.

Best practices: After working with the XMLA endpoint, the choices now seem pretty clear to me but there was some confusion until I got to that point. Best practices will continue to emerge. In light of this and other recent announcements, I can say that I have a much clearer vision for how to plan and manage solutions (and to delineate between self-service and enterprise BI projects) than I did a year ago. If you have questions, please post them in the comments and I’ll do my best to address them in future posts.

Combining & querying datasets: The ability to use the the endpoint to query one data model from another model enables some very compelling composite model scenarios – but planning these solutions is important.

Large models & storage: The size and scale limitations are similar to those in AAS and generally only limited by your chosen Premium capacity. Since models are compressed and typically only include column data needed for analytic reporting, it is unusual to see datasets larger than a few gigabytes but Premium will support model sizes up to 400 GB.

By default, datasets deployed using the endpoint are stored in the Azure data centers using single file storage. This is fine for small models but after creating larger partitioned models, using large file storage will improve development and deployment performance by managing objects in multiple files on the instance backend. There will eventually be a UI for this setting but it is currently available only through the management API or via PowerShell.

Countries with Improved COVID-19 Recovery Rates

Is there any good news in the COVID-19 Coronavirus data? Everyone is familiar with the tragic growing number cases and fatalities globally and in most countries that have high populations. However, several countries have increasing recovery rates. You can click here to view the new Power BI report shown in the image below.

The experts keep taking about looking for “the peak”, when the number of people recovering from infection increases faster than new cases. We’re not there yet but where is the recovery rate increasing? The “Recovery Rate (3 Day Change)” measure is the Recovery Rate (e.g. The number of confirmed cases divided by the number of recovered cases) from three days prior to the most recent report date. This report ranks countries by this measure.

On the left side are the countries highest ranked in improved recovery rates over the past three days (pictured as of April 10). The line chat on the right shows the top 20 countries with their trend of Recovery Rates over the past 30 days.

Among these are countries with small populations or those with a small number of confirmed COVID cases, so even a small number of recovered cases yields a high ratio. To remove some of these outliers, the report page only includes countries with 100 or more cases. Some could have had a high infect rate that has since slowed or it could be a lag in the available recovery date. This report uses the dataset from the main COVID-19 Daily Updates report but just focuses on recover rates.

By the time you read this, the top 15 countries by improved recovery rates may change. Right now, these countries have the highest increase in recovery rates. Why is this and what can we learn from these results?

  • Djibouti
  • Algeria
  • South Africa
  • Guatemala
  • El Salvador
  • Austria
  • Australia
  • Germany
  • Cambodia
  • Azerbaijan
  • Brunei
  • Iraq
  • Venezuela
  • Iceland

I Need Your Questions about Power BI and Paginated Reports

For a new series of posts to include tutorials and “how to” demonstrations for Power BI, SQL Server Reporting Services (SSRS) and Paginated Reports, I need your help to know what questions are most compelling and top of mind.

Please scroll to the Comments section at the bottom of this post and help me create a list with your questions and suggested topics.

What questions do you have and what do you need to learn how to do?

Make suggestions and cast your votes on others’ ideas for video posts, demos and tutorials you would like to see. I will add these to a list and work through as many as I can. The following are some rough topic areas and ideas to get started:

Power BI Solution Questions:

  • Power Query techniques
  • Data modeling techniques
  • DAX calculation techniques
  • Visualization best practices
  • How to design a specific report feature
  • Of the many choices or approaches to address a business need, which is a recommended practice?

Paginated Reports & SSRS Questions:

  • How do Paginated Reports work in the Power BI cloud service?
  • How are Paginated Reports and SSRS licensed from Microsoft?
  • When should I use Power BI vs SSRS or Paginated Reports?
  • Navigating between different report types

New Coronavirus (COVID-19) Daily Updates Report

Updated again on April 7 2020. See video tutorial below

After publishing the original daily COVID-19 cases report on March 14, Johns Hopkins University changed the file format. This required us to publish an updated dataset with a new report, which you can access >here<.

Since the original solution was posted, I have received a tremendous amount of feedback, suggestions for enhancements and corrections. I introduced the tool in the earlier post. Members of the Microsoft MVP program have been a lot of help and several of my colleagues from Pragmatic Works have jumped in to add their insights and design support. We’re working on a Power BI app template that can installed into a Power BI tenant. In the interim, you can access the publicly accessible report through this link.

Using the report, we can track the daily progression of confirmed cases, recovered cases and deaths by country. Where available, these metrics are also available state or province and US counties.

The current project and future updates can be accessed using this GitHub repo.

There are now three different ways to access the COVID-19 Daily Updates Power BI dataset & report:

COVID-19 Coronavirus Daily Updates in Power BI

March 24 update – This post will be updated frequently for the next few days. Please watch for updates.

This is a Power BI report (<link) I have developed and published with public access to the web to be shared with anyone who needs access to this information. It contains daily updates from the Center for Disease Control (CDC) using data curated by the Johns Hopkins University Center for Systems Science & Engineering. To the best of our collective ability, the data is accurate but I cannot make any guarantees. Please validate with other sources before making any decisions with this information.

Additional enhancements and contributions are being made by Microsoft MVPs and community members:

After the initial version, my colleague Robin Abramson spent late evenings and a weekend to help work-through design details. I appreciate members of the Microsoft MVP community, Reza Rad and Miguel Escobar, stepping in to help with query updates to get scheduled data refresh working.

I’m very hopeful that the this report will be a valuable resource. It’s been a labor of love and considerably more work that I envisioned. But, I will continue to work on enhancements and corrections as I am able – based on feedback. I started working on this project to help a consulting client try to understand how the virus outbreak is affecting their customer order shipment delays and materials supply chain. That grew into a off-the-clock side project, demanding nights and weekends to get this far. Now, I hope we can use this information to proactively respond to this threat.

Please post comments here or contact me through Twitter, if you have feedback, comments and questions.

link to published report

The CDC and WHO began collecting COVID-19 case information from various sources on January 22 with the latest count of confirmed cases, recovered cases and deaths recorded by country, state or province. John Hopkins University collect this data every day and store files in a publicly accessible GitHub repository. On March 1st, they began geocoding the location for each case, where available, with the latitude and longitude. Location information is sparse but available frequently enough to observe trending.

Pete Gil at Pragmatic Works initially discovered this data source from another report published at worldometers.info. He scraped their web page and created an attractive Power BI report with the latest daily numbers. Springboarding from that project, I went back to the source files and created this new data model with daily snapshots and cumulative updates.

Watch for updates (where I’ll explore the design and more details) but this a quick tour of the initial set of report pages based on my published data model:

The first page provides some background information about data sources, credits and a report page menu:

Use the bookmark buttons to navigate to each page. You can also use the page number navigation buttons below the report.

The three measures displayed at the top of this and other pages show the latest counts, as of the highest select date range. Use the range slicer to limit the time-series charts and to set the “as of” date for the latest measures (Confirmed, Recovered and Deaths).

Right-click the line/area or stacked column charts to drill-through to details for a specific date.

The Global Cases page displays the aggregate case counts by country and for known locations. You can switch between the three measures using the radio button slicer. This changes every value on this page to use the selected measure.

On every page, you can narrow the view of countries using the Country Region drop-down list slicer. Use this slicer to deselect countries that have a high number so you can view and compare other countries. Hold Ctrl to select and deselect multiple items from the slicer item list.

The Country shape map definition was created by David Eversvelt. I’ve made some modifications to accommodate country names provided by the CDC.

I have created three separate pages with Country/State & Province maps. Only a limited number of shape map files are available in Power BI so I have selected the US, Canada and Australia for now.

Either use drillthrough or navigate to the Detail Matrix page. The matrix shows the progression of the selected measure over time within a region. Expand the geography hierarchy to view details by states or provinces for a country. The date range slicer in the top-right can be used to control the range of dates displayed as columns. Within the scope of the displayed data, the largest values are displayed with graduating shades of red.

To narrow the comparison, use the Country Region slicer to filter by country and change the scope of the conditionally colored cells. This allows you to remove irrelevant regions and focus on those of interest.

The Novel COVID-19 Coronavirus outbreak is a serious matter that is affecting our world in ways that we are only beginning to understand. If we can use this data to better understand what is happening , maybe we can use this information to mitigate the affects if this global event.

What questions do you need to answer and how do you need to use this information?

How can we look at it differently to provide better insight?

How do you need to correlate the the state of cases with other data to make decisions and forecast outcomes?

Correlating Corona Virus Outbreaks with Business Data using the ESRI Map in Power BI

At the time of this post, the world is dealing with a serious health and economic crisis. The COVID-19 Corona Virus is impacting the lives of people around the world and in turn, it is affecting world markets and industries and many different ways. For example, I am working with a consulting client whose material shipping and supply chain are being are impacted by the breakout and they need to quickly respond by making order changes and logistics choices. Forecasting and planning analysts must make adjustments to help the company prepare for these impactful changes.

This demonstration shows you how to create a multi-layer map to correlate current outbreak case locations with your own data, using Power BI and the ESRI map visual. I’m using sample data for demonstration but this is the same technique I am using for our client. In the real data set, correlation is impactful where shipping orders are being delayed and cancelled in areas most affected. For more, visit my blog at SqlServerBiBlog.com.

The technique used in this map report is relatively easy to implement because both data sources are separate feeds to the map service. There are different ways to correlate map data from two different sources. In our solution, we are also integrating the CDC data into the data model, which will allow us to perform comparison calculations. Using AI and machine learning, we may be able to perform predictions and associations.

Power BI Query Performance & Query Diagnostics

This post demonstrates how the order of steps added to a query can make a big performance difference and drastically effect the number of steps generated by the designer. I’ll demonstrate how to use the new query Diagnostics tools to compare and understand query performance.

The Power Query Editor for Power BI simplifies data transformation processing by generating query steps for each action you perform in the query designer. This whiteboard diagram shows the high-level flow of information through a Power BI solution. Every query has a source (“SRC” in the diagram) followed by a connection. The query consists of a series of transformations (“XForm”) prior to populating a table in the data model.

These steps are defined in “M” code which is executed when the data model is processed. In simple projects, all the query steps are automatically generated. The order with which you add these steps makes a difference. Not only does the order that you add steps to a query help organize and manage a query but it can have a significant impact on performance and the computer resources needed for a query to run. A little planning and iterate clean-up as you work through the design process can make a big difference.

The two queries shown here have exactly the same outcome and they were both created just by choosing transformations from the query designer menus. The only difference is the order that I chose the options.

Introducing Query Diagnostics

To understand how query steps are being processed an to compare two test queries, I use the new Query Diagnostics features on the Tool ribbon. In this simple test, this is really easy.

I select a query in the designer, start the diagnostics, perform a refresh and then stop the diagnostics. This generates two new queries with the diagnostics results.

I then choose the other query and repeat the same steps to get diagnostics for that query.

There is a boatload of useful information in the diagnostic results query but it’s way more than we need.

The most important information for this test is the Exclusive Duration column. For this test, I all need is to summarize this column. I did the same thing with both diagnostic queries and then compared the two results. Appending these two summarized diagnostic query results clearly shows the difference in performance:

Video Demonstration

This video demonstration is an exaggerated yet effective example of working through the process of importing a simple Excel worksheet and then transforming a series of columns. In the first example, I rename and change the data type of each column, one-at-a-time. In the second example, I consolidate the steps; renaming each column and then change the column data types. How does this simple change to my approach affect the generated query and execution performance?

Getting Started with the New Power BI Activity Log API

When the new Power BI service activity logging API was announced over the holidays, I was intrigued and anxious to start working with it. I’ve had some experience with report usage monitoring using the existing Office logs and usage metric reports that do provide some useful information but can be a chore to use. Activity monitoring and troubleshooting with the new logging API is focused entirely on Power BI tenant events like dashboard, interactive and paginated reports views, deployments, errors and data refresh events. This should be easier than before, enabling admins to be more proactive by tracking usage patterns. In a short series of blog posts, I’ll demonstrate how to build a complete activity logging and reporting solution for your entire Power BI tenant. In this first post of the series, we’ll just get started with the basics by capturing just a few log events for a brief window of activity and surface them in a simple report.

Before we get started,

a brief history lesson is on order:


When the Power BI cloud-based service was initially offered back in 2013 as a feature extension to Office 365, it used SharePoint Online as the storage architecture. Some additional logging events were added to the existing and numerous Office 365 events catalog. It has always been cumbersome to find relevant reporting information amid all of the other noise in these logs. Components of the Power BI back-end services have since been migrated into a more specialized service infrastructure but the activity logging has remained in Office 365 until December of 2019. The Office logs required special privileges within the Office tenant and produced volumes of event data related only to folders, files and Office documents.

The new Power BI Activity Log API is specially-suited and optimized just for Power BI. By contrast, it will be much easier to identity and track relevant service and user activity for workspaces, apps, dashboards, interactive Power BI reports and paginated reports in the Power BI service.

I envision that my production-scale logging solution will use an orchestration tool like Azure Data Factory to iterate over historical activity logs, store log files to Azure Data Lake Storage and then incrementally update a Power BI data model for reporting and analysis. This first example will use PowerShell script manually executed from my desktop.

PowerShell Me, Baby

The new Get-PowerBIActivityEvent commandlet is added to the Microsoft Power BI Management library. Install the latest version to gain access to the activity logs.

In my project, the first step is to open and run the PowerShell ISE as a local administrator. To installed the latest Power BI Management library locally, I execute this code :

Install-Module -Name MicrosoftPowerBIMgmt.Admin

I need to sign-in to my Power BI tenant with a service or user account that has tenant admin privileges. This code opens a standard login dialog and prompts for a user name and password, populating a Credential type object variable used to open a connection to the Power BI service:

$Cred = Get-Credential

Connect-PowerBIServiceAccount -Credential $Cred

In the Power BI product team’s Developer blog, Senior Program Manager Kay Unkroth explains that the Get-PowerBIActivityEvent commandlet can be called with date/time parameters to include only one day at a time. This line requests all activity on Junuary 5th, 2020, caching the activity log information as a JSON structure:

$activities = Get-PowerBIActivityEvent -StartDateTime ‘2020-01-05T00:00:00.000’ -EndDateTime ‘2020-01-05T23:59:59.999’

Finally, I write the log contents to a local file using this code:

$activities | Out-File -FilePath C:\Temp\RSActivityLog.json

Importing this file into Power BI Desktop produces the following simple table:

A couple of important things to point out

The API is optimized to handle large numbers of events. As such, it is limited to return records for a range of time up to one full day using the StartDateTime and EndDateTime parameters. The web service returns a continuation token return parameter to let you know if there is more data beyond a fixed frame size that will typically return about 5,000 to 10,000 records.

Incidentally, I’ve played with a few different file formats. JSON is by far the most flexible format but you may not get all the key/values you want just by consuming the results right out-of-the-box and without parsing all the nested levels. In Kay’s article, he uses the ConvertTo-JSON directive to flatten the native JSON document into a more conventional colon-delineated log flat file with named key/value pairs. Using this method, I was able to get more field information that those that apparently slipped through the cracks from the JSON document. Although, I had to massage the output a bit and then transform the rows into columns using some fancy pivot dancing in Power Query.

This simple report

is a first effort showing records for a short time frame. I’ve produced a single table with some recognizable grouping and slicing attributes but we can do considerably more. Using these fields, I analyze activities for only dashboard , Power BI or paginated report views. We can filter by user, object type, operation type, the web browser or devices used to view content, type of workload or the containing workspace.

In a fully-fleshed-out data model, some of the attributes might exist as separate dimension/lookup tables. But, this is enough for now.

Post Conclusion

Please share your questions and your own experience with the activity logging API, and watch for subsequent posts about tenant monitoring and activity management solutions.

Curator’s Corner: December 2019

As I read the many posts from those in the community who I follow, I am reminded that the community brain trust is much greater than any individual. As a writer and blogger, I’m occasionally compelled to express an original thought or opinion that I think is uniquely my own. However, we work in a world where everything comes from somewhere and there are many contributors who I trust and rely upon for advice and cutting-edge information. This “corner” of my blog is to highlight these community contributions that I find informative.

James Serra, Microsoft Solution Architect and former Data Platform MVP, continues a deep expose’ of Azure Synapse Analytics, with the sixth post in the series. This new Azure service headlined at both Ignite and PASS Summit, currently in Preview from Microsoft, is the evolution of the modern data warehouse. Azure Synapse Analytics is an orchestration of services including Azure SQL Data Warehouse, Data Bricks Data Lake Gen2. It will be an important consideration for serious cloud-based BI, analytics and data warehouse solutions at enterprise scale.

Azure Synapse Analytics & Power BI performance
Azure Synapse Analytics new features
Azure SQL Data Warehouse Gen2 announced
Azure SQL Database vs SQL Data Warehouse
What is Microsoft Azure Stream Analytics?
Azure Synapse Analytics & Power BI concurrency

Marco Russo, a name synonymous with DAX and BI expertise, captures what happened in the DAX world in 2019 in a an aptly-named blog post: “What has happened in the DAX world in 2019” :-). He also writes “I’ve seen the future, and it’s bright – but unfortunately, it’s under NDA!” and actually goes on to describe some of the announcements expected in the next year and major conference events.

David Eversveld, Data Platform MVP, writes about improvements to the Power BI theme capabilities. In addition to the new design ribbon in Power BI Desktop, themes can be exported from the designer. Adding to his collection of reference material that I have found valuable in my Power BI toolbelt, David posted this data color reference to assist with color selection for reports.

The new Power BI Activity Log was announced this month. This will make it easier to capture and monitor user and report activity. It also simplifies Power BI tenant administration by isolating report activity from Office 365 events and other log events. Power BI started out as an extension of Office 365 and SharePoint Online services but not all organizations use or manage Office 365 and Power BI under the same administration role. Microsoft continues to deliver on the promise to provide comprehensive automation APIs and tooling for administration.

The consistent contributions of Guy In A Cube’s Adam Saxton and Patrick LeBlanc are too numerous to mention. Notably, they were awarded the “Most Helpful Data Video Channel” by Data Literacy Awards. Data Literacy LLC is a Seattle-based training and education company founded by Ben Jones.

A Conversation with Ásgeir Gunnarsson about Power BI in the Enterprise

As I continue to explore and develop best practices for managing serious business-scale Power BI solutions, I’m having conversations with recognized community leaders. Last month I chatted with Ásgeir Gunnarsson on the SQL Train ride from Portland to PASS Summit in Seattle. Ásgeir is a data platform MVP and seasoned Business Intelligence expert from Reykjavik, Iceland who works as Chief Consultant for Datheos, a Microsoft-focused BI and Analytics consultancy in Copenhagen. He leads the Icelandic Power BI User Group and PASS Chapter.

He gave an inspiring presentation at Oregon SQL Saturday about Enterprise Power BI Development. You can view his presentation deck from the Schedule page here.

Ásgeir talked primarily about the development life cycle for for projects centered around Power BI, data and object governance. As I’ve mentioned in my earlier posts on this topic, the development experience for BI projects in general is different from application development and database projects and you cannot use the same management tools – at least not in the same way. He promoted using OneDrive for Business to manage version control.

He shared several excellent resources, many of which I either use or have evaluated, to help manage Power BI projects. The ALM Toolkit is a useful tool for comparing objects in great detail between two PBIX files. Ásgeir also show some efforts from community contributors to automate change-tracking file-level source control (which really made the point that it’s a difficult thing to do with Power BI). We know that Microsoft are working on an integrated release management solution for the Power BI service which may amend or replace the need for existing tools.

Guidance for publishing and management the life cycle for Power BI solutions included deployment automation using OneDrive and PowerShell. Using multiple workspaces for development, testing and pre-production; deployment can be managed using the Power BI REST APIs and PowerShell script, which is detailed in this tutorial. These PowerShell examples demonstrate how to clone workspace content, publish files and rebind data sources.

Regarding governance and security, he made reference to the extensive Microsoft whitepaper: Planning a Power BI Enterprise Deployment. He steps-through diagrams that help simplify each of the important processes and tasks for developing, deploying and managing Power BI solutions.

If you need to manage Power BI solutions, I encourage you to review his presentation and you can connect with Ásgeir on LinkedIn and Twitter.

Setting up Power BI project Team Collaboration & Version Control

Team file sharing and version control may be managed through Power BI workspace integration with OneDrive and SharePoint team sites. In this post, I will show you how to get started and setup a new project.

In this demonstration, I’m going to show you how to set up a workspace and a team collaboration site so that you can share Power BI files among members of your team and manage version control. In another post, we’ll discuss the nuances and the differences in version control, build processes and DevOps comparing a Business Intelligence project with an application development project – and how those are different experiences. An important lesson is to learn how to work with this tool the way that it was designed to be used rather than to try to force Power BI to work with incompatible build management and version control tools. Power BI, along with companion technologies like Analysis Services and Reporting Services, can participate in an application life cycle (ALM) and continuous integration (CI) framework but sometimes not in the way we might imagine. This post addresses a piece of that larger puzzle.

Get Started in the Office Portal

If your organization uses office 365, you have all the tools that you need. First thing is to go to the Office Portal at office.com. Click on the App menu and choose “SharePoint”. Here, we’re going to create a new site. I’m going to choose the Team Site template and this is going to be for my Contoso Sales project. We’ll give the site the name: “Contoso Sales Project” and click “Next”. The only other thing I really need to do here is add some users as owners to this site. You can add an office group or you can add users later on. I’ll go ahead and add a couple of users and we’ll finish so that creates the site in SharePoint Online. Here, you see my new site. What happens is in Azure Active Directory, a new group is created for this site and we don’t need to do anything extra. You’re going to see that here in the Power BI portal.

Create a New Workspace and Assign a OneDrive Group Alias

I’m going to create a new workspace for development. Depending on the formality of your project, you could have a DEV a TEST and a Production workspace. You can create one or perhaps two workspaces for a project. I’ll postfix the name to indicate that this is going to be my DEV workspace. In the Workspace Settings, I’m going to choose the Contoso Sales Project group that was automatically created for my site as my workspace OneDrive group. That gives every member of that group access to this shared space and gives me the ability to sync files with this workspace.

Add a Folder and Setup Sync

Let’s go back to SharePoint Online and view the team site. Since the site was just setup, you’ll see that it shows up at the top of the most recent list. The next thing that I’m going to do is add a folder that I can sync with my desktop so I can share files with other members of my team. We’ll go to the Documents library within the new site and here I’m going to add a new folder. I’ll click new and folder and we’ll give the folder a name so this is going to me my Contoso Sales Project Files folder. We’ll create that folder and then the next thing I’m going to do is configure that folder to sync with my desktop. Each member of your team can do the same thing which will enable them to put copies of the files within this synched folder onto their desktops. You always want to use Power BI Desktop to work in a local file folder. So, there’s my my synced folder on my desktop. It’s probably a good idea to add that to my Windows Quick Launch bar or add it to my Favorites so that I can get to it quickly.

I’ll put a PBIX file into this folder and in the future, I’ll go to do all of my Power BI development work. I have a file from an existing project that I’ll use for simplicity sake. I’m just going to add a .PBIX file and designate this is my dataset file. In another post, I’ll talk about separating datasets from reports – which is something that I routinely do, especially in larger more formal projects. I’ll go ahead and just copy and paste a .PBIX file and then rename that file eventually I’ll have one dataset .PBIX file that will contain all of my data, and that’s what this file is for. It currently contains a report with pages in it. When I make a change to that file, it immediately synchs – which you can see in the OneDrive app. If you just click on the OneDrive icon, you’ll see that that will sync-up. This is a fairly small file so it goes pretty quickly.

Publish a File from the Synchronized Folder

Back to the workspace: I’m on the Datasets page and I’m going to click “Get Data” which takes us to a page where we can see the Files tile. I’m going to click “Get” on that tileand that’s going to show me that there’s a new OneDrive tile with the name of the new team site. I’ll click that and it takes me to the team site where I’ll see the folder that I had created. I click that and now I see the .PBIX file.

When I choose “Connect”, that actually imports the file as a new dataset in the service. Using this method to publish the dataset will keep that file synced up and any changes I make will automatically be deployed. There is no more need to deploy updated dataset or report files from Power BI desktop. This file will remain synced-up all of the time. I can set-up a gateway, schedule refresh and all of the other things that I would normally do with a Power BI file as if I had deployed it from Desktop. Using this method, it’s going to remain synched-up through this SharePoint Online folder, which is managed through the OneDrive For Business application.

Testing and Demonstrating Synchronization

I’m going to open this file in Power BI Desktop and just make a simple change and save the file. We should see that it automatically gets synched-up. I’ll do that a couple of times and then we can go take a look at the automatic version control that takes place through SharePoint Online. OneDrive For Business is actually SharePoint Online under the hood. When you create a team site, it essentially creates a OneDrive For Business folder. That’s how all of that magic happens.

Incidentally, while synching the file – as you can see, I just got an alert and an invitation from SharePoint that welcomes me to the site. Each member of your group will automatically get an invitation like this with a link to the folder that I’ve shared with them. I’ve changed the folder view to large icons and you can see the little green check box icon that shows that that file is synced-up with my desktop.

Make sure to always open the .PBIX file from the synced folder on your computer rather than from the SharePoint site through your browser. If you make a change to the file in Power BI Desktop, you’ll see the changes in seconds to minutes, depending on the file size. It doesn’t really matter what kind of change I make… I’ll just go to the data model diagram view and make a change… I moved a table around (imagine that I’m adding a measure or adding a relationship, adding into the table anything that would constitute a change)… this flags this file and causes me to be prompted to save it. Now we go back to File Explorer and notice the little sync icon which I’m going to get for a few seconds in this case. Because it’s a small file, now it’s synced-up again.

Edit Local Files in the Synchronized Folder

After making changes, always close Power BI Desktop so the file isn’t locked by the application. That releases the file so that it can be locked momentarily by the OneDrive process and synched-up with the online service. You can watch the progress here in the OneDrive app. If this were a large file, you’d see a little progress bar and it might take a couple of minutes to synchronize. All members of your team will see the new file here in the file system afterward.

Using Version Control and Team Collaboration Features

From File Explorer, choose the “View Online” menu option to go to the SharePoint library in your browser. You can use the ellipsis menu next to the file name to check-out the file. This prevents changes by other team members while you have it locked. You can use the “Version History” menu option like I’m doing here. You can see that every single change generates a new version of the file. If I have admin access – as I do here – I could choose a version and restore or revert back to that version. You have complete control here so this gives us a really good version control story for Power BI and it gives us the ability to collaborate as members of a team.

Next Level Power BI – M Language Basics

This is a series of posts and exercises from a recent full-day workshop called Next Level Power BI. Check back for additional lessons and freely post your questions and feedback in the comments section after this post.

Power Query is a modern data acquisition and transformation technology and it is the “Get Data” component of Power BI and many other Microsoft tools. The query script language of Power Query is known as “M” or the Microsoft Data Mashup language. The menu options and selections in the Power Query Editor interface build this “M” script for you but to use Power Query at an expert level, you must understand fundamental concepts of the language. The post will step you through some essential mechanics of the language to build the necessary fundamental skills.

What you learn here can be applied to the Power Query implementation of other Microsoft products, which currently include:

  • Power BI Desktop Query Editor
  • Power BI Data Flows
  • Excel “Get & Transform” on the Data ribbon
  • SQL Server Analysis Services (SSAS) Tabular projects in SSDT/Visual Studio, or Azure Analysis Services.
  • Azure Data Factory: Power Query task

By the time you read this, there are likely to be more implementations of Power Query.

Basic Syntax

A quick review of the characters that are used to define or encapsulate groups of objects will help to better understand and recognize what you are seeing in the following code examples.

Parentheses, square brackets and Set braces all have different meanings. For references, here they are in the same order:

( ) – Parentheses are used to pass parameters to a function and to control mathematical order of operations.
[ ] – Square brackets encapsulate a set of records.
{ } – Set braces encapsulate a list of values.

Most functions in M are part of an object class (such as Date, Table or Text) and the syntax is something like this:

  • ObjectClass.Function( ),
  • ObjectClass. Function( Parameter ),
  • ObjectClass. Function( Parameter1, Parameter2 )

Examples:

  • Date.AddYear ( <date value> )
  • Table.FromList ( <list object> )
  • Text.End ( <text string>, 2 ) – returns last two characters from right of string

A few older functions that were part of the initial release of Power Query are not prefixed with the object class name. For example the #date( ) function has three parameters: Year number, Month number and Day number.

Any word may be used as a variable. Common examples include step names like Source. Variables are often multiple words containing no spaces, such as ServerName. You are strongly encouraged to rename steps with short but meaningful phrases to make them self-documenting and easy to understand.

Are you with me so far?

The Mighty #shared Function

This little function is literally the key to the entire kingdom and will return every Power Query object defined in your file along with every available M object class and function in the entire language along with complete documentation and code examples. This is the best kept secret in Power Query.

Add a new blank query and type the following into the formula bar, and watch while magic things happen:

=#shared()

The results will be returned as a list of objects and functions. Convert that to a table and you can view the definition of several hundred objects and functions. The M laguange contains nearly 900 functions and more are frequently added to new product versions.

Click the cell next to the link for any function to peek at the definition or click the link to explore related details.

Getting Started

The following examples are hand-written queries entered into the Advanced Editor in Power Query. None of these queries have an actual data source. Values are generated within the code itself but all of these examples can be applied to real queries based on any data source.

Each of the following queries adds a little more to the previous query to explore progressively more advanced objects and code techniques in the M language.

If you create a new blank query in Power Query, it will generate code like this:

let
Source = “”
in
Source

This is the fundamental pattern to get you started.

Values

Let’s modify the script to give the Source variable a text value of “This is a value” (creative, I know).

Use the Advanced Editor to view the generated M script.

Here’s just the script outside of the Advanced Editor:

//——————————————————-
//#1: Object Types: Value
//——————————————————-
let
Source = “This is a value”
in
Source

Use the Done button to close the Advanced Editor to see the result.

Unless they are explicitly data-typed, variable object and data types are automatically assigned, which you will see in the following examples.

The Source variable represents a value type object that is implicitly typed as text. You can see the resulting value in the query designer:

Records

In M, records are encapsulated in square brackets. This script generates two single column records.

//——————————————————-
//#2: Object Types: Record
//——————————————————-
let
Source = [Record1 = “Value 1”, Record2 = “Value 2”]
in
Source

The result is two single column records:

Lists

The List object in M is denoted using Set braces, sometimes called “squiggly brackets”. The following script creates a list of four text-type values:

//——————————————————-
//#3: Object Types: List of Values
//——————————————————-
let
Source =
{ “Value 1”, “Value 2”, “Value 3”, “Value 4” }
in
Source

The list is shown below. Note the column header in the preview grid reminds us that this is a List object. Also note that the designer shows the List Tools ribbon with options to convert the List to a Table, because that’s what we normally do in Power Query to be able to consume List data.

Now, we can start to combine these concepts and techniques. The following query script creates two records, each containing two columns with corresponding values:

//——————————————————-
//#4: Object Types: List of Records
//——————————————————-
let
Source =
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
}
in
Source

The designer shows two records as links. This is because these are not consumable objects. To use them, you can either convert them to a table or expand one record by clicking the link to view the values.

To peek inside a record without adding a new query step, you can click in a cell but not on the link and the value will be displayed in the preview area at the bottom of the screen.

The first time you see a link presented in this way, you are likely to click on it to see what happens. No damage done, but this will generate a step that produces a single record and you will lose sight of the entire list. To go back, just delete the generated step in the Applied Steps list.

Tables

If you use the Convert to Table button on the toolbar, that will add the steps you see in this query.

The Table.FromList function is a good example of several similar functions that serve a similar purpose. As you look through all the functions within the different object classes, you will notice several different “From” and “To” functions that convert an object from one type to another type. These are both easy to use and they are easy to identify and to understand their purpose. This one should be obvious: As the FromList function in the Table class, it converts a List type object to a Table. Another example, the ExpandRecordColumn in the Table class expands a column containing delimited values into separate columns.

//——————————————————-
//#5: Object Types: Table
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”})
in
#”Expanded Column1″

The result of these steps produces the results you see here. We began with a list of two records, each having comma-separated values. After the conversion steps, we now have a Table object with two rows and two columns with values and headers.

Transforming Each Row

When you use the Transform ribbon or menu options to perform most any transformation on a column, you are in essence saying “I want to perform this action on the specified column in each row of the table.” The key word in this statement is “each” and this is an important differentiation in the following M code that was generated when extracting the last character of the column named Column2.

An example of this generated query is shown in Query #6 below.

//——————————————————-
//#6: Object Types: Transform each row
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”}),
#”Extracted Last Characters” = Table.AddColumn(#”Expanded Column1″, “Right Char”, each Text.End([Column2], 1), type text)
in
#”Extracted Last Characters”

Let’s isolate just that step and examine the M code in the last line of the query:

#”Extracted Last Characters” = Table.AddColumn(#”Expanded Column1″, “Right Char”, each Text.End([Column2], 1), type text)

The ‘#”Extracted Last Characters”‘ variable is being assigned the outcome of The AddColumn function. We know that the return object type is a table because of the object class Table always returns a Table type object. The AddColumn function takes three parameters:

  • A table type object (usually the name of the previous step variable)
  • Name of the new column as text
  • An expression that returns a value for the new column

The last parameter argument in this case is the output of the Text.End function. You might guess that this is going to be a text type value. The End function, similar to the RIGHT function in Visual Basic, returns a specified number of characters from the end of a text string. But, since Column2 exists in every record in the table returned by the ‘ #”Expanded Column1″ ‘ variable, how do we specify that we want to do this for each record, one-at-a-time? …go got it. We use “each“.

Note to the Right Char column in the following results. It is the right-most character from Column2:

Transforming Columns In-place

Rather than creating a new column containing a value extracted from a previously created column, you can also modify an existing column. This is done using the TransformColumns function of the Table class. Again, we’re looking at the last step of the following query that was generated by using the Extract menu on the Transform ribbon or menu:

//——————————————————-
//#7: Object Types: Transform each row col in place
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”}),
#”Extracted Last Characters” = Table.TransformColumns(#”Expanded Column1″, {{“Column2”, each Text.End(_, 1), type text}})
in
#”Extracted Last Characters”

Here is the last step isolated:

#”Extracted Last Characters” = Table.TransformColumns(#”Expanded Column1″, {{“Column2”, each Text.End(_, 1), type text}})

Instead of the column name being passed to the Text.End function, we see an underscore. This is just a shorthand reference that means “use the column currently being transformed”.

Creating a Table from Records

Now back to completely hand-written code… This one’s easy:
By default, new columns are data-typed as “Any” which is essentially an unclassified data type and this is not typically a best practice. Columns should be strongly typed so that grouping, aggregation and DAX functions work correctly in the data model.

In the following query, we apply the TransformColumnTypes function from the Table class. The function requires a table and at least one column name and data type reference. You can pass any number of columns to this function, each encapsulated with set braces, and with another set of braces around the whole thing.

Here, you can see the results: one table with headers, two columns and three records. This all looks good.

Creating a Table from a Set of Records

Now, we’re going to start putting things together by using object type variables. I think you’ll agree that the code is much easier to read.

And we see similar results as before but with a much cleaner approach and readable code.

Extracting a Record from a Table

As a final assignment, I need to get only one record so I essentially need to unwind the objects that were used to assemble this table and get only one record from it. I want lemons so I need to figure out what the ordinal record number is within the table’s record collection.

Since there are three records, there is automatically an ordinal (index) number assigned to each. Sometimes you just have to guess if you don’t already know, if the first record starts at zero or one. Turns out that all these collections are zero-based so if you want to identify the second record (for Lemons) in the table, it’s going to be ID = 1.

Here’s the syntax:

let
Rec1 = [Product ID = 123, Product Name = “Oranges”],
Rec2 = [Product ID = 124, Product Name = “Lemons”],
Rec3 = [Product ID = 125, Product Name = “Apples”],

MySet = { Rec1, Rec2, Rec3 },

MyTable = Table.FromRecords( MySet ),
MyLemons = MyTable{1}

in
MyLemons

SO what’s all the fuss… what does this actually do?

I’m glad you asked. Let’s take a look.

Isn’t she a beauty?

It’s the best looking lemon I’ve seen all day.

Now you’ve seen the mechanics of how M language functions can be used to perform very simple value manipulations and transformations. More importantly, you have learned about the relationship between important objects like Values, Records, Lists of Values and Lists of Records, oh – and let’s not forget – Tables. That’s the one that actually does stuff in a data model,

I hope this has helped you to broaden your horizons with Power Query and the M language. Please comment with you feedback.

Chat with Donald Farmer at PASS Summit 2019

Enabling the modern data analyst by becoming shopkeepers rather than gatekeepers of information

I appreciated the opportunity to sit down and talk with one of my heros in the industry. Donald Farmer has been a recognized leader in the Business Intelligence and Analytics community for the past two decades and his thought leadership is even more relevant today. He played a key role in shaping the Microsoft BI toolset as a Principal Program Manager from 2001 when the products we use today were in their infancy. He’s also been outside of Microsoft circles which gives him a broad and unique industry perspective. As the Vice President of Innovation and Design at Qlik Technologies, he brought Qlick Sense to market. He remains a friend and advocate for the community. Donald holds multiple patents and has published multiple books. Today he is doing his own thing as the Principal of TreeHive Strategy.

The chat was 9:41 long and is worth watching every second. I have added captions to highlight a few key points which are also listed below. Please enjoy.

Key Points

Route Knowlege is a view of information at a micro level, like turn-by-turn directions. This might be important when you only need to answer a single question without a broad understanding of the entire landscape.
By contrast, Survey Knowlege is a view of information in the context of the bigger picture. Someone with Survey Knowlege knows “how the whole thing hangs together“.

Serving the needs of the modern Business Data Analyst requires a change in attitude. We should become provisioners of information rather than protectors of information. In other words, become shopkeepers rather than gatekeepers.

Actionable data is not always detail data and consuming a summation can not only be easier but more useful than getting lost in the details and the accuracy or precision of a number. Drilling to detail in context is a natural part of the exploration process. He talked about a pattern in the process to provide actionable business information, which includes these stages:

  1. Orientation
  2. Glimsing
  3. Examination
  4. Acquisition
  5. Action

I appreciate Donald taking the time to both speak at PASS Summit and to chat with me for this blog series. Please watch my blog for additional interviews and insights from PASS Summit 2019

Next Level Power BI – SVG Databars

Power BI table and matrix visuals can render SVG object elements in a column or measure to display in-line web content. You can build dynamic content like KPI indicators, databars and sparklines by altering SVG element content using DAX expressions. In this post, I will show you how to create databars that show negative and positive values with different color fills based on category values. Of course, Power BI has built-in databars in the conditional formatting options for numeric columns but you cannot alter the color and other attributes dynamically – at least not yet. For the time being, this technique provides more flexibility than the built-in databar feature.

Read This First

In this post from earlier this week, I introduced a simple technique to produce symbols and graphics using columns or a measure to render SVG images in a table visual. I recommend that you read that post to understand the simple implementation of this technique.

Dynamic Databars

Instead of a graphic symbol, the databar really just consists of a rectangle that has a variable start and end position. I want to change the fill color layer based on whether the value is negative or positive, and based on a category value in my data. You could apply any imaginable business logic to change the color or other attributes. I also want to layer formatted text over the top of the databar rectangle. Check out the example below showing a series of records. My simplified demo dataset causes the same value to repeat across the four Flintstones characters but this Databar measure could be applied to any set of data. Notice that negative values are red and the bar “starts” at the center and is proportionally longer as it extends to the left of center for larger negative values. The same is true for positive values that start in the center of the cell and move to the right.

Before we examine the code to generate the dynamic databar, let’s look at the rendered SVG element for a couple of these cells. If I hover over the cell for ID 1 and “Barney”, the SVG definition is displayed in the tooltip.

One fun thing about writing this blog post and showing you fragments of HTML script is that it is a little challenging to copy and paste the script into this post without it being interpreted and rendered as a graphic. This is one reason I’ve elected to use screen capture images.

Examine the above script and you will see the “<rect” element opening tag followed by attributes that specify the x axis, y axis and height values. These attributes draw the rectangle starting at 79 pixels from the left-most edge of the viewBox (left side of the cell), 80 pixels tall and 181 pixels wide. Keep in mind that the graphic does get rescaled with the table so these are only relative measurements. The fill property of the style attribute fills the rectangle with the color Red.

There is also a “<text” element with similar attributes that place centered (anchored to the middle of the viewBox) text at the mid-point of the viewBox width of 300 pixels. For simplicity, there are a few attributes I’m not mentioning but it’s fairly simple to recognize them and figure out what’s going on.

Hovering over the cell for ID 9 and “Barney” with the value 1495, the x and y axis values cause a DodgerBlue filled rectangle to start at 260 pixels and to be 236 pixels wide.

Show Me the DAX

The DAX code for the DataBar measure is below. The actual expression code is fairly simple but it took a little trail-and-error to work-out the measurements and scaling logic. Rather than building the expression in-line, I’ve invested work up-front to put all the values into variables. First, we get the minimum value for all the selected ID rows (using the MINX iterator function with the ALLSELECTED filtering function) and store that in the MinValue variable. The MaxValue variable is similar with the added condition to handle negative or positive values. This may all seem a bit verbose but its really just math.

There is a disconnected table in my sample data model named Flintstone Characters with character names that I place onto the Columns axis of the matrix visual. In a real-world solution, this could be a field like product categories, geographic regions or anything else. The BarFill variable is set to a color value (which could be either a web color name or Hexadecimal value) based on the selected character name. When grouping a visual on a value, the SELECTEDVALUE function will return a single value. After resolving all the variable values, this measure emits the SVG markup similar to the examples shown previously.

DataBar =
VAR MinValue = MINX(ALLSELECTED(‘Values'[ID]), calculate(SUM(‘Values'[Value])))
VAR MaxValue = MAXX(ALLSELECTED(‘Values'[ID]), calculate(SUM(‘Values'[Value]))) + IF(MinValue<0, ABS(MinValue), 0)
VAR BarValue = SUM(‘Values'[Value])
VAR FormattedValue = FORMAT( BarValue, “#,##0” ) –“#,##0.00” –“#,##0.0 %”
VAR svgWidth = 600
VAR svgHeight = 200
VAR svgFontStyle = “normal 70px sans-serif”
VAR BarHeight = 80
Var BarRange = ABS(MinValue) + MaxValue
VAR BarValueFactor = DIVIDE(svgWidth, BarRange)
VAR BarWidth = INT( ABS(BarValue) * BarValueFactor )
VAR BarZero