Composite Models Gen 2 and DirectQuery over Power BI Datasets

My wife bought me a nice telescope for Christmas. I’ve wanted one since I was a kid. We opened gifts early because we’re going to take the RV out in the dessert to watch the great convergence of Jupiter and Saturn next week. But, what am I doing now, after opening my gift? …updating Power BI Desktop to check-out the new Composite Model feature! This new feature of Power BI has been in the works for several years and we have been anxiously waiting for this announcement. It is truly the Utopian dream; the Holy Grail!

It is hard to articulate the impact that this capability can have for businesses. The new composite model gen 2 feature is a monumental breakthrough that will significantly raise the bar for Power BI model development and could change the way we think about designing enterprise BI solutions – but customers will need guidance to use it correctly. If you haven’t been sitting on the edge of your seat waiting for this announcement, read the announcement and understand how significant this is.

Power BI December 2020 Feature Summary | Microsoft Power BI Blog | Microsoft Power BI

There are been many attempts by Microsoft and other vendors to create a data modelling architecture that provides for fast access to cached data, direct access to live data and scaled-out connections to established data models. Remember ROLAP and HOLAP storage in multidimensional cubes? These were great concepts with significant trade-off limitations. No other vendor has anything like this. Way back in the day, Microsoft jumped on the Ralph Kimball bandwagon to promote the idea that a company should have a “one version of the truth” exposed through their data warehouse and cubes or semantic data models. They met customer demand and gave us a BI tool that, in order to bring data together from multiple sources, makes it easy to create a lot of data silos. Arguably, there are design patterns to minimize data duplication but to use governed datasets, self-service report designers are limited to connecting to large, central models that might only be authored and managed by IT. This new feature can restore balance to the force and bring us back to “one version of the truth” again.

To put this into perspective, here’s a quote from Alberto Ferrari’s recent article: “It is not easy to find a name that accurately describes the massive step forward that is this feature in the evolution of Business Intelligence solutions. It would fully deserve a hyperbolic name, but the marketing departments already overused so many terms over the years: “universe” was already used in the past century; you can imagine what happened after that. We like “composite models” because it is a name that concisely describes the purpose of this new feature.”

Alberto has a great video emphasizing the magnitude of this release: New composite models in Power BI: A milestone in Business Intelligence – SQLBI

Those of us who live a breathe BI will continue to have thoughts, opinions and eventually will have war stories and prescriptive guidance to offer about using this new capability most effectively.

Upon initial review, it seems to work exactly as expected. In future posts, I will demonstrate how to design and implement a next-gen composite model and share my experience and thoughts as we move down this new road.

Power BI Expert Resources

Where to go for best practice advice for Power BI: The Data Gods

One of the most common questions I hear from customers and students is “what are your recommended resources for learning and staying current with Power BI?”. I find some people get stuck learning Power BI because they are simply overwhelmed with information and choices. Google search will often raise more questions than answers because there is a lot of noise on the Internet. Expert advice can be found only after dismissing the myriad of posts asking questions, sharing opinions, and sifting through outdated and confusing information. Most information found in discussion forums is outdated because the product changes so often. So, what are the most reliable sources for Power BI expert advice and best practices?

I follow several industry experts and I own many great books. Chris Wagner (KratosBi.com | @KratosBi), Microsoft MVP, has compiled the most comprehensive go-to list of Microsoft data experts called the Data Gods (you can see the complete list here). When I was added to the collective of featured “Data Gods” last week, I was hesitant to brag about. I mean, there’s no modest way to post this picture. The title of “Data God” is a lot to live up to – but it is truly an honor and I want to thank Chris and others for this recognition.

More importantly, this is a concise list of reliable industry experts – which I have paired-down to those who blog and post regularly about their experiences and advice. I know and follow nearly everyone on this list but there are a few folks with whom I have not yet had the pleasure to meet. There are also others featured in the Data Gods collection who make significant contributions in other ways, aside from frequent blogs and articles. I’ve also added a few additional notable community contributors.

Free Best Practice Resources

Following is my selection of members of the Data Gods counsel with links to their free resources. Most of these are blog sites or web sites that include blogs and articles available for free without requiring a sign-in or subscription:

Alberto Ferrari & Marco Russo – SQLBI.com | @marcorus | @FerrariAlberto

For most of us who work day-to-day with Power BI and other Microsoft BI tools, SQLBI.com is the first stop for expert advice. Alberto and Marco have set the gold standard for Microsoft BI best practices and design patterns. The breadth and depth of their expertise is expansive and they are known for going deep into the nuts and bolts that make Power BI work. These guys are are literally an extension of the Microsoft product development teams by guiding Microsoft leadership and evangelizing the platform. Their books on DAX and Tabular model design are considered by the most serious Power BI Developers to be the most comprehensive reference books available.

Patrick Leblanc & Adam Saxton – GuyInACube.com | @GuyInACube | @awsaxton |
@patrickdba

You either know these guys or you’ve never searched the web for Power BI help. Adam started the Guy In A Cube YouTube channel several years ago as a side project while doing BI technical support work for Microsoft. Patrick came from a BI and SQL Server consulting background before joining Microsoft and teaming with Adam. Today they share design techniques and feature highlights in weekly, high-quality short videos and a Saturday morning live streamed Q&A session. Adam and Patrick are members of the Customer Advisory Team (CAT) focused on supporting Power BI enterprise customers. Their weekly Power BI community Roundup video features the most current Power BI blog posts and release notes.

Chris Webb – blog.crossjoin.co.uk | @cwebb_bi

Chris is one of the most prolific, enduring and consistent bloggers in the Microsoft BI industry. His posts and articles go deeper then most, prepared with thought, effort and insight. Before joining the Microsoft Customer Advisory Team (CAT), he ran a successful training and consulting firm, specializing in the Microsoft BI platform. Over the years he was, and continues to be, the go-to expert on SQL Server Analysis Services, MDX, Power Query and DAX; and continues to share his knowledge and advice through his blog.

Matt Allington – Excelerator Blog | @ExceleratorBI

Matt’s main focus is business-centered self-service BI and financial analysis. He is a tried-and-true expert in the Excel and Power BI Desktop space. His training, presentation and writing style is clear, authoritive and to-the-point. He has the best collection of intro and intermediate level books on Power BI and DAX; each is short, concise and easy-to-understand. His books, Supercharge Power BI and Learn to Write DAX are excellent way to get started.

Seth Bauer, Mike Carlo & Steve Campbell – PowerBI.Tips | @PowerBITips | @Seth_C_Bauer | @Mike_R_Carlo | @PowerBISteve

PowerBI.tips is an expansive collection of free resources and utilities for Power BI with focus on visual themes, layout and color selection. This dynamic trio covers the important bases: Seth on visual design, Mike on enterprise solutions and Steve focusing on application development. They have an array of external tools for things like connection management and project documentation.

Melissa Coates – CoatesDataStrategies.com | @SqlChick

Melissa is one of the deep thinkers in the industry, always asking the right questions and challenging the status quo. She’s also one of the most pleasant people I’ve ever met. Her blog posts are thorough and insightful. Melissa has co-authored multiple whitepapers for Microsoft on data governance, enterprise deployment and migration.

Leila Etaati & Reza Rad – Radacad.com | @leila_etaati | @Rad_Reza

Leila and Reza are known as one of the power couples in the industry. With nearly limitless energy, they have traveled from New Zealand over the years to present at more conference events than I can count. Leila is one of the BI industry’s thought leaders in the machine learning and AI space, with education and credentials behind her experience. Reza’s expertise with Power Query and Power BI are impressive. They freely share their knowledge through their blogs, articles, training classes and have published numerous books.

Kasper de Jonge – KasperOnBI.com | @Kjonge

Kasper is a Principal Program Manager on the Power BI product team and based in The Netherlands. A regular conference session headliner, Kasper is very tuned into customers’ real-world enterprise BI reporting problems. He has been blogging for years and has recently launched a YouTube channel.

Alice Drummond – DiscoverEI.com | @AliceDiscoverEI

With an emphasis on environmental data analysis, one of the things I appreciate about this Alice’s active blog series is that with each post, she shares a comprehensive collection of blog sites and training resources to help readers find supportive information and expand their knowledge.

David Eldersveld – DataVeld.com | @DataVeld

If David can’t figure it out, it probably can’t be done. Over the years, he has pushed the limits, discovered and developed some very unique and interesting techniques to integrate things like graphics and geospatial data with Power BI. David is very active in the community and shares different ways to take Power BI beyond the out-of-the box features to do things that many of us had no idea was even possible.

Brian Grant – CSGPro.com | @BrianGrantBI

Brian has a unique intellect and a passion for problem-solving and sharing knowledge, particularly related to Power Query and DAX. I’ve had the pleasure of collaborating with Brian on multiple occasions in conference presentations and training events. He and his colleagues have developed an interesting method to understand the mechanics of the DAX language that he shares in a video series called the Elements of DAX.

Reid Havens – HavensConsulting.net | @HavensBI

In a few short years, Reid has risen to all-star status by tirelessly designing and sharing data visualization techniques. He is a master of spit-and-polish presentation and report design.

Meagan LongoriaDataSavvy.me | @MMarie

Meagan has broad expertise but her focus is designing reports for accessibility and accurate data presentation. Her posts and conference presentations are very insightful. She challenges the status quo and shares specific techniques to provide accessible report designs.

Matthew Roche – BIPolar.com | @SQLAllFather

Matthew is a member of the Microsoft Power BI Customer Advisory Team (CAT). He has considerable experience with several Microsoft development initiatives that have culminated in leading products and Azure services that are used to create enterprise data solutions. He is an industry thought leader who is very active in the online community. He shares his insights, leadership and expertise through his blog and industry conference events, most notably related to modern data governance and Power BI dataflows.

Gil Raviv | DataChant.com | @GilRa

Author of Collect, Combine & Transform Data in Excel using Power Query and Power BI, and excellent reference book. Gil maintains an active blog.

SqlServerBi.com | @Paul_Turley

I add my own blog to this resources list where I share my thoughts about best practice design and development approaches in this series: Doing Power BI The Right Way.

Others

Gilbert “Q” Quevauvilliers – FourMoo.com | @GilbertQue

From has vast experience consulting with IT and business consulting customers, Gilbert shares relevant examples and lessons learned. He is an active blogger and community supporter.

Imke Feldmann – TheBiAccountant.com | @TheBIccountant

Deep, business-focused, creative problem-solving techniques using Power Query/M in Power BI and Excel.

Ruth Pozuelo Martinez – Curbal.com | @CurbalEN

Ruth puts tremendous energy and creativity into her video presentations which she posts frequently. She covers everything from intro to moderately advanced level Power BI design with complete and thorough examples, all presented in a stunning greenscreen format.

Parker Stevens – BiElite.com | @PowerBIElite

Parker’s blog site is an extension of his “BI Elite” YouTube channel, with numerous short tutorials and excellent “how-to” examples.

James Serra | JamesSerra.com

Thorough and consistent, James shares announcements and guidance about the Microsoft data platform and data warehouse architecture.

Books

Beginner
Intermediate/Advanced

Doing Power BI the Right Way: 7. Planning for separation – data models and reports

Part of the the series: Doing Power BI the Right Way (link)

Back in the day before Power BI came along, when we created BI solutions, reports and data models were separate. If you created a cube or Tabular model with Analysis Services, it was developed with Visual Studio and deployed to a server. Reports could be authored and deployed to the report server, separately. Now, with Power BI Desktop, you author your data model and report in the same development space and then deploy the whole kit and kaboodle to the service at once.

The Thick and Thin of Reports

Power BI reports generally come in two varieties. If you open up Power BI Desktop, choose “Get Data”, build a model and a report; all of that stuff ends up in one self-contained PBIX file. Conversely, a report using a live connection to a published dataset is often called a “thin report” because it contains no data or data model. Thin reports offer several advantages: not only separation of development effort but also centralized control and the ability to share published data models with multiple report authors.

Even when a single PIBIX file contains everything, Power BI actually separates the report from the data model (called a dataset in the service) and gives them both the same name. This is very convenient for self-service projects because it is quick and easy to make changes to the data model, queries and measures if you need to make a report enhancement. This is all well and good for small, one developer projects but what about those larger scale solutions where the data model and reports are developed by different folks, or when multiple reports are connected to the same data model?

At what point does it make sense to separate the data model and reports into separate files?

I believe that managing reports and the dataset separately is a best practice in enterprise solutions. However, like a lot of “proper” practices in the IT data world, it is far less convenient to “do it right”. In each project, we must exercise an appropriate degree of discipline. I’ve worked on projects that were so laise fair and lacking management structure that they produced a tangled web of garbage. I’ve also worked on projects managed with strict adherence to rules and restrictive application development methodologies; where report and BI developers had no freedom to be creative; and thus producing bland and useless reports. It is hard to prescribe a set of rules that always works for every project with the right dose of balance. A certain degree of freedom is necessary to learn-as-we-go and produce masterful analytic reports. At the same time, the process requires a well-defined plan, time-boxing and prioritizing of features.

When I started writing this post, I was working on a consulting project that started out with the dataset and report in two separate files. Under that plan, one developer could work on the data model and publish it to the DEV/QA workspace and then the other developer could design reports as new modeling features become available. In theory, this is typically the right approach. The requirements for this project expanded and we found ourselves in a rapid iteration cycle, needing to make model, measures and report changes. As a result, I brought the report pages back into the dataset PBIX file. There is no elegant way to do this but it wasn’t difficult to create new pages and then copy and paste the visuals from the separate report file to the combined dataset/report file. Bookmarks, drill-through and tooltip pages had to be rewired but it wasn’t a difficult task. Kasper De Jonge walks through the process in this blog post from May, 2020.

In cases where you have crisp requirements and a clear division of labor; you can start with two files to delineate the data model and the report. In cases where you are actively prototyping and working alone, developing a single PBIX file containing all the parts will allow you to move quickly. When it becomes more important to lock things down and manage releases than to keep iterating, move report pages to a separate PBIX file that is connected to the published dataset in the service.

Hot Swap Live Connections

I’ve stared using a new external tool that promises to make this process much easier. The Hot Swap tool is a utility developed by Steve Campbell and available from PowerBI.tips as part of the Business Ops externals tools package.

This simple tool allows you to start a project with separate data model and report files and then to swap the live connection between copies of the model: either a dataset published to the Power BI Service or to a PBIX file on the local desktop using a local TCP port. Hot Swap is a community-developed external tool that is not officially supported by Microsoft but my experience in testing the tool thus far has been very good. I used an earlier solution from Steve in a project about a year ago that worked-out quite well. It was a set of PowerShell scripts that rewired connects in much the same way. Having this capability in a simple pop-up dialog is an even more convenient option. You can check out this demo of the Hot Swap tool in the PowerBI.tips YouTube channel.

One of the tings I like about starting with two files is that we don’t run the risk of design issues left-over after moving report pages and visuals to another file.

The new approach is simple: begin your project with separate data model and report files. You can publish an early version of the data model file to a development workspace in the service and connect to establish a live-connected report. Then, run the Hot Swap tool and switch the connection to a local copy of the your data model to continue iterative development. Changes to the local data model PBIX file/Power BI Desktop instance are immediately available in the report PBIX. Publish changes whenever you need to and then you can use Hot Swap to remove the local connection. You don’t actually use the tool to perform the reconnect because Power BI Desktop allows you to use the normal Get Data options after a live connection has been removed from the report file. Regardless, it is a simple process.

Recommendations

For small, informal projects developed by one person (or one developer at a time), keeping the data model and report in one file is convenient and easier to make changes as you go. If you later decide to share the dataset for others to create additional reports, or to promote or certify the dataset, you should separate the dataset file and create a “thin” report using a live connection to the published datset.

For moderate-scale and formalized Power BI projects, begin with two separate PBIX files or make it a point to separate the data model from the report file before you deploy to the Power BI service. Consider using the Hot Swap tool to help manage local mode development.

Doing Power BI the Right Way: 6. Validating data model results – Part 2

Part of the the series: Doing Power BI the Right Way (link)

Part 1 is here

Validation 301: Creating a Validation Report Solution, Part 2

We often have users of a business intelligence solution tell us that they have found a discrepancy between the numbers in a Power BI report and a report produced by their line-of-business (LOB) system, which they believe to be the correct information.

Using the LOB reports as a data source for Power BI is usually not ideal because at best, we would only reproduce the same results in a different report. We typically connect to raw data sources and transform that detail data, along with other data sources with historical information to analyze trends, comparisons and ratios to produce more insightful reports.

However, if the LOB reports are really the north star for data validation, these can provide an effective means to certify that a BI semantic model and analytic reports are correct and reliable.

This is a pattern that I have applied for consulting customers. My current client refers to the main Power BI report and dataset as their “dashboard” solution. The data model is large and complex with sophisticated reports used monthly by hundreds of users. The dataset is setup with a scheduled refresh. The “dashboard” solution has been thoroughly tested and we are careful to only apply updates and enhancements infrequently under a rigorous application lifecycle management process.

At the end of each month, a user exports their LOB report to an Excel file and drops it into a SharePoint library. I have one consulting client who use an automation tool to essentially open the LOB application, punch in the right keystrokes and export the report to a file at the end of every month.

The format of these exported reports have proven to be fragile and we don’t want to risk destabilizing the main Power BI “dashboard” solution. So, rather than importing the validation files directly into the main dataset, I have created a separate Power BI project with a dataset & report that has two sources. The first data source for the validation report connects to the main “dashboard” Power BI dataset using the Analysis Services connector and the XMLA endpoint. A DAX query is issued to return values to compare with the LOB report totals. The second data source for the validation report imports the validation files that were exported from the LOB report. If that import process were to fail for any reason, it doesn’t affect the main dashboard solution.

This diagram depicts the validation solution at a high level. The main Power BI “dashboard” solution get its data from the corporate EDW. That data is processed through rigorous ETL pipelines and contains data extracted from multiple systems used by the corporation.

A lightweight DAX query imports summarized results from the main dataset into the validation dataset.

Before we go any further, I just want to make the point that in most normal Power BI data models, I would never import data from another published Power BI or SSAS data model. That’s just a bad idea and can make it very difficult to trace data lineage. Do not use DAX queries as a data source for your regular report models. This validation report is a rare and appropriate exception to the rule.

The second data source for the validation solution is a set of files exported from reports produced by the LOB system. These reports only contain summary information but can be compared to totals from our Power BI solution. Values from the two sources are compared and the validation report shows whether they match or fail validation.

Just a note about the decision to create a separate dataset. There were two reasons for this:
1. We could not control the format of the validation files. If the file format were to change, it would cause an error and stall our scheduled refresh cycle for a large mission-critical report used by several hundred high-level employees (whose job is it to tell the CEO why his dashboard doesn’t work?)
2. Today we do not have the ability to use a live connection to a Power BI dataset in a composite model along with other sources, but we will in the near future. This capability doesn’t address problem #1 but will certainly change the way I think about architecting large, multi-source models in the future.

Writing the DAX query to read the summary values from the main Power BI “dashboard” dataset was a great learning experience. Although I am now pretty good at writing DAX queries by hand after a couple of these projects, the easier way to get started is to allow Power BI Desktop to generate the query for you. Here are the steps using the ContosoDW dataset as an example:

  1. Add a table visual to the report canvas. Add the same field and measure used in the LOB report to be used for comparison.
  2. Turn off totals for the table visual.
  3. Add and set slicers or filters to equate the LOB report results.
  4. On the View menu in Power BI Desktop enable the Performance Analyzer.
  5. In the Performance Analyzer pane display to the right side of the report canvas, click the Refresh visuals link.
  6. Click the plus sign next to the table visuals to expand the performance counters for that visual.
  7. Click the Copy query link.

The DAX query is now in the Windows clipboard and can be used to run the same query that the designer produces to populate the table visual. To test and modify the query, do the following.

  1. Make sure you have the latest version of DAX Studio installed. If not, install it from daxstudio.org. Save your work, close and reopen Power BI Desktop after installing DAX Studio.
  2. From the External Tools menu in Power BI Desktop, click DAX Studio. This opens DAX Studio with a query session connected to the local Power BI dataset.
  3. Paste the clipboard contents to the query window and run the query.

Here’s the query that was generated by Power BI:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Product'[Category Name], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Product'[Category Name]

This code can be simplified for our needs. The essential code is the expression using the SUMMARIZECOLUMNS function. You can disregard everything after that. Make the following changes to simplify the DAX query code:

  1. Delete everything after the closing parenthesis for the SUMMARIZECOLUMNS function.
  2. Remove the variable line just before the SUMMARIZECOLUMNS function and replace it with “EVALUATE”

The resulting query code should look like this:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])
EVALUATE
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

This is the query I would use to populate a new table in my validation dataset. Just use Get Data to open the Power Query editor, create a new SQL Server Analysis Services connection, paste the workspace path for a premium enabled workspace and connect to the published dataset for my main Power BI solution. Then finally, paste this query code into the query window.

Next step is to import the validation file(s) that were exported from the LOB report. In our validation solution, I used the SharePoint connector and imported all files with a predefined file name prefix. After the validation files are imported, make sure that the column data types are compatible, load the table and wire-up a relationship between the two tables. In our last validation solution, I created a new measure that compares a measure from the summary table imported from the main “dashboard” solution dataset with a comparable measure from the validation file source table. Measures calculated the difference and the % difference between the two. I then used a KPI indicator icon in the table visual using conditional formatting to show if the differences were within an acceptable range.

This screen capture from the validation page has the real data blurred out but you can see the grouping and comparisons resulting in either green or red indicators.

Each month, I receive an email from our user who is responsible for the report validation. He tells me that at a quick glance he is able to verify that the totals from their line-of-business application and Power BI solution match, and that the executive leaders and financial analysts who use these reports can be certain that they are accurate and reliable. When an item doesn’t match, they can pinpoint the account in question and usually are able to track it down to a data refresh timing issue or a mistake made in their source system that hasn’t yet been reconciled.

We are now setting up alerts using KPI tiles on a dashboard that will message key users when validation either succeeds or fails in key areas. As a result, the confidence in this solution has increased significantly and business leaders trust the information they get from these reports.

Doing Power BI the Right Way: 6. Validating data model results – Part 1

Part of the the series:  Doing Power BI the Right Way (link)

Moving important business data into a data model for analytic reporting can often be a two-edge sword. Data retrieval is fast and can support all kinds of analytic trending and comparisons. But, data in the model may be one or two layers away from the original source data, making it more challenging to compare with familiar user reports. Often the first validation effort after transforming and loading data into the model and then visualizing the initial results is having a business user say “yeah, that looks about right.” Then, sometime later after more development and extensive user testing, the feedback might be “hmm, that seems a bit off.” …not exactly scientific.

I have been doing a lot of data validation work lately – both formally and informally. Informally: Validating calculation results from a Power BI data model is just a normal part of the development process. Formally: After developing and delivering an entire business dashboard solution, a formal validation process is used to validate the ongoing results after future data refresh cycles and to certify reports so that business leaders know they can trust them.

Particularly for database developers, making the leap to using reports built on a tabular data model instead of the source database is a mental transition. The developer has to stop thinking in SQL and start thinking in model mechanics. This can pose a challenge when needing to validate numbers in a report, which should be done using the data model and not only by running by a SQL query.

Validation 101: Create & Use COUNT Measures

When designing a new data model, this is typically the first thing I do… For every fact table and for each large dimension table, I create a measure that returns the record count for that table. Users normally think about their data in business terms (like sums, ratios and averages) and not about how many records there are in a table. Record counts are a convenient sanity check for record completeness and distribution across groups; and may also be a good indicator for model size and performance.

Let’s say I have a data model containing the tables: Sales, Orders, Customers, Products and Date. At minimum, I will create the following measures with code similar to this:

Sales Row Count = COUNTROWS( ‘Sales’ )

Orders Row Count = COUNTROWS( ‘Orders’ )

Customer Count = COUNTROWS( ‘Customers’ )

These measures allow you to quickly troubleshoot problems that might otherwise go unnoticed – with model relationships, key matching, orphaned records and missing records. Just drag-and-drop the measure and a field to group the values. For example, if I drag the Sales Row Count measure and Year from the Date table onto the report canvas, I can immediately see sales records across years. If I do the same thing with the customer region field, I might be able to spot a gap in the region distribution of records over time. If every column in a column chart (created by default when dragging a numeric field) shows 100%, then I know that I either forget to create a relationship, have a mismatched data type or forgot the “Set as Date Table” setting for the Date table. Easy peasy.

Can’t I just count any field value rather that creating explicit measures? – Yes but don’t. Coming from an SSAS multidimensional background, I learned to define measures for every useful numeric metric. If we needed to show the SUM or the COUNT of anything, we needed to define a measure for that. In Power BI and SSAS Tabular, you can simply use implicit measures – meaning you can just drag-and-drop any field and then indicate that you want to count each occurrence or sum up the numeric value of the field. For quick-and-dirty Power BI reports, this is a convenient way to get simple aggregate results but not a best practice in my opinion because of inconsistent behavior and other limitations. There is a growing consensus in the Power BI community that using explicit measures should be the norm and not the exception. If you always use explicit measures rather than aggregating numeric columns, users and report developers will always have a consistent experience.

Validation 201: Measure Dependency Chains

This is a fairly normal pattern:

  1. Create a simple base measure using a simple expression, let’s say:
    [Employee Count]
  2. Create additional measures that reference and then perform some logic on the first measure. For example:
    [Employee Count Females] and [Employee Count Males]
  3. Layer another new measure on the second layer of measures. Like:
    [Female to Male Employees Ratio]
  4. Create another measure to apply special business logic, let’s say to conditionally apply row-level security and non-visual totals:
    [Female to Male Employees Ratio (RLS Adjusted)]
  5. ..and so on

This is a simple example but it is not unusual. Reusing measures helps to separate reusable calculation logic but it can also create deep dependency chains. At some point, you will inevitably discover a calculation error that you will need to track-down and debug. I can’t remember my own name some mornings so I certainly cannot work through the mental gymnastics needed to keep track of every dependent measure.

When I have a visual that is not displaying the expected results, I usually do the following:

  1. Duplicate the report page and prefix the page name with TEST: (something meaningful)
  2. Delete visuals not needed for testing
  3. Convert the visual to a table so I can just see the numbers grouped by a field
  4. Drag the referenced measures into the table to create multiple columns. Validate each dependent measure.

This helps to break the problem down into manageable steps and components. If you know that the first measure is always behaving as expected, move on to the next measure that depends on it. If the problem isn’t immediately apparent, recreate the same conditions with slicers. By isolating the problem, I can find and fix the issue every time.

How to Track Object Dependencies

Keeping track of dependencies between layers of measures can be challenging. I keep a scratchpad on my desk to make lists of things I need to remember. Sometimes I don’t even save my notes but it just helps me organize my thoughts. I use a whiteboard for a similar purpose, diagramming dependencies between measures and other objects with boxes and lines. Just standing and diagramming on the whiteboard is a very effective problem-solving technique for me. Electronic tools like OneNote and Microsoft Whiteboard can be just as effective if you are in the habit of using them.

The Object Dependencies feature in Tabular Editor is pure magic and a tool I am using more frequently to test, debug and to document my data models.

Tabular Editor Object Dependencies dialog

Reusing Measures

On a different but related topic… Create separate measures for calculations that will be used in visuals and reused by other measures but having too many measures can be difficult to manage. Strike a balance between code reuse and a more succinct, easier-to-manage portfolio of measures. If you have to jump through seven hoops (e.g. trace multiple dependent layers of measures) every time you need to debug or validate a calculated value, you probably have too many dependent measures.

Organize measures but don’t over-organize them… I prefer to put all my measures in a central table or at least assign them to just a few subject area fact tables. Don’t leave measures in different tables, scattered throughput the data model. Although I’m a fan of using display folders, it can be more work to assign measures to several different folders than it is just to use meaningful naming conventions. The implementation of display folders in Power BI Desktop in my opinion is an unfinished feature. If you want to use display folders to organize measures within container tables, use Tabular Editor. Name measures so they’re predictable and easy to find in a sorted list. For example, if you will have a series of measures based on “Sales Amount“, don’t name the variations “MTD Sales” or “Ratio of Sales to Prior Year“. Instead name them with the same prefix, like “Sales Amount“, “Sales Amount MTD“, “Sales Amount YTD“, “Sales Amount Ratio to Prior Year” and “Sales Amount Region % of All Regions“. This way, they all get sorted and grouped together in the field list. The theme here should be finding the right balance to organize your objects while keeping things simple.

Validation 301: Creating a Validation Report Solution, Part 1

This is a big topic and I have a learned some great techniques which will be covered in a follow-up post. So, let’s call this teaser.

How do users and business leaders know with certainty that they can trust the information in a Power BI report? There are a variety of factors that can affect data reliability. If the data source is known to be correct, there could be issues in the extraction, staging, loading, transformation or the calculations used to present the information in an analytical report. Testing the report with data at a point in time is a good place to start but it doesn’t absolutely guarantee accuracy down the road. If we had a report with a simple set of aggregate values read directly from the business source of truth alongside calculated values from the data model after it has been refreshed, this can provide a litmus test to check and instill congoing confidence. When (I said “when” not “if”) a user tells us that report information isn’t right, we can confidentially and easily prove that analytic report information from the data model is consistent with the source. This is what a validation report is all about.

To set the stage for a longer post on this topic, think about this typical scenario:
Every month we load fresh manufacturing production volume and forecast records into our data mart or data lake, and then into the Power BI data model. The results of the data model can be used as a data source for the validation report. Using the XMLA endpoint, you can run a DAX query against the published data model to get the values shown in a report visuals complete with filters and slicers. For comparison, you can load data from a report produced by the business system, load those values into a the validation data model and compare the monthly totals. We can ne flag the data that does and doesn’t match, and calculate the differences.

More on this later.

Creating a Validation Report Solution – Part 2

Doing Power BI the Right Way: 9. Choosing the right report type: analytic or paginated

Part of the the series:  Doing Power BI the Right Way (link)

It has been a few weeks since I last posted, and a while since I started the promised blog series “Doing Power BI the Right Way”. I’ve received a lot of positive comments on the posts in the series this far, and people wondering why it’s taking so long. Well, I am busy working on a book about Paginated Reports and SQL Server Reporting Services. This will be the second edition of a book I co-authored ten years ago and this edition is also a collaborative effort with multiple contributors. We have quite a lot of excellent material that is just starting to go through the process of editorial and technical review. Writing a technical book always takes more time and effort than imagined, and working on this one in my “spare time” is a laughable notion but this is going to be a great book.

Writing this blog series while juggling other commitments has brought me to two realizations: 1. Posting articles in the series will be sporadic, and 2. The topics will be in fairly random order based on the projects I’m working on and the topics that are most top-of-mind at the time.

This brings me to the subject of this post: Paginated and Analytic reports.

Before we had Power BI, we had Reporting Services. When the focus of my career and consulting practice became Business Intelligence, starting about fifteen years ago, most of the “BI reports” I created were in SQL Server Reporting Services (SSRS). We could create a variety of charts and reports that allowed users to drill-down from summary to details. We could create reports that allowed a user to drill-through, from one record or report item to another report that presented more details. With a little creativity, we could create reports that allowed users to click items that would filter other items on the same report. It took some work and time to create these interactive “drill-through-to-self” reports. Today, after transforming data into the right format for a tabular data model, this type of interactive functionality just magically happens in Power BI with very little report design effort. But, Power BI is primarily a tool for analyst users to create their own reports and to get answers to business questions rather than for IT developers to create reports that print or export perfectly to a PDF or Excel. Although it is possible to create fairly advanced paginated reports, most bread-and-butter operational reports are really just what I call “lists of stuff”… the results of SQL queries presented as a table or matrix; perhaps with a few groupings and subtotals. Simple business processes and task often require simple reports to help manage them.

Paginated reports, by definition, are reports that allow a lot of data to be presented on multiple pages. Paginated reports can be viewed online, printed or exported to a variety of file formats. Becasue there is so much overlap in the capabilities of SSRS/Paginated Reports and Power BI, the choice between these two report tools is not always clear. I can cite many cases when using Power BI Desktop is clearly the right choice to create an analytic report, and cases where it would make more sense to create an operational report using Paginated Report Builder. I think these are the easier decision points but I would like to explore those cases where we could argue either way. What are the pros and cons of using Power BI to create traditional reports? When would it make more sense to replace a static, paginated report with an interactive report created with Power BI?

The integration of Paginated Reports into the Power BI cloud service continues to march forward at a rapid pace. Just yesterday, Chris Finlan teased a blog post announcing new Power Automate integrations that will allow report to be distributed and exported using flows. Listen up for some big announcements at Microsoft Ignite about Paginated Reports in the Power BI service. Whether you host operational reports on your own report server or in the cloud, Paginated Reports are an important part of most business reporting portfolios.

When do you choose to use Paginated Reports or SSRS rather than Power BI? Do you design reports differently than you would have 5 years ago? Do your users prefer interactive, self-service reports now or do they still want made-to-order static reports?

What are the deciding factors between creating a paginated report or an interactive report?

Do you create paginated reports with a Power BI dataset or SSAS model as the data source or do you only use SQL queries for those reports?

I’m interested in your ideas and questions. Please post them in the comments below this post and then I will extend this post with your input and my thoughts.

Doing Power BI the Right Way: 2. Preparing, shaping & transforming source data

Part of the the series:  Doing Power BI the Right Way (link)

In a business intelligence solution, data must be shaped and transformed. Your data is rarely, if ever, going to be in the right format for analytic reporting. It may need to be consolidated into related fact and dimension tables, summarized, grouped or just cleaned-up before tables can be imported into a data model for reporting. If source data is in a relational database, chances are that it is optimized for transactional processing or application development. The column names, data types and formats are likely not suitable nor user-friendly for simple reporting. If source data exists in files or spreadsheets extracted from business reports, without data preparation you are likely just to reproduce the existing reports without adding business value.

This important topic raises one of the most critical questions to consider when designing a Power BI project:

Where should I shape and transform data?

Data can be shaped and transformed in at least three different layers in the business intelligence process:

  1. At the source
  2. In Power Query
  3. In the BI data model

Your options for transforming data at the source depend on the capabilities of the data source. In general, a query running closer to a data source will be more efficient than one that must transfer a set of data, cache and manipulate it downstream. If you are importing text files or Excel files, then your only choice is to import an entire file and then potentially throw away what you don’t need – which is fine for small-ish data sets.

The right side of the previous diagram depicts data transformed in the data model. Some data morphing can be performed in DAX using calculated columns or calculated tables. In rare cases, there may be good reasons to do this but if you can perform row-level calculations in a database view or in Power Query, you will be better off. It is also easier to maintain the solution if you can keep all that query logic in the same layer so you are only using one design tool.

Data shaping at the Source

If we break the first choice down a little further, we could prepare data in an Extract, Transform & Load (ETL) process and populate a new table in the source database with correctly-shaped and formatted data for BI reporting. This is the approach used in traditional data warehouse solutions, typically under the control of an IT-managed data warehouse development group.

Data Warehouse, Data Mart or Data Lake

Back in the day, the conventional approach was to move data from sources into a staging database, and then to move it again into a dimensional database that was structured to support analytic reporting. I say this in past-tense because that was the way we always did it 10-20 years ago. Now, in the age of self-service analytics, it has become more common to build reporting projects on-the-fly and to transform data without moving all the data from one location to another before it arrives in the BI data model. However, there are limits to just how much data we can transform and how much complex transformation can be done in an ad-hoc fashion. The water here is muddy and it is difficult to say that one approach is always preferable to another but one thing is very clear: a Data Warehouse is very necessary in cases where data governance, high volume and centralized management are required.

If requirements are well defined and unlikely to change, building a data warehouse or dimensional data mart might be the ideal approach. However, data warehouse projects can get stuck in long cycles of over-planning and requirement gathering which has led many organizations to think about about BI as an extension of business rather than IT processes. Most organizations have come to grips with the reality that the vast majority of BI projects are conducted after databases and data warehouses have been completed. Over decades, we have seen hundreds of companies chasing the Utopian dream of the “perfect data warehouse” along with Master Data Management, data cataloging, cleansing, governance and stewardship. Each of these goals can be achievable but are potential rabbit holes. Experienced data practitioners have learned to apply their experience to time-boxed and resource constrained projects with the goal to create “good enough” solutions that don’t completely drain IT budgets and resources.

The data lake has become a viable option for staging and storing data before moving it into a structured model because relational storage can be expensive. Data lakes provide a low-cost means to store a lot of data in the cloud. We usually think of the “modern data warehouse” platform consisting of a data lake and dedicated data warehouse database. Flavors of the lake/warehouse approach could include using a lake AS a warehouse, using a standard relational database to store a small/medium scale warehouse or using a dedicated parallel processing DB platform (e.g. Synapse, Snowflake or Cosmos DB) to store a large-scale data warehouse.

Power BI and the Data Warehouse

Bringing to discussion back to focus on Power BI queries, data models and reports; let me put a bottom line on this multifaceted topic. I believe that there are three data transformation scenarios related to Power BI design:

  • Business-Centric, Informal and Small Model/Report Solutions
    Power BI makes it easier to transform data as you go. In small projects, using this approach saves time and money, allowing business professionals to meet your own reporting needs. End of story… win-win.
  • Interim POC Solutions
    Without a formal data warehouse, Power BI can be used to munge-together data from multiple sources and discover requirements that a data warehouse solution would provide. Use Power BI to build a prototype or proof-of-concept solution that can eventually evolve into a formal solution. Accept the follies and trade-offs of ad hoc data models and use them to evolve toward the right solution using iterative design.
    Use Power Query to perform all necessary transformations and cleansing operations.
  • Formal Data Model & Report Solutions
    Once properly governed and secured data warehouse/marts exist, rebuild Power BI data models and reports. The “rebuild” effort could encompass anything from switching data sources in an existing report project or starting from scratch. Either way, you are saving time by applying learnings and design elements. If additional query logic is needed, use database views to promote query folding in Power Query.

The simple fact is the making such a transition is never “done”. It is a journey that requires iterative thinking and management. Accepting and embracing this reality will save everyone a lot of angst. With data cleansing and transformation applied to ingested data warehouse data using tools like Azure Data Factory (which may utilize Power Query or SSIS), SSIS or Databricks, data sources for the Power BI project may simply be tables or views.

Where and what is Power Query?

Power Query is the data transformation technology built into Power BI. It is an extremely flexible and extensible data shaping platform that evolved from tools like SSIS, DTS, Excel, Microsoft Query and Access. When you choose Get Data or Transform Data from Power BI Desktop, this opens the Power Query editor. The Power BI service includes an online implementation of Power Query called Dataflows which allows queries and entities to be share between Analysts creating different data models. Power Query is also now integrated with Excel (when you select Get Data), the Visual Studio SSAS Tabular model designer and Azure Data Factory.

Learn to leverage Power Query to perform data transformation efficiently to perform all row-level transformation and calculations. In general, Column naming and data type changes should be performed in Power Query rather than embedded SQL or DAX.

Understand Query Folding

This is a critical concept to understand about Power Query. For supported data sources and whenever possible, queries you create in the Power BI query editor will be converted to native queries to be executed at the source. This means that if you are using SQL Server, rather that transferring all the data from the database engine to the Power BI service before performing transformations, steps may be converted to an optimized TSQL query. This is an ideal scenario for high-volume fact tables.

Query folding only works when Power Query can select records from a single database object like a database table, view or table-valued user-defined function.

Hard pills to swallow

For relational database professionals, there are a few realities about Power BI that can be a shock to their pattern for solving data problems. Without getting into all the details about why things are the way they are, it is very important to understand and embrace these points:

SQL queries do not fold – If you hand-write a SQL query or copy and paste SQL script into a new query within Power BI, Power Query cannot fold – and therefor cannot optimize – that query. This doesn’t mean that this is wrong or necessarily a bad practice. If you need to populate a relative small table in the data model that doesn’t need to be filtered, grouped, sorted or otherwise transformed in ways that could otherwise be optimized; this is probably fine.

Selectable database objects can be folded – If you can use a single database table or create a view in the database, particularly for high-volume fact tables, these queries will be faster and will consume fewer server resources.

Stored procedures – Power Query doesn’t include specific optimizations or support for stored procedures. I have successfully used stored procs as data sources, at the insistence of DBAs, but this is not a recommended design practice.

Parameterized filters

Queries for large tables should be designed with filters so you can easily control the records imported – both at design time and when you need to populate the data model with a full set of records. Even if you don’t need to manage millions of records or load a table incrementally, this is just a good idea and will save a lot of redesign later on.

I use parameters to manage anything that might change, like folder paths and server names.

The example shown here conforms to the requirements needed to implement the incremental refresh feature if Power BI:

The RangeStart and RangeEnd parameters are then used in a Filter Rows step in the query to limit the imported records within the selected date range.

Incremental refresh

One of the perks of using a date range filter is that it enables incremental refresh. This is one of the coolest features to come along in a long while and now it works with both Premium and share capacity licensing. If queries support query folding and use a set of date range filters to limit the records into a table in the data model, Power BI will automatically generate table partitions. For example, if I were to partition a table by month, each partition includes records for only one month and data in each monthly partition is is only loaded when new records are available or when data has changed.

I design my fact tables to support incremental refresh as a standard design pattern whether we have a requirement to implement it or not. This gives me the flexibility to only load a small set of data I need for development. Publishing a smaller PBIX file is faster and then more data can be loaded after the dataset has been published to the service.

If you don’t need it, throw it away

The presence of every single column in your data model comes at a cost. It uses memory, storage space and can potentially impact performance. If you aren’t absolutely certain that you need a column, just omit it from the table in Power Query. If you or you users change their minds, columns can easily be added.

Before you ask, this is not a picture of my garage but according to my wife, it could be. Keeping all the columns in your data that you “might need someday” is a lot like all that crap I’ve been collecting in my garage and on my workbench that I tell myself might be useful some day.

Error on the side of omitting columns that you don’t need to meet immediate reporting requirements. The difference between the cluttered garage analogy and reality of Power BI design is that you immediately benefit by getting rid of unneeded data right now and it costs you little or nothing to go back and add one of those columns later, after your business stakeholders decide they need it.

The easiest way to manage this is to use the Choose Columns button on the Home ribbon, and then just deselect the columns you don’t need. In the future, use the gear icon next to the generated step to make changes to the column selection.

Column naming

Please read this aloud and commit it to memory: Rename all columns that will be visible in the data model using concise, self-describing, meaningful and user-friendly names. I can’t emphasize how important this is. Not only is this good practice in BI solutions, but there is a psychology to the way business users perceive what they see in reports. When they see cryptic labels and field names, they become uncomfortable and may distrust the information.

Your data model should be designed for users and not for developers to design reports and consume. Even if technical professionals are designing reports, field names are going to show up as labels and titles. Consider the following examples:

Friendly Column NameTechnical Name
Customer Number
Customer Nbr
Customer #
customerNumber
CustomerNbr
CUST_NBR
Product Brand NameproductBrandName
ProdBrandNm
PRD_BRAND_NM

Technical naming conventions are deeply engrained into the culture of application and database design and for good reason. For developers and database professionals, certain naming patterns like Camel Case (camelCase) and capitalized word abbreviations separated by underscore characters are almost a right of passage; but object names in this format should not exist in a data model for analytic reporting. Spaces in column and measure names are welcome, as are punctuation and other special characters like #, $, %, etc.

Table and fields should be named so they are familiar and easy to understand. When you or a user designing a self-service report drags and drops a field or measure into a report visual, these names are used as labels and titles that should also be meaningful and self-explanatory.

Field names should also completely describe the object. Consider that if the Product and Customer tables both have “Name” columns, if you were to drag the Name field from the Customer table into one visual and the Name field from the Product table into another visual, each would only be labeled “Name”. It would be more appropriate to name these fields “Customer Name” and “Product Name” in their respective tables.

Naming & annotating steps

Rename the steps in Power Query so that they are self-describing (like the following example). Your future self will thank you when you revisit the project in six months. This is done by right-clicking the step and choosing Rename from the menu.

A new feature in Power Query displays a tooltip with annotations you add to the M script or the text you enter in the Description property of the step.

The names of each step are applied as variable names in the Power Query “M” language script. Variables that contain spaces or special characters are proceeded with # and are encapsulated with double quotes. The comments in the code I added with // prefixes result in the tooltip annotations I mentioned in the previous paragraph.

A recent suggestion from one of the blog readers is to use underscores rather than spaces in these step names. This allows you to double-click the variable name and copy it to the clipboard.

Consolidating steps

Building queries using linear steps helps break-down complicated logic but it can become inefficient as the project evolves. In the course of query design, you are likely to repeat steps like those in the second example below where I removed three different columns after using them in other steps. I also changed the data type each time I created a custom or duplicated column.

If you select an existing step and then perform the same action on a different column or columns, Power Query will add the new transformation to the selected step. In the query on the left, I consolidated all these steps, removing columns and changing data types only once.

The optimized query on the left runs in about one-third the time is the original query on the right.

Power Query Best Practice Guidelines

Here is a summary of best practice recommendations:

  • Set EVERY column to the right data type. Watch for untyped columns with “ABC123” in the column heading.
  • Rename all columns that won’t be hidden in data model (keys and sort columns are OK).
  • Consolidate redundant steps.
  • Rename steps to document transformations.
  • Perform row-level calculations in Power Query instead of DAX calculated columns.
  • Use database views & avoid SQL statements so Power Query can implement query folding.
  • Use parameters & range filters to reduce working set for development, reduce PBIX file size, speed up desktop refresh and reduce time to publish dataset.
  • Use date range parameters in service to expand dataset size during data refresh & optionally implement incremental refresh.

Doing Power BI the Right Way: 1. Futureproofing Power BI solutions

Part of the series: Doing Power BI the Right Way (link)

When starting a Power BI project, you have many choices to make. Decisions like how to source your data, where and how to create queries to cleanse, transform and reshape the data; where and how to create calculations and the nuances of modeling are just the tip of the iceberg. Most Power BI solutions progress through a few stages in their lifespan, which include:

  1. Creating a simple prototype to explore ideas and to get feedback from users and business leaders
  2. Building a working proof-of-concept to frame the solution
  3. Designing a foundational data model to build upon
  4. Completing a production-scale solution
  5. Iterative enhancements & releases

The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.

Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.

Reality Check

If you are using Power BI to create reporting solutions for business users, your circumstances are likely similar to one of these scenarios:

  1. You are a self-service report designer creating a small report solution that will grow and evolve over time.
  2. You work in a software development group in an IT organization with a formal application lifecycle management discipline.
  3. You work in a business unit or small company with an informal software process – and they need your guidance to be successful.

If you are a self-service report designer creating a small report solution that will grow and evolve over time, you likely have a data source that wasn’t specifically curated for reporting and analysis. The data will need to be massaged and shaped before you can design insightful, interactive visuals. That’s cool because we have great tools built into Power BI Desktop to do that sort of thing. You might have some ideas about how to present the data but you’ll probably try different things and see where the journey takes you. These projects tend to have a lot of business impact right out of the gate. Business report users will be delighted with what you can do in a short time with Power BI and they will want more. You’ll get excited and expand the project to include more data or more dimension to the data you have. It is almost inevitable that you will experience two common challenges:

  • Scope creep
  • Data quality issues

These are not necessarily “problems” per se but realities to acknowledge. Building a solid, foundational data model is challenging in under these circumstances. The good news is that Power BI let’s you explore options quickly and allows you to try different approaches while you sort-out the formal business and technical requirements. The typical outcome is to start over with a new project where you will be well-prepared to use a cleansed data source and design to a well-defined set of requirements within the new project scope. An experienced BI practitioner will set this expectation with the business and either carefully evolve certain elements into the final project and toss the rest, or use the first attempt as a throw-away prototype and start over. Just know that this is a natural part of the process.

If you work in an software development group in an IT organization with a formal application lifecycle management discipline, you are probably accustomed to using team development a code repository, version control and differencing tools to manage a shared code base. The main components of Power BI came from a long heritage of IT-centric development so at its core, objects (like tables, fields and measures) have properties and metadata descriptors that can be versioned, differenced, merged and scripted. But what has made Power BI so successful is that is is primarily designed for self-service reporting. Power BI Desktop is a beautifully streamlined product that packages data transformation queries, model tables, data and reports into a tidy package but it is not an IT development tool and doesn’t have these capabilities built in – nor will it. That’s not what Power BI Desktop was designed for. The good news is that the community has rallied to develop tools to meet the needs of IT developers who work in teams to develop, test and deploy formal, enterprise-scale BI reporting solutions. Power BI and the rest of the Microsoft BI platform – can be scripted and automated to create custom deployments.

Right-sizing solutions

Before creating Power BI, earlier versions of Microsoft’s Business Intelligence components existed as separate products. BI reporting projects took many months to design, develop and deploy. By contrast, today you can open-up Power BI Desktop and create a quick “report” (PBIX file). In this form, the file contains data transformation queries, an in-memory data model with measures and other calculations, and report pages that include the interactive visuals. This design typically meets the needs of small and intermediate sized projects.

One of the first questions to consider is whether the solution you intend to create with Power BI is a quick informal project or a formal project – or may become a formal project in the future. Informal projects don’t necessarily have to conform to all the design best practices. They are limited in scope and typically meet a very specific and short-term need. Formal projects are built on best practice design standards to endure a certain degree of expanding scope and are designed for longevity. They have a business sponsor or champion who drives the business requirements and at least one developer who is responsible for designing a solution that meets those requirements. Formal projects are typically testing, signed-off an then deployed in accordance with an established development discipline or application lifecycle process so they can be supported by the business. If your organization has a process for delivering and supporting company-sanctioned applications and reports, then you will need a roadmap to transition from an informal project to a solution that the business is prepared to support. A few common questions that come up in these discussions are:

  • How is the source data procured and maintained?
  • Who is responsible for maintaining that data?
  • Who from the business defines the reporting requirements and signs-off on Power BI reports meeting those requirements?
  • Who owns the development of the Power BI solution (data model and reports)?
  • Who is the developer’s backup if they become unavailable?
  • Are the requirements and project assets sufficiently documented to support such a transition?
  • What are the security requirements?
  • Who and how will users be given access to reports or dataset(s) for ad hoc analysis?
  • Are users assigned to report and dataset permissions directly or through group membership?
  • Should users have conditional or restricted access to data within the Power BI dataset?
  • What is the process for requesting new reporting features, adding features to the existing solution, testing and deploying updates?

Personal BI projects, where a data analyst imports data, creates reports and then publishes a report for their own use serves an important need in business. These reports can easily be promoted to a departmental or business area solution by sharing the report with others. As the scope of the project continues to expand, there are several important considerations that define the threshold between an informal project, designed for quick development and convenience, and a formal project; intended for longevity and long-term business support. For each project component, I start with a typical informal project scenario and then provide recommended practices to to plan for futureproofing.

Planning for separation

Designing transformation queries, data model and reports in one PBIX file is convenient and offers quick troubleshooting, debugging and redesign. However, it also limits development to one person at a time and promotes a monolithic solution. Separating the data model from the report files helps to promote self-service and focused report design. The certified or promoted dataset is deployed, secured and managed as a separate object. Reports are connected to a Power BI dataset live connection and any number of reports can connect to a published dataset.

This pattern supports both fully IT-managed projects with multiple developers and it supports self-service report design using certified and promoted datasets. After the data model design is stabilized and locked-down, report designers simply focus on report creation and data analysis by connecting to the secured dataset. Any tools and products capable of connecting to Power BI or SQL Server Analysis Services (SSAS/AAS) can be used for reporting and ad hoc analysis.

Source data

Spreadsheets and files extracted from reports, manually manipulated using Excel or other line-of-business applications have limited utility as reliable data sources. Manually preparing spreadsheets and source data files is only feasible for small sets of data and one-time or manual data loads. An analyst user can easily remove, rename or add a column or formula to a sheet that will break or negate the import process.

Futureproofing: Data from source systems can be loaded into a managed database or data lake before using Power Query to load the data model. Data from files (if carefully protected and maintained) may be transformed, cleansed and validated before it is ready to be loaded. Using “brute force” transformation steps to replace values and cleanse records one step at a time that can be slow and difficult to manage as the complexity and volume increases. Reduce the number and complexity of query steps and apply the following recommendations for Power Query design optimization.

Use parameters in Power Query to manage any variable information such as database servers, database names (e.g. DEV database, QA database & PROD database) and file paths.

Planning for data volume

In smaller, informal projects we typically designed for a limited set of data and fixed time-frame. If manageable, all production data can be loaded into the working PBIX file on a developer’s desktop.

Futureproofing: When working with large data volumes, a subset of data should loaded into the desktop dataset for development. Query parameters are used to expand the number of records (usually using a date range) loaded after the dataset is deployed for production use. Groups and ranges of data records, often partitioned by months or years, are defined and stored separately so a large volume of records don’t need to be reloaded along with new records.

Power Query design practices

Using the robust Power Query design user interface (ribbon and menu options), you can build sophisticated and complex transformations that work well with moderate data volumes. With a small number of query steps, default names like “Rename Columns” and Change Data Type” are fine but it can be difficult to trace the steps and remember where and why you performed each step.

Futureproofing: Tables and fields should be meticulously renamed to use standard friendly names, categorized and optimized with appropriate data types. Unused columns are removed to save space and improve performance. Spend extra time to get the field names right before creating dependent steps. (see; Chris Webb: Names, Tables, Columns & Measures In Power BI)

Consolidate redundant query steps – like those used to rename columns, remove columns and change data types – into a single step in the query.
Attribute fields needed for grouping and filtering should be moved to separate dimension tables. Dimension unique keys are defined with corresponding fact table keys.
All numeric columns used for aggregation (which share the same level of grain) should be extracted to fact tables, with corresponding dimension keys.
Row-level calculated columns should be defined in either in database views or Power Query tables rather than DAX calculated columns.

Promote query folding by using database tables, views or table-valued user-defined functions in the database rather than writing SQL statements in Power Query. Test and optimize queries to support query folding by checking to see in “View Native Query” is an available right-click menu option on each query step. Power Query also doesn’t work well with stored procedures.

Data loading & refresh management

For large fact tables, design them to be “tall” rather than “wide” by removing unneeded columns or moving them into dimension tables. Tall tables should only include columns that compress efficiently, such as numeric values and short, non-distinct text values. Do not store a single unique key column in a fact table.

If large tables take too long to load or cause the PBIX file to exceed a reasonable size for development (no more than 500 MB), use parameters to limit the range of records. To manage the table size once deployed tot he service, do one of the following:

  1. Use start and end date range parameters to filter a Date or Date/Time type column in the table. Use this to limit the working set on the development desktop and then to expand the range of records in the service.
  2. Similar to step one, use a pair of Date/Time parameters with a range filter on a Date/Time type column and follow the product documentation to implement incremental refresh. This will automatically partition the table records on specified calendar date parts (e.g. weeks, months, years) and then only load changes.
  3. Devise a table partitioning strategy and implement partitions using Tabular Editor. Use PowerShell or the XMLA read/write endpoint to schedule and execute TMSL or XMLA script to manage partition processing.

Data modelling

Many modelling techniques that perform well when tested on small data sets will slow down with large tables. These include relationships on bridging tables, many-to-many relationships, bi-directional relationship filters and multi-measure layered calculations.

Avoid using SWITCH statements over already complex DAX expressions or referenced measures. Avoid using iterator functions (like SUMX) with complex row-by-row predicate logic. Separating calculation logic into separate measures is a useful design pattern to segregate business logic but it can create performance bottlenecks. Identify slow measures and consider building streamlined measures using alternate patterns when bottlenecks are discovered.

Import mode tables are generally faster and more flexible than DirectQuery mode tables. However, DirectQuery is a good option for very tall transactional tables where simple aggregations are used, rather than complex DAX measures. If Import and DirectQuery mode tables are both used in a composite data model, consider creating a summary table using Import model and a detail table using DQ, and then define aggregates to cache certain query results in memory (to compensate for the normal performance degradation resulting from the query translation in DirectQuery.)

There is so much more to discuss at this point but I’ll need to address additional topics in later posts in this series titled “Doing Power BI the Right Way“. Topics include:

  • Service vs on-premises
  • Interactive vs operational reports
  • Spreadsheet reporting
  • Iteration vs redesign: When to start over
  • Designing for scale

Power Query or Power BI Dataflows

Multi-tier and advanced self-service solutions can be architected by further separating the transformation queries from the data model using Power BI dataflows. These are server-hosted Power Query/M queries that are both designed and managed in the Power BI service.

Dataflows expose several integrations with the Microsoft Azure cloud services, such as the Common Data Service and Azure Data Lake Gen2 Storage. Dataflows also have integrations with AutoML machine learning models.

The advantages and disadvantages of dataflows should be carefully considered before jumping in head-first. Power Query in Power BI Desktop is a fast, right query development environment that provides nearly immediate feedback. By contrast, the browser-based dataflows designer is similar in function but slower and has a less-robust query design feature set. Some developers choose to develop Power Query in Desktop and then port their M query code to dataflows as needed.

Stick around. There is a lot more coming in this series in the months ahead:

Doing Power BI the Right Way

  1. Futureproofing Power BI solutions
  2. Preparing, shaping & transforming source data
  3. Power Query design best practices
  4. Power Query in dataflows or Power BI Desktop
  5. Data modeling essentials and best practices in Power BI and AS tabular
  6. Validating data model results
  7. Planning for separation – data models and reports
  8. Power BI delivery options
  9. Choosing the right report type: analytic or paginated
  10. Designing and managing large datasets in Power BI
  11. Multi-developer and lifecycle management for Power BI
  12. Certified reports, certified datasets & the self-service mindset

To wrap up this edition of the series, I will leave the rest of my outline points as teasers for subsequent posts. Please freely comment with your thoughts about what to include in these later editions.

The rest of these are random thoughts left from the outline for the series. Feel free to comment.

Future topics:

  • Define the technical audience
  • Establish solution ownership
  • Continuation plan (if they were to leave?)
  • Who will maintain & manage the solution after deployment?
  • User access
  • Version updates
  • Support

Know and understand the business audience

  • Report/Dashboard Consumer
  • Skilled Data Modeler
  • Analyst, Data Scientist
  • IT BI Developer
  • Planning for capacity
  • Certifying reports and datasets: Gaining business trust
  • Validating results
  • Team collaboration
  • Project file sharing
  • Promote continuous improvement
  • Version management
  • Managing deployment
  • Power Query and Dataflows
  • Promote and support repeatable design
  • Templates and layouts
  • Planning for security
  • Access to reports & dashboards
  • Publish to web
  • Embedding to a business portal
  • Embedding for external access
  • Access to dataset(s) for ad hoc analysis and future report development
  • Conditional data access / role-based access / row-level security

Power BI External Tools: Reading the Tea Leaves

I’m in the midst of writing another post in the series: “Doing Power BI the Right Way” that I started last week. However, this topic is super important so I wanted to pause and spend just a few minutes to share some thoughts. Why is Microsoft (particularly, the Power BI product team) promoting the use of tools that they don’t develop nor officially support?

This week, Christian Wade from Microsoft announced on the Microsoft Power BI blog that a new ribbon “External Tools” was added to the preview features of Power BI Desktop in the July update. In earlier posts, Christian has promoted community developed tools such as Tabular Editor, DAX Studio and the ALM Toolkit that can be used as replacements or extensions to the Power BI development experience. These three tools are autmatically added to the ribbon by default if they are installed, but you can add your own applications by following the instructions in the this Microsoft document titled Using external tools in Power BI.

What are these three “promoted external tools”?

Tabular Editor is an open source application developed primarily by Daniel Otykier. The source code resides in GitHub where multiple community members have contributed to the project. Likewise, DAX Studio is an open source project headed up by Darren Gosbell. The ALM Toolkit for Power BI is an evolution of an earlier Visual Studio extension project from Christian Wade called the BISM Normalizer, used primarily to compare and merge changes between SSAS/AAS Tabular projects and databases. Working with MAQ Software, he evolved the code into a stand-alone application which is optimized for Power BI data models. Each of these tools are free and supported by the community but not officially supported by Microsoft.

There is an important message here

I use all three of these applications regularly. Before the new ribbon, I would open my PBIX file in Power BI Desktop so that the data model was loaded into memory. Then, I would open the needed utility application from the Windows program menu, and then connect to the Power BI data model using the local port address. The new ribbon conveniently allows me to open the installed application and automatically connect to the data model I have loaded. Honestly, it only saves a few steps but the presence of the new ribbon sends an important message.

What does that mean, exactly? Unlike many other vendors, Microsoft has a strong and thriving community who promote and support their products. Much of this is simply volunteerism backed by Microsoft’s MVP, Partner, user group and Certification programs. I typically don’t contact Microsoft product support unless I have a bona fide show-stopping error but I’m certain that if you were to open a support case and tell them that DAX Studio is acting up, they would send you to the GitHub repo and suggest that you chat with other community members.

I worked with a consulting client who had a large investment in database and BI tools from Oracle, IBM and MicroStrategy. As we got started on a Microsoft BI project, I IT operations that all of the developers would need to install and frequently update several software tools. Any software needed to be added to the list of corporate approved software which is referenced by the outsourced support helpdesk to remotely install software for an employee. For each application, we needed to provide the specific product version, licensing agreement and cost, service level agreement and support agreement with the software vendor. Software could only be installed from a network file share and not over the Internet. There was a form and a formal approval process that didn’t support my suggestion that open source tools by installed and that the free, officially-supported applications needed to be updated every month. Altering the approval process required multiple meetings and escalations to top-level leadership who had never heard of a vendor promoting “community-supported” development tools and entertaining such a strange notion was a big paradigm shift – but it is our reality and one of the reasons that Power BI is a successful platform.

External Tools: What Does This Mean?

Microsoft has several forums for product users and partners to make suggestions and to ask for new features. If you search Ideas.PowerBI.com, where any user can ask for new features, you will see many many requests to add IT/enterprise features to Power BI Desktop such as version control, multi-developer support, partitioning and other capabilities that already exist in Analysis Services projects. Adding all these things to Power BI Desktop would clutter the product and defeat the goal of keeping self-service model/report development simple. For the IT BI developer, we have Visual Studio extensions for data model and paginated report design but the process and overhead required to maintain those project templates and extensions for different versions of Visual Studio is complicated to say the least (see my previous post “I am done using Visual Studio for BI data model development“).

Bottom line

Community-supported development tools are an essential part of the BI development experience. These are necessary utilities to have in your toolbox if you are a serious BI developer. The Microsoft product team will not only tolerate them but rely on them, and promote these tools in features like the new External Tools ribbon so that we have best-of-breed tools to fill gaps as the platform continues to expand.

Doing Power BI the Right Way

This is an introduction to a series of posts and pages that will provide a comprehensive set of best practices for successful Power BI solutions. In previous posts, I have asked readers to suggest topics for future posts. Based on that and other feedback, I will be addressing questions and suggested topics. The topics list at the end of this post is a brainstorm list and I ask that you help me make sure it is complete. My goal is to provide a set of guidelines and practices that provide the best chance of success as you navigate many decisions about how to stage and transform source data, how and where to perform data shaping and calculations, how to model data and the best way to visualize the results. The biggest question of all may be how to make the right decisions so that the small project you design today will work when you add more data, more users and transition into a formal, managed solution.

There are many fantastic resources to learn about Power BI and rest of the Microsoft BI and reporting platform; but learning about Power BI and the choosing among design options can be like drinking from multiple firehoses at full pressure at the same time. I will be the first to admit that my “best practices” are “my opinions”. In many cases they work consistently for me but points are debatable and open for discussion. I’ll tell you when I have a very strong opinion about something being done a certain way, or when I have found a pattern that works for me and that I offer for your consideration. I’m not always right… just ask my wife :-). Please comment, ask and offer alternative points of view.

Rather than offering another training course or duplicating the contributions that so many others in the industry make through their blogs, courses, books and articles; this will be a condensed set of guidelines about the many choices you must make when designing a solution. In the posts to follow, I will reference other resources and discussions on various topics.

Best practice guidelines topics

The following topic list will serve as a link menu for future posts. Expect this list to be updated and completed:

  1. Futureproofing Power BI solutions
  2. Preparing, shaping & transforming source data
  3. Power Query design best practices
  4. Power Query in dataflows or Power BI Desktop
  5. Data modeling essentials and best practices in Power BI and AS tabular
  6. Validating data model results
  7. Planning for separation – data models and reports
  8. Power BI delivery options
  9. Choosing the right report type: analytic or paginated
  10. Designing and managing large datasets in Power BI
  11. Multi-developer and lifecycle management for Power BI
  12. Certified reports, certified datasets & the self-service mindset

Just tell me what to do

Any attempt to apply universal best practices to Power BI solution design is a slippery slope. The tools are so flexible and powerful, and the requirements of each project are so varied that it is challenging to establish a set of steps or rules that, if followed, will always yield the absolute best design for a given scenario. With that out of the way, I’ll say this: In my job, I see a lot of poorly-designed Power BI projects. I’ve worked with dozens or scores (maybe even hundreds?) of consulting clients who bring us projects – some partially completed, some finished, and many that are just broken – to be fixed or completed. My reactions range from “that’s just downright wrong” to “hmmm… I wouldn’t have done it that way but I guess it will work for the time being”. I try not to cast stones and do, on occasion, realize that others have found a better way to solve a problem. I don’t have all the answers but I do have a lot of experience with Microsoft Business Intelligent solution design, and have learned many good practices and design patterns from other community leaders and many successful projects over the past twenty or so years.

A little less conversation and a little more action

Let’s start with a simplified flowchart and condensed decision tree. This first whiteboard drawing is the first half of the Power BI design process, ending with the data model, before measures, visualization and solution delivery. There is a lot more but I think this is a good starting point.

I Am Done Using Visual Studio …for BI data model development

For several years, Visual Studio has been my go-to tool for designing semantic data models used for Business Intelligent reporting. Back in 2005, I used the Business Intelligence Development Studio (BIDS) Visual Studio add-in for SSIS, SSRS and SSAS projects to develop BI solutions with multidimensional cubes. In 2012 when Microsoft began the transition from on-disk cubes to in-memory SSAS Tabular models, I used SQL Server Data Tools (SSDT) to create tabular models. It was a rocky road at first. The Tabular designer was fragile to put it mildly.

Enter Power BI… Initially intended for self-service data model and report design, Power BI Desktop has quickly grown into a robust and full-featured BI design tool. Not only does Power BI Desktop include a lot of great features, it is stable and streamlined. It’s a joy to use compared with my early experiences using SSDT for tabular model design. I prefer to use Desktop to perform model design. It’s faster, more convenient and just easier than SSDT. However, at some point I the life of a project it just makes more sense to transition the data model to an enterprise-scale effort.

Now, before anyone has a chance to comment and say “Paul, what the #$@! are thinking? Visual Studio is an essential tool and there are certain things that you can’t do with out it!”, I agree and will continue to use SSDT for a few key features. So, yes, I am not absolutely done using Visual Studio for managing projects other than SSAS, and perhaps for code check-in …I’ll finish this part of the story in a bit.

I want to be clear – I love Visual Studio.  It’s a great product for developing software and a variety of business and data solutions. However, history has demonstrated that the notion of stitching together several different products and expecting them all to just work together seamlessly is just untenable. Without getting into all the reasons that it has been difficult for Microsoft to develop and maintain a rock-solid tabular model design add-in for Visual Studio, contrast that effort with the evolution of the Power BI product.  The Power BI product team is entirely focused on developing one product by a development team under unified leadership, with a focused set of objectives. Negotiating co-development of any product by several different teams is difficult within any organization, particularly one as large as Microsoft. The reason that new features can be added weekly to the Power BI service and monthly to Power BI Desktop is that one product team manages all those features.

Some of you will remember the time when the Business Intelligence message from Microsoft was that we supposed to create solutions relying on coordinated components of many products like SQL Server (relational, SSIS, SSAS and SSRS), Windows Server, SharePoint and Office – all orchestrated to work together seamlessly. It was a good idea – and still is in moderation – but this approach produced a delicate and complicated beast that was difficult to manage and had many potential points of failure.

One of the reasons Power BI Desktop is such a wonderfully streamlined product is that the feature set is optimized for data analysts and not for IT developers. To maintain a streamlined product, we are not at all likely to see enterprise capabilities (like version control, multi-developer code merging and scriptable objects) added to this product. These capabilities do exist, however, for Analysis Services projects and community supported tools like Tabular Editor and DAX Studio. But now (drum-roll, please) Power BI dataset can be developed and deployed to a workspace using enterprise tools through the magic of the XMLA endpoint.

The Visual Studio/SSDT Quandary

<RANT>

Call it a learning disability, but I have tried time and time again to use the Visual Studio tabular designer to manage SSAS projects with the same outcome.  Smallish demo and POC projects go well but not-so-much when tackling the complexities of product-scale design. I guess it is just my natural optimism to hope things work out better than they did last time, but the laws of the universe dictate that if you do the same thing, history will be repeated.

Here’s how it goes… I start developing a data model in SSDT by importing some tables and queries, and adding relationships and measures.  All good, right?  At this point in the timeline, I often convince myself that development environment is stable and that everything will work-out so I forge ahead, believing that all will be good. I then add some more tables and a whole bunch of new DAX calculations – and soon everything goes to hell.  The model designer stops responding or behaves sporadically, Visual Studio crashes, the model definition file gets corrupted and I then I remember that I’ve been down this dark road before.

Recounting the painful past, it is frustrating to open a support ticket and explain to the engineer that “sometimes when I do that, this happens but not always” and “in all the confusion, I really can’t remember exactly how I got to this state.”

Pondering these memories, then I draft a blog post titled “I Am Done Using Visual Studio”

</RANT>

I sincerely appreciate the efforts of Kasper DeJonge from the SSAS product team back in 2012 as we spent hours in remote meetings trying to reproduce various odd behaviors in the tabular designer with a large data model. The fundamental problem was that the Model.bim file, which defines all the objects in the data model, was an enormous XML document (ours was approaching 100,000 lines.) Every change in the designer required the entire file to be re-written to disk and the loaded back into memory. Things improved significantly in 2016 and 2017 when the model definition was streamlined using JSON rather than XML, and the file structure was simplified to reduce the file size. Similar meetings with several other product leaders have proven that the product team are seriously dedicated to optimizing the enterprise tabular model experience.

I’m all about solutions and not just ranting about problems.  So what’s the answer? How should we manage enterprise BI data model and Power BI solutions from now one? Using Tabular Editor alongside Visual Studio is really a best-of-both-worlds experience. You can open the Model.bim file stored in the Visual Studio SSAS project folder.

Tabular Editor

Tabular Editor is a superb tool for developing and managing tabular data models for SQL Server Analysis Services (SSAS), Azure Analysis Services (AAS) and Power BI. It is a community supported tool created by Daniel Otykier, a Microsoft MVP and Senior Business Intelligence Architect with Kapacity.dk in Denmark. The most comprehensive resource to find this and other community supported BI tools for the Microsoft platform is on the Italians’ site at SqlBi.com/Tools

If the project is under source code control, changes made with Tabular Editor will be detected and can be synchronized with the remote source repository from Team Explorer in Visual Studio.

Here are some quick facts and recommendations:

Power BI and Version Control
Power BI Desktop files (.PBIX) do not support version control and code merging.
Recommendation:
Don’t try to do this – It will turn out badly. Starting model design in Power BI Desktop will save you time but once you transition to the Model.bim file format, use Tabular Editor.

Separating Reports and Dataset
A monolythic PBIX file created with Power BI Desktop containing reports, data model and queries is simple and easy to manage until you need to move beyond several limitations that this imposes.
Recommendation:
Power BI reports and datasets (data models) should be managed separately in all serious projects. Period. …whether you need to transition the data model to Model.bim or not.
Separating Power BI reports from the data model/dataset has many advantages which include allowing report and data model development to be performed in parallel and by different team members. This is an absolute must to create a certified dataset for users to connect and do their own reporting and analysis.

Visual Studio and Version Control
Visual Studio SSAS projects allow you to manage version control.
Recommendation:
This is a good thing. Keep doing this but use Tabular Editor as your primary model design tool.

A data model stored as a Model.bim file can have changes compared, split and merged between data model version files, deployed AS databases or Power BI datasets. Manage integrated source control with Azure DevOps or GitHub. Check-in changes, branch, merge, push and pull changes made by other developers but don’t use the Visual Studio Compare tool. Dependencies within the model definition can easily get you into trouble

Tabular Editor
Tabular Editor is a far superior design experience than Visual Studio. It is streamlined, easy to use and it won’t blow up when writing measure calculations. You can switch back and forth between tools since each tool has features that the other lacks. Just be sure to save and close the model file with one tool before opening it in the other …AND MAKE BACKUPS! The more that I do this, the more I prefer using Tabular Editor.

Tabular Editor doesn’t have a graphical model designer so I prefer to use Visual Studio to model tables and relationships. Set table and column properties, create calculated columns and measures, manage partitions and other tabular model features in Tabular Editor.

Start in Power BI Desktop and Migrate to an Enterprise Model
From Power BI Desktop, save the file as a .PBIT (template) which and then be opened in Tabular Editor. Once you save the file to the .BIM format, this is a one-way trip since a an Enterprise model cannot be saved back to a PBIT or PBIX file. Of course, if you start designing the data model in Visual Studio, there is no need to resave the model. You can just and use Tabular Editor to make new model enhancements.

Power BI Premium, AAS & SSAS
Power BI Premium capacity is required to deploy a Model.bim file as a dataset in a Power BI workspace.

Power BI Premium capacity is the E-ticket and is the best way to access all Power BI enterprise capabilities. With the XMLA endpoint, it will make more sense for most customers to deploy enterprise data models as datasets to a workspace rather than Azure Analysis Services (AAS) models.

Future-Proofing Data Models

If your organization chooses not to use Premium, which might be a more economical choice until you have enough licensed users to justify the cost, you can use AAS data models or on-prem SSAS data models to exceed the capacity limits of Power BI datasets under only Pro licensing.

If industry certification or compliance restrictions prevent your organization from using Power BI in the cloud, using enterprise models is the norm. Use a local instance of SQL Server Analysis Services Tabular. If you move to the cloud in the future, the transition should be relatively simple.

Keep in mind that Premium Capacity, or perhaps a more attractive licensing alternative, may be in your future so architect solutions in such as way that you can easily make that transition.

Demystifying the Power BI XMLA Endpoint

When the Power BI product team began promoting the new XMLA endpoint connectivity for Power BI I thought that this could be a game changer for IT-driven, enterprise class BI solutions. Now that I have used it in a few real solutions and actively working with clients, I’m sharing my experience about how it works and what you can do with it. The read-only endpoint has been in GA for Premium capacities and the read/write endpoint is currently in preview.

Before my long-winded introduction, I’ll get to the point:
Using the XMLA endpoint, the Power BI Premium service now includes the capabilities of SQL Server Analysis & Azure Analysis Services combined with newer data modeling capabilities of Power BI. Data models published to the Power BI service now support version control, scripted builds and team application life cycle management, enterprise IT tooling and scripted object management.

…wait, what? How?

Let’s just start by reviewing some of the challenges that have existed in the Power BI platform prior to the availability of this capability:

  • SSAS/AAS enterprise features
    Buried deep within the Power BI cloud service is the SQL Server Analysis Services (SSAS) Tabular model “Vertipaq” in-memory analytics database engine. The SSAS engine & on-prem product itself has many useful features not exposed in the Power BI implementation of data models and datasets. These “enterprise class” features are numerous including object-level scripting for deployments, source control, application life cycle, continuous integration and build management, data partitioning, perspectives, translations, currency conversion, KPI definitions, measure groups and calculation groups.
  • SSAS/AAS redundant services & costs
    At one point, Azure Analysis Services was a superset of data modeling features but now many exclusive features are available in Power BI datasets in a Premium capacity workspace so this choice isn’t so clear. Power BI and AAS also have separate costs. As the Power BI services continues to evolve, many new an compelling features are available only in Power BI datasets and not in Analysis Services; such as hybrid mixed-mode data models, aggregations, incremental refresh policies, dataflows and AutoML.
  • Source control, Application life cycle, builds & Team data model development
    Power BI Desktop is a convenient tool for creating an entire Power BI solution from soup to nuts but isn’t optimized for IT scale projects. A single PBIX file contains connections, queries, data model objects, measures and report pages. A PBIX file structure cannot be easily parsed, compared, branched or merged by established source control and build management tools. Analysis Services projects on the other hand support Microsoft development tools like Visual Studio, Azure DevOps and SQL Server Management Studio. Several mature third-party development tools like Tabular Editor, DAX Studio and the ALM Toolkit enhance or exceed the features of Microsoft’s development suite.

The XMLA endpoint in Power BI Premium bridges this divide by exposing the underlying Analysis Services database instances in the Power BI service. It really is as simple as that. Each workspace is in fact an SSAS instance and each Power BI dataset is an SSAS database. What you can do with traditional SSAS through management and development tools, TMSL or XMLA script, PowerShell or API automation; you can do with Power BI.

Use a Premium capacity workspace

If you have Premium capacity setup in your tenant, you are good to go. Several of our consulting clients do but to experiment and test new features, I use my own private tenant. In lieu of paying for Premium capacity, I can setup an embedded capacity in the Azure portal. Sizing the embedded capacity to A4 is the same as a P1 premium capacity but it can be paused when I don’t need to use it. The cost is about $8.50 per hour so MAKE SURE TO PAUSE WHEN DONE.

Here’s the Power BI Embedded capacity I created in my Azure tenant, sized to A4 and currently paused. It takes about a minute to start or pause the embedded service.

After starting the capacity, I jump over to the Power BI Admin Portal and click the capacity name to change settings. Note that because my capacity was added as an embedded service, it shows up in the Power BI Embedded page but the settings are the same for a Premium capacity.

Enable the Read/Write endpoint

By default, the XMLA endpoint will be set to Read Only. Under the Workloads group on the Capacity settings page, switch XMLA endpoint to Read Write and then save the changes.

Now any workspace set to use Premium capacity can be accessed with an XMLA endpoint address. I’ve created a test workspace in my tenant to test migrating an SSAS tabular database to a Power BI dataset. I select the workspace in the Power BI Portal and in the workspace settings, make sure that Dedicated capacity is switched on. I know that it is because of the diamond icon next to the workspace name. The Workspace Connection address is below. Click the Copy button to get the address on my clipboard.

Now, I can use that address in any tool that knows how to connect to SSAS tabular in compatibility level 1450 or above. Let’s try to connect using SQL Server Management Studio. I need a newer version of SSMS, 18.4 or higher.

Connect to SQL Server Analysis Services and paste the XMLA endpoint address for the Server name. You need to use Azure Active Directory authentication. If your organization uses MFA, you can use that option but I will choose regular AAD authentication.

…and viola! Connected.

I can run queries and scripts here in DAX, MDX, XMLA or TMSL. Most but currently not all capabilities are supported in the preview. In particular, if you have RLS roles, the members must be dropped and then added back in the Power BI Portal.

So far, I have scripted existing Power BI datasets and migrated them to Analysis Services projects in Visuals Studio, and then deployed to a new dataset from Visual Studio. The learning here is that migration is a one-way street from Desktop to Visual Studio. Whether development starts in Power BI Desktop or Visual Studio, there is no going back to Desktop. Ongoing development must be in Visual Studio.

Definitions: “Report”, “Database” and “Dataset

In the self-service Power BI world, the term “Report” has been used to mean at least two different things. With the queries and data model managed separately and deployed as a Power BI dataset, the term “Report” in this context means only report pages with visuals (don’t get me started talk about “dashboards”). A data model originating from a Power BI Desktop file is published as a dataset. Now that we see these objects through the lens of Analysis Services, a Power BI dataset is a Database.

In migrated projects, continue to author and manage reports in Desktop connected to the deployed dataset as an Analysis Services connection. You can actually switch the connection between an SSAS instance, an AAS instance or a published dataset using the XMLA endpoint address. I usually use query parameters for the Server and Database to easily change these values.

There’s a lot more that I will need to cover in later posts, but I’ll mention a few things briefly.

Best practices: After working with the XMLA endpoint, the choices now seem pretty clear to me but there was some confusion until I got to that point. Best practices will continue to emerge. In light of this and other recent announcements, I can say that I have a much clearer vision for how to plan and manage solutions (and to delineate between self-service and enterprise BI projects) than I did a year ago. If you have questions, please post them in the comments and I’ll do my best to address them in future posts.

Combining & querying datasets: The ability to use the the endpoint to query one data model from another model enables some very compelling composite model scenarios – but planning these solutions is important.

Large models & storage: The size and scale limitations are similar to those in AAS and generally only limited by your chosen Premium capacity. Since models are compressed and typically only include column data needed for analytic reporting, it is unusual to see datasets larger than a few gigabytes but Premium will support model sizes up to 400 GB.

By default, datasets deployed using the endpoint are stored in the Azure data centers using single file storage. This is fine for small models but after creating larger partitioned models, using large file storage will improve development and deployment performance by managing objects in multiple files on the instance backend. There will eventually be a UI for this setting but it is currently available only through the management API or via PowerShell.

COVID-19 Three Day Change Report

In my ongoing quest to present the Coronavirus data from the CDC and WHO in useful ways, I have created another set of report pages. There are to primary pages that show all countries with the percent of change in non-recovered cases in the past three days. Our previous reports provided a lot of information for deep analysis but it wasn’t so easy to make quick comparisons and to see how a state or country was progressing. This latest report connects to the same dataset but will let you see how the numbers have changed in the past three days across countries and states.

Click the following screen image to view the publish-to-web report. Just right-click a country to drill-through and view state/province three-day changes. The CDC and WHO provide state & province case data only for the United States, Canada and Australia.

I’ll update the GitHub repository and the template app with these additions as well.

Non-recovered cases are the number of confirmed cases less the number of recovered cases. The “Confirmed Cases” metric used in reports we see in most news programs and websites is the all-time accumulated total for a country or state. Attempts to report only the number of active cases has been inconsistent across geographies. This method to calculate the recovery rate seems most reliable at the country level but we don’t yet have consistent recovery numbers at the state and county level in the unified data set.

Roadtrip, Climbing Case Numbers and Public Perception

Having this information at our fingertips makes a huge difference. My wife and I are currently on an unplanned road trip attending to some urgent family business. In light of the outbreak, we are driving about 3,500 miles rather than flying. Driving through Oregon, Idaho, Utah, Arizona and New Mexico; it is apparent that the situation is perceived very differently in different areas. For example, the Navajo reservation is observing curfew and no one is allowed to be outside their homes under penalty of law. In New Mexico, absolutely everyone must wear a mask to go into a store. But in other places where we have traveled in the past two days, it is life as usual with little or no restrictions – even though the number of confirmed cases has risen by about fifteen percentage points in the past three days. Here in New Mexico, cases have increased by about 28% so we will be wearing our masks and cautiously practicing social distancing.

Countries with Improved COVID-19 Recovery Rates

Is there any good news in the COVID-19 Coronavirus data? Everyone is familiar with the tragic growing number cases and fatalities globally and in most countries that have high populations. However, several countries have increasing recovery rates. You can click here to view the new Power BI report shown in the image below.

The experts keep taking about looking for “the peak”, when the number of people recovering from infection increases faster than new cases. We’re not there yet but where is the recovery rate increasing? The “Recovery Rate (3 Day Change)” measure is the Recovery Rate (e.g. The number of confirmed cases divided by the number of recovered cases) from three days prior to the most recent report date. This report ranks countries by this measure.

On the left side are the countries highest ranked in improved recovery rates over the past three days (pictured as of April 10). The line chat on the right shows the top 20 countries with their trend of Recovery Rates over the past 30 days.

Among these are countries with small populations or those with a small number of confirmed COVID cases, so even a small number of recovered cases yields a high ratio. To remove some of these outliers, the report page only includes countries with 100 or more cases. Some could have had a high infect rate that has since slowed or it could be a lag in the available recovery date. This report uses the dataset from the main COVID-19 Daily Updates report but just focuses on recover rates.

By the time you read this, the top 15 countries by improved recovery rates may change. Right now, these countries have the highest increase in recovery rates. Why is this and what can we learn from these results?

  • Djibouti
  • Algeria
  • South Africa
  • Guatemala
  • El Salvador
  • Austria
  • Australia
  • Germany
  • Cambodia
  • Azerbaijan
  • Brunei
  • Iraq
  • Venezuela
  • Iceland

I Need Your Questions about Power BI and Paginated Reports

For a new series of posts to include tutorials and “how to” demonstrations for Power BI, SQL Server Reporting Services (SSRS) and Paginated Reports, I need your help to know what questions are most compelling and top of mind.

Please scroll to the Comments section at the bottom of this post and help me create a list with your questions and suggested topics.

What questions do you have and what do you need to learn how to do?

Make suggestions and cast your votes on others’ ideas for video posts, demos and tutorials you would like to see. I will add these to a list and work through as many as I can. The following are some rough topic areas and ideas to get started:

Power BI Solution Questions:

  • Power Query techniques
  • Data modeling techniques
  • DAX calculation techniques
  • Visualization best practices
  • How to design a specific report feature
  • Of the many choices or approaches to address a business need, which is a recommended practice?

Paginated Reports & SSRS Questions:

  • How do Paginated Reports work in the Power BI cloud service?
  • How are Paginated Reports and SSRS licensed from Microsoft?
  • When should I use Power BI vs SSRS or Paginated Reports?
  • Navigating between different report types