Power BI Data Modeling Sessions

This is going to be a very busy week for presentations Iam presenting with five sessions scheduled on the topic of data modeling!

Data modeling is the heart and foundation of a useful Power BI solution. If you can get the data model right, most everything else falls into place. Correct data modeling is relatively simple on the surface but can be challenging in practice. Evidence of the interest and importance of this topic is that I submit abstracts on a few different topics for these events and this is the one keeps getting selected.

Silicon Valley Microsoft Data Platform Meetup
Tuesday, April 20 at 6:00 PM Pacific Time

Power BI Summit
April 19-23 – multiple sessions and times

The session will be repeated three times for attendees in different time zones.

https://globalpowerbisummit.com/

BI & Analytics community the Norwegian Computer Association
April 21 5:00 PM Central European Time

Supporting materials are available on the Presentations page on my blog here: Presentations | Paul Turley’s SQL Server BI Blog. In addition to to the material and demonstrations I present in these sessions, I wanted to share some additional thoughts which I have recorded in this short video:

Drill-through from Power BI to Paginated Report – Report Recipe #4

Navigation between reports is the hallmark of an interactive reporting solution, enabling the ability to drill-through and see relevant details and contextual filtered information in a target report. Power BI and Paginated Reports both offer navigation capabilities suited for different purposes. Having the ability to navigate between reports can provide a tremendous amount of power and flexibility.

This is simple math: with two different report types, there are four different possible report navigation combinations. They are:

  • Power BI report to Power BI report
  • Power BI report to Paginated report
  • Paginated report to Power BI report
  • Paginated report to paginated report

Continuing this series of blog posts to highlight our recently published eBook about Paginated Report recipes, this post summarizes the technique detailed in recipe #4, “Drill-through from Power BI to Paginated Report On-premises”.

If you have worked with the native SSRS report drill-through using report actions, that capability is not yet baked into the product. It probably will be some day but likely not soon. That’s OK because this technique, using URL navigation, works quite well. There is one other twist which is that the URL parameters are a little different when comparing on-premises Paginated (SSRS) reports to Paginated reports in the Power BI service. The technique in in this recipe uses an on-prem target report. You can apply the same pattern in the cloud with sight modifications and I will post an example in the near future.

This image depicts the source Power BI report with links that navigate to the target paginated report, showing filtered details:

The mechanism to enable navigation is a report URL produced with a DAX measure. The code emits a URL with parameters that preserve the row and filter context of the selected item or record. This URL measure is exposed as a link in a table or matrix visual in the Power BI report, optional with a URL “link” icon:

To see the instructions in detail, follow this link to the recipe: 04-Drill-through from Power BI to Paginated Report On-premises | Paul Turley’s SQL Server BI Blog

Creating a Paginated Report Template – Report Recipe #3

This series of posts introduces the recipes in the Paginated Report Recipes eBook. In recipe #3, “Reusable Report Template”, Kathi Kellenberger demonstrates how to create a universal report template that you can use to create standardized reports that have consistent styling, branding and look-and-feel. This is an effective best practice to ensure that all reports meet organizational standards and conform to a standard style and format. Kathi’s approach provides tremendous flexibility by using parameters as variables to change several report styling elements. Thank you, Kathi, for contributing this recipe to the collection.

The Reusable Report Template recipe allows you to build Paginated reports and SQL Server Reporting Services reports in Visual Studio, starting with a new report template that allows you to easily switch-out a logo image, background, border and line colors and the report title in the header.

Back when I was doing a lot of SSRS project work, I used techniques similar to those described in this recipe to create “starter” report templates. At the beginning of a new project, I would create a standard template report and then create new reports from the template. I also used parameters and expressions to define dynamic styling properties. However Kathi has taken all of these tricks to the next level by building a comprehensive report template with all of these techniques built-in. At the center of the template is a custom parameter handler function that helps manage several styling properties in a single parameter and then maps them to various report settings.

Use the detailed instructions in this recipe to define your own standard report template with your corporate branding and style, and then use the advanced coding technique to swap-out styling property values. The report recipe is available here. The sample files and other resources related to this recipe are available here.

Paginated Reports Recipe eBook – first two recipes

First and most importantly, I want to acknowledge and thank these contributing authors who wrote report recipes and working samples included in this eBook:

  • Robert Bruckner
  • Kirill Perian
  • Kathi Kellenberger
  • Stacia Varga
  • Tomaž Kaštrun
  • Tom Martens

As I am working on posting the initial report recipes, I wanted to share some comments about the first two recipes. These recipes are step-by-step instructions that lead through the start-to-finish process using provided sample database, and can be used with SQL Server Reporting Services (SSRS), Paginated Report Server or Paginated Reports published to a Power BI Premium capacity workspace. Reports may be authored with Power BI Report Builder, Report Builder or SQL Server Data Tools/Visual Studio. Additional recipes will follow.

The eBook with all of the recipes will be accessible from the main menu here on the blog. Each recipe is published as a page rather than a post, so you don’t have to search and try to figure out when each one was originally posted. As of tonight, I have published eight of twelve recipes that have been completed thus far. I’ll get others posted in the near future. Please enjoy these and watch for more paginated report recipes here: Paginated Report Recipes eBook | Paul Turley’s SQL Server BI Blog

The first two recipes in the Paginated Report Recipe eBook are about applying alternate row and group shading. Dating back to the early days of printed reports, this style of reports were called “greenbar reports” because they were literally printed on continuous, pin-fed paper that had alternating shaded green and white rows to make it easier for someone reading the report to follow-along.

As the industry evolve from scrolling paper reports to laser-printed pages and then to on-screen reports, the greenbar effect was still useful. Recipe #1 introduces expression-based styling by changing the background color using the modulus (divisional remainder) of the row number in a table data region. If the row number is on odd number, set the background color to a shading color. Otherwise, set it to white or transparent.

01: Alternate Row Table “Green Bar” Report | Paul Turley’s SQL Server BI Blog

In recipe #2, contributing author Kirill Perian takes this technique to the next level by maintaining the alternate shading across column and row groups in a matrix data region. The method is a little more involved than just using the row number. He uses a hidden column to track the column group and then an expression to reference that value to manage the background shading for each row.

02: Alternate Row Shading in Matrix (with Column Groups) | Paul Turley’s SQL Server BI Blog

The end results looks similar to the simple table report using in recipe #1 but the problem is a little more complex in a matrix report because of the column groups. The advantage of this technique is that it will work across row and column groups at multiple levels.

Paginated Reports eBook Finally Released

At last, the second edition of “SQL Server Report Recipes” written by Paul Turley, Robert Bruckner and a host of contributors; is being released, a few recipes at a time. This time around, it will be a free book published through my blog and perhaps other sources. I’ve posted the introductory chapter and six report design recipes. You may also download the sample SQL Server database used in all the recipe reports. I sincerely hope that you and others will use this as a free resource to serve the common interest of the greater community. Please spread the word and send those who might learn and benefit to this page.

The work on this book really started about eighteen years ago…

Late December back in 2003; what a very special time for me! SQL Server Reporting Services was due to be released to the market and I had been using the pre-released version to integrate reports into a custom web application. After using a variety of products like FoxPro, Access and Crystal Reports, and after mastering essential T-SQL queries and .NET programming, using SSRS was a dream. It was simple and elegant, programmable and powerful. I was so excited that I wanted to tell the world, so I recruited authors to write our first book about Reporting Services. Three editions later, I teamed up with Robert Bruckner, one of the lead developers on the SSRS product team at Microsoft, to write a different kind of book. Robert and I had both blogged quite a lot about report design techniques. Robert had done some crazy things in SSRS that no one knew was possible. He found very creative ways to develop games using SSRS such as tic-tac-toe, hangman and Battleship. Honestly, there are easier ways to do game programming, but he proved that almost anything was possible if you were willing to think outside the box and maybe even draw outside the lines a bit.

Several contributing authors have worked tirelessly over the past difficult year to make this eBook a reality. Others, through no fault of their own, sign-up to contribute to the book but “the year that shall not be named” had other plans. At first we paused, and then we stopped and then we regrouped and decided to publish a shorter but comprehensive, modern kind of book.

Please visit the new Welcome and Introductory page with the Table of Contents to the current and future recipes: Paginated Report Recipes: 2020-2021 | Paul Turley’s SQL Server BI Blog

Doing Power BI the Right Way: 4. Power Query design best practices

Part of the the series: Doing Power BI the Right Way (link)

Although my professional focus is building enterprise-scale BI solutions, I’ve created my share of informal Power BI reports that were put together quickly, with the goal to create something “good enough” rather then achieving perfection. This guide is about designing proper and formal solutions but these practices apply to any Power BI project that needs to survive future maintenance.

When you need to load data into a data model, you have a few options and the right choice is going to depend on a few factors. This equation is usually a balance between quickly and conveniently generating a table or using a disciplined approach to get reliable data from a sustainable source of record. The following image shows that data can be transformed in the source (or before data is loaded into the source) or within Power Query using Power BI Desktop.

Convenience or Scale?

Life is full of choices and trade-off decisions. Let’s say that you need to create a lookup table containing sales regions and this information doesn’t exist in the source database. You could easily create a new table in Power Query using the “Enter Data” feature, and just manually enter the sales regions. This would solve the immediate problem with very little effort, but how will the table be maintained if sales regions are added or change in the future? We could keep the list of values in an Excel file stored in SharePoint for a business owner to maintain when information changes. You could also go all the back back to the beginning of the process and load a dimension table in the source database. IT professionals might take the hardline and say that all data must flow through the data warehouse regardless of the effort and cost. Which of these is the right choice? …it depends on the scope reporting project, and the long-term ownership of the solution. Sometimes quick and convenient are OK but particularly tables that need to scale and handle larger data volumes in the future, the following guidelines are critical.

I’ll get to the point and then explore the details afterward. Start with these fundamentals:

  • A proper data model is the heart of a Power BI report solution.
  • The purpose of Power Query is to shape and prepare each of the tables loaded into the data model.
  • The tables, fields and measures in the data model should be uncomplicated and user-friendly; intuitive and easy to navigate.

Recommended Practices

Following are the general best-practices I religiously apply when creating queries. After briefly describing each recommended practice, I’ll break it down and review some in detail.

PracticeExplanation
Use parameters for query paths & connectionsParameters are used to make the solution portable. Any connection information, like a file path or database server name, should be stored in a parameter so it can be changed without modifying query code.
For large SQL tables, reference either tables or viewsPower Query has optimizations built-in to work with different data connectors. Several connectors support query folding, where Power Query translates query steps into the native query language.
Starting with a SQL query rather than selecting a table or view from the list of database objects will ensure that query folding will not work. When possible, start with a table and if you need to use SQL to prepare data before loading it with Power Query, create a view.
Limit large table volume using data range parameters, compatible with Incremental RefreshIncremental Refresh enables the Power BI service to partition large tables and only load data that changes rather then the entire table when the dataset is refreshed. This was once a Premium-only feature that now works with shared capacity licensing with datasets up to 1GB in size. Even if you don’t intend to use the Incremental Refresh feature, using a pair of date range parameters allows you to filter large tables and keep the PBIX file size small. After publishing the file to the service, you can update the parameters and load more records.
Create two date/time type parameters named RangeStart and RangeEnd, and then add a date range filter according to these instructions.
Remove all unneeded columnsResist the urge to leave columns that you are not sure that you need for reporting. In each query, remove all unneeded columns early in the sequence applied steps. The easiest way to do this is to use the Choose Columns button on the Home ribbon and deselect columns. To change the selection later, click the gear icon next to the Remove Other Columns step.
Rename using title case for all table names and column names that will be visible in the data modelAlthough it may seem trivial, it is absolutely necessary to apply friendly naming conventions to all tables and fields. Chris Webb wrote an excellent post and about object naming conventions. As a rule, rename all columns that will not be hidden in the data model, using friendly title names (with spaces and mixed case).
There is no need to rename primary key, foreign key and other utility fields. After tables are added tot he data model, hide those fields to remove clutter and confusion for report developers (even if you are the report developer).
Explicitly set data type for all columnsColumn renaming and changing the data types an be time-consuming work but be meticulous about check every column returned by the query.
Consolidate multiple steps of the same type (such as change type and rename columns)As query design evolves, you will inevitably create inefficient queries with redundant steps. There will always be opportunities to improve the design, often by consolidating and reorganizing steps.
Rename key steps to document purpose and aid future developmentRenaming query steps allows you to understand the function and purpose of each step in the Applied Steps list. This creates a self-documented series of operations that will be easier to maintain down the road.
Add code comments in the M script and/or step descriptions to document queriesYou can further document query steps by changing the step Description in the Properties window or by adding code comments in the Advanced Query Editor.
Use steps that promote query foldingMany query steps allow records to flow-through without blocking subsequent steps. There are also certain transformation steps that must read all the records into memory to apply the transformation.
Move steps that break query folding as late as possibleSteps that support query folder, that can be translated into the native query language of the data provider, should be performed first. If non-foldable steps can’t be avoided, they should be performed as late as possible in the sequence of query steps.
Perform Pivot, Unpivot, Transpose transformations only on small result setsThese transformations must load all records into memory. They are powerful but have volume and performance limitations. Test them with production-scale source data.
Create row-level calculations in Power Query instead of DAXRow-level calculations and derived column values can be performed either using Power Query or DAX, but M s a far more capable language for data transformation. Effective data preparation using Power Query will simplify and ease the burden of data model design. Regardless of the relative efficiency, if you perform all data preparation and transformation in one place, this can simplify maintenance in the future. There are less-common cases where DAX is the best choice to create calculated tables and calculated columns outside the scope of a single row.
Avoid using DirectQuery when not warranted by latency and data volume requirementsDirectQuery has it’s place in data model design but it should be an exception to the normal pattern of importing data into an in-memory data model. DirectQuery tables can work with simple aggregation calculations but don’t perform well using many other DAX functions.
Use DirectQuery to aggregate values over very large tables that won’t otherwise fit into memory, or to support drill-through into non-aggregated detail records. These are advanced design patters that should be treated as rare and exceptional.
Avoid using native SQL queries, especially with large source tablesNative SQL queries don’t support query folding so they should be avoided when possible. This may be a an acceptable exception to load low-volume query results but generally use a database table or view for query data sources.
Use Date/Time values for dates (use Date or Time)Load column data only at the necessary level of grain so values in the data model will compress. In particular, reduce date/time values to date only. For time-level resolution, store date and time values in separate columns.
Import or create Date/Time tables in Power Query rather than using auto-generated date hierarchiesAuto-generated calendar tables in Power BI Desktop are fine for self-service projects but purposely-built date tables will afford more flexibility.
If available, use a date dimension table in your data warehouse or source database. Secondarily, generate a date table using Power Query.
Date tables can effectively be created with DAX functions but if all tables are loaded with Power Query this gives you the convenience of managing all tables centrally.

Breaking It Down

Let’s take a deeper look at some of these recommendations.

Use parameters for query paths & connections

For every file path, web address or server name in source connections; use a parameter. It is not hard to go back and edit source connection information using the Advanced Editor but the easiest way to build parameterized connections is to build them as you go.

Start by enabling the feature “Always allow parametrization in data source and transformation dialogs” on the Power Query Editor page in the Option dialog.

As you build each query connection, for most connection types, you will be promoted to select or create a new parameter.

Here is an example of the parameters in a demonstration project. Without modifying any code or editing a query, any of these values can be changed easily.

Here are two examples of parameterized connections. For the SQL Server connection, the server/instance name is passed as the first argument to the Sql.Database function. The second example concatenates the folder path (stored in the SourceFolderPath parameter) with the file name to create the fill folder and file path.

For large SQL tables, reference either tables or views

The best way to guarantee poor query performance with a relational data source is to start with a hand-written native query (like SQL) and then perform transformations on the results.

If a query is based on a relational table or view, Power Query can generate the native SQL (and a few other supported languages) with a SQL statement rather than selecting a table or view.

Use Query Diagnostics to Diagnose and Performance Tune

The following image shows the results of Power Query Diagnostics which I describe here: Power BI Query Performance & Query Diagnostics | Paul Turley’s SQL Server BI Blog. This post demonstrates how to capture timings for each step and overall queries.

Doing Power BI the Right Way: 4. Power Query in Dataflows or Power BI Desktop

Part of the the series: Doing Power BI the Right Way (link)

Power BI, more than any other Microsoft product in my recollection, offers more options and choices to architect and deliver a solution. Without compromise, Power BI can effectively be used to do anything from create a simple chart using an Excel spreadsheet, to enterprise reporting and analytics on a Fortune 100’s massive data warehouse. At the end of this post, I will share a comprehensive list of resources and insights from Matthew Roche, a Program Manager on the Power BI Customer Advisory Team (CAT). To tease that series, I’ll start with this quote from Matthews’s blog:

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain of change has become outweighed by the cost and pain of maintaining course.

Matthew Roche, Building a data culture – BI Polar (ssbipolar.com)

When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparent when the business reaches a certain stage of data culture maturity.

Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.

Solution Architecture Choices

On a scale of one to ten, with ten being the most formalized, strictly-governed and complex corporate reporting platform; the self-service Power BI option might range from one to four.

For the self-service data analyst, working entirely in Power BI Desktop, data can be imported and transformed using Power Query. Tables are modeled, calculations are defined and data is visualized. This mode is simple and works well for small to moderate-scale solutions with less emphasis on data governance and centralized control.

Even using this simple approach, data models can be developed separately from reports, certified and shared with multiple report developers and self-service report authors. So, to a point, business data can be managed and governed – but the queries in the Power BI solution read directly from source systems or files that are not curated for analytic reporting.

Data Warehouses and Data Marts

The “single version of the truth” or “golden record” repository, a data warehouse (or smaller-scale “data mart”) is the ideal solution to store and manage reliable corporate information. The challenge with creating a central data warehouse to manage centrally-governed organizational data is that it is costly and time-consuming, however the trade-off is that self-service data models can be inaccurate and out of date. When business leaders need answers quickly, it is not always feasible to add more data sources to a data warehouse quickly.

On the complexity scale of one to ten, versions of this option might be from seven to ten.

A conventional DW/BI solution typically uses on-prem data transformation tools like SSIS to stage and transform source data into a central data warehouse built using a relational database product like SQL Server. Although viable for on-prem systems, this old-school architecture model doesn’t embrace scalable and cost-effective cloud technologies.

The first generation of the Microsoft cloud-based modern data warehouse can utilize several different Azure services. The components in following example are easily equated to the conventional data warehouse solution in the previous example. Azure Data Lake services as the staging environment typically using text files and structured file storage as an inexpensive landing area for ingested source data. Azure Data Factory is used to orchestrate and transform files and data streams into and out of the data lake – and the data warehouse. Depending on the need for scale and size, Azure SQL Database or Azure Data Warehouse (now called Azure Synapse) may be used for data warehouse storage.

If your organization has a comprehensive data warehouse to serve-up all or most of the data needed for analytic reporting, this is probably the best fit for a Power BI solution in your business environment.

Constructing an enterprise data warehouse solution is not a trivial endeavor, often involving as much effort to negotiate business process challenges as the technology development to implement the solution.

The newer generation of the Azure modern data warehouse is a best-of-breed collection of tightly-integrated cloud services called Azure Synapse Analytics. Compared to the previous set of independent Azure services, Synapse Analytics provided a unified development and management interface. Apache Spark and other industry standard technologies designed for data science and platform-agnostic analytics provides the open source data prep engine. Azure Synapse is the evolution of Azure Data Warehouse, Microsoft’s read-optimized, scalable massive parallel-processing (MPP) SQL-based database engine.

Power BI Dataflows

Dataflows can fill an important gap between purely self-service data prep and formal data warehouse solutions. If you don’t have a comprehensive data warehouse to meet your analytic reporting requirements but need to provide more data quality control over standardized entities, incorporating dataflows might be the ticket.

In its simplest form, dataflows provides reusable transformation logic (queries) that can be shared by multiple Power BI data models. Using dataflows deployed to a workspace can save data model developers from repeating the same transformation steps in multiple datasets. But these are more than just Power Query scripts stored in the cloud.

A long list of capabilities are enabled by using dataflows. They can provide integrity and standard entity definitions stored in Dataverse (previously known as the Common Data Model) to enforce standard naming, data types and schema compliance among other features.

In Premium Capacity, dataflow results may be persisted in Azure Data Lake Gen2 storage. This essentially allows you to use dataflows to create a moderate-scale data warehouse without a big investment. Entities may be linked to related entities which creates virtual joins and referential constraints. Other Premium features include DirectQuery, Computed entities and Incremental refresh – all managed in the dataflow rather than for each dataset. Integrations with Azure AI, Machine Learning and Cognitive Services allow you to utilize AI features without writing code. For example, in a recent project, we used AutoML on a dataflow containing high school student data to predict graduation outcomes.

flow of data

Dataflows Start with M Queries

Dataflows begin with an M query, just like the queries in Power BI Desktop before adding the additional capabilities mentioned earlier. Queries are authored entirely in the browser but migrating from Power Query in Power BI Desktop is fairly easy. Start with a Power BI solution (PBIX file) in Desktop and open a query in the Advanced Query Editor. You can create a new dataflow in the browser and then copy and paste the existing query M code from Desktop to the dataflow designer. You do have to copy each query one at a time and there just a few compatibility differences but for the most part, it should be a one-to-one transfer.

Building a Data Culture

Matthew Roche from the Power BI Customer Advisory Team has an excellent 17-part blog series about Building a data culture. Dataflows sit at the crossroads between business process, data stewardship and technology. The industry has been throwing technology and software at data governance and quality problems for decades, with marginal success. It is much easier for data practitioners to acknowledge that these are multi-faceted business culture challenges than it is to formulate a plan to be successful. If anyone can effectively carry and delivery this message, it is Matthew. In this video series, he offers prescriptive guidance to enlist an executive sponsor, work with business stakeholders, and to navigate the landmines of a business landscape to a successful data culture transition.

Honestly, I’ve only pursued this series in fragments over the past year and now that I’ve caught the vision, I plan to watch the entire series start-to-finish. It is that good. Think of it as Game Of Thrones with data.

Resources

Matthew also provides a comprehensive list of Power BI Dataflows resources here. Matthew recently presented to our 3Cloud Power BI and Analytics development team about using dataflows to promote a data culture. This presentation was an epiphany for me, that helped to better understand how dataflows fit into the BI solution puzzle – that’s when the gauge metaphor popped into my head. I encourage you to watch and perhaps you will have a similar moment of reckoning.

The Power BI Adoption Framework is a set of presentations from Microsoft that can serve as a checklist of important tasks and areas that should be covered in any Power BI implementation, large and small. These decks are also a great tool for adopting and sharing your organization’s BI and Analytics strategy with business leaders and stakeholders. You can use them a s a picklist to assemble your own presentations.

A First Look at Gen 2 Composite Models with Live Power BI Datasets

About three years ago when the data model development engineers from the Power BI product team told me they were working on the ability for tabular data models to share other published data models, that sounded amazing and almost too good to be true. They said it would be a lot of work to make this a reality. The Utopian dream of distributed datasets was publicly announced on the Wave 2 Roadmap last year and then the feature went to Preview in December with this announcement: Using DirectQuery for datasets and Azure Analysis Services (preview) – Power BI | Microsoft Docs. We’ve seen the new feature demonstrated and explained several times but now I’m exploring this exciting new feature for myself.

For this test, I have two different data models in separate PBIX files. One published dataset contains Internet Sales information from the ContosoDW Azure SQL database and the other contains Store Sales information. These models contain many of the same tables and a few unique tables. Notably, each data model contains a different fact table. Here are the two published data models in the Contoso Sales workspace:

After making sure that I had upgraded to the December 2020 version of Power BI Desktop, I open the Options dialog and enable the preview feature: DirectQuery for Power BI datasets and Analysis Services.

After restarting Power BI Desktop, I use Get Data and choose Power BI datasets. After connecting to the workspace and selecting the ContosoDW Internet Sales published dataset, the local model is now connected using a live connection.

After connecting to the remote model, all the tables are listed in the Fields list. At this point, I’m using a capability that has been available for a long time: one Power BI report file connected to a separate, published dataset. Previously I would not be able to add additional dataset connections or import from other sources into a live connected data model.

The status bar shows a new message with a link to enable changes to the model. On the Model ribbon, a new button is also displayed. Using either of these options will modify the local model metadata to allow multiple live dataset connections.

When I click the link Make changes to this model, a dialog box allows me to confirm that I want to switch the live connection to a DirectQuery connection and prepare the local model to allow additional DirectQuery connections.

What happens if I start with an existing local model?

The first important lesson I learned was this if I were to create a new PBIX report file containing existing tables and then connect to a published dataset, this does not enable the ability to make changes to the local model.

With an imported table in the data model before using Get Data to select and connect to a published dataset, the status bar shows this message:

I found this to be unintuitive: The text “(click to change)” is not a clickable link. At this point there is no way to update the local model with the necessary changes. I had to start with an empty PBIX to add the first live connection. After that, I could use Get Data to import additional tables from others sources, but not the other way around.

Adding More Dataset Connections

Next, I use Get Data to connect to the ContosoDW Store Sales published dataset. After the local model had been converted, there was nothing particularly unique about the experience. I could just connect to an additional dataset and all of the tables will simply show up in the data model diagram view. The tables from each model are displayed with a different colored heading, in this case they are blue and red.

I had several duplicate table names after connecting the second data model. The second set of tables were renamed and postfixed with the number 2. I renamed the tables that I wanted to keep visible. The redundant tables that I don’t need are displayed on the right side of this diagram layout. Tables in a remote model cannot be deleted from the local model. I used the Properties pane to set the unused tables to be hidden.

If I were connecting to a large remote model that had a lot of tables just for the sake of bringing one table into the local model, it would be cumbersome to hide all of the the unneeded tables – so I hope there will be an easier way to manage this in the future.

I opened Tabular Editor from the External Tools ribbon and was able to use it just as if I were designing a normal import model data model, to rename tables and set properties with multiple objects selected.

Adding relationships between tables from each remote model was pretty much the same process as it would have been in a local model. I can use the Manage Relationships dialog and use the drag-and-drop method in the diagram view. The latter method opens the Manage Relationships dialog rather then simply creating the relationship.

Finally, I added a new Combined Sales Qty measure to one of the remote tables in my local model that references an existing measure from each of the remote models. For demonstration, the table visual shown below includes the Order Date field which is related to each of the two fact tables (Online Sales and Store Sales), the Online Sales Qty measure from the Internet Sales remote model, the Store Sales Qty measure from the Store Sales remote model, and the new Combined Sales Qty measure which I added to the local model:

After publishing the new composite data model and report, the Lineage View shows that the Combined Sales dataset has connections and dependencies on the two remote datasets that show up as Analysis Services DirectQuery connections, apparently using the XMLA endpoint in the Power BI service.

Once I worked through the design process a couple of times, the data model behavior is seems to be exactly the same as always, and designing reports is no different at all. I’m just scratching the surface and I’m anxious to dig deeper and compare notes with others in the community. I have no doubt that behaviors, limits and issues will arise with this brand new and very compelling capability but I’m very encouraged so far.

Power BI: The Week in Graphics

If you’re like me right now, you need a distraction. I got started on a new blog and tweet about project management best practices and after yesterday’s events, just had to put it on a shelf. So, a couple quick things…

Power BI Top Features

Melissa Coates, long-time BI community rock star, published this amazing infographic depicting all of the features and services in the Power BI ecosystem. Next time you hear someone incorrectly describe Power BI as “just a visualization tool”, show them this:

You can download a hi-res version and get more background here: Diagrams — Coates Data Strategies

BI-Survey Searchable Maps

I recent heard from Donald Farmer, one of the original architects of the Microsoft Business Intelligence platform. Donald went on the work for QlikTech and now runs an advisory practice called Treehive Strategy Services. Donald has been working with the BI-Survey, one of the leading analyst firms in Europe, to enable an interactive map of the Azure analytics ecosystem. Similar to Garner and Forrester research, BI Survey compare industry platforms and products but do so using survey data. I’ve been participating in the BI-Survey for about 12 years and find it to be a comprehensive and insightful resource to compare BI tools, among other products. They just released these interactive guide maps for Power BI and Azure Services, Partners and Success Stories. These are just a couple of examples:

Each map has a bunch of filtering and search filters that allow you to visualize the heatmap based on pricing, features and a variety of categories. These are really useful tools.

Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)

Part of the the series:  Doing Power BI the Right Way (link)

Data modeling 201-301 – continued from part 1

So far, you’ve seen that the essential components of a data model include tables related to each other. Filtering records in one table prorogates the filter to related tables(s), causing records in the related table to be filtered. This dimensional model schema is the foundation of the reporting solution. Once the fact and dimension tables are in-place, you can create more advanced solutions by working outside of this standard relationship pattern.

Using Disconnected Tables

Disconnected tables are useful in a variety of ways, allowing a user to select a value that drives the behavior of a dynamic measure, or by grouping values and performing some kind of dynamic calculation. Let’s focus on one of the disconnected tables shown in this figure, the Selected Measure Period table:

This table was created using the Enter Data feature and includes only one column named Period, which includes the following values:

Current
MTD
YTD
Same Per LY
PY

The exact text for these values isn’t really important because I have written specific DAX code to work with them.

The purpose of this post is to demonstrate effective data modeling and not DAX fundamentals but one cannot exist without the other. For our purposes, my model in this example contains five measures that calculate the Store Sales Qty with different time series variations. The Store Sales Qty measure simply returns the Sum of the Quantity field and the Store Sales Qty MTD calculates the month-to-date value for a given date. I’ve created another measure named Store Sales Qty Selected Period which is shown below. When a value in this table is selected, this measure returns the results of one of the other measures.

The following three images show cases where the Store Sales Qty Selected Period measure is used as the Value property of a matrix visual. Each row represents a date from the Order Date table. The slicer above the matrix allows users to select from one or more Period values from the Selected Measure Period table. By selecting only the value Current, the measure returns the Stores Sales Qty measure value. By selecting YTD the Store Sales Qty YTD measure value is returned.

The third example adds the Period field to the matrix Column group. For all values chosen in the slicer, a Period value is displayed in the column heading and the corresponding measure value is displayed in that column.

Here’s another example used to calculate the daily Orders Pending Delivery. The column chart below shows the daily number of orders where an order has been placed but not yet delivered. The dates on the X axis of the chart are from a disconnected table named Select Effective Date.

The logic for the Online Orders Pending Delivery measure depicted by each chart column is the number of orders where the Order Date is less then or equal to the selected date and the Delivery Date is greater than the selected date, or a delivery dat doesn’t yet exist. Here is the measure definition:

Once again, the disconnected table helps us work-around the normal constraints of related tables. The theme here is to use relationships to build the foundation and use them for primary reporting functionality. Then, use disconnected tables to take the solution to the next level with dynamic calculations and creative problem-solving.

The Answer to Life, the Universe and Everything: 42

Every data model should be designed to fulfill specific reporting requirements. This can be challenging when creating larger and more complex models. This isn’t to say that we can’t design utility models to answer a lot of different business questions but it can be difficult to create the utopian data model than can answer every question. In years past, I’ve seen attempts to create something like a “master cube” that contains nearly every table and field in the data warehouse, in an attempt to let users get insight about anything they could possibly imagine. There is a balance here, which you will see in the following section.

Relationships Don’t Always Work Out

The filter propagation from any one table in the data model to any other table can only use one set of relationships. The problem can be illustrated in the question: How do we determine the geographic location for Online Sales orders? Take a look at the following model diagram: Customers have a geographic location as evidenced by the GeographyKey in the Customer table, which is related to the Geography table using a column with the same name. Easy answer.

What about Store Sales? The Store table has a GeographyKey but there is currently no relationship from Geography to Store. That should be easy to resolve, right? Let’s just create a relationship from Geography to Store by dragging the GeographyKey column…

What the crap, Man! Why can’t we create the relationship? The warning message says “because this would create ambiguity between tables Geography and Online Sales”. Yep, sure enough, if that relationship were active, the data model would have two different paths between Geography and Online Sales and wouldn’t be able to decide whether to follow the filter through the Customer table or the Store table. This is a little bit like Google Maps selecting one of multiple routes between two locations. It might tell you that you can take an alternate route and add 15 minutes to your drive but you can only take one route or the other.

So, how do we deal with this problem? …through the magic of DAX, of course. We don’t need no stinking relationships! Rather than relying on natural filter propagation through a relationship that we can’t implement, we create a measure that implements a pseudo relationship using the DAX TREATAS function. Here, the Store Sales Qty measure is explicitly taking all the selected GeographyKey values in the Geography table and creating an in-memory table using the VALUES function. The TREATAS function is saying “take this table of values and treat it as if it were the GeographyKey in the Store table”, effectively filtering Store records using the selected Geography records.

The result is simply that Store Sales Qty can be grouped and filtered by any field in the Geography table.

I have a unique appreciation for the TREATAS function which was introduced late, after we began to embrace DAX in Power BI, SSAS Tabular and Power Pivot. I was in the Community Zone at the PASS Summit in Seattle some five years ago. Someone asked for help with a similar problem which was a challenging but interesting puzzle. Chris Webb happened to be passing by so I dragged him into the conversation. We spent well over an hour hammering out DAX code on a laptop before he presented a solution consisting of CROSSJOIN, FILTERTABLE and SUMMARIZE functions. I don’t remember all the details but it was complicated. Chris has avoided me ever since (I’m kidding… Chris is a super nice guy, enjoys a challenge and always willing to help).

The underpinnings of many DAX functions are a sea of internal logic and code meant to solve specific business problems. Spending time with experts like Chris, Marco Russo and Jeffery Wang (who architected the DAX language) have helped to to appreciate the immense power yet simplicity that this language offers.

Who Needs Relationships Anyway?

One more example. This one is just for fun and the point is to underscore that although a data model with properly-formed relationships is very important, this should not be the limiting factor in a project design. Take a quick look at the following Power BI report page. What do you see?

There are several report visuals that group the Online Sales Qty measure and time-series variations of this measure by different dimension attributes. You would expect there to be a rich data model with several related tables to support all this functionality, right?

Here is the diagram view of this data model. It contains only one relationship and even that could have been eliminated to further make the point.

How is this possible? Isn’t the whole point of a data model to make this type of data visualization and analysis possible? What’s going on here? The Online Sales Qty measure apply the same DAX code pattern that I showed you before, using the TREATAS function to build virtual relationships on the fly:

How does this measure perform compared with a conventional measure in a “proper” data model? With a few hundred thousand fact records, it performs just fine but in a more complex data model with a larger volume of data, this might not be the best approach. The primary reason that performance could degrade is that the measure code causes queries to use the tabular formula engine rather than the storage engine, which is typically faster. My point with this demonstration is that you can use similar techniques to do just about anything you can imagine.

Conclusion

Start with a well-crafted dimensional data model built using best practice design patterns and rules. Use exceptions when needed to address specific requirements – in the form of flattened tables and master-child relationships. Then you can graduate to using special cases like no relationships at all, to address specific use cases and reporting needs. Start with the basics, adhere to the rules and then make exceptions as needed.

As usual, I had a few more things in-mind to cover when I started this post that will be deferred to later posts, to be covered in greater depth. These include:

  • The current state and future of Power BI data model design
  • DirectQuery and Gen 1 composite models
  • DirectQuery over Power BI datasets & SSAS models (Gen 2 composite models)
  • Using External Tools for advanced data model design
  • When and how to use calculation groups to standardize measure variations

Supporting Resources

Star Schemas
https://www.sqlbi.com/articles/the-importance-of-star-schemas-in-power-bi/https://docs.microsoft.com/en-us/power-bi/guidance/star-schema

Many to Many Relationships
https://docs.microsoft.com/en-us/power-bi/guidance/relationships-many-to-many

Model Optimization
https://docs.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction

DirectQuery Guidance
https://docs.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance

Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (1 of 2)

Part of the the series:  Doing Power BI the Right Way (link)

Data Modeling 101: part 1 (more advanced techniques in part 2)

A data model is the foundation of analytic reporting. It provides structure and order over information that might otherwise be chaotic and untrustworthy. Loading data into a properly designed model ensures that it conforms to some essential rules that provide better performance, reliability and accuracy.

Different Schemas for Different Needs

One of the most important lessons I have learned about data modeling over the past 20+ years is that there isn’t one model that fits all business needs. However, there are simple patterns we can follow to model data according to different business needs. Each pattern or schema has a different set of rules. At a very high level, we can think of these schemas in three different categories.

When reporting requirements are simple and unsophisticated, sometimes the easiest way to model a set of data is to transform it into a single, flat table. There are several trade-offs and limitations when using this approach, though. This is typically the goal when bringing information into a spreadsheet so you can simply add-up a column of values, perhaps filtered by other columns. It might seem natural to use the same approach when making the transition from Excel to Power BI. Here is an example of retail order data organized into a flat table:

Look over this sheet before moving to the next figure. If this data were stored in an Excel sheet, how could we use it for reporting? You could filter by the Order Date column and add up the Quantity, Unit Cost and Unit Price. You could also apply filters to the Company Name, Category, Subcategory or Product Name columns. For simple reporting, this could meet our needs.

The master/detail pattern is typically used with transactional data. Master records, like orders or invoice headers, are related to detail records such as the order details or invoice line items. The master/detail pattern can be repeated with multiple tables to create a normalized database. Use this pattern sparingly to support drill-to-detail reports. Including unnecessary detail in an analytic data model uses a lot of memory and storage, and can impact report performance.

When relational database professionals are introduced to Power BI, they will often see DirectQuery as a simple means to leverage their familiar and existing relational database or data warehouse, only to be disappointed by poor performance and other limitations. However, master/detail schemas are where DirectQuery can provide a best of both worlds solution so long as the analytic needs over detail data is light. If we need to provide the ability to drill-down to detail records from an interactive report without needing to perform calculations over the entire detail record set and avoid sucking billions of records into memory, DirectQuery might be the ticket.

Dimensional Modeling 101

The rules of dimensional modeling are a lot like the laws of physics in that the rules of dimensional modeling were really discovered rather than invented. By applying dimensional design patterns, data will naturally behave and reports will perform well. Dimensional rules really help describe the way data behaves when stored in the most optimal form for reporting and analysis.

The Kimball Group are credited as the pioneers of dimensional modeling and although the books and papers that Ralph Kimball and his colleagues authored are 20-30 years old, they are as relevant today as ever. I’ll cover some of the basics but you can find a comprehensive reference guide to dimensional modeling patterns and techniques on the Kimball Group site here: Dimensional Modeling Techniques – Kimball Group. There have a been a few recent publications employing modern examples in books such as Star Schema – The Complete Reference and Agile Data Warehouse Design. Both are excellent books but unless you are undertaking a huge data warehouse project with complicated data problems, understanding the essential concepts is sufficient.

By organizing information into fact and dimension tables, we store as little data as possible while maintaining consistency and data integrity. In data design, we often refer to “entities” and “attributes”. An entity is something we keep track of (like a Customer or Product). Entities have attributes which describe specific properties. In the analytic data world, entities are usually materialized as dimension tables and each attribute is a column or field.

A fact table contains the numeric columns used to summarize and aggregate measure values, along with the key columns that are related to the dimension tables. Dimensions contain the attributes used to group and filter the business facts. Fact records share the same level of grain across all dimensions. For example, if domestic sales orders and international sales orders each have the same level of detail for dimensions like Customers, Products and Order Date, these records could be stored in the same fact table. But, if Sales Targets are applied at the month level rather than the date, they must be stored in a separate fact table.

The essence of a dimensional model is the star schema, simplified here to show one fact table related to a small number of dimensions.

A real application of the star schema design is shown in the following image. The Online Sales fact table contains key columns that are used to relate this table to each of the dimensions. The fact table also contains numeric type columns used to define measures which aggregate and total numeric values such as Net Price, Quantity, Unit Cost, Unit Discount and Unit Price. You’ll notice that the relationship from each of the dimension tables to the fact table is one-to-many and filters records in one direction, depicted by the arrow on the relationship line. For example, the relationship from the Customer table to Online Sales is based on the CustomerKey column in both tables. The relationship filter direction indicates that if the Customer table were filtered on a report, Online Sales would be filtered accordingly. However, if the Online Sales table were filtered in a report, this would not cause the Customer table to be filtered.

According to the Kimball Method, there are four stages to define fact and dimension tables:

  1. Select the business process
  2. Declare the grain
  3. Identify the dimensions
  4. Identify the facts

Seems pretty simple, right? It can be at a macro level until you need to get into the weeds. The trick is to maintain a high-level business perspective as you work through the the details to solve data problems. I find this to be a useful whiteboard exercise when starting a new project. Using only business tools and business language, diagram data entities at a high level using this approach. Next, break the process down using these steps:

  1. Load detailed atomic data into dimensional structures
  2. Structure dimensional models around business processes
  3. Ensure that every fact table has an associated Date dimension table
  4. Ensure that all facts in a single fact table are at the same grain or level of detail
  5. Resolve many-to-many relationships in fact tables
  6. Resolve many-to-one relationships in dimension tables
  7. Store report labels and filter domain values in dimension tables
  8. Make certain that dimension tables use a surrogate key
  9. Create conformed dimensions to integrate data across the enterprise
  10. Continuously balance requirements and realities to deliver a DW/BI solution that’s accepted by
  11. business users and that supports their decision-making

Something to keep in mind is that these rules and steps apply equally to formal, IT-based model design and to self-service BI projects. However, the tools have been modernized over the past decade or so; along with the degree of strict adherence to the rules. Simply put, in the new age of self-service BI, everything is negotiable. But something else to keep in mind is that bending or breaking the rules nearly always comes at a cost. For example, step 8 prescribes creating surrogate keys to identify dimension records. You can use the original business keys throughout the model but run the risk of encountering key collisions or other data quality issues.

In some cases, the lines that separate facts and dimensions can become blurred, which may need to be considered in your model design. For example, I recently developed a BI solution where we counted employee hiring events and a row in the Hires fact table was simply a record of the event when an employee was hired along with information like the hiring date and position. There were no numeric columns to sum-up so we simply needed to count these records. This is often referred to as a “factless fact”. Another exception to the typical star pattern are cases where a fact table contains a value we could use for grouping or filtering, and it might be pointless to extract it into a separate dimension table. This is known as a “degenerate dimension”. These are examples of cases where you could arguably apply the dimensional design rules strictly out of principle or err on the side of simplicity. The right answer for each case should be driven by the business reporting requirements while considering the the reliability and matainability of the overall solution.

Let’s apply the process to the flat table of sales orders. Here it is again with some of the dimensions and facts identified:

In this example, there are three separate date columns that record the date an order was placed, the date it was due and the date that the order was delivered. Customer information includes the Company Name and a unique CustomerKey which could be used to lookup detailed customer information. There are three numeric columns that can be summarized or aggregated. These columns could also be used to perform more complex calculations like ratios, running totals and time-series comparisons.

Dimensional Concepts

Now, let’s review some common practical dimensional modeling applications.

Role Playing Dimensions

Dimensional entities can play more than one role in reporting. In the flat table, the three date columns have completely different purposes but all store the same type of value; a date. But dates can be used to group and aggregate data in date part groups like month, quarter, year or fiscal periods. They can be used to perform time-series calculations like month-to-date or same period in the previous year. To do this, you absolutely must have a date dimension* table containing a contiguous set of date records over a period of time. For our purposes, we might need all dates for the past five years through the end of the current year. Date dimensions are the most common example of role-playing dimension but let’s look at another scenario before looking at at more date examples.

*Power BI Desktop includes features that will automatically generate a date dimension table, and there are DAX functions you can use to generate date tables in code. Although useful in certain cases, I prefer to manage date tables the old fashion way so I have one constant method for importing and managing all tables. By comparison to other data challenges, building date dimension tables is a relatively simple task.

The following image shows a fact table named Flights, with two role-playing Airport dimensions. An airport is, well it’s just an airport… it has a three-letter code and some information about the geographical location. But with respect to any given flight, an airport can serve two functions: to depart from or to arrive to. In this example, it is necessary to materialize the Airport table twice: the Departure Airport and the Arrival Airport.

With two role-playing Airport dimensions, a report user could ask for all the flights from Japan to Australia on a given date.

Back to the retail sales example, let’s say that our Sales fact table has two date keys; OrderDate and ShipDate. A single Date dimension can only have one active relationship. The following image shows the result of the autodetected relationships between Sales and Date, where one of the relationships is inactive and effectively useless for most practical purposes.

This image shows three separate instances of the Date dimension table imported using referenced queries in Power Query. The three tables: Order Date, Due Date and Delivery Date use up a tiny bit more memory than a single instance of the table but give report users more flexibility to answer business questions.

A Table for Every Date Column?

An attendee in a data modeling session recently asked this question. She said: “I have 14 date columns in my fact table used to track a variety of different events. Do I need to create 14 different Date dimensions?” My answer was “certainly not.” This is an extreme example but a very good question. You only need separate date dimension tables when you need to have the flexibility to use time-series functions in DAX or to perform comparisons using date part fields (like Year, Quarter or Month). Operations like calculating the difference between two date type columns or counting rows with future dates can be performed using fields in the fact table. I’ll show you an example in part 2 of this blog series. This is one of those “it depends” situations where you must make a judgement call depending on your business reporting requirements, balancing simplicity and maintainability with complexity and flexibility.

Many-to-Many relationships and Bi-directional Filters

File this under “don’t do this unless you have to.” Many data modeling decisions are a trade-off between performance and functionality; and using iterative design, you will often find a better way to solve problems as you go. There are a few different ways to design for many-to-many relationships. The conventional approach is to use a bridging table that contains all of the combinations of keys that relate two tables together. In the following example, the Customer and Product dimension tables normally would have a single-direction filter from the “one-side” to the “many-side” of the relationship. If reporting requirements call for filtering products based on selected customers who purchased them, we can use the Sales fact table as a bridge table and change the relationship between Product and Sales to use a bi-directional filter.

Depending on the cardinality of the relationship, using a bi-directional filter could introduce a performance penalty. If we only have 100 products and less than a million Sales records, this is probably not a big deal. If we have 10,000 products and 100 million Sales records, this relationship will likely slow things down considerably.

Here is another example: Given the requirement to show accounts and transactions for a selected customer, the following model doesn’t work with out-of-the-box relationships. To understand why, follow the flow of filtered records through the relationships. From Customer to AccountCustomer, the arrow on the relationship line indicates that the filter flows in the right direction. Once the AccountCustomer table has been filtered, the Account table will not be filtered as we try to swim upstream because that relationships doesn’t naturally flow from the many-side to the one-side.

Changing the Account-AccountCustomer relationship filter to bi-directional fixes the problem and meets the reporting requirement.

As I mentioned, the bi-directional filter introduces a potential performance penalty. However, if the cardinality is low and the memory footprint of the data model is low and lean (using efficient key values and removing unnecessary columns), this design may perform just fine and you are good-to-go. I’ll demonstrate more creative techniques in part two of this two-part topic, as we explore more advanced data modeling design techniques.

Continued in Part 2

Composite Models Gen 2 and DirectQuery over Power BI Datasets

My wife bought me a nice telescope for Christmas. I’ve wanted one since I was a kid. We opened gifts early because we’re going to take the RV out in the dessert to watch the great convergence of Jupiter and Saturn next week. But, what am I doing now, after opening my gift? …updating Power BI Desktop to check-out the new Composite Model feature! This new feature of Power BI has been in the works for several years and we have been anxiously waiting for this announcement. It is truly the Utopian dream; the Holy Grail!

It is hard to articulate the impact that this capability can have for businesses. The new composite model gen 2 feature is a monumental breakthrough that will significantly raise the bar for Power BI model development and could change the way we think about designing enterprise BI solutions – but customers will need guidance to use it correctly. If you haven’t been sitting on the edge of your seat waiting for this announcement, read the announcement and understand how significant this is.

Power BI December 2020 Feature Summary | Microsoft Power BI Blog | Microsoft Power BI

There are been many attempts by Microsoft and other vendors to create a data modelling architecture that provides for fast access to cached data, direct access to live data and scaled-out connections to established data models. Remember ROLAP and HOLAP storage in multidimensional cubes? These were great concepts with significant trade-off limitations. No other vendor has anything like this. Way back in the day, Microsoft jumped on the Ralph Kimball bandwagon to promote the idea that a company should have a “one version of the truth” exposed through their data warehouse and cubes or semantic data models. They met customer demand and gave us a BI tool that, in order to bring data together from multiple sources, makes it easy to create a lot of data silos. Arguably, there are design patterns to minimize data duplication but to use governed datasets, self-service report designers are limited to connecting to large, central models that might only be authored and managed by IT. This new feature can restore balance to the force and bring us back to “one version of the truth” again.

To put this into perspective, here’s a quote from Alberto Ferrari’s recent article: “It is not easy to find a name that accurately describes the massive step forward that is this feature in the evolution of Business Intelligence solutions. It would fully deserve a hyperbolic name, but the marketing departments already overused so many terms over the years: “universe” was already used in the past century; you can imagine what happened after that. We like “composite models” because it is a name that concisely describes the purpose of this new feature.”

Alberto has a great video emphasizing the magnitude of this release: New composite models in Power BI: A milestone in Business Intelligence – SQLBI

Those of us who live a breathe BI will continue to have thoughts, opinions and eventually will have war stories and prescriptive guidance to offer about using this new capability most effectively.

Upon initial review, it seems to work exactly as expected. In future posts, I will demonstrate how to design and implement a next-gen composite model and share my experience and thoughts as we move down this new road.

Power BI Expert Resources

Where to go for best practice advice for Power BI: The Data Gods

One of the most common questions I hear from customers and students is “what are your recommended resources for learning and staying current with Power BI?”. I find some people get stuck learning Power BI because they are simply overwhelmed with information and choices. Google search will often raise more questions than answers because there is a lot of noise on the Internet. Expert advice can be found only after dismissing the myriad of posts asking questions, sharing opinions, and sifting through outdated and confusing information. Most information found in discussion forums is outdated because the product changes so often. So, what are the most reliable sources for Power BI expert advice and best practices?

I follow several industry experts and I own many great books. Chris Wagner (KratosBi.com | @KratosBi), Microsoft MVP, has compiled the most comprehensive go-to list of Microsoft data experts called the Data Gods (you can see the complete list here). When I was added to the collective of featured “Data Gods” last week, I was hesitant to brag about. I mean, there’s no modest way to post this picture. The title of “Data God” is a lot to live up to – but it is truly an honor and I want to thank Chris and others for this recognition.

More importantly, this is a concise list of reliable industry experts – which I have paired-down to those who blog and post regularly about their experiences and advice. I know and follow nearly everyone on this list but there are a few folks with whom I have not yet had the pleasure to meet. There are also others featured in the Data Gods collection who make significant contributions in other ways, aside from frequent blogs and articles. I’ve also added a few additional notable community contributors.

Free Best Practice Resources

Following is my selection of members of the Data Gods counsel with links to their free resources. Most of these are blog sites or web sites that include blogs and articles available for free without requiring a sign-in or subscription:

Alberto Ferrari & Marco Russo – SQLBI.com | @marcorus | @FerrariAlberto

For most of us who work day-to-day with Power BI and other Microsoft BI tools, SQLBI.com is the first stop for expert advice. Alberto and Marco have set the gold standard for Microsoft BI best practices and design patterns. The breadth and depth of their expertise is expansive and they are known for going deep into the nuts and bolts that make Power BI work. These guys are are literally an extension of the Microsoft product development teams by guiding Microsoft leadership and evangelizing the platform. Their books on DAX and Tabular model design are considered by the most serious Power BI Developers to be the most comprehensive reference books available.

Patrick Leblanc & Adam Saxton – GuyInACube.com | @GuyInACube | @awsaxton |
@patrickdba

You either know these guys or you’ve never searched the web for Power BI help. Adam started the Guy In A Cube YouTube channel several years ago as a side project while doing BI technical support work for Microsoft. Patrick came from a BI and SQL Server consulting background before joining Microsoft and teaming with Adam. Today they share design techniques and feature highlights in weekly, high-quality short videos and a Saturday morning live streamed Q&A session. Adam and Patrick are members of the Customer Advisory Team (CAT) focused on supporting Power BI enterprise customers. Their weekly Power BI community Roundup video features the most current Power BI blog posts and release notes.

Chris Webb – blog.crossjoin.co.uk | @cwebb_bi

Chris is one of the most prolific, enduring and consistent bloggers in the Microsoft BI industry. His posts and articles go deeper then most, prepared with thought, effort and insight. Before joining the Microsoft Customer Advisory Team (CAT), he ran a successful training and consulting firm, specializing in the Microsoft BI platform. Over the years he was, and continues to be, the go-to expert on SQL Server Analysis Services, MDX, Power Query and DAX; and continues to share his knowledge and advice through his blog.

Matt Allington – Excelerator Blog | @ExceleratorBI

Matt’s main focus is business-centered self-service BI and financial analysis. He is a tried-and-true expert in the Excel and Power BI Desktop space. His training, presentation and writing style is clear, authoritive and to-the-point. He has the best collection of intro and intermediate level books on Power BI and DAX; each is short, concise and easy-to-understand. His books, Supercharge Power BI and Learn to Write DAX are excellent way to get started.

Seth Bauer, Mike Carlo & Steve Campbell – PowerBI.Tips | @PowerBITips | @Seth_C_Bauer | @Mike_R_Carlo | @PowerBISteve

PowerBI.tips is an expansive collection of free resources and utilities for Power BI with focus on visual themes, layout and color selection. This dynamic trio covers the important bases: Seth on visual design, Mike on enterprise solutions and Steve focusing on application development. They have an array of external tools for things like connection management and project documentation.

Melissa Coates – CoatesDataStrategies.com | @SqlChick

Melissa is one of the deep thinkers in the industry, always asking the right questions and challenging the status quo. She’s also one of the most pleasant people I’ve ever met. Her blog posts are thorough and insightful. Melissa has co-authored multiple whitepapers for Microsoft on data governance, enterprise deployment and migration.

Leila Etaati & Reza Rad – Radacad.com | @leila_etaati | @Rad_Reza

Leila and Reza are known as one of the power couples in the industry. With nearly limitless energy, they have traveled from New Zealand over the years to present at more conference events than I can count. Leila is one of the BI industry’s thought leaders in the machine learning and AI space, with education and credentials behind her experience. Reza’s expertise with Power Query and Power BI are impressive. They freely share their knowledge through their blogs, articles, training classes and have published numerous books.

Kasper de Jonge – KasperOnBI.com | @Kjonge

Kasper is a Principal Program Manager on the Power BI product team and based in The Netherlands. A regular conference session headliner, Kasper is very tuned into customers’ real-world enterprise BI reporting problems. He has been blogging for years and has recently launched a YouTube channel.

Alice Drummond – DiscoverEI.com | @AliceDiscoverEI

With an emphasis on environmental data analysis, one of the things I appreciate about this Alice’s active blog series is that with each post, she shares a comprehensive collection of blog sites and training resources to help readers find supportive information and expand their knowledge.

David Eldersveld – DataVeld.com | @DataVeld

If David can’t figure it out, it probably can’t be done. Over the years, he has pushed the limits, discovered and developed some very unique and interesting techniques to integrate things like graphics and geospatial data with Power BI. David is very active in the community and shares different ways to take Power BI beyond the out-of-the box features to do things that many of us had no idea was even possible.

Brian Grant – CSGPro.com | @BrianGrantBI

Brian has a unique intellect and a passion for problem-solving and sharing knowledge, particularly related to Power Query and DAX. I’ve had the pleasure of collaborating with Brian on multiple occasions in conference presentations and training events. He and his colleagues have developed an interesting method to understand the mechanics of the DAX language that he shares in a video series called the Elements of DAX.

Reid Havens – HavensConsulting.net | @HavensBI

In a few short years, Reid has risen to all-star status by tirelessly designing and sharing data visualization techniques. He is a master of spit-and-polish presentation and report design.

Meagan LongoriaDataSavvy.me | @MMarie

Meagan has broad expertise but her focus is designing reports for accessibility and accurate data presentation. Her posts and conference presentations are very insightful. She challenges the status quo and shares specific techniques to provide accessible report designs.

Matthew Roche – BIPolar.com | @SQLAllFather

Matthew is a member of the Microsoft Power BI Customer Advisory Team (CAT). He has considerable experience with several Microsoft development initiatives that have culminated in leading products and Azure services that are used to create enterprise data solutions. He is an industry thought leader who is very active in the online community. He shares his insights, leadership and expertise through his blog and industry conference events, most notably related to modern data governance and Power BI dataflows.

Gil Raviv | DataChant.com | @GilRa

Author of Collect, Combine & Transform Data in Excel using Power Query and Power BI, and excellent reference book. Gil maintains an active blog.

SqlServerBi.com | @Paul_Turley

I add my own blog to this resources list where I share my thoughts about best practice design and development approaches in this series: Doing Power BI The Right Way.

Others

Gilbert “Q” Quevauvilliers – FourMoo.com | @GilbertQue

From has vast experience consulting with IT and business consulting customers, Gilbert shares relevant examples and lessons learned. He is an active blogger and community supporter.

Imka Feldman – TheBiAccountant.com | @TheBIccountant

Deep, business-focused, creative problem-solving techniques using Power Query/M in Power BI and Excel.

Ruth Pozuelo Martinez – Curbal.com | @CurbalEN

Ruth puts tremendous energy and creativity into her video presentations which she posts frequently. She covers everything from intro to moderately advanced level Power BI design with complete and thorough examples, all presented in a stunning greenscreen format.

Parker Stevens – BiElite.com | @PowerBIElite

Parker’s blog site is an extension of his “BI Elite” YouTube channel, with numerous short tutorials and excellent “how-to” examples.

James Serra | JamesSerra.com

Thorough and consistent, James shares announcements and guidance about the Microsoft data platform and data warehouse architecture.

Books

Beginner
Intermediate/Advanced

Doing Power BI the Right Way: 7. Planning for separation – data models and reports

Part of the the series: Doing Power BI the Right Way (link)

Back in the day before Power BI came along, when we created BI solutions, reports and data models were separate. If you created a cube or Tabular model with Analysis Services, it was developed with Visual Studio and deployed to a server. Reports could be authored and deployed to the report server, separately. Now, with Power BI Desktop, you author your data model and report in the same development space and then deploy the whole kit and kaboodle to the service at once.

The Thick and Thin of Reports

Power BI reports generally come in two varieties. If you open up Power BI Desktop, choose “Get Data”, build a model and a report; all of that stuff ends up in one self-contained PBIX file. Conversely, a report using a live connection to a published dataset is often called a “thin report” because it contains no data or data model. Thin reports offer several advantages: not only separation of development effort but also centralized control and the ability to share published data models with multiple report authors.

Even when a single PIBIX file contains everything, Power BI actually separates the report from the data model (called a dataset in the service) and gives them both the same name. This is very convenient for self-service projects because it is quick and easy to make changes to the data model, queries and measures if you need to make a report enhancement. This is all well and good for small, one developer projects but what about those larger scale solutions where the data model and reports are developed by different folks, or when multiple reports are connected to the same data model?

At what point does it make sense to separate the data model and reports into separate files?

I believe that managing reports and the dataset separately is a best practice in enterprise solutions. However, like a lot of “proper” practices in the IT data world, it is far less convenient to “do it right”. In each project, we must exercise an appropriate degree of discipline. I’ve worked on projects that were so laise fair and lacking management structure that they produced a tangled web of garbage. I’ve also worked on projects managed with strict adherence to rules and restrictive application development methodologies; where report and BI developers had no freedom to be creative; and thus producing bland and useless reports. It is hard to prescribe a set of rules that always works for every project with the right dose of balance. A certain degree of freedom is necessary to learn-as-we-go and produce masterful analytic reports. At the same time, the process requires a well-defined plan, time-boxing and prioritizing of features.

When I started writing this post, I was working on a consulting project that started out with the dataset and report in two separate files. Under that plan, one developer could work on the data model and publish it to the DEV/QA workspace and then the other developer could design reports as new modeling features become available. In theory, this is typically the right approach. The requirements for this project expanded and we found ourselves in a rapid iteration cycle, needing to make model, measures and report changes. As a result, I brought the report pages back into the dataset PBIX file. There is no elegant way to do this but it wasn’t difficult to create new pages and then copy and paste the visuals from the separate report file to the combined dataset/report file. Bookmarks, drill-through and tooltip pages had to be rewired but it wasn’t a difficult task. Kasper De Jonge walks through the process in this blog post from May, 2020.

In cases where you have crisp requirements and a clear division of labor; you can start with two files to delineate the data model and the report. In cases where you are actively prototyping and working alone, developing a single PBIX file containing all the parts will allow you to move quickly. When it becomes more important to lock things down and manage releases than to keep iterating, move report pages to a separate PBIX file that is connected to the published dataset in the service.

Hot Swap Live Connections

I’ve stared using a new external tool that promises to make this process much easier. The Hot Swap tool is a utility developed by Steve Campbell and available from PowerBI.tips as part of the Business Ops externals tools package.

This simple tool allows you to start a project with separate data model and report files and then to swap the live connection between copies of the model: either a dataset published to the Power BI Service or to a PBIX file on the local desktop using a local TCP port. Hot Swap is a community-developed external tool that is not officially supported by Microsoft but my experience in testing the tool thus far has been very good. I used an earlier solution from Steve in a project about a year ago that worked-out quite well. It was a set of PowerShell scripts that rewired connects in much the same way. Having this capability in a simple pop-up dialog is an even more convenient option. You can check out this demo of the Hot Swap tool in the PowerBI.tips YouTube channel.

One of the tings I like about starting with two files is that we don’t run the risk of design issues left-over after moving report pages and visuals to another file.

The new approach is simple: begin your project with separate data model and report files. You can publish an early version of the data model file to a development workspace in the service and connect to establish a live-connected report. Then, run the Hot Swap tool and switch the connection to a local copy of the your data model to continue iterative development. Changes to the local data model PBIX file/Power BI Desktop instance are immediately available in the report PBIX. Publish changes whenever you need to and then you can use Hot Swap to remove the local connection. You don’t actually use the tool to perform the reconnect because Power BI Desktop allows you to use the normal Get Data options after a live connection has been removed from the report file. Regardless, it is a simple process.

Recommendations

For small, informal projects developed by one person (or one developer at a time), keeping the data model and report in one file is convenient and easier to make changes as you go. If you later decide to share the dataset for others to create additional reports, or to promote or certify the dataset, you should separate the dataset file and create a “thin” report using a live connection to the published datset.

For moderate-scale and formalized Power BI projects, begin with two separate PBIX files or make it a point to separate the data model from the report file before you deploy to the Power BI service. Consider using the Hot Swap tool to help manage local mode development.

Doing Power BI the Right Way: 6. Validating data model results – Part 2

Part of the the series: Doing Power BI the Right Way (link)

Part 1 is here

Validation 301: Creating a Validation Report Solution, Part 2

We often have users of a business intelligence solution tell us that they have found a discrepancy between the numbers in a Power BI report and a report produced by their line-of-business (LOB) system, which they believe to be the correct information.

Using the LOB reports as a data source for Power BI is usually not ideal because at best, we would only reproduce the same results in a different report. We typically connect to raw data sources and transform that detail data, along with other data sources with historical information to analyze trends, comparisons and ratios to produce more insightful reports.

However, if the LOB reports are really the north star for data validation, these can provide an effective means to certify that a BI semantic model and analytic reports are correct and reliable.

This is a pattern that I have applied for consulting customers. My current client refers to the main Power BI report and dataset as their “dashboard” solution. The data model is large and complex with sophisticated reports used monthly by hundreds of users. The dataset is setup with a scheduled refresh. The “dashboard” solution has been thoroughly tested and we are careful to only apply updates and enhancements infrequently under a rigorous application lifecycle management process.

At the end of each month, a user exports their LOB report to an Excel file and drops it into a SharePoint library. I have one consulting client who use an automation tool to essentially open the LOB application, punch in the right keystrokes and export the report to a file at the end of every month.

The format of these exported reports have proven to be fragile and we don’t want to risk destabilizing the main Power BI “dashboard” solution. So, rather than importing the validation files directly into the main dataset, I have created a separate Power BI project with a dataset & report that has two sources. The first data source for the validation report connects to the main “dashboard” Power BI dataset using the Analysis Services connector and the XMLA endpoint. A DAX query is issued to return values to compare with the LOB report totals. The second data source for the validation report imports the validation files that were exported from the LOB report. If that import process were to fail for any reason, it doesn’t affect the main dashboard solution.

This diagram depicts the validation solution at a high level. The main Power BI “dashboard” solution get its data from the corporate EDW. That data is processed through rigorous ETL pipelines and contains data extracted from multiple systems used by the corporation.

A lightweight DAX query imports summarized results from the main dataset into the validation dataset.

Before we go any further, I just want to make the point that in most normal Power BI data models, I would never import data from another published Power BI or SSAS data model. That’s just a bad idea and can make it very difficult to trace data lineage. Do not use DAX queries as a data source for your regular report models. This validation report is a rare and appropriate exception to the rule.

The second data source for the validation solution is a set of files exported from reports produced by the LOB system. These reports only contain summary information but can be compared to totals from our Power BI solution. Values from the two sources are compared and the validation report shows whether they match or fail validation.

Just a note about the decision to create a separate dataset. There were two reasons for this:
1. We could not control the format of the validation files. If the file format were to change, it would cause an error and stall our scheduled refresh cycle for a large mission-critical report used by several hundred high-level employees (whose job is it to tell the CEO why his dashboard doesn’t work?)
2. Today we do not have the ability to use a live connection to a Power BI dataset in a composite model along with other sources, but we will in the near future. This capability doesn’t address problem #1 but will certainly change the way I think about architecting large, multi-source models in the future.

Writing the DAX query to read the summary values from the main Power BI “dashboard” dataset was a great learning experience. Although I am now pretty good at writing DAX queries by hand after a couple of these projects, the easier way to get started is to allow Power BI Desktop to generate the query for you. Here are the steps using the ContosoDW dataset as an example:

  1. Add a table visual to the report canvas. Add the same field and measure used in the LOB report to be used for comparison.
  2. Turn off totals for the table visual.
  3. Add and set slicers or filters to equate the LOB report results.
  4. On the View menu in Power BI Desktop enable the Performance Analyzer.
  5. In the Performance Analyzer pane display to the right side of the report canvas, click the Refresh visuals link.
  6. Click the plus sign next to the table visuals to expand the performance counters for that visual.
  7. Click the Copy query link.

The DAX query is now in the Windows clipboard and can be used to run the same query that the designer produces to populate the table visual. To test and modify the query, do the following.

  1. Make sure you have the latest version of DAX Studio installed. If not, install it from daxstudio.org. Save your work, close and reopen Power BI Desktop after installing DAX Studio.
  2. From the External Tools menu in Power BI Desktop, click DAX Studio. This opens DAX Studio with a query session connected to the local Power BI dataset.
  3. Paste the clipboard contents to the query window and run the query.

Here’s the query that was generated by Power BI:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Product'[Category Name], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Product'[Category Name]

This code can be simplified for our needs. The essential code is the expression using the SUMMARIZECOLUMNS function. You can disregard everything after that. Make the following changes to simplify the DAX query code:

  1. Delete everything after the closing parenthesis for the SUMMARIZECOLUMNS function.
  2. Remove the variable line just before the SUMMARIZECOLUMNS function and replace it with “EVALUATE”

The resulting query code should look like this:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])
EVALUATE
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

This is the query I would use to populate a new table in my validation dataset. Just use Get Data to open the Power Query editor, create a new SQL Server Analysis Services connection, paste the workspace path for a premium enabled workspace and connect to the published dataset for my main Power BI solution. Then finally, paste this query code into the query window.

Next step is to import the validation file(s) that were exported from the LOB report. In our validation solution, I used the SharePoint connector and imported all files with a predefined file name prefix. After the validation files are imported, make sure that the column data types are compatible, load the table and wire-up a relationship between the two tables. In our last validation solution, I created a new measure that compares a measure from the summary table imported from the main “dashboard” solution dataset with a comparable measure from the validation file source table. Measures calculated the difference and the % difference between the two. I then used a KPI indicator icon in the table visual using conditional formatting to show if the differences were within an acceptable range.

This screen capture from the validation page has the real data blurred out but you can see the grouping and comparisons resulting in either green or red indicators.

Each month, I receive an email from our user who is responsible for the report validation. He tells me that at a quick glance he is able to verify that the totals from their line-of-business application and Power BI solution match, and that the executive leaders and financial analysts who use these reports can be certain that they are accurate and reliable. When an item doesn’t match, they can pinpoint the account in question and usually are able to track it down to a data refresh timing issue or a mistake made in their source system that hasn’t yet been reconciled.

We are now setting up alerts using KPI tiles on a dashboard that will message key users when validation either succeeds or fails in key areas. As a result, the confidence in this solution has increased significantly and business leaders trust the information they get from these reports.