Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)

December 31, 2020March 22, 2021

Part of the the series: Doing Power BI the Right Way (link)

Data modeling 201-301 – continued from part 1

So far, you’ve seen that the essential components of a data model include tables related to each other. Filtering records in one table prorogates the filter to related tables(s), causing records in the related table to be filtered. This dimensional model schema is the foundation of the reporting solution. Once the fact and dimension tables are in-place, you can create more advanced solutions by working outside of this standard relationship pattern.

Using Disconnected Tables

Disconnected tables are useful in a variety of ways, allowing a user to select a value that drives the behavior of a dynamic measure, or by grouping values and performing some kind of dynamic calculation. Let’s focus on one of the disconnected tables shown in this figure, the Selected Measure Period table:

This table was created using the Enter Data feature and includes only one column named Period, which includes the following values:

Current MTD YTD Same Per LY PY

The exact text for these values isn’t really important because I have written specific DAX code to work with them.

The purpose of this post is to demonstrate effective data modeling and not DAX fundamentals but one cannot exist without the other. For our purposes, my model in this example contains five measures that calculate the Store Sales Qty with different time series variations. The Store Sales Qty measure simply returns the Sum of the Quantity field and the Store Sales Qty MTD calculates the month-to-date value for a given date. I’ve created another measure named Store Sales Qty Selected Period which is shown below. When a value in this table is selected, this measure returns the results of one of the other measures.

The following three images show cases where the Store Sales Qty Selected Period measure is used as the Value property of a matrix visual. Each row represents a date from the Order Date table. The slicer above the matrix allows users to select from one or more Period values from the Selected Measure Period table. By selecting only the value Current, the measure returns the Stores Sales Qty measure value. By selecting YTD the Store Sales Qty YTD measure value is returned.

The third example adds the Period field to the matrix Column group. For all values chosen in the slicer, a Period value is displayed in the column heading and the corresponding measure value is displayed in that column.

Here’s another example used to calculate the daily Orders Pending Delivery. The column chart below shows the daily number of orders where an order has been placed but not yet delivered. The dates on the X axis of the chart are from a disconnected table named Select Effective Date.

The logic for the Online Orders Pending Delivery measure depicted by each chart column is the number of orders where the Order Date is less then or equal to the selected date and the Delivery Date is greater than the selected date, or a delivery dat doesn’t yet exist. Here is the measure definition:

Once again, the disconnected table helps us work-around the normal constraints of related tables. The theme here is to use relationships to build the foundation and use them for primary reporting functionality. Then, use disconnected tables to take the solution to the next level with dynamic calculations and creative problem-solving.

The Answer to Life, the Universe and Everything: 42

Every data model should be designed to fulfill specific reporting requirements. This can be challenging when creating larger and more complex models. This isn’t to say that we can’t design utility models to answer a lot of different business questions but it can be difficult to create the utopian data model than can answer every question. In years past, I’ve seen attempts to create something like a “master cube” that contains nearly every table and field in the data warehouse, in an attempt to let users get insight about anything they could possibly imagine. There is a balance here, which you will see in the following section.

Relationships Don’t Always Work Out

The filter propagation from any one table in the data model to any other table can only use one set of relationships. The problem can be illustrated in the question: How do we determine the geographic location for Online Sales orders? Take a look at the following model diagram: Customers have a geographic location as evidenced by the GeographyKey in the Customer table, which is related to the Geography table using a column with the same name. Easy answer.

What about Store Sales? The Store table has a GeographyKey but there is currently no relationship from Geography to Store. That should be easy to resolve, right? Let’s just create a relationship from Geography to Store by dragging the GeographyKey column…

What the crap, Man! Why can’t we create the relationship? The warning message says “because this would create ambiguity between tables Geography and Online Sales”. Yep, sure enough, if that relationship were active, the data model would have two different paths between Geography and Online Sales and wouldn’t be able to decide whether to follow the filter through the Customer table or the Store table. This is a little bit like Google Maps selecting one of multiple routes between two locations. It might tell you that you can take an alternate route and add 15 minutes to your drive but you can only take one route or the other.

So, how do we deal with this problem? …through the magic of DAX, of course. We don’t need no stinking relationships! Rather than relying on natural filter propagation through a relationship that we can’t implement, we create a measure that implements a pseudo relationship using the DAX TREATAS function. Here, the Store Sales Qty measure is explicitly taking all the selected GeographyKey values in the Geography table and creating an in-memory table using the VALUES function. The TREATAS function is saying “take this table of values and treat it as if it were the GeographyKey in the Store table”, effectively filtering Store records using the selected Geography records.

The result is simply that Store Sales Qty can be grouped and filtered by any field in the Geography table.

I have a unique appreciation for the TREATAS function which was introduced late, after we began to embrace DAX in Power BI, SSAS Tabular and Power Pivot. I was in the Community Zone at the PASS Summit in Seattle some five years ago. Someone asked for help with a similar problem which was a challenging but interesting puzzle. Chris Webb happened to be passing by so I dragged him into the conversation. We spent well over an hour hammering out DAX code on a laptop before he presented a solution consisting of CROSSJOIN, FILTERTABLE and SUMMARIZE functions. I don’t remember all the details but it was complicated. Chris has avoided me ever since (I’m kidding… Chris is a super nice guy, enjoys a challenge and always willing to help).

The underpinnings of many DAX functions are a sea of internal logic and code meant to solve specific business problems. Spending time with experts like Chris, Marco Russo and Jeffery Wang (who architected the DAX language) have helped to to appreciate the immense power yet simplicity that this language offers.

Who Needs Relationships Anyway?

One more example. This one is just for fun and the point is to underscore that although a data model with properly-formed relationships is very important, this should not be the limiting factor in a project design. Take a quick look at the following Power BI report page. What do you see?

There are several report visuals that group the Online Sales Qty measure and time-series variations of this measure by different dimension attributes. You would expect there to be a rich data model with several related tables to support all this functionality, right?

Here is the diagram view of this data model. It contains only one relationship and even that could have been eliminated to further make the point.

How is this possible? Isn’t the whole point of a data model to make this type of data visualization and analysis possible? What’s going on here? The Online Sales Qty measure apply the same DAX code pattern that I showed you before, using the TREATAS function to build virtual relationships on the fly:

How does this measure perform compared with a conventional measure in a “proper” data model? With a few hundred thousand fact records, it performs just fine but in a more complex data model with a larger volume of data, this might not be the best approach. The primary reason that performance could degrade is that the measure code causes queries to use the tabular formula engine rather than the storage engine, which is typically faster. My point with this demonstration is that you can use similar techniques to do just about anything you can imagine.

Conclusion

Start with a well-crafted dimensional data model built using best practice design patterns and rules. Use exceptions when needed to address specific requirements – in the form of flattened tables and master-child relationships. Then you can graduate to using special cases like no relationships at all, to address specific use cases and reporting needs. Start with the basics, adhere to the rules and then make exceptions as needed.

As usual, I had a few more things in-mind to cover when I started this post that will be deferred to later posts, to be covered in greater depth. These include:

The current state and future of Power BI data model design
DirectQuery and Gen 1 composite models
DirectQuery over Power BI datasets & SSAS models (Gen 2 composite models)
Using External Tools for advanced data model design
When and how to use calculation groups to standardize measure variations

Supporting Resources

Star Schemas
https://www.sqlbi.com/articles/the-importance-of-star-schemas-in-power-bi/https://docs.microsoft.com/en-us/power-bi/guidance/star-schema

Many to Many Relationships
https://docs.microsoft.com/en-us/power-bi/guidance/relationships-many-to-many

Model Optimization
https://docs.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction

DirectQuery Guidance
https://docs.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance

Paul Turley

Microsoft Data Platform MVP, Principal Consultant for 3Cloud Solutions Specializing in Business Intelligence, SQL Server solutions, Power BI, Analysis Services & Reporting Services.

Power BI Direct Lake and DirectQuery in the Age of Fabric

I just returned from the Microsoft Fabric Community Conference in Las Vegas. Over 4,000 attendees saw a lot of demos showing how to effortlessly build a modern data platform with petabytes of data in One Lake, and then ask CoPilot to generate beautiful Power BI reports from semantic models that magically appear from data in a Fabric Lakehouse. Is Direct Lake the silver bullet solution that will finally deliver incredibly fast analytic reporting over huge volumes of data in any form, in real time? Will Direct Lake models replace Import model and solve the dreaded DirectQuery mode performance problems of the past? The answer is No, but Direct Lake can break some barriers. This post is a continuation of my previous post titled “Moving from Power BI to Microsoft fabric”.

Direct Lake is a new semantic model storage mode introduced in Microsoft Fabric, available to enterprise customers using Power BI Premium and Fabric capacities. It is an extension of the Analysis Services Vertipaq in-memory analytic engine that reads data directly from the Delta-parquet structured storage files in a Fabric lakehouse or warehouse.

Moving from Power BI to Microsoft Fabric

Fabric is here but what does that mean if you are using Power BI? What do you need to know and what, if anything will you need to change if you are a Power BI report designer, developer or BI solution architect? What parts of Fabric should you use now and how do you plan for the near-term future? As I write this in March of 2024, I’m at the Microsoft MVP Summit at the Microsoft campus in Redmond, Washington this week learning about what the product teams will be working on over the next year or so. Fabric is center stage in every conversation and session. To say that Fabric has moved my cheese would be a gross understatement. I’ve been working with data and reporting solutions for about 30 years and have seen many products come and go. Everything I knew about working with databases, data warehouses, transforming and reporting on data has changed recently BUT it doesn’t mean that everyone using Power BI must stop what they are doing and adapt to these changes. The core product is unchanged. Power BI still works as it always has.

The introduction of Microsoft Fabric in various preview releases over the past two years have immersed me into the world of Spark, Python, parquet-Delta storage, lakehouses and medallion data warehouse architectures. These technologies, significantly different from the SQL Server suite of products I’ve known and loved for the past twenty years, represent a major shift in direction, forming the backbone of OneLake; Microsoft’s universal integrated data platform that hosts all the components comprising Fabric. They built all of Fabric on top of the existing Power BI service, so all of the data workloads live inside familiar workspaces, accessible through the Power BI web-based portal (now called the Fabric portal).

CI/CD & DevOps for Power BI… Are We There Yet?

In my view, projects and teams of different sizes have different needs. I described DevOps maturity as a pyramid, where most projects don’t require a sophisticated DevOps implementation, and the most complex solutions do. The DevOps maturity is a progression, but only for projects of a certain scale. One of the following options might simply be the best fit for a particular project.
Unless you are throwing together a simple Power BI report that you don’t plan to maintain and add features to, the first and most basic managed project should start with a PBIX file or Power BI Project folder stored in a shared and cloud-backed storage location.
DevOps isn’t a requirement for all projects, but version control and shared file storage definitely is.

5 thoughts on “Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)”

Rick Ryan says:

July 14, 2022 at 5:08 pm

Thanks for your informative article – nice job!

Loading...

Reply
Pingback: Non-Kimball Relationships in Power BI – Curated SQL
Lars Nordin says:

January 1, 2021 at 1:23 pm

At my company we created our version of a master cube using a variant of a “link table” model, meaning all relationships are pre-defined as sets of keys in a link table. In Power BI all tables are then just related to the link table. Sure, the size of the model becomes about twice that of a star schema model (due to cardinality in link table) but the link table design saves us from all the issues you normally get using a star schema (ambiguity etc). Our model has multiple facts and dimensions and a couple of millions of rows, and performs good on a Pro-license. So, what’s my point? I’m looking for someone to challenge our approach. Everywhere I read about star schemas being the holy grail for BI, but a see so many drawbacks with star schemas I’m surprised about its hype. But on the other hand no one writes anything about link table models (expect for Qlik), so I’m guessing there must be some major drawback I haven’t figured out yet 🙂

Loading...

Reply
Brian Custer says:

December 31, 2020 at 12:31 pm

I agree with your take on using TreatAs when necessary. A good data model is the place to start but then when needed peppering in a few good DAX calcs can make all the difference.

Loading...

Reply
Pingback: Doing Power BI the Right Way: 6. Data Modeling Essentials & Best Practices (1 of 2) | Paul Turley's SQL Server BI Blog

Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)

Using Disconnected Tables

The Answer to Life, the Universe and Everything: 42

Relationships Don’t Always Work Out

Who Needs Relationships Anyway?

Conclusion

Supporting Resources

Like this:

Paul Turley

5 thoughts on “Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)”

Leave a ReplyCancel reply

Using Disconnected Tables

The Answer to Life, the Universe and Everything: 42

Relationships Don’t Always Work Out

Who Needs Relationships Anyway?

Conclusion

Supporting Resources

Share this:

Like this:

Paul Turley

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

5 thoughts on “Doing Power BI the Right Way: 5. Data Modeling Essentials & Best Practices (2 of 2)”

Leave a ReplyCancel reply

Discover more from Paul Turley's SQL Server BI Blog