Power BI and the Dimensional Dilemma

February 5, 2023February 5, 2023

There is no secret about this. If you do any legitimate research about Power BI (reading blogs, books or training from reliable sources), you will quickly learn that a lot of basic functionality requires a dimensional model, aka “Star Schema”. This is a hard fact that every expert promotes, and self-taught data analysts either have learned or will learn through experience. So, if everyone agrees on this point, why do so many resist this advice?

Perspective is everything. I didn’t understand why getting to the star schema was so out of reach so often until I was able to see it from another perspective. There are a few common scenarios that draw source data into different directions than an ideal dimensional model.

A common scenario I have seen a number of times is when a new Power BI developer or consultant, attempting to follow best practices, comes to the conclusion that shaping data before it lands in Power BI would be the best bet; they ask their client or Database Administrator or Data Engineer to take on the responsibility of shaping the data for them. This creates a dependency to define fact and dimension tables or views in the database. If they had the time, BI pattern skills and inclination to create a dimensional model, the customer/project sponsor probably wouldn’t be asking for expert help. The Data Engineer writes a big query and the BI developer shoe-horns a flat table into Power BI, citing that the customer made the decision not to follow best practice design. I hear new consultants say this all the time: “the customer made me do it that way”, but that’s not a good answer.

Sure, data should be prepared and transformed upstream as far as possible, but when this isn’t possible (at least not in the first project iteration), move on to plan B and transform the data in another layer downstream. If the data source isn’t already organized into dimension and fact tables, you can reshape rows and columns into a star schema using Power Query using a series of transformation steps. This may not be the most ideal solution if you need to import a large volume of data in a scalable solution, but it will help you get to a correct star schema data model and work-around the barrier.

Other intermediate design choices might include using Power BI dataflows or datamarts – but the fundamental choice still remains: to shape the data before it lands in the Power BI solution or after. The essential design patterns are still the same.

Another common argument for not following the dimensional rules is that the initial report requirements may not call for having the ability to slice or group by a certain column value that would normally be contained in a conformed dimension table. Maybe the old report just showed all of these columns in a long list of records displayed in the table visual. The old report used a single SQL query, which might work just fine if you want to reproduce the old static report in Power BI. But, if you evolve the new report to include time-series measures and modern visualizations like charts, slicers and drillthrough actions; you will need (say it with me) – a dimensional model. in my experience, nearly every “fit for purpose” report will eventually outgrow a flat query model when new requirements are added. So, you might as well do it right the first time.

Project evolution is often a progression from Good design to Better design and then, in cases where we have the luxury to do things right… on to Best design. This is the classic balance of Time, Features and Money. With all the time and money in the world, you can have all the features you want; but constraining one, also constrains another. Having the perfect solution is great but having a working solution is better than having no solution at all.

Bottom line: creating a dimensional model should be your first order of priority. If you can’t manage that in the source database, data warehouse or data mart; craft the fact and dimension tables in Power Query – and follow other best practices that will support extensibility (see this series of posts). If the Power BI solution is mission critical and needs to be futureproof, plan to optimize the design in a later project phase and move to a more scalable approach as needed.

The Dimensional Dilemma

Let’s try a quick test (I didn’t practice this, by the way). I’ll do a quick Google search:

The resulting top video results are:

A two & half hour course from Pragmatic Works on dimensional modeling, Two YouTube videos about star schema model design, A Guy In A Cube video from Patrick LeBlanc titled “Data modeling best practices – Part 1” where he introduces dimensional concepts

The page link results are similar and include an EnterpriseDNA course, a Udemy course and a link to my blog post titled “Doing Power BI the Right Way: 5. Data Modeling Essentials”

Point made: Power BI needs a star schema.

Paul Turley

Microsoft Data Platform MVP, Principal Consultant for 3Cloud Solutions Specializing in Business Intelligence, SQL Server solutions, Power BI, Analysis Services & Reporting Services.

Power BI Direct Lake and DirectQuery in the Age of Fabric

I just returned from the Microsoft Fabric Community Conference in Las Vegas. Over 4,000 attendees saw a lot of demos showing how to effortlessly build a modern data platform with petabytes of data in One Lake, and then ask CoPilot to generate beautiful Power BI reports from semantic models that magically appear from data in a Fabric Lakehouse. Is Direct Lake the silver bullet solution that will finally deliver incredibly fast analytic reporting over huge volumes of data in any form, in real time? Will Direct Lake models replace Import model and solve the dreaded DirectQuery mode performance problems of the past? The answer is No, but Direct Lake can break some barriers. This post is a continuation of my previous post titled “Moving from Power BI to Microsoft fabric”.

Direct Lake is a new semantic model storage mode introduced in Microsoft Fabric, available to enterprise customers using Power BI Premium and Fabric capacities. It is an extension of the Analysis Services Vertipaq in-memory analytic engine that reads data directly from the Delta-parquet structured storage files in a Fabric lakehouse or warehouse.

Moving from Power BI to Microsoft Fabric

Fabric is here but what does that mean if you are using Power BI? What do you need to know and what, if anything will you need to change if you are a Power BI report designer, developer or BI solution architect? What parts of Fabric should you use now and how do you plan for the near-term future? As I write this in March of 2024, I’m at the Microsoft MVP Summit at the Microsoft campus in Redmond, Washington this week learning about what the product teams will be working on over the next year or so. Fabric is center stage in every conversation and session. To say that Fabric has moved my cheese would be a gross understatement. I’ve been working with data and reporting solutions for about 30 years and have seen many products come and go. Everything I knew about working with databases, data warehouses, transforming and reporting on data has changed recently BUT it doesn’t mean that everyone using Power BI must stop what they are doing and adapt to these changes. The core product is unchanged. Power BI still works as it always has.

The introduction of Microsoft Fabric in various preview releases over the past two years have immersed me into the world of Spark, Python, parquet-Delta storage, lakehouses and medallion data warehouse architectures. These technologies, significantly different from the SQL Server suite of products I’ve known and loved for the past twenty years, represent a major shift in direction, forming the backbone of OneLake; Microsoft’s universal integrated data platform that hosts all the components comprising Fabric. They built all of Fabric on top of the existing Power BI service, so all of the data workloads live inside familiar workspaces, accessible through the Power BI web-based portal (now called the Fabric portal).

CI/CD & DevOps for Power BI… Are We There Yet?

In my view, projects and teams of different sizes have different needs. I described DevOps maturity as a pyramid, where most projects don’t require a sophisticated DevOps implementation, and the most complex solutions do. The DevOps maturity is a progression, but only for projects of a certain scale. One of the following options might simply be the best fit for a particular project.
Unless you are throwing together a simple Power BI report that you don’t plan to maintain and add features to, the first and most basic managed project should start with a PBIX file or Power BI Project folder stored in a shared and cloud-backed storage location.
DevOps isn’t a requirement for all projects, but version control and shared file storage definitely is.

5 thoughts on “Power BI and the Dimensional Dilemma”

Priyanka says:

February 24, 2023 at 5:41 am

Very nice blog

Enhance and explore your skills, also prepare for the Power BI interview questions and answers that will help to give a confidant to face the interview.

Loading...

Reply
Lee says:

February 8, 2023 at 12:36 am

What I’d like to see is Power BI bloggers advocating and demonstrating the use of the star schema approach without relying on Adventure Works or similarly “ordinary” data to do so. Organisations with the common and basic data model that involves Customers, Products, Sales, Invoices leading to the usual measures of “Revenue”, “Profit”, “Profit %”, “CoGs” etc are ‘easy’ to star schema. I want to see less ordinary organisations’ data being manipulated and analysed. I want to learn but I need other examples to be led by!

Loading...

Reply
Brett Moore says:

February 7, 2023 at 5:47 am

I believe in this advice if row level security isn’t a consideration and have applied it at previous jobs. We use SAP HANA in directquery mode since HANA is where we house all our RLS rules. I get that PBI has RLS as well, but it is cumbersome to maintain in my experience, and we don’t want to maintain it in two places. We do have some date, material, vendor, region etc dim tables we bring in when needed, but most the time we make a decent size fact table that the report authors use to build their reports.

Loading...

Reply
Pingback: The Importance of Star Schemas in Power BI – Curated SQL
Lars says:

February 6, 2023 at 3:02 am

I agree that compared to flat tables star schemas are a better option in 99% of the cases. But in many of the models me and my teams put to production throughout the years we’ve chosen a key table model rather than a traditional star schema. Downsides with key table model is performance (you get ~2x cardinality compared to star schema model) but the advantages are huge (my favorite is that you can be flexible on what’s dimensions and facts which is a great feature in e.g. a self-service model where users sometimes are interested in “avg purchase amount for customer age >30” but sometimes instead interested in “avg customer age for purchase amounts >1000”).

Sisense has a fair article on this, but it doesn’t cover everything since there are many interesting variants you can do in a key table model. Would love to hear our experience on this!

https://support.sisense.com/kb/en/article/key-table-store-data-relationships-between-primary-key-values

Loading...

Reply

Power BI and the Dimensional Dilemma

The Dimensional Dilemma

Like this:

Paul Turley

5 thoughts on “Power BI and the Dimensional Dilemma”

Leave a ReplyCancel reply

The Dimensional Dilemma

Share this:

Like this:

Paul Turley

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

5 thoughts on “Power BI and the Dimensional Dilemma”

Leave a ReplyCancel reply

Discover more from Paul Turley's SQL Server BI Blog