There is no secret about this. If you do any legitimate research about Power BI (reading blogs, books or training from reliable sources), you will quickly learn that a lot of basic functionality requires a dimensional model, aka “Star Schema”. This is a hard fact that every expert promotes, and self-taught data analysts either have learned or will learn through experience. So, if everyone agrees on this point, why do so many resist this advice?
Perspective is everything. I didn’t understand why getting to the star schema was so out of reach so often until I was able to see it from another perspective. There are a few common scenarios that draw source data into different directions than an ideal dimensional model.
A common scenario I have seen a number of times is when a new Power BI developer or consultant, attempting to follow best practices, comes to the conclusion that shaping data before it lands in Power BI would be the best bet; they ask their client or Database Administrator or Data Engineer to take on the responsibility of shaping the data for them. This creates a dependency to define fact and dimension tables or views in the database. If they had the time, BI pattern skills and inclination to create a dimensional model, the customer/project sponsor probably wouldn’t be asking for expert help. The Data Engineer writes a big query and the BI developer shoe-horns a flat table into Power BI, citing that the customer made the decision not to follow best practice design. I hear new consultants say this all the time: “the customer made me do it that way”, but that’s not a good answer.
Sure, data should be prepared and transformed upstream as far as possible, but when this isn’t possible (at least not in the first project iteration), move on to plan B and transform the data in another layer downstream. If the data source isn’t already organized into dimension and fact tables, you can reshape rows and columns into a star schema using Power Query using a series of transformation steps. This may not be the most ideal solution if you need to import a large volume of data in a scalable solution, but it will help you get to a correct star schema data model and work-around the barrier.
Other intermediate design choices might include using Power BI dataflows or datamarts – but the fundamental choice still remains: to shape the data before it lands in the Power BI solution or after. The essential design patterns are still the same.
Another common argument for not following the dimensional rules is that the initial report requirements may not call for having the ability to slice or group by a certain column value that would normally be contained in a conformed dimension table. Maybe the old report just showed all of these columns in a long list of records displayed in the table visual. The old report used a single SQL query, which might work just fine if you want to reproduce the old static report in Power BI. But, if you evolve the new report to include time-series measures and modern visualizations like charts, slicers and drillthrough actions; you will need (say it with me) – a dimensional model. in my experience, nearly every “fit for purpose” report will eventually outgrow a flat query model when new requirements are added. So, you might as well do it right the first time.
Project evolution is often a progression from Good design to Better design and then, in cases where we have the luxury to do things right… on to Best design. This is the classic balance of Time, Features and Money. With all the time and money in the world, you can have all the features you want; but constraining one, also constrains another. Having the perfect solution is great but having a working solution is better than having no solution at all.
Bottom line: creating a dimensional model should be your first order of priority. If you can’t manage that in the source database, data warehouse or data mart; craft the fact and dimension tables in Power Query – and follow other best practices that will support extensibility (see this series of posts). If the Power BI solution is mission critical and needs to be futureproof, plan to optimize the design in a later project phase and move to a more scalable approach as needed.
The Dimensional Dilemma
Let’s try a quick test (I didn’t practice this, by the way). I’ll do a quick Google search:
The resulting top video results are:
A two & half hour course from Pragmatic Works on dimensional modeling, Two YouTube videos about star schema model design, A Guy In A Cube video from Patrick LeBlanc titled “Data modeling best practices – Part 1” where he introduces dimensional concepts
The page link results are similar and include an EnterpriseDNA course, a Udemy course and a link to my blog post titled “Doing Power BI the Right Way: 5. Data Modeling Essentials”
Point made: Power BI needs a star schema.
5 thoughts on “Power BI and the Dimensional Dilemma”
Very nice blog
Enhance and explore your skills, also prepare for the Power BI interview questions and answers that will help to give a confidant to face the interview.
What I’d like to see is Power BI bloggers advocating and demonstrating the use of the star schema approach without relying on Adventure Works or similarly “ordinary” data to do so. Organisations with the common and basic data model that involves Customers, Products, Sales, Invoices leading to the usual measures of “Revenue”, “Profit”, “Profit %”, “CoGs” etc are ‘easy’ to star schema. I want to see less ordinary organisations’ data being manipulated and analysed. I want to learn but I need other examples to be led by!
I believe in this advice if row level security isn’t a consideration and have applied it at previous jobs. We use SAP HANA in directquery mode since HANA is where we house all our RLS rules. I get that PBI has RLS as well, but it is cumbersome to maintain in my experience, and we don’t want to maintain it in two places. We do have some date, material, vendor, region etc dim tables we bring in when needed, but most the time we make a decent size fact table that the report authors use to build their reports.
I agree that compared to flat tables star schemas are a better option in 99% of the cases. But in many of the models me and my teams put to production throughout the years we’ve chosen a key table model rather than a traditional star schema. Downsides with key table model is performance (you get ~2x cardinality compared to star schema model) but the advantages are huge (my favorite is that you can be flexible on what’s dimensions and facts which is a great feature in e.g. a self-service model where users sometimes are interested in “avg purchase amount for customer age >30” but sometimes instead interested in “avg customer age for purchase amounts >1000”).
Sisense has a fair article on this, but it doesn’t cover everything since there are many interesting variants you can do in a key table model. Would love to hear our experience on this!