A Developer’s Guide to Creating Bad Power BI Projects – Part 1

It occurred to me that we have put so much effort into promoting best practices and proper design that there is far less information about how to create bad reports and data models. In that light, the purpose of this article is to talk about what to do if you want things to go poorly and make sure your projects fail – if not immediately, then sometime in the future – and if not for you then for whoever inherits the work that you have done.

I welcome your comments and thoughts about the worst practices you have seen in Power BI. Post your ideas in the comments below. What examples have you encountered of sure-fire ways to make projects go badly?

Keep in mind that the rules of “good” and “bad” design might be different for small, informal projects vs larger-scale, formal projects where longevity and scale are more important. Arguably, bad practices for enterprise-scale, multi-user report projects might be just peachy keen for small, informal desktop reports for one user with limited data. This article is written from the perspective of formal, larger-scale projects.

So, you’ve taught yourself how to use Power BI! You’ve Googled some questions and let YouTube show you how to cobble together a little data from different places, wire-up a few tables and drag-and-drop fields to make some attractive visual reports. It wasn’t all that hard to do, right? You’ve demonstrated your new Power BI design chops and now all the managers in the company want more cool charts and reports. Since the first round of reports were so well received, the company bought licenses so everyone in the organization can use Power BI to run their business. Welcome to the big leagues, folks! Just import data from all the corporate systems, desktop Excel files and whatever web feeds you can find. If it’s on the Internet it MUST be right! Munge all this data together and create dashboards to help the executives run the business!

This guide demonstrates common examples of how enterprise Power BI projects are bound to fail using self-taught undisciplined techniques to build larger, enterprise solutions. Most, if not all of the examples in this and the posts that follow, are examples that my colleagues and I encounter when we go in the fix failed Power BI projects.

Bad practice #1: Keep PBIX files on your local computer rather than shared storage

Many Power BI projects evolve through stages of prototypes, so they might create multiple PBIX files. After some trial-and-error, solution developers often save-off multiple files with specially-encoded names including things like the date, their initials and other information each one with different features and techniques. One example might be: FinancialAccountingReport_20210615_BobR_TestDisconnectedSlicer and another one might be FinancialAccountingReport_20210618_BobR_FixedForecastMeasureButNowItRunsSuperSlow. Reminds me of when I was eight. We would write notes with cyphers that only the writer and recipient would understand. There is nothing wrong with making a few backup copies of your work in a convenient place where you will find it. someone has to keep track of it all and recover work when there is a staff change.

Without a deliberate effort to exercise discipline, delete or archive the old files and then store the most current copies in a central storage location that gets backup and synced to the cloud; you are sure to have a mess on your hands. Worse than that, if the project ever needs to be turned over to someone else, they wouldn’t have a clue about how to continue the work. I always ask myself what would happen if I had to hand-over my work to another developer tomorrow? Would they have what they need to continue and would they know where to pick things up? Some transition documentation is good to have in this case, but make sure the master files are in a safe location. Better yet, keep them in a source code repository like Azure DevOps or GitHub, or in OneDrive or SharePoint with automatic file versioning.

Oh, wait… this article is about creating bad projects! Bad practice is to just leave files on your local drive. If you want things to go badly, that’s what you should do. When you leave this mess for someone else, they will have no idea about what you did or how to continue the work.

Bad practice #2: Keep the only copy of a report and dataset published in the online workspace to download when you need it

Today, we’re all about using the cloud to safely store files. This is a good practice. You can store all your PBIX and other important files in a OneDrive folder where they are backed up to the cloud and the files are even versioned so you can revert back to an earlier version if you do something silly and need to recover. However, relying on the option to store your one-and-only copy of a report or data model file in a Power BI workspace is optimistic at best.

There are several conditions that will disable to ability to download a PBIX file. The thing is that enterprise-scale data models need to take advantage of features that are not supported in Power BI Desktop (and probably never will be). This is by design. Desktop is primarily a self-service tool and most of the feature we use in enterprise-scale projects require the model definition file to either be saved to a JSON model definition file (e.g. “Model.BIM”) or to a PBIX file with an internal model version number unusable by Power BI Desktop. I’ve heard from numerous novice developers who were accustomed to publishing their latest PBIX file to the service, only to discover that after enabling some advanced feature; they could no longer download the file.

If you want to shoot yourself in the foot, do not keep a master copy of your model and report files (which you are managing separately, right?) in OneDrive and roll the dice that you will be able to download a copy when you need to edit the model or report in Desktop.

Bad practice #3: Import source data files from your local file system and then publish the report to the service

If your source data files are on your local computer, after you publish the data model to the service as a dataset; you will not be able to refresh the dataset from the service unless you take a few more steps. There are a few options to remedy this situation, such as:

  1. Move the source file folder to a shared OneDrive folder or SharePoint library, and then use that connectors to refresh the dataset from the service.
  2. Move the source file folder to a network share and then setup the enterprise gateway to import and refresh the published dataset from the service.
  3. Install and setup a personal gateway to fresh files from your local computer.

Option #3 is a sure fire way to run into problems down the road. This might work for a small project managed by a single developer. But, if you have source files on your laptop which happens to be asleep or shut off during a scheduled refresh event, that’s not gunna work, and the refresh will fail.

Another common gotcha is that if you import source files that are stored in a OneDrive folder, and use a local syched folder as the source; the connection is to the local file system (like “C:\Users\Paul\OneDrive\…”) and not to the OneDrive service in the cloud. In this case, the data refresh will try to read from the local folder which is inaccessible from the Power BI service.

Bad practice #4: Use hand-written SQL queries to import large database tables

SQL is the tool of choice for most database professionals, so it would stand to reason that you should write a SQL query when connecting to a relational database (like SQL Server or Oracle.) Right? Sure, if you want things to be slow and inefficient. Best practice is to connect your queries in Power BI to read from a database table or view, and not to use SQL statements so that Power Query can employ a method called “query folding”. This means that the Power Query processing steps, where possible, will be translated into SQL or the native query language for the data provider. There are advanced techniques to promote query folding over a SQL statement, but this is the exception to the rule. If you need to write SQL logic to join, filter, group or sort data from multiple tables before it hits Power Query, write a view and store it in the source database.

Simple answer: If you want things to go badly, write SQL queries in your Power Query queries using Power BI Desktop.

Bad practice #5: Use DirectQuery for “real-time” analytics

Power BI includes an awesome capability called “DirectQuery” that enables certain data source engines to handle query operations as user interacts with report visual, rather than using the VertiPaq in-memory analytic engine, to process queries. DirectQuery doesn’t use the VertiPaq storage engine, which is really the heart and soul of Power BI and SSAS Tabular data models. With DQ, records are read live from the source. The downside to DQ is huge. Most interactive report visuals can take several seconds to load, and some DAX functions simply can’t be used due to resource overhead and performance cost.

Database professionals often consider DQ to be a cure-all solution because it enables familiar database technologies. Everything is managed centrally, there is no record duplication in the data model and users see the current state of a record when they interact with reports. That’s all fine and dandy but compared to Import mode, DirectQuery performance can be several orders of magnitude slower.

Import mode is fast because because data sits in memory, compressed by column. DQ is slow because queries must be translated and pushed down to the data store, then results must be sent back. DirectQuery is a powerful feature when used strategically to address a few specific problems. It is a good solution for massive fact tables that would otherwise use a lot of memory. Just filter large tables and don’t do a lot of grouping and aggregation over a large set of records.

In a Composite data model, with most tables using Import mode, and one or a few detail tables using DirectQuery, you can a achieve “best-of-both-worlds” solution by allowing users to drill to details against a filtered set a records.

If you want your visual, analytic reports to be really slow, use DirectQuery for all the tables.

Bad practice #6: Combine source data and load one big table into the Power BI data model

People who approach data problems in Power BI the same way they do in Excel, tend to combine source tables into a large spreadsheet-like table. That approach creates an seemingly easy-to-navigate grid of data with a lot of columns; but you can lose a lot of functionality and create a very inefficient model this way. The VertiPaq, Tabular modeling engine was architected around the premise of “star schema” dimensional models.

It’s not that Power BI won’t work with flattened data, or that you “have to” build a star schema, with separate dimension tables for every attribute! The the natural rules of data behavior just cause Power BI to work better when data is modeled in related fact and dimension tables. There are several DAX functions that expect records to be filtered through a relationship. One of the most common examples is a date dimension table, used to filter a fact table to perform time-series calculations. But that same behavior can exist for Products, Customers, Geography and other filtering or grouping attributes that can be used to filter a related table of numeric business facts.

If you want to struggle with DAX calculations and limit users’ options to interact with your data, pound your data a large, flat table, rather than individual dimension or look-up tables.

9 thoughts on “A Developer’s Guide to Creating Bad Power BI Projects – Part 1

  1. On the subject of direct query mode – you make it sound like a terrible, poor practice choice “if you want your…reports to be really slow..choose direct query mode” – it’s the ONLY choice when you are dealing with very large scale data. We have AAS cubes up 20-30G in size – we simply cannot import all that data.

    • I agree that DirectQuery has its place and is a good, and perhaps necessary choice, for very large fact tables This point may not have been clear in the post. It sounds like you understand the trade-off choice and use cases for DQ. Just a point of clarity, I said “If you want your visual, analytic reports to be really slow, use DirectQuery for ALL THE TABLES.” The point is that DQ should be the exception and not the rule, if you want reports and complex calculations to perform well.

      • So how would a bank, retail giant, manufacturer etc looking at top down summary to near transactional data, hundreds of millions of rows (star or snowflake in the tab model) plan for good performance if direct query mode should be the exception?

    • I am curious… when you say that you have “AAS cubes” that are 20-30 GB in size, this suggests that are are in fact importing data into those data models rather than using DirectQuery. If you were using DQ only, the model would be very small because it would contain only metadata. If you are in fact importing large tables into a tabular model deployed to AAS and then using live connect to that published model to create Power BI reports, that is an excellent architecture choice. Another option would be to deploy the tabular model to a Premium workspace and manage it through the Power BI service. Depending on your needs, one option may be more cost-effective that the other – it depends. Both AAS and Power BI Premium support models that are hundreds of GB in size.
      I’m not meaning to be critical of your use of the use of the word “cube” but just trying to understand… Technically, AAS only supports tabular models and not multidimensional cubes.
      Under some circumstances, when DQ is the better choice, you can create a composite model with details in DQ fact tables and summary values and dimensions in Import model tables. This can provide the best of both worlds.

  2. Thanks Paul. These a really good insights. I think your first point suggesting the use of CD/CI like AzureDevOps need to be approached with caution. I have seen IT pushed for a DevOps standards at the cost of the Power BI delivery. While I agree there needs to be a balance, I find this a potential driver for bad Power BI reports. The problem created using the CD/CI distract time and resources away from understanding data model and DAX. Personally, I like the value that external tools and 3rd-party charts add to Power BI. However I find that DevOps and DBA need to learn more about these tools to deliver user requirements and value.

    • These are good points, Daryl. Unfortunately, DevOps engineers and solution architects are often not BI experts and don’t understand why BI assets don’t easily work with conventional source code and build tools, and therein lies the challenge. I agree with what you are saying.

  3. Pingback: Power BI Anti-Patterns – Curated SQL

Leave a Reply to Sajaroo Cancel reply