Part of the series: Doing Power BI the Right Way (link)
When starting a Power BI project, you have many choices to make. Decisions like how to source your data, where and how to create queries to cleanse, transform and reshape the data; where and how to create calculations and the nuances of modeling are just the tip of the iceberg. Most Power BI solutions progress through a few stages in their lifespan, which include:
- Creating a simple prototype to explore ideas and to get feedback from users and business leaders
- Building a working proof-of-concept to frame the solution
- Designing a foundational data model to build upon
- Completing a production-scale solution
- Iterative enhancements & releases
The purpose of this post is to provide some guidance to help you design Power BI solutions that can survive the journey through these stages with as little “throw-away” design as possible. There will always be prototyping and redesign in any project but if you follow proven design patterns and lessons learned from prior experience, you’ll have a much better chance of building a reporting solution that will endure.
Oh, I should mention this… the difference between item #1 (the simple prototype) and #2 (working proof-of-concept) is that you need to throw away the first one – and anything else that doesn’t lay a foundation that you can build on top of. This is a hard lesson but one that will pay off by helping our teams, sponsors and business stakeholders understand that we can proceed after laying down solid bricks (based on well-defined functional and technical requirements) to build the rest of the solution.
If you are using Power BI to create reporting solutions for business users, your circumstances are likely similar to one of these scenarios:
- You are a self-service report designer creating a small report solution that will grow and evolve over time.
- You work in a software development group in an IT organization with a formal application lifecycle management discipline.
- You work in a business unit or small company with an informal software process – and they need your guidance to be successful.
If you are a self-service report designer creating a small report solution that will grow and evolve over time, you likely have a data source that wasn’t specifically curated for reporting and analysis. The data will need to be massaged and shaped before you can design insightful, interactive visuals. That’s cool because we have great tools built into Power BI Desktop to do that sort of thing. You might have some ideas about how to present the data but you’ll probably try different things and see where the journey takes you. These projects tend to have a lot of business impact right out of the gate. Business report users will be delighted with what you can do in a short time with Power BI and they will want more. You’ll get excited and expand the project to include more data or more dimension to the data you have. It is almost inevitable that you will experience two common challenges:
- Scope creep
- Data quality issues
These are not necessarily “problems” per se but realities to acknowledge. Building a solid, foundational data model is challenging in under these circumstances. The good news is that Power BI let’s you explore options quickly and allows you to try different approaches while you sort-out the formal business and technical requirements. The typical outcome is to start over with a new project where you will be well-prepared to use a cleansed data source and design to a well-defined set of requirements within the new project scope. An experienced BI practitioner will set this expectation with the business and either carefully evolve certain elements into the final project and toss the rest, or use the first attempt as a throw-away prototype and start over. Just know that this is a natural part of the process.
If you work in an software development group in an IT organization with a formal application lifecycle management discipline, you are probably accustomed to using team development a code repository, version control and differencing tools to manage a shared code base. The main components of Power BI came from a long heritage of IT-centric development so at its core, objects (like tables, fields and measures) have properties and metadata descriptors that can be versioned, differenced, merged and scripted. But what has made Power BI so successful is that is is primarily designed for self-service reporting. Power BI Desktop is a beautifully streamlined product that packages data transformation queries, model tables, data and reports into a tidy package but it is not an IT development tool and doesn’t have these capabilities built in – nor will it. That’s not what Power BI Desktop was designed for. The good news is that the community has rallied to develop tools to meet the needs of IT developers who work in teams to develop, test and deploy formal, enterprise-scale BI reporting solutions. Power BI and the rest of the Microsoft BI platform – can be scripted and automated to create custom deployments.
Before creating Power BI, earlier versions of Microsoft’s Business Intelligence components existed as separate products. BI reporting projects took many months to design, develop and deploy. By contrast, today you can open-up Power BI Desktop and create a quick “report” (PBIX file). In this form, the file contains data transformation queries, an in-memory data model with measures and other calculations, and report pages that include the interactive visuals. This design typically meets the needs of small and intermediate sized projects.
One of the first questions to consider is whether the solution you intend to create with Power BI is a quick informal project or a formal project – or may become a formal project in the future. Informal projects don’t necessarily have to conform to all the design best practices. They are limited in scope and typically meet a very specific and short-term need. Formal projects are built on best practice design standards to endure a certain degree of expanding scope and are designed for longevity. They have a business sponsor or champion who drives the business requirements and at least one developer who is responsible for designing a solution that meets those requirements. Formal projects are typically testing, signed-off an then deployed in accordance with an established development discipline or application lifecycle process so they can be supported by the business. If your organization has a process for delivering and supporting company-sanctioned applications and reports, then you will need a roadmap to transition from an informal project to a solution that the business is prepared to support. A few common questions that come up in these discussions are:
- How is the source data procured and maintained?
- Who is responsible for maintaining that data?
- Who from the business defines the reporting requirements and signs-off on Power BI reports meeting those requirements?
- Who owns the development of the Power BI solution (data model and reports)?
- Who is the developer’s backup if they become unavailable?
- Are the requirements and project assets sufficiently documented to support such a transition?
- What are the security requirements?
- Who and how will users be given access to reports or dataset(s) for ad hoc analysis?
- Are users assigned to report and dataset permissions directly or through group membership?
- Should users have conditional or restricted access to data within the Power BI dataset?
- What is the process for requesting new reporting features, adding features to the existing solution, testing and deploying updates?
Personal BI projects, where a data analyst imports data, creates reports and then publishes a report for their own use serves an important need in business. These reports can easily be promoted to a departmental or business area solution by sharing the report with others. As the scope of the project continues to expand, there are several important considerations that define the threshold between an informal project, designed for quick development and convenience, and a formal project; intended for longevity and long-term business support. For each project component, I start with a typical informal project scenario and then provide recommended practices to to plan for futureproofing.
Planning for separation
Designing transformation queries, data model and reports in one PBIX file is convenient and offers quick troubleshooting, debugging and redesign. However, it also limits development to one person at a time and promotes a monolithic solution. Separating the data model from the report files helps to promote self-service and focused report design. The certified or promoted dataset is deployed, secured and managed as a separate object. Reports are connected to a Power BI dataset live connection and any number of reports can connect to a published dataset.
This pattern supports both fully IT-managed projects with multiple developers and it supports self-service report design using certified and promoted datasets. After the data model design is stabilized and locked-down, report designers simply focus on report creation and data analysis by connecting to the secured dataset. Any tools and products capable of connecting to Power BI or SQL Server Analysis Services (SSAS/AAS) can be used for reporting and ad hoc analysis.
Spreadsheets and files extracted from reports, manually manipulated using Excel or other line-of-business applications have limited utility as reliable data sources. Manually preparing spreadsheets and source data files is only feasible for small sets of data and one-time or manual data loads. An analyst user can easily remove, rename or add a column or formula to a sheet that will break or negate the import process.
Futureproofing: Data from source systems can be loaded into a managed database or data lake before using Power Query to load the data model. Data from files (if carefully protected and maintained) may be transformed, cleansed and validated before it is ready to be loaded. Using “brute force” transformation steps to replace values and cleanse records one step at a time that can be slow and difficult to manage as the complexity and volume increases. Reduce the number and complexity of query steps and apply the following recommendations for Power Query design optimization.
Use parameters in Power Query to manage any variable information such as database servers, database names (e.g. DEV database, QA database & PROD database) and file paths.
Planning for data volume
In smaller, informal projects we typically designed for a limited set of data and fixed time-frame. If manageable, all production data can be loaded into the working PBIX file on a developer’s desktop.
Futureproofing: When working with large data volumes, a subset of data should loaded into the desktop dataset for development. Query parameters are used to expand the number of records (usually using a date range) loaded after the dataset is deployed for production use. Groups and ranges of data records, often partitioned by months or years, are defined and stored separately so a large volume of records don’t need to be reloaded along with new records.
Power Query design practices
Using the robust Power Query design user interface (ribbon and menu options), you can build sophisticated and complex transformations that work well with moderate data volumes. With a small number of query steps, default names like “Rename Columns” and Change Data Type” are fine but it can be difficult to trace the steps and remember where and why you performed each step.
Futureproofing: Tables and fields should be meticulously renamed to use standard friendly names, categorized and optimized with appropriate data types. Unused columns are removed to save space and improve performance. Spend extra time to get the field names right before creating dependent steps. (see; Chris Webb: Names, Tables, Columns & Measures In Power BI)
Consolidate redundant query steps – like those used to rename columns, remove columns and change data types – into a single step in the query.
Attribute fields needed for grouping and filtering should be moved to separate dimension tables. Dimension unique keys are defined with corresponding fact table keys.
All numeric columns used for aggregation (which share the same level of grain) should be extracted to fact tables, with corresponding dimension keys.
Row-level calculated columns should be defined in either in database views or Power Query tables rather than DAX calculated columns.
Promote query folding by using database tables, views or table-valued user-defined functions in the database rather than writing SQL statements in Power Query. Test and optimize queries to support query folding by checking to see in “View Native Query” is an available right-click menu option on each query step. Power Query also doesn’t work well with stored procedures.
Data loading & refresh management
For large fact tables, design them to be “tall” rather than “wide” by removing unneeded columns or moving them into dimension tables. Tall tables should only include columns that compress efficiently, such as numeric values and short, non-distinct text values. Do not store a single unique key column in a fact table.
If large tables take too long to load or cause the PBIX file to exceed a reasonable size for development (no more than 500 MB), use parameters to limit the range of records. To manage the table size once deployed tot he service, do one of the following:
- Use start and end date range parameters to filter a Date or Date/Time type column in the table. Use this to limit the working set on the development desktop and then to expand the range of records in the service.
- Similar to step one, use a pair of Date/Time parameters with a range filter on a Date/Time type column and follow the product documentation to implement incremental refresh. This will automatically partition the table records on specified calendar date parts (e.g. weeks, months, years) and then only load changes.
- Devise a table partitioning strategy and implement partitions using Tabular Editor. Use PowerShell or the XMLA read/write endpoint to schedule and execute TMSL or XMLA script to manage partition processing.
Many modelling techniques that perform well when tested on small data sets will slow down with large tables. These include relationships on bridging tables, many-to-many relationships, bi-directional relationship filters and multi-measure layered calculations.
Avoid using SWITCH statements over already complex DAX expressions or referenced measures. Avoid using iterator functions (like SUMX) with complex row-by-row predicate logic. Separating calculation logic into separate measures is a useful design pattern to segregate business logic but it can create performance bottlenecks. Identify slow measures and consider building streamlined measures using alternate patterns when bottlenecks are discovered.
Import mode tables are generally faster and more flexible than DirectQuery mode tables. However, DirectQuery is a good option for very tall transactional tables where simple aggregations are used, rather than complex DAX measures. If Import and DirectQuery mode tables are both used in a composite data model, consider creating a summary table using Import model and a detail table using DQ, and then define aggregates to cache certain query results in memory (to compensate for the normal performance degradation resulting from the query translation in DirectQuery.)
There is so much more to discuss at this point but I’ll need to address additional topics in later posts in this series titled “Doing Power BI the Right Way“. Topics include:
- Service vs on-premises
- Interactive vs operational reports
- Spreadsheet reporting
- Iteration vs redesign: When to start over
- Designing for scale
Power Query or Power BI Dataflows
Multi-tier and advanced self-service solutions can be architected by further separating the transformation queries from the data model using Power BI dataflows. These are server-hosted Power Query/M queries that are both designed and managed in the Power BI service.
Dataflows expose several integrations with the Microsoft Azure cloud services, such as the Common Data Service and Azure Data Lake Gen2 Storage. Dataflows also have integrations with AutoML machine learning models.
The advantages and disadvantages of dataflows should be carefully considered before jumping in head-first. Power Query in Power BI Desktop is a fast, right query development environment that provides nearly immediate feedback. By contrast, the browser-based dataflows designer is similar in function but slower and has a less-robust query design feature set. Some developers choose to develop Power Query in Desktop and then port their M query code to dataflows as needed.
Stick around. There is a lot more coming in this series in the months ahead:
- Futureproofing Power BI solutions
- Preparing, shaping & transforming source data
- Power Query design best practices
- Power Query in dataflows or Power BI Desktop
- Data modeling essentials and best practices in Power BI and AS tabular
- Validating data model results
- Planning for separation – data models and reports
- Power BI delivery options
- Choosing the right report type: analytic or paginated
- Designing and managing large datasets in Power BI
- Multi-developer and lifecycle management for Power BI
- Certified reports, certified datasets & the self-service mindset
To wrap up this edition of the series, I will leave the rest of my outline points as teasers for subsequent posts. Please freely comment with your thoughts about what to include in these later editions.
The rest of these are random thoughts left from the outline for the series. Feel free to comment.
- Define the technical audience
- Establish solution ownership
- Continuation plan (if they were to leave?)
- Who will maintain & manage the solution after deployment?
- User access
- Version updates
Know and understand the business audience
- Report/Dashboard Consumer
- Skilled Data Modeler
- Analyst, Data Scientist
- IT BI Developer
- Planning for capacity
- Certifying reports and datasets: Gaining business trust
- Validating results
- Team collaboration
- Project file sharing
- Promote continuous improvement
- Version management
- Managing deployment
- Power Query and Dataflows
- Promote and support repeatable design
- Templates and layouts
- Planning for security
- Access to reports & dashboards
- Publish to web
- Embedding to a business portal
- Embedding for external access
- Access to dataset(s) for ad hoc analysis and future report development
- Conditional data access / role-based access / row-level security
25 thoughts on “Doing Power BI the Right Way: 1. Futureproofing Power BI solutions”
What an amazing post! I always look forward to reading your posts.
They are so engrossing and well-written. The way you consider your readers’ preferences and leave a hint of personally relatable experiences is simply exceptional. There are always a few lines in your posts that stick with me and keep me hooked on your writing. Reading one of your blogs immediately makes visitors want to read another and share it. This post was wonderful, and please continue to share such blogs in the future.
I believe that my expertise in the field in which you write will be very beneficial to you. You can get a sense of my writing style by browsing through my high-quality content.
Great post Paul!
Really appreciate your efforts.
Could you please explain “Do not store a single unique key column in a fact table.” How to handle such situation.
Thank you !
Sure, because VertiPaq uses compression to store tables efficiently, we don’t want to store unique values in a column when they can be avoided. In a dimension table, the key column should be unique of course, to be able to identify and lookup dimension records. But, fact tables should contain either foreign key columns or numeric columns used to create measures. A fact table doesn’t need to have its own unique key. Such a column will not compress and is unnecessary.
Thanks a lot Paul for the detailed explanation !
Today most businesses are leveraging Power BI services to manage the data and maximize the business capabilities. To use this tool or to get intelligent business insights, uses would require a Power BI Consulting Services.
Congrats! nice article and very usefull.
Great article as ever but on Data modeling subpart could you explain more in details what do concretely implies in technical terms streamlined measures in following sentence : “Identify slow measures and consider building streamlined measures using alternate patterns when bottlenecks are discovered.”.
Many thanks for your feedback.
Hi Stephane, the point I was trying to make is that it is not always easy to identify the root cause of performance issues. However, once you decompose the logic and identify which parts are performing poorly, you can then look for different ways to write your code or make improvements to the data model. Sometimes the solution is discovered entirely through trial-and-error but only after you have identified the problem.
Great article with lots of ideas to think over.
I’m running a bootcamp soon, this is going on the pre-reading list. I’m hoping it will spark conversations and get people thinking.
Great post Paul!
The summarization of the most important topics regarding a nice deployment of Power BI (in many different scenarios) is great!
One important topic that I’m facing and still trying to fully understand is deploying the model in PowerBI or in Analysis Services (On-Premises). Lot to think about!
Thanks for this important post!
Hi J.M. The choice between using Power BI datasets and SSAS has a lot of dependencies. If you don;t have the option to move to the cloud, and since you can’t share datasets on-prem, I would generally lean toward using a central SSAS tabular model.
Thanks a Lot Paul!
And what about SSAS on-premises or on the Cloud?
It seems to me that depending on the location of the user vs PBI hosted location (on Microsoft Server – PBI Service), the On-Premises can have a delay to process the PBI queries over the gateway whereas the AAS will be side by side on the Microsoft Server.
What you think about it?
Excellent article. Very anxious to read your next topic.
All the best,
Really great post thank you Paul!
Very insightful thank you Paul!
Thank you for the solid content you provide. I think I see a book in the making, and we have the opportunity to get a pre-publication version. Looking forward to the other posts you outlined.
Thanks for the great introduction to the world of Power BI and planning for it!
This is exactly the sort of informed knowledge sharing that makes a blog!
I would be interested in how individuals permission for the data use is managed from gathering to final visualization.
Thanks for the article, really appreciate your efforts
Great Post. Useful and concise from start to finish.
My comment is that PowerBI “report publishing” requires some workarounds for the small business that are very confusing, time consuming and lacking in published best practice to explain current deficiencies.
1. what are the minimum MS licencing required to support SMB deployment (i.e. 2 BI developers and 10 consumers)?
2. what is the best practice for BI developers to publish dashboard reports across Team Sites with typical permissions (Executive, Sales, HR, Mktng) – and what is the required MS licensing?
3. what is the best practice for integrating automated periodic publication to Team Sites/Sharepoint/Email of multi-page drill down excel (PowerPivot) sheets (think finance) – and what is the required MS licensing?
For a small business, I really think that “report publishing” (Team Sites/email/Sharepoint) needs to be well understood before going down the Power BI adoption path!
Totally agree. I think a lot of businesses (large and SMB) start off thinking they are just going to send PBIX files around. When I start training them, I tell them that they HAVE to plan for everyone who is going to even LOOK at a report to have a Pro license. This always seems to come as a surprise to them.
Also, the whole idea of being able to publish PB reports to Teams is something that has to be carefully thought through. The memberships must either be 100% identical OR some people will need to be prepared for a “content not available” error in Teams.
I always compare this more to a publishing solution than just a reporting tool.
Terrific post with LOTS to think about. Here’s one thing I have encountered to add to the mix:
What happens when the project sponsor and report consumers are only getting full access to the data when you, the developer, start bringing it in? In my experience, that can blow the initial scope right out of the water. I would add to the initial phase a step wherein the data is reviewed AFTER the developer gets access to it. Then the sponsor and consumers can say:
We want this, this and this—based on what is available. Otherwise it is continuous scope creep—or scope adjacency as I like to call it. And if the developer is service oriented, it is very hard to say no if you know that adding a particular field to the analysis is relatively easy (BY ITSELF). It’s when there are 20-30 “can you just add this” requests that it can blow the best
plans out of the water.
The other thing I would add is that planning for the Workspace/App where the report will “live”
should be done early in the process. I find that most folks are used to the Excel
Mode, wherein you pass a file around. The closest rhey have come to a workspace is (maybe) a Sharepoint site, or a shared network drive. Neither is an exact analogue.
I can’t wait to see other comments on this series.