If you are designing solutions with Power BI, or using Power BI along with other data platform components, to be used in a large organization; how should you get started and what are the best practices to ensure sucess? I began a series of 12 blog posts on this topic back in July of 2020. If you can check all the boxes covered in these 12 topics, you are on your way to a successful Power BI solution. Nearly all of the planned topics were completed, but most of these posts are in the abyss of two years of blogging history. I’ve linked to the original posts and provided a short summary to make each item as actionable as possible. These are topics for which I have opinions and some passion for “doing it right”.
Doing Power BI The Right Way (Introduction)
Before jumping in, understand the purpose, long-term and short-term goals for the solution you are building. Power BI can enable you to create reports, source and transform data very quickly. But, often at the cost of data governance, data quality and long-term maintainability. Will your first project be a proof-of-concept or a full-scale production-ready solution? Here are some questions to consider as you head down this path:
- Are you a Data Platform Developer, creating a solution for others, or are you a Business Data Analyst analyzing data for yourself or your business group?
- Who is the audience? Is the audience likely to change or expand in the future? Do they need to run standardized reports, or to explore and analyze data on their own?
- Is it critical that data results be perfect (e.g. clean, complete, up-to-date)? Reporting from a single version of truth can be challenging and costly, compared to importing a one-time extract from a couple of data sources. How prepared are business stakeholders and users to accept the trade-offs between quick-and-easy reports created from an Excel file and a fully-governed solution?
- What is the volume of data? Will the data volume increase in the future.
From the perspective of a data user, consider the following options and decision points. This flowchart calls out three different possible use cases for a report user/analyst:
- As a report user, I would like to run a report that provides answers to business questions. All requirements are well-defined and all the data model and report design work will be performed by a developer, so users just run the report.
- Using an existing certified dataset, I want to browse data to answer my business questions. I want to create my own reports so I can explore and visual information in different ways.
- I want the flexibility to import, wrangle, model and visualize data on my own. The results may not be certified for others to make business decisions, but I want the freedom to do deep and specialized analysis over my own data, for my own use. Self-service models can be used as a POC to guide developers to extend and enhance governed data models.
This guide mainly addresses the first two use cases, to create data models that support enterprise reports; or data models that can be used by savvy business users to browse, explore, and create their own reports.
Power BI is an incredible self-service reporting tool that can be used to quickly procure and analyze a set of data. Durable, enterprise-scale solutions require a scalable mindset. For convenience, Power BI Desktop allows one person to get from source data to presentation quickly, but durable solutions consist of three layers: Data Transformation, Data Modeling & Data Presentation. Queries & transformation processes, data models and report development can be performed and managed by three different people in these specialized roles.
- Perform data transformations in the appropriate solution layer so that they can be maintained
- Use transformation design patterns that will scale to work efficiently with the largest required data volume
- Split data model/datasets from reports so they can be developed and managed separately
Consider the following questions:
- Who is responsible for maintaining source data?
- Who from the business defines the reporting requirements and signs-off on Power BI reports meeting those requirements?
- Who owns the development of the Power BI solution (data model and reports)?
- Who is the developer’s backup if they become unavailable?
- Are the requirements and project assets sufficiently documented to support such a transition?
- What are the security requirements?
- Who and how will users be given access to reports or dataset(s) for ad hoc analysis?
- Are users assigned to report and dataset permissions directly or through group membership?
- Should users have conditional or restricted access to data within the Power BI dataset?
- What is the process for requesting new reporting features, adding features to the existing solution, testing and deploying updates?
Power Query is an amazing data transformation tool that, when used efficiently, provide power and flexibility. Some transformation steps that work with small sets of data do not work very well with high volume data sources. Understand the strengths and weaknesses, and learn to work with them. In particular, learn to use parameters, range filters and how to enable query folding.
ETL work has always been a discipline of routine and process. Power Query makes this easier but well-designed and manageable transformation queries are those that follow well-defined patterns. After choosing transformations that work efficiently with your source data at the right volume, applies these guidelines:
- Rename steps with intuitive, descriptive names
- Declutter & consolidate steps
- Filter records
- Use parameters to manage variable information
- Remove unnecessary columns and fields
- Annotate and document code & steps
Dataflows are the online implementation of Power Query, in the Power BI service. Rather than authoring queries in the Power BI Desktop and storing them in a PBIX file, queries may be designed in a web browser, and then shared across multiple datasets. There are many good reasons to use Dataflows but they are not a silver-bullet solution for every environment.
A common use case for Dataflows is to provide a standardized set of transformations and data entities, when they have not already been defined in a central data warehouse. If you have an existing data warehouse, some Dataflows functionality may be redundant; however they also enable some interesting features like streaming datasets and AutoML. Start with the basics and learn to use PQ in Desktop, then consider using Dataflows when needed.
Data modeling is at the heart of Power BI and analytic reporting throughout the data platform. If you can get the data model right, many other things – like measure calculations and report functionality – will just fall into place. Dimensioanl thinking is a paradigm shift that requires rethinking the way you transform and manage data. Before you convince yourself that a star schema is unnecessary for your reporting needs, learn to apply dimensional model design patterns, and to make rare exceptions. Star schema design will will address 95% of analytic reporting requirements and 90% of reporting needs.
Put any new report in front of a serious business user and they will ask “how do I know this is right”? As data flows through multiple stages of a BI solution, you need to be confident that the results are valid and accurate. Design your transformations, data model, measures and reports so that you can trace the results and validate each steps that handles and manipulates data records and values. You can start validating data in each table with simple record counts. Then, build measures in layers that allow you or someone checking and testing results to see each components of the calculation. In large projects, you can create a testing data model, to compare and validate report measures with the raw source values.
Having queries, the data model, measures and reports all rolled-up into a single PBIX file makes development quick and easy. But, it also prevents more than one developer from working on the solution in tandem. Separating data model design from report design, by moving them into separate definition files, not only enables separation of work but it also gives you the freedom to create multiple reports that share a central data model. Although there some work-effort trade-offs when separating models from reports, the benefits are immediately realized. Community-supported tools make this task fairly simple and easy to manage.
Power BI is an online service, hosted in the Microsoft Azure cloud. The most comprehensive option for publishing and sharing all the goodness of Power BI (reports, interactive visuals, dashboards, Paginated reports, shared and certified datasets for self-service reporting) is to use Power BI Premium capacity. Once reports and dataset are bundled into a workspace app, they can be viewed and used by any number of users in your organization. Premium supports large datasets, auto-scaling and a number of other features that enterprise-scale reporting and analytics. The monthly cost for Premium is a substantial investment for serious customers and may not be attractive at first for smaller organizations. Less expensive options may be a better choice for smaller shops or those needing to test the water before scaling up the solution or their user audience.