Power & Data Platform MVP, Principal Consultant, Pragmatic Works
Preface: This post is set set of guidelines that address many aspects of new projects. Although I feel strongly that many of my recommendations are correct and applicable in most uses cases, other experienced practitioners have valuable ideas and I welcome their input. Please post your comments, links to other resources and ideas that may be different than mine. This will be a living document with ongoing revisions with the goal to provide a comprehensive, best practices guide as the platform continues to mature and as experts continue to use it.
Revision: 8/25/2019 – cleaned-up & added checklist
Revision: 9/7/2019 – added Security section & corrected Last Refresh Date/Time
I find there there are so many things to remember when starting a project that a checklist is handy. I’ve been collecting the following as notes for some time. Bare with me as I work on consolidating this article into a concise checklist.
The purpose of this article is to outline a set of guidelines and recommended practices for managing Microsoft Power BI projects. This guide is primarily focused on the work performed by the IT-managed BI Solution Developers for formally-managed BI projects.
Why “Good” and “Best”?
Power BI is a tool and platform that serves two masters: IT developers and business data users. As such, some practices are absolutely best-practice and should be followed with rare exception. Other recommendations are conditional guidelines, with possible exceptions. The flexibility of Power BI makes it possible for industry practitioners to developed their own solutions to apply design patterns in the absence of a specific product feature (like source control, for example).
This guide is largely based on my experience, discussions with other members of the MVP community, clients and fellow consultants. Most of the information in this recommendation guide are based on the consensus of parties with the greatest experience with the Power BI platform. These parties include:
- Members of the Microsoft MVP Program (Data platform and Business Applications focus) who participate in daily interactions with the Power BI product development and leadership teams at Microsoft.
- Microsoft certified solution partners who offer consulting, training and value-added services to Microsoft product customers.
Power BI has come a long way in a short time; since the Power BI service and Power BI Desktop were introduced in 2015. At that time, the emphasis was to enable self-service business users to analyze and visualize data in small-scale ad hoc reports. As Power BI grew into the enterprise space, it has been adopted by IT-managed data governance and solution development teams. Power BI is displacing – and being used in concert with – conventional IT-centric Business Intelligence product technologies that have been evolving for decades (such as SQL Server Analysis Services, Reporting Services and Integration Services). With this history and the rapid adoption of the platform, standard design practices will continue to evolve.
Two discreet audiences and roles should be considered for Power BI:
- Business Data Analyst
- IT-managed BI Solution Developers
Consider these roles, with respect to both the solution developer(s) and users, as you begin to plan a new project.
Business Data Analyst
Members of this role are part of a rapidly growing segment of the industry. They understand how to use software tools and understand data manipulation and reporting within the confines of area of business focus. Traditionally, data analysts were data consumers who use financial spreadsheets but now they use advanced data preparation tools to perform reporting and deeper data analytics. Business Data Analysts generally fall into two subcategories:
- Advanced Data Consumers
These users explore data using spreadsheets, pivot tables and chart-style reporting tools that are served by existing multidimensional cubes and data models. They do not write code and work with IT/BI Developers, Data Modelers or Advanced Data Analysts to collect and model the data they use to create their reports.
- Data Modelers, Advanced Data Analysts and Data Scientists
These advanced users perform their own data mashups, produce data models and write limited code to create solutions for their department or business area (using Power Query, DAX, R, Python, etc.) They do no architect and develop governed and certified datasets for enterprise-level business use.
IT-managed BI Solution Developers
Enterprise-scale BI solutions are typically created by a team of specialists who may include:
- Business Systems Analysts
- BI Solution Architects
- Project Managers
- Data Transformation (ETL) Developers
- Enterprise Data Modelers
- Measure code experts (using DAX)
- Report Developers
- Operations & Deployment Engineers
Power BI may be used alone or with other products to create BI data transformation queries, data models, reports and dashboards. IT-BI managed scenarios:
- Complete BI solution design, development and deployment, consumed by end users.
- Data models created and deployed to support user ad hoc report design.
- Support Advanced Data Analyst data transformation, data modeling & report design
- Migrate advanced user created data models to governed/certified datasets, transferring ownership to IT & operations
Project Types – simplified
For simplicity, this guide refers to Power BI projects as either “Formal” or “Informal” using the following definitions:
- Formal projects are scoped, funded, staffed and executed with the collaboration of a business champion and stakeholders; and IT Business Intelligence developers and data managers. These projects promote business and IT-governed datasets and certified reports.
- Informal projects are executed by business users and are considered ad hoc in nature. Datasets are generally not IT governed, and reports are typically not certified.
- Hybrid projects can be anything in-between. They might be a user-authored report using published, certified dataset used for self-service reporting. Informal, self-service datasets can be migrated to governed datasets in collaborative IT/business projects.
Again, this guide is primarily focused on the work performed by the IT-managed BI Solution Developers for formally-managed BI projects. Topics included in this guide:
- Solution Architecture
- Managing Power BI Desktop Files
- Datasets and Reports
- Version Control & Lifecycle Management
- Workspace and App Management
- Data Model and Power Query Design Guidelines
- Dimensional Design
- Query Optimization
- Managing Dataset Size with Parameters
- Implicit and Explicit Measures
- If Users Need Excel, Give them Excel
- Certified & Shared Datasets
- Enterprise Scale Options
All Business Intelligence projects involve the same essential components including:
- Source Queries
- Data transformation steps
- Semantic data model
- Calculations (typically measures)
- Data visualizations that reside on:
- report pages
These components can be built using different products and development tools. Lengthy discussions can be had to debate the nuances and merits of different options; but a short summary of recommendations follows. To summarize these options in simple form, first consider the following options:
- Single PBIX File
Build all queries, transformations, the data model, measures, reports and pages
with Power BI Desktop in a single PBIX file and deploy to the Power BI cloud
service; to be shared with users.
This option is most suitable for Business
Data Analysts, erring on the side of simple design, typically optimal for
small, informal self-service projects where a desktop business user needs to perform their own data mashups and quick, ad hoc analysis. Risks include unknown data integrity and the use of uncertified dataset results. Informal projects can effectively serve as prototypes for formally developed and managed dataset development. Single file projects can be migrated to dataset/report two file projects with some rework required for the reports, depending on the report complexity.
Separate Dataset & Report PBIX
In formal projects, use Power BI Desktop to author a separate PBIX “dataset” file containing the following: queries, transformations, the data model and measures. These are in turn deployed to the Power BI cloud service. Develop one or more separate PBIX “report” files that connects to the published dataset file in the service. All report pages and visuals reside in the “report” file. Some additional measures can be created in the report
This pattern is recommended for most projects where data governance is important and where new report features are likely to be requested in the future. By separating the dataset from report development; the data model design can be locked-down and managed by a data steward. This approach provides report designers freedom to add visuals and make report enhancements without risking the stability and integrity of the data model that may be serving data to other reports.
Development and management of a two file project is a little more cumbersome because the dataset must be redeployed before changes are available in reports.
Self-service report users who need to explore and visualize data, can use a governed and certified data model. Changes and enhancement should be coordinated between the report designer and the data steward or IT resource managing the data model.
SSAS with Power BI
Taking the “traditional route” in formal projects to build the data model in SSAS, hosted either on-prem or in Azure Analysis Services, and then creating reports and visuals with Power BI was a necessity under enterprise-level feature were added to the Power BI platform in 2017-2018. Although there are still some good reasons to use SSAS rather than Power BI to build the data model (called a “dataset” in Power BI), Power BI can be the more cost-effective choice today. Power BI and SSAS both offer unique capabilities which doesn’t clearly make one option always better than the other.
You can develop the source queries, transformations, data model and measures in an Analysis Services (SSAS) project using Visual Studio (also called SQL Server Data Tools or SSDT) and in turn deploy the SSAS database/semantic model to an on-premises SSAS instance or Azure Analysis Services (AAS) cloud service. Connect new Power BI reports to the deployed SSAS model using a direct connection. SSAS/AAS can handle data models with more than the 1 GB, 3 GB & 12 GB compressed data size limits of Power BI. The SSAS development environment can be more difficult to navigate and the model designer can be unstable. When the VS/SSDT designer crashes, developers can lose work even if they are routinely saving their work.
Favor using Power BI. It typically takes considerably longer to develop data models in Visual Studio/SSDT than using Power BI Desktop. This project architecture is no longer the primary recommendation for BI solution development. Microsoft is directing most of their development and support resources to Power BI rather than SSAS. Although SSAS still has some advanced capabilities not yet present in Power BI; Power BI can handle larger data volumes in certain cases. Microsoft and Microsoft partners recommend using Power BI is the tool for developing most new BI solutions.
- DirectQuery vs Import/In-memory
Power BI desktop using DirectQuery mode, rather than import mode to reference tables in an underlying data mart or data warehouse relational database. No data is stored in the Power BI or SSAS data model; live SQL queries are generated when navigating report visuals.Recommendation/Cost/Benefit:
- Avoid this option, unless needed to satisfy unique requirements. Power BI and SSAS both are favored and are optimized to use: import mode, in-memory tabular data models for all but rare and exceptional scenarios. DirectQuery significantly limits the DAX Calculations supported in the data model. DirectQuery speed is typically many times slower than import, in-memory mode. The advantage of DirectQuery mode is that queries are executed directly against source tables in real-time. Consider building composite models with both import and DirectQuery mode to get good performance and drill-down to live transactional details. Composite models are typically complex and more time-consuming to develop, test and deploy.
Managing Power BI Desktop Files
Power BI Desktop files (PBIX and PBIT) should be stored in a centrally managed network-assessable folder, where they can be accessed by all
members of a BI report development team. The storage folder should support automatic backup and recovery in the case of storage loss. Options depend on currently supported services and organization preferences within the following guidelines:
File Size and Folder Synchronization
- Report and dataset developers must open files from the Windows file system and not from a web browser interface. Files must either reside in or be synchronized with the Windows file system.
- Files containing imported data typically range in size from 100 to 600 MB. Any shared folder synchronization or disaster recovery system should be designed to effectively handle multiple files of this size.
- OneDrive For Business (shared by team, with folder synchronization).
- SharePoint or SharePoint Online (with folder synchronization).
- GitHub and/or VSTS with local repository & folder synchronization. If used, Git must be configured for large file storage (LFS) if PBIX files are to be stored in the repository.
Give files an intuitive and friendly name, that will result in them not being confused with similar report projects. In default view, only the first 20 or so characters are visible for each report. Use self-evident names that can be used to identify datasets and reports, using the beginning of long names. Workspace and report names have limited space in the Power BI portal navigation bar. Use short and concise names when possible; but try to use names that differentiate like-named items in the first 20 characters. The full name is visible in the hover-over tooltip or by resizing the panel.
Datasets and Reports
Formal report projects with certified datasets should be developed with reports and datasets divided into separate files. By default, all Power BI objects, including reports and datasets, are stored in a single file. When the PBIX file is published from Power BI Desktop to a workspace; the PBIX file is separated into separate dataset and report objects. This default behavior is convenient for informal, small-scale, self-service analysis; but not optimal for managing formal solutions with governed datasets, managed by teams of developers or report authors.
Separating the dataset and report into different PBIX files; allows them to be managed and versioned separately. This approach also promotes a division of labor and ownership, particularly for certified datasets that must undergo scrupulous validation.
Like the way SSAS projects are used to develop multidimensional cubes and tabular models; deploying a separate Power BI dataset provides an uncomplicated black box solution for ad hoc report design and data exploration. Power BI can now be used to develop large data models containing 10s of gigabytes of compressed data. Using these development tools take far less time and possibly without additional licensing fees.
Version Control & Lifecycle Management
At the present time, Power BI has no integrated version control mechanism. In addition, Power BI is incompatible with most version control systems that perform branching and differencing operations on code and script files that are common in application development projects. Power BI (PBIX and PBIT) report and dataset files are stored as binary archive files with internal checksum metadata. Change and differencing operations performed by most code and version management systems will corrupt these files and render them inoperable.
Reality: Integrating changes made to multiple PBIX files is difficult to manage and challenging to perform technically. Even with differencing tools and effective version-control, just avoid multi-developer work on Power BI dataset files.
The Power BI ALM Toolkit (based on the older BISM Normalizer tool for SSAS Tabular) will make it easier to compare differences. Until (or if) specialized tooling becomes available, you should avoid replying in this method to manage project work.
Power BI Desktop PBIX files that contain imported data can be large; typically, in the range of 100-600 MB. However, they can be checked into a version control repository for the purpose of backing up and restoring files.
PBIX files can be reduced to small files by saving them to a Power BI Template (PBIT) file. This removes all data and may be an effective to make occasional backups for version archive purposes and disaster recovery. The template can then be used to generate a new PBIX file; much like using a template to generate a new Word, Excel or PowerPoint document.
Saving a PBIX to a template for backup purposes takes minimal effort; but rehydrating a PBIX from a template is more labor-intensive and time-consuming. The reason for this is that a new file must be created and then queries must reprocessed to populate tables with new data.
Store the dataset and report PBIX files separately using a version number postfix for the file name in the following the format: Major.Minor.Revision. For example:
Manufacturing Cost Analysis Dataset v1.5.3.PBIX
Keep only the previous file versions that are useful for historical reference and recovery. Report files are small and don’t take up much storage; but, dataset files can be large. If you need to archive older dataset files, resave them as Power BI Template (PBIT) files to reduce the file size. However, a developer must continue to work with the current PBIX file. Recovering a dataset from a template is time-consuming so developers cannot actively develop using only template files.
Remove the version number from file names prior to production deployment.
Dataset Version History
By keeping version history information in an internally-generated table in the data mode; durable revision history notes and version information is self-contained and not dependent on external data sources.
Add a Version History table to the data model by using the “Enter Data” feature in Power Query to create a table containing these columns:
- Version Number
- Revision Date/Time
- Developer Name
- Revision Notes
Add a custom column named Refresh Date/Time that will be used to capture the date & time every time the queries & model is refreshed. This custom column could be added to any table but it is best to use a table with few rows because the value will be the same in each row.
Add the following measures to the model:
|Last Refresh Date/Time||= MAX( ‘Version History'[Refresh Date/Time] )|
|Current Version||= MAX( ‘Version History'[Version Number] )|
Create a Dataset Information page in the dataset PBIX file and in any report that references the dataset.
Every time a revision is made to the data model, edit the Enter Data
step and add a row to the table with a new version number. Use the following guidelines to increment components of the version number:
|Major release||Increment only when a newly scoped solution is released to PROD.|
Major releases have a specific set of planning features. A new major release is created only for a new set of planned features or significant enhancements as part of a new scope of work.
|Minor Version Number||As a team, decide when changes are significant enough to warrant an incremental minor version release; typically, when changes are made to QA or PROD releases within a major release. Generally, for minor feature|
enhancements: For a dataset, minor version changes may break or affect a report’s design.
For a report, minor version changes add or change functionality aside from fixes.
|Fix or Revision Number||Increment for every bug fix or design change, that will be deployed to the DEV environment or visible to other developers. Generally, for bug|
fixes but not for adding features. May be incremented by the developer. Reset
to zero with a new minor version number.
For example; the following version number represents Major Release Version 1, Minor Version 5 and Fix/Revision Number 3:
Datasets and reports should be versioned separately. The major version number for reports and datasets may be synchronized for major workspace app releases.
Be mindful, that the purpose for version numbering is to provide at simple standard to keep track of changes. The process can easily be over-thought and over-engineered. Keep it simple, discuss changes as a team and make process adjustments to focus on priority tasks.
Release Management, DevOps and Automation
(topic draft – expect updates – please comment)
This is an area where the Power BI platform currently lacks a formal process or mechanism. We know that release management tools are being developers but in the early stages. We may or may not see a release management too from Microsoft that blends with other tools that you might have in use, such as VSTS and AzureDevops. In my experience, automated Power BI release management isn’t really being practiced in the industry yet. That said, there are techniques available. PowerShell script may be used to publish datasets and reports, and to change dataset bindings. It is possible to either publish to a production workspace or to effectively move assets from one workspace to another. This approach is discussed briefly in the Power BI Enterprise Deployment Guide. Other approaches are discussed here:
PLEASE ADD REFERENCES
Workspace and App Management
To share reports to users who do not have a Power BI Pro license (each license has an additional cost) must be in a Premium capacity workspace and the workspace must be published as an app.
Create a separate workspace to share with Free licensed users
Create a separate workspace for formal projects that have a report, collection of reports and/or dashboard that should be distributed to multiple view-only users. A workspace must be converted to an app to be shared with users who do not have Power BI Pro licenses assigned to them. Once the workspace is published as an app; it can be shared with any group of “free” users or all users in the organization. The visible text for workspace names is limited; name workspaces so they be uniquely identified in about 20 characters or less.
For a formal project, create the following workspaces:
Only development team members need Contributor access to this workspace. This workspace does not need to have Premium capacity; unless, developers need to unit test incremental refresh or other Premium features.
All testers must have View access for testing and Contributor access for report authoring. Should be in Premium capacity to test incremental refresh.
Omit the “PROD” designation in the name. This workspace will be the
name of the published app that users will see in their Apps, Home and Favorite pages so use a name that is simple and sensible. Must have Premium capacity to share the app with non-Pro licensed users.
Use a common workspace for informal reports not shared with Free
Multiple self-service and informal report files can share a common workspace without converting the workspace to an app. All users accessing any of those reports, must have a Power BI Pro license assigned to them. Informal or self-service Power BI projects typically don’t require
separate Development or QA workspaces.
Data Model and Power Query Design Guidelines
The topics in this section are not an exhaustive guide to Power BI query and data model design. Following are common issues and recommendations often encountered in projects.
Always design data models for optimal performance, even when it “doesn’t matter” in simple projects. As data volumes grow and as data models are enhanced to address new requirements, increased complexity leads to performance challenges. There are many books on this subject dealing with the intricacies of best practice data model design. In brief:
- Build star schemas – wherever possible, reshape data into fact a dimension tables with single key, one-to-many relationships from dimensions to fact.
- Enforce dimension key uniqueness – Just because a key value “should” be unique, there is no guarantee that it will be unless enforced at the data source. Perform grouping and duplicate reduction in the data source views or Power Query queries to guarantee uniqueness. Duplicate record count checks and other mechanisms can be applied to audit source data for integrity but do not allow the data model to violate these rules.
- Avoid bi-directional filters and unnecessary bridging tables – These data modelling patterns adversely affect performance.
- Calculated columns – Should be applied in Power Query and not in DAX calculated columns wherever possible. This maintains a consistent design pattern for maintainability.
- Annotate code – Use in-line comments and annotations in all code including SQL, M and DAX; to explain calculation logic and provide author and revision information.
- Hide all fields not used directly by users. These include: primary and foreign key columns, numeric columns used to create measures, and columns used to specify the sort order of other fields.
- Set to Do Not Summarize – Any non-hidden numeric columns that are not intended to roll-up or summarize values should be set to “Do Not Summarize” in the Modeling ribbon in Power BI Desktop. Columns set to summarize are indicated with a Sigma icon.
Support Query Folding with Views
Data source queries in Power BI are authored using the Power Query editor; which generates queries in the Microsoft Data Mashup “M” universal query language. A critical optimization technique employed by Power Query is called “Query Folding”; where the “M” query definition is translated into a native query that can be executed by the source database server. If query folding works, SQL Server executes queries and returns results to the Power BI service through the on-premises data gateway. If query folding can’t be used, all raw data is streamed through the gateway, held in memory in the service and then processed inefficiently, row-by-row.
Query folding works reliably when queries authored in Power Query reference relational database tables or views. Avoid using SQL statements in Power Query and avoid using stored procedures as query data source objects. This can be a difficult transition for database professionals who are accustomed to writing SQL queries but Power Query can optimize its queries only when referencing tables or views.
In rare cases to avoid extremely long-running queries that might time-out during data refresh, it may be necessary to materialize source table results in a table stored in the EDW or data mart database. Using parameters with an incremental refresh policy can also reduce or eliminate query time-out issues.
Managing Dataset Size with Parameters
Use Power Query parameters to filter large tables (typically only fact tables). The incremental refresh feature of Power BI Premium capacity will automatically generate partitions and only process new or partial data to avoid reloading large tables.
Even if you don’t intend to implement incremental refresh policies; data refresh parameters allow the working dataset size to be reduced in the development environment and to deploy a smaller dataset file that can then be refreshed with a full load of data in the service.
Design large tables with a parameterized range filter that conforms to the requirements supporting incremental refresh policies.
When a dataset PBIX file approaches 400 MB in size, parameters should be used to filter large tables and reduce the working set of records for development.
Dataset approaching 1 GB in size should have fact tables configured with incremental refresh policies and must be in a Premium capacity workspace.
First, create two Date/Time type parameters named RangeStart and RangeEnd. Next create a date range filter in the early steps of each fact table on a Date/Time type column in the table. Incremental refresh policy requirements are detailed in this article: https://docs.microsoft.com/en-us/power-bi/service-premium-incremental-refresh
Measures & Numeric Columns
Format all DAX code for easy readability. not only does it look better but will help avoid mistakes and support debugging and maintenance. As a guide, use DAXFormatter.com to “beautify” measures and follow the same code formatting pattern going forward.
All measures should return the correct numeric type values and every measure should be correctly formatted. This seemingly trivial subject is important and an issue in most informal and ill-managed projects. Requirements should be defined for data types, precision and formatting. Common items include:
- Measures are designed to summarize and total over any filter, slicer or grouping condition. Apply the appropriate roll-up or conditional level logic if it is not
- Percent and ratio measures should be fractional decimals. For example, “45 percent” is represented as a decimal value .45. When formatted, it appears as 45%, or 45.00% with two decimals for better precision.
- All whole or large number measures should include a thousand separator.
- Use the DAX DIVIDE function instead of the forward slash operator to handle blank and zero division errors.
- Do not conditionally return text values to handle exceptions (like “not applicable”) in a measure that should return a numeric value.
Create a separate text-type measure to support titles & labels, if needed.
Implicit and Explicit Measures
For simple and informal projects, Power BI supports implicit measures. Numeric columns in each table are automatically set to summarize data for ad hoc reporting. This behavior is both convenient and confusing for ad hoc report users. In formal projects, avoid using implicit measures for the following reasons. Instead, create a measure for any numeric column values that should be aggregated for reporting, and then hide all redundant numeric columns.
- Implicit measures don’t work in Excel pivot table and expressions
Like when working with cubes and connected SSAS models, only explicit measures are available in Excel report.
- Two choices confuse users
Since all data models will include explicit measures, written using DAX. It is confusing for users to understand when to use measures and when to use numeric columns. In conventional Excel pivot tables (which connects to Power BIas i f it were a multidimensional cube); measures and calculation objects are contained in a measure groups and all columns (whether numeric or any other data type) are dimension attributes. The introduction of PowerPivot, SSAS tabular and Power BI confuses most users with inconsistencies.
- Measures are Flexible
Inevitably, some implicit measure will eventually be replaced with explicit measures when they are required to support some conditional logic or alternate aggregation behavior. Starting with explicit measures, avoids retraining users and introducing inconsistent behavior.
Promote Self-service Reporting
The true value of the Power BI platform is when business users can use the interactive reporting features of Power BI to get answers to business questions. This can be as simple as starting with an existing report and changing a slicer or filter; or adding a visual to a report page to see a selected set of data. Novice report author/users can easily be taught to drag-and-drop fields into a new report page to group and slice data from an IT & certified dataset. Starting with IT/business certified datasets published to the Power BI service, users with a range of interests and skills can modify or author reports using their web browser without altering governed data or protected reports.
For user training and support purposes, consider educating users with skills in the following roles:
Novice Report Users
- Navigate to an existing published app or report
- Understand how to dashboard tiles, report pages and visuals
- Use filters and slicers
- Use visual interactions to select and slice data
- Use report visual drill-through navigation
- Use Q&A to ask and answer data questions
Casual Data Analysts
- Edit a report
- Change a report visual to a different type of visual
- Add a report page
- Add visuals and fields to a report page
- Add and use slicers to filter and select data
- Use Analyze in Excel to create a connected pivot table
- Use Excel to add fields, groups, totals, filters & slicers to a pivot table
- Set alerts & create a report subscription
- User bookmarks to create custom filter sets and navigation
Intermediate Report Authors/Analysts
- Using a published certified dataset, create a new report
- Add pages, visuals, slicers and filters
- Add custom measures to a report using DAX
- Create drill-through actions & navigation
- Create a custom dashboard by pinning tiles
Use SSRS/Paginated Reports for Operational Reporting
Power BI is not a replacement for paginated, operational reporting. For static, multi-page, printable reports; use SQL Server Reporting Services (SSRS) instead of Power BI. SSRS (now called “Paginated Reports”) is integrated into the Power BI service with Premium capacity licensing, and can be integrated with interactive Power BI reports and Power BI data datasets.
To a limited degree, some operational reports can be reproduced using Power BI reports and SSRS can be used, some a limited degree, to create interactive reports.
Use the right tool for the job and consider the strengths and limitations of Power BI, SSRS/Paginated Reports and Excel to author and deliver reports for different audiences and use cases.
If Users Need Excel, Give them Excel
Financial users often misunderstand that they must adapt to visual dashboard reporting when they really need spreadsheet reports with reliable, current data. Power BI was borne from Excel and from SQL Server Analysis Services (traditionally “cubes”). Therefore, it can meet the needs of both self-service users and IT-managed enterprise BI solutions. In many ways, Excel are a perfect pair. If users need spreadsheets, don’t give them reports that look like spreadsheets. Give them connected spreadsheets; then train them to use Excel from a Power BI report and certified dataset.
Analyze in Excel is a gem that is often not promoted by IT-based report authors; typically, because report requirements are usually specified before users lean about this capability. It really is a best-of-both-worlds solution; because it keeps governed data in a published dataset which allows users to access it via a secure live connection from Excel.
When selecting Analyze in Excel from a published report or dataset in the service, Excel opens with a new connection (using an OCD file) directed at the published dataset. The Analyze in Excel ribbon (installed to Excel when downloaded) can also be used to create a new connection from Excel.
By default, a pivot table is generated in a new worksheet; but advanced Excel users can use cube functions to place live Power BI dataset data anywhere in a workbook.
Power BI and Excel–the Dynamic Duo
Rather than exporting data from Power BI (and creating off-line copies of data), promote the use of live, connected reporting with Excel.
Enable Analyze in Excel in selected reports and datasets; then train a limited audience of advanced users to promote its use. Establish best practices for each business use case. Generally information about this feature is available here: https://docs.microsoft.com/en-us/power-bi/service-analyze-in-excel
There are many good references and tutorials about data and report security that I won’t cover here but will summarize the recommended approaches. Data level security and report/content level security are separate concepts that should be coordinated.
There are multiple options for sharing report content with users. Users cannot view a dashboard or report unless they have been granted access to a workspace or have been invited to share a report.
If you have a Premium capacity Power BI tenant, you can package the content of a workspace [dashboard(s) and report(s)] as an app which can be shared with any users in your organization who do not need individual Power BI Pro licenses to read and view content. These users will not be able to edit or author additional reports.
Users with Pro licenses assigned through the Office 365 Portal can have a workspace or report shared with them, which they can (unless otherwise restricted) edit, save, copy or share with others.
Once a user can access a report, they will see the results of all data in the underlying dataset for that report. Users with report level access can read all dataset data unless role-based and/or row-level security has been implemented. Role/RLS security work essentially the same way in SSAS and with data models developed with Power BI Desktop.
Role-based Security is active when at least one role is defined for the dataset, filter expressions are added to the role and them user or group members are added to the role. Conditional filtering is performed for each table in the data model using a DAX expression.
RLS is an extension of role-based security. In addition to creating one or more roles with filter expressions and adding role members, filtering may be conditional based on the authenticated user. The typical approach is to populate and maintain a mapping table of users and key values for each table in the data model that should have restricted access. All related data will be filtered. For example, if a user can only see data for a particular sales territory, they should only see sales totals for the territory.
The user/mapping tables must be maintained either manually or through automation if the user/object permission information is available in an existing data source. Don’t over-engineer this. The user/mapping information can be managed in a spreadsheet and securely stored in SharePoint or OneDrive, or driven by a PowerApp or custom form. *This requirement is often missed in large projects.
Add users to Office or Azure Active Directory groups and then use groups to share report content. If you are using role-based or row-level security, use groups to manage role membership.
If using Premium, use workspace apps to share and distribute reports with read-only users and reduce Pro license requirements.
Create a strategy for managing RLS user/entity mapping tables. Either manage manually or automate if the data is available.
Certified & Shared Datasets
To promote reuse and data governance, datasets can be used in reports residing in different workspaces (including a personal workspace). Datasets can be endorsed to designate those that are certified and promoted by data owners and the organization. The Power BI service includes a platform and workflow to manage promoted and certified datasets. When enabled, a dataset author can enable a dataset to be marked for promotion and certification. Designated users can test and optionally certify specific datasets that are known to be reliable and conform to organizational
Dataset promotional and certification markings require the coordinated effort of the solution developer, Power BI tenant admin, data steward or business stakeholders. The actual process is simple; but the certification process should be defined and managed by solution business owners and service administrators.
The following image illustrates the
Information about this capability: https://docs.microsoft.com/en-us/power-bi/service-datasets-certify
Enterprise Scale Options
In many ways, Power BI has now surpassed the capabilities of SQL Server Analysis Services. Microsoft are investing in the enterprise capabilities of the Power BI platform, by enhancing Power BI Premium Capacity, adding Paginated Report and features to support massive scale specialized use cases. Consider the present and planned capabilities of the Power BI platform; before, choosing another data modeling tool such as SSAS.
Data Model Options for Power BI Solutions
Project Preparation Checklist
[_] Categorize the solution by identifying the author & user roles related to the project:
[_] Author role: Business Data Analyst
[_] Author role: Skilled Data Modeler, Analyst, Data Scientist
[_] Author role: IT BI Developer
[_] Users’ role: Report/Dashboard Consumer
[_] Users’ role: Self-service Report Author
[_] Users’ role: Advanced Data Analyst
[_] Develop & Document Support & training plan for users
[_] Identify the Solution Type for the project. This will guide other project management designs:
[_] Design single PBIX file for small geoup, departmental project authored by one developer for a limited group of users
[_] Design & deploy a separate dataset PBIX file – from report file(s) – when the dataset should be branded as a Certified dataset
[_] Design separate dataset and report PBIX fiels for formal projects with more than one dataset & report developer, to coordinate work
[_] Use SSAS/AAS as a data modeling option when those databases exist or where IT operations insist o management development and maintenance through integrated source control (e.g. Visual Studio Team Services & Azure DevOps)
[_] Identify the Project Type & related Soluton Architecture:
[_] Project type: Formal project
[_] Project type: Informal project
[_] Project type: Hybrid project
[_] Architectural approach: Single PBIX
[_] Architectural approach: Separate dataset and report PBIX
[_] Architectural approach: Report PBIX connected to SSAS or AAS
[_] Understand DirectQuery model trade-offs and special use cases. Avoid if possible.
[_] Create storage locations and folder structure for Development file management:
[_] Development file storage
[_] Team member collaboration environment & processes
[_] Folder synchronization
[_] Define File naming standards
[_] Decide on dataset and report names
[_] Define the Version Control & Lifecycle Management
[_] Postfix files with 3-part version number
[_] Remove ver number from published files in QA and PROD
[_] Create Version History table in Power Query
[_] Increment version numbers in data model
[_] Backup PBIT files for archive
[_] Create measures: Last Refresh Date/Time
[_] Create measure: Current Version
[_] Add data model info page to report
[_] Define your Release Management, DevOps & Automation strategy (if any – Might be OK to deploy files manually) (to automate or not to automate)
[_] Decide on Workspace and App Management, workspace & app name, etc.
[_] Create PROD workspace (omit PRD from name), assign dedicated capacity
[_] Create QA workspace (post-fix name with QA), assign dedicated capacity
[_] Create DEV workspace (post-fixo name with DEV), dedicated capacity not required
[_] Assign Pro licenses to all developers, admins and report author users (QA?)
[_] Assign membership and access to workspaces
[_] Create fact date range filter parameters: RangeStart & RangeEnd
[_] Filter large fact tables with range filters, consider incremental refresh policies if slow and/or over 800 MB compressed.
[_] Design source queries (T-SQL?) to reshape source data into conformed dimension & fact tables
[_] Create views in database for each dimension and fact
[_] Enforce key uniqueness to remove all duplicate keys from all dimension tables
[_] Query Date dim/lookup table at source if it exists
[_] If not available, generate Date dim/lookup table in Power Query
[_] Avoid bi-directional relationship wherever possible
[_] Include code annotations & comment blocks in: views, M queries,measures
[_] Hide all key columns and other columns not used directly by users
[_] Use parameters to filter and reduce dataset PBIX size to ~400 MB or less
[_] Create incremental refresh policies for dataset larger than ~500 MB
[_] Use Premium capacity workspaces for datasets approaching 900 MB
[_] Use Premium capacity workspace when you need to share reports with non-Pro licensed user
[_] For non-Pro licensed user, convert QA (and then PRD) workspace to app and distribute the app to users via AD group membership
[_] Create explicit measures for all aggregate-able column values
[_] Hide all measure base numeric columns
[_] Set all non-aggregate-able numeric column default summarization to “Do Not Summarize”
[_] Format add whole numbers with thousand separators
[_] Format all currency & decimal measures to defined standard (perhaps 2 decimal, thousand separator)
[_] Format all ratios and percentage with appropriate decimal positions
[_] Use the DIVIDE function for all division operations to avoid div by zero
[_] Name all measures and user-visible fields with mixed-case, short,
[_] friendly sentenc-like terms
[_] Assign measures to fact table where they logically belong – or to an empty measure group table.
[_] Measures that don’t have an obvious home (fact) table should be assigned a measure group table.
[_] Precede measure group table names with “_” so they sort to the top of the list. (display folders are not ready for use in Power BI Desktop)
As a team decide:
[_] When to increment the revision/fix number
[_] When to increment the minor version number
[_] Who owns the migration to QA process? How is it implemented?
[_] Who own migration from QA to PROD? How is it implemented?
[_] How is user/stakeholder sign-off obtained?
[_] How are data source requirements defined, presented & documented?
[_] Who owns & maintains the requirements log?
[_] How are measure & KPI calculation requirements documented & maintained?
[_] How are the report design, layout & visualization requirements documented & maintained?
[_] What is the organization & project styling & branding standard?
[_] Does the organization have a standard brand image? Where is it documented? How owns that standard & can answer questions?
[_] Should report be designed for mobile consumption (with mobile layouts)?
27 thoughts on “Power BI Project Good and Best Practices”
Paul, thanks for the comprehensive guide. This article https://skyvia.com/blog/etl-in-power-bi#skyvia with Power BI ETL best practices talks about the various methods for extracting, transforming, and loading data into storage using Power BI ETL. I think it complements your guide. The article also considers five easy ways to do ETL with Power BI Dataflows, which will take your ETL process to the next level.
Thanks Paul for such wonderful article ,really helped & learned a lot
Excellent article Paul, it is superb guide for a base standards document. Thank you so much,
“Files containing imported data typically range in size from 100 to 600 MB. Any shared folder synchronization or disaster recovery system should be designed to effectively handle multiple files of this size.” I think that in this case programs like Deltacopy (use of rsync) is more effective, because of Byte Level Replication (Delta Change in a File ). Could Power BI implement this feature natively.
This is great stuff, thank you for doing this!
Thanks for compiling this article Paul!
It is fabulous!
Fantastic article, thanks for compiling this!
A question regarding self-service reporting.
I think your idea with having a separate dataset PBIX and a report PBIX makes sense. I have a case where clients want to create their own report from a prepared dataset and there are 2 options for this. Either build the report from the service or from the desktop (connecting to the datasource in the service). I am reluctant to suggest that they create the report in the service as there is nothing preventing other users from deleting or altering the reports that other users have created in the service and there is no backup of reports either. You can now download the PBIX from the service but re-uploading it that way will create duplicate datasets in the service which is not ideal. Having a local PBIX file that multiple users can edit is also risky (what if 2 people decide to edit the file at the same time, then whoever saves last will overwrite the other users changes). What’s your thought on this?
Great article. FYI Free Users can now access premium capacity workspaces via the viewer role. https://powerbi.microsoft.com/en-us/blog/announcing-the-new-viewer-role-for-power-bi-workspaces/
I’d like to comment that Tabular data models are superior, not inferior, and are more widely used and effective, and will be supported and advanced by MS for many years to come, and will continue to be the more dominate modeling platform -at the enterprise level-, especially because of SSIS and SSAS, than a PowerBI data model. From the reporting side, PowerBI still has a mountain to climb to reach the advanced abilities and wide adoption of Tableau, and incredibly lags behind excel in some charting aspects. PowerBI only fairly recently became an OnPremise product, and has not had the adoption it could have at the enterprise level for not having a superior offering to existing platforms. That all said, where PowerBi can be used and makes sense, it continues to gain ground, albeit slowly but confidently. I don’t see it advancing sufficiently enough to overcome the competition of existing MS and NonMS platforms at the enterprise level for years to come.
I agree 100% with Parker.
In my experience, Tabular models are not slower than import/in-memory data. I’m not sure you could say they’re faster, but they certainly aren’t slower. With appropriate solution architecture there should be no performance issues with a Tabular data model.
Hi, I would also add part about best practices of row level security, since it is inevitable in large scale projects and input/parameters/master data tables. Should they be maintained as an Excel Table on Sharepoint (or create custom application in powerapps).
Excellent idea. I’ve added an Implementing Security section.
You’ve done a beautiful thing – thank you!
Great Article! One question surrounding the Revision Date/Time…it will update to the most recent time the dataset has been refreshed/accessed no?
Thank you. I confused this with the Last Refresh Date/Time column. Good catch. I’ve revised.
What do you mean ” Avoid using in-line SQL statements in all queries in Power Query.” You mean don’t put a select query in the “Source” step?
Thank you very much for the effort and time you put into this compilation.
Great tips! Won’t adding a custom column with date cause the dates to be re-calculated anytime the query is refreshed?
Excellent article and wait for the coming one!
One thing I dont like when using Live Connection with SSAS and DirectQuery is that everytime you lose connection (in my case when sales pple are on the field, vpn sometime is needed sometime etc.) you wont have access to the reports. That’s pretty annoying especially when you put a normal report (not in realtime) in production.
Below I would like to share my exp and it would be great if I can have some feedbacks/advices :
I’m using different dataset that are built by using pbi dataflow.
I build it mostly by copying M queries generated in Power BI Desktop. So far, the performance and governance are doing well.
1/ when a dataflow fails to refresh, you can track down the reason why by downloading the refresh history file, pbi team does a very good job of showing the error.
2/ depends where resides the data source, the refresh happens really fast compare to pbi data set.
Cons: with pbi pro, I dont have the capability to do incremental refresh, hoping it will be released soon.
2/ For Analyze in Excel:
I have been struggled with the Financial pivot table request too.
What we did is displaying the financial measures in a separate folder, it does help them a lot.
1/Measures Sales Invoice:
1-1/ Common (very intuitive measures, ease to use)
1-1-1/ Actual net Sales
1-2/ Specific (need a specific training to use these measures)
1-2-1/ Actual net Sales
P/s: I love the versioning and pbi life cycle management, thank you for sharing!
Excellent resource – like mentioned above, I’ve been meaning to write one up!
Recommendation: clean up the wordage and post as a Medium article.
What a great document to start up !
Excellent article, all the things we’ve had to discover the hard way! Nice to see we came to the same conclusions as best practice.
Great tips ! Thank you.