Doing Power BI the Right Way: 6. Validating data model results – Part 1

Part of the the series: Doing Power BI the Right Way (link)

Moving important business data into a data model for analytic reporting can often be a two-edge sword. Data retrieval is fast and can support all kinds of analytic trending and comparisons. But, data in the model may be one or two layers away from the original source data, making it more challenging to compare with familiar user reports. Often the first validation effort after transforming and loading data into the model and then visualizing the initial results is having a business user say “yeah, that looks about right.” Then, sometime later after more development and extensive user testing, the feedback might be “hmm, that seems a bit off.” …not exactly scientific.

I have been doing a lot of data validation work lately – both formally and informally. Informally: Validating calculation results from a Power BI data model is just a normal part of the development process. Formally: After developing and delivering an entire business dashboard solution, a formal validation process is used to validate the ongoing results after future data refresh cycles and to certify reports so that business leaders know they can trust them.

Particularly for database developers, making the leap to using reports built on a tabular data model instead of the source database is a mental transition. The developer has to stop thinking in SQL and start thinking in model mechanics. This can pose a challenge when needing to validate numbers in a report, which should be done using the data model and not only by running by a SQL query.

Validation 101: Create & Use COUNT Measures

When designing a new data model, this is typically the first thing I do… For every fact table and for each large dimension table, I create a measure that returns the record count for that table. Users normally think about their data in business terms (like sums, ratios and averages) and not about how many records there are in a table. Record counts are a convenient sanity check for record completeness and distribution across groups; and may also be a good indicator for model size and performance.

Let’s say I have a data model containing the tables: Sales, Orders, Customers, Products and Date. At minimum, I will create the following measures with code similar to this:

Sales Row Count = COUNTROWS( ‘Sales’ )

Orders Row Count = COUNTROWS( ‘Orders’ )

Customer Count = COUNTROWS( ‘Customers’ )

These measures allow you to quickly troubleshoot problems that might otherwise go unnoticed – with model relationships, key matching, orphaned records and missing records. Just drag-and-drop the measure and a field to group the values. For example, if I drag the Sales Row Count measure and Year from the Date table onto the report canvas, I can immediately see sales records across years. If I do the same thing with the customer region field, I might be able to spot a gap in the region distribution of records over time. If every column in a column chart (created by default when dragging a numeric field) shows 100%, then I know that I either forget to create a relationship, have a mismatched data type or forgot the “Set as Date Table” setting for the Date table. Easy peasy.

Can’t I just count any field value rather that creating explicit measures? – Yes but don’t. Coming from an SSAS multidimensional background, I learned to define measures for every useful numeric metric. If we needed to show the SUM or the COUNT of anything, we needed to define a measure for that. In Power BI and SSAS Tabular, you can simply use implicit measures – meaning you can just drag-and-drop any field and then indicate that you want to count each occurrence or sum up the numeric value of the field. For quick-and-dirty Power BI reports, this is a convenient way to get simple aggregate results but not a best practice in my opinion because of inconsistent behavior and other limitations. There is a growing consensus in the Power BI community that using explicit measures should be the norm and not the exception. If you always use explicit measures rather than aggregating numeric columns, users and report developers will always have a consistent experience.

Validation 201: Measure Dependency Chains

This is a fairly normal pattern:

Create a simple base measure using a simple expression, let’s say:
[Employee Count]
Create additional measures that reference and then perform some logic on the first measure. For example:
[Employee Count Females] and [Employee Count Males]
Layer another new measure on the second layer of measures. Like:
[Female to Male Employees Ratio]
Create another measure to apply special business logic, let’s say to conditionally apply row-level security and non-visual totals:
[Female to Male Employees Ratio (RLS Adjusted)]
..and so on

This is a simple example but it is not unusual. Reusing measures helps to separate reusable calculation logic but it can also create deep dependency chains. At some point, you will inevitably discover a calculation error that you will need to track-down and debug. I can’t remember my own name some mornings so I certainly cannot work through the mental gymnastics needed to keep track of every dependent measure.

When I have a visual that is not displaying the expected results, I usually do the following:

Duplicate the report page and prefix the page name with TEST: (something meaningful)
Delete visuals not needed for testing
Convert the visual to a table so I can just see the numbers grouped by a field
Drag the referenced measures into the table to create multiple columns. Validate each dependent measure.

This helps to break the problem down into manageable steps and components. If you know that the first measure is always behaving as expected, move on to the next measure that depends on it. If the problem isn’t immediately apparent, recreate the same conditions with slicers. By isolating the problem, I can find and fix the issue every time.

How to Track Object Dependencies

Keeping track of dependencies between layers of measures can be challenging. I keep a scratchpad on my desk to make lists of things I need to remember. Sometimes I don’t even save my notes but it just helps me organize my thoughts. I use a whiteboard for a similar purpose, diagramming dependencies between measures and other objects with boxes and lines. Just standing and diagramming on the whiteboard is a very effective problem-solving technique for me. Electronic tools like OneNote and Microsoft Whiteboard can be just as effective if you are in the habit of using them.

The Object Dependencies feature in Tabular Editor is pure magic and a tool I am using more frequently to test, debug and to document my data models.

Tabular Editor Object Dependencies dialog

Reusing Measures

On a different but related topic… Create separate measures for calculations that will be used in visuals and reused by other measures but having too many measures can be difficult to manage. Strike a balance between code reuse and a more succinct, easier-to-manage portfolio of measures. If you have to jump through seven hoops (e.g. trace multiple dependent layers of measures) every time you need to debug or validate a calculated value, you probably have too many dependent measures.

Organize measures but don’t over-organize them… I prefer to put all my measures in a central table or at least assign them to just a few subject area fact tables. Don’t leave measures in different tables, scattered throughput the data model. Although I’m a fan of using display folders, it can be more work to assign measures to several different folders than it is just to use meaningful naming conventions. The implementation of display folders in Power BI Desktop in my opinion is an unfinished feature. If you want to use display folders to organize measures within container tables, use Tabular Editor. Name measures so they’re predictable and easy to find in a sorted list. For example, if you will have a series of measures based on “Sales Amount“, don’t name the variations “MTD Sales” or “Ratio of Sales to Prior Year“. Instead name them with the same prefix, like “Sales Amount“, “Sales Amount MTD“, “Sales Amount YTD“, “Sales Amount Ratio to Prior Year” and “Sales Amount Region % of All Regions“. This way, they all get sorted and grouped together in the field list. The theme here should be finding the right balance to organize your objects while keeping things simple.

Validation 301: Creating a Validation Report Solution, Part 1

This is a big topic and I have a learned some great techniques which will be covered in a follow-up post. So, let’s call this teaser.

How do users and business leaders know with certainty that they can trust the information in a Power BI report? There are a variety of factors that can affect data reliability. If the data source is known to be correct, there could be issues in the extraction, staging, loading, transformation or the calculations used to present the information in an analytical report. Testing the report with data at a point in time is a good place to start but it doesn’t absolutely guarantee accuracy down the road. If we had a report with a simple set of aggregate values read directly from the business source of truth alongside calculated values from the data model after it has been refreshed, this can provide a litmus test to check and instill congoing confidence. When (I said “when” not “if”) a user tells us that report information isn’t right, we can confidentially and easily prove that analytic report information from the data model is consistent with the source. This is what a validation report is all about.

To set the stage for a longer post on this topic, think about this typical scenario:
Every month we load fresh manufacturing production volume and forecast records into our data mart or data lake, and then into the Power BI data model. The results of the data model can be used as a data source for the validation report. Using the XMLA endpoint, you can run a DAX query against the published data model to get the values shown in a report visuals complete with filters and slicers. For comparison, you can load data from a report produced by the business system, load those values into a the validation data model and compare the monthly totals. We can ne flag the data that does and doesn’t match, and calculate the differences.

How Lakehouse Architecture is Revolutionizing Business Intelligence

The Lakehouse is the evolution of the earlier cloud data platform in many pieces that came with “some assembly required”. All of the components are modern, mature and capable but complicated and require specialized skills. Imagine that the new version is easier to assemble with instructions that are only a few pages with stick figures, and it comes with an Allen wrench.

We’re seeing consulting customers putting Lakehouse and BI solutions on-line in just a few weeks. Then they iterate to scale-up their modern data warehouse/BI platform as they train their Center of Excellence champions and Data Governance organizations as they progress.

DevOps & CI/CD for Power BI: Real-World Example

In a recent blog post, Paul discussed that DevOps, and specifically CI/CD, principles are essential for modern software development, and Power BI is no exception. Power BI CI/CD for enterprise class projects can be achieved using Azure DevOps, with steps including source control, automated testing, and build and release pipelines.

DevOps & CI/CD for Power BI

DevOps isn’t difficult to implement for small and medium-scale projects, and simple things like managing version control in a code repository can save hours of lost time. Organization who are accustomed to managing large application development initiatives might expect to have a fully automated build and deployment process in concert with an Agile delivery process, managed with specialized tools like Jira, GitHub and Azure DevOps.

17 thoughts on “Doing Power BI the Right Way: 6. Validating data model results – Part 1”

Ensuring data accuracy in analytic reporting involves meticulous validation processes, from creating count measures for sanity checks to tracking measure dependencies for debugging. It’s a complex yet essential aspect of maintaining trust in reports and making informed business decisions.

I appreciate your blog post about data validation and the valuable solution you provided. It has been incredibly helpful for my current project, and I am grateful for the insights you have shared. Thank you for taking the time to write this informative and insightful article.

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project.
https://www.technobridge.in/clinical-research-course.html

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project
https://www.technobridge.in/clinical-research-course.html

Thank you for posting a informative and valuable post for us.
please keep posting a good content
https://marketresearchcommunity.com/digital-therapeutics-market/

Thank you for your informative and valuable content it helps me to gain my knowledge
https://marketresearchcommunity.com/non-dairy-yoghurt-market/

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project
https://marketresearchcommunity.com/betaine-market/

Hey thanx for sharing this blog over here. It seems useful to start career in clinical research. We will look forward for more updates.
https://www.technobridge.in/clinical-research-course.html

Hi All,
From time to time I write some blog posts about how to do get insights on PBI data models.

Especially which tools are at your disposal.

https://audministrator.wordpress.com/2020/10/14/power-bi-get-cube-vertipaq-metrix-in-powershell/

Hope this is helpful,

Pingback: Validating Data Model Results in Power BI – Curated SQL

Excellent article, thank you.

Could you write an article about how you document and communicate your data models? To whom do you communicate? How do you keep a model up-to-date if you have multiple team members developing it? Etc,

I’ve been looking for this answer to validating for a long time. Great post!

Very informative article. Thanks for sharing with us.

Great stuff Paul! I am very much looking forward to the next article on validation report.

I find good insight please continue

Very good article on data validation. I find that testing all the way through creating a data model is a good approach as well. Sort of like test driven development.

Pingback: Doing Power BI the Right Way | Paul Turley's SQL Server BI Blog

Dental Advisors says:

February 2, 2024 at 5:31 am

Ensuring data accuracy in analytic reporting involves meticulous validation processes, from creating count measures for sanity checks to tracking measure dependencies for debugging. It’s a complex yet essential aspect of maintaining trust in reports and making informed business decisions.

Loading...

Aditi Rathore says:

March 9, 2023 at 12:02 am

I appreciate your blog post about data validation and the valuable solution you provided. It has been incredibly helpful for my current project, and I am grateful for the insights you have shared. Thank you for taking the time to write this informative and insightful article.

Loading...

Kirti Sharma says:

January 19, 2023 at 9:59 pm

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project.
https://www.technobridge.in/clinical-research-course.html

Loading...

Kirti Sharma says:

January 19, 2023 at 9:56 pm

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project
https://www.technobridge.in/clinical-research-course.html

Loading...

hareshreport says:

September 27, 2022 at 6:49 am

Thank you for posting a informative and valuable post for us.
please keep posting a good content
https://marketresearchcommunity.com/digital-therapeutics-market/

Loading...

ankeetahatwar says:

June 27, 2022 at 11:42 pm

Thank you for your informative and valuable content it helps me to gain my knowledge
https://marketresearchcommunity.com/non-dairy-yoghurt-market/

Loading...

mohitmrc says:

June 27, 2022 at 11:18 pm

Thank you for write this blog about the data validation. Also thank you for mentioning the solution it helps us in my current project
https://marketresearchcommunity.com/betaine-market/

Loading...

raveenaaher says:

June 29, 2021 at 3:15 am

Hey thanx for sharing this blog over here. It seems useful to start career in clinical research. We will look forward for more updates.
https://www.technobridge.in/clinical-research-course.html

Loading...

audministrator says:

October 14, 2020 at 8:12 am

Hi All,
From time to time I write some blog posts about how to do get insights on PBI data models.

Especially which tools are at your disposal.

https://audministrator.wordpress.com/2020/10/14/power-bi-get-cube-vertipaq-metrix-in-powershell/

Hope this is helpful,

Loading...

Pingback: Validating Data Model Results in Power BI – Curated SQL
Marcus says:

October 11, 2020 at 11:14 pm

Excellent article, thank you.

Could you write an article about how you document and communicate your data models? To whom do you communicate? How do you keep a model up-to-date if you have multiple team members developing it? Etc,

Loading...

Jeff says:

October 11, 2020 at 1:31 pm

I’ve been looking for this answer to validating for a long time. Great post!

Loading...

Haritha says:

October 11, 2020 at 5:34 am

Very informative article. Thanks for sharing with us.

Loading...

Fredrik Vestin says:

October 11, 2020 at 5:08 am

Great stuff Paul! I am very much looking forward to the next article on validation report.

Loading...

Gokul says:

October 11, 2020 at 3:27 am

I find good insight please continue

Loading...

Brian Custer says:

October 10, 2020 at 10:59 am

Very good article on data validation. I find that testing all the way through creating a data model is a good approach as well. Sort of like test driven development.

Loading...

Pingback: Doing Power BI the Right Way | Paul Turley's SQL Server BI Blog

Doing Power BI the Right Way: 6. Validating data model results – Part 1

Validation 101: Create & Use COUNT Measures

Validation 201: Measure Dependency Chains

How to Track Object Dependencies

Reusing Measures

Validation 301: Creating a Validation Report Solution, Part 1

Like this:

Paul Turley

17 thoughts on “Doing Power BI the Right Way: 6. Validating data model results – Part 1”

Leave a ReplyCancel reply

Validation 101: Create & Use COUNT Measures

Validation 201: Measure Dependency Chains

How to Track Object Dependencies

Reusing Measures

Validation 301: Creating a Validation Report Solution, Part 1

Share this:

Like this:

Paul Turley

Related Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

17 thoughts on “Doing Power BI the Right Way: 6. Validating data model results – Part 1”

Leave a ReplyCancel reply

Discover more from Paul Turley's SQL Server BI Blog