Doing Power BI the Right Way: 7. Validating data model results – Part 2

Part of the the series: Doing Power BI the Right Way

Part 1 is here

Validation 301: Creating a Validation Report Solution, Part 2

We often have users of a business intelligence solution tell us that they have found a discrepancy between the numbers in a Power BI report and a report produced by their line-of-business (LOB) system, which they believe to be the correct information.

Using the LOB reports as a data source for Power BI is usually not ideal because at best, we would only reproduce the same results in a different report. We typically connect to raw data sources and transform that detail data, along with other data sources with historical information to analyze trends, comparisons and ratios to produce more insightful reports.

However, if the LOB reports are really the north star for data validation, these can provide an effective means to certify that a BI semantic model and analytic reports are correct and reliable.

This is a pattern that I have applied for consulting customers. My current client refers to the main Power BI report and dataset as their “dashboard” solution. The data model is large and complex with sophisticated reports used monthly by hundreds of users. The dataset is setup with a scheduled refresh. The “dashboard” solution has been thoroughly tested and we are careful to only apply updates and enhancements infrequently under a rigorous application lifecycle management process.

At the end of each month, a user exports their LOB report to an Excel file and drops it into a SharePoint library. I have one consulting client who use an automation tool to essentially open the LOB application, punch in the right keystrokes and export the report to a file at the end of every month.

The format of these exported reports have proven to be fragile and we don’t want to risk destabilizing the main Power BI “dashboard” solution. So, rather than importing the validation files directly into the main dataset, I have created a separate Power BI project with a dataset & report that has two sources. The first data source for the validation report connects to the main “dashboard” Power BI dataset using the Analysis Services connector and the XMLA endpoint. A DAX query is issued to return values to compare with the LOB report totals. The second data source for the validation report imports the validation files that were exported from the LOB report. If that import process were to fail for any reason, it doesn’t affect the main dashboard solution.

This diagram depicts the validation solution at a high level. The main Power BI “dashboard” solution get its data from the corporate EDW. That data is processed through rigorous ETL pipelines and contains data extracted from multiple systems used by the corporation.

A lightweight DAX query imports summarized results from the main dataset into the validation dataset.

The second data source for the validation solution is a set of files exported from reports produced by the LOB system. These reports only contain summary information but can be compared to totals from our Power BI solution. Values from the two sources are compared and the validation report shows whether they match or fail validation.

Just a note about the decision to create a separate dataset. There were two reasons for this:
1. We could not control the format of the validation files. If the file format were to change, it would cause an error and stall our scheduled refresh cycle for a large mission-critical report used by several hundred high-level employees (whose job is it to tell the CEO why his dashboard doesn’t work?)
2. Today we do not have the ability to use a live connection to a Power BI dataset in a composite model along with other sources, but we will in the near future. This capability doesn’t address problem #1 but will certainly change the way I think about architecting large, multi-source models in the future.

Writing the DAX query to read the summary values from the main Power BI “dashboard” dataset was a great learning experience. Although I am now pretty good at writing DAX queries by hand after a couple of these projects, the easier way to get started is to allow Power BI Desktop to generate the query for you. Here are the steps using the ContosoDW dataset as an example:

  1. Add a table visual to the report canvas. Add the same field and measure used in the LOB report to be used for comparison.
  2. Turn off totals for the table visual.
  3. Add and set slicers or filters to equate the LOB report results.
  4. On the View menu in Power BI Desktop enable the Performance Analyzer.
  5. In the Performance Analyzer pane display to the right side of the report canvas, click the Refresh visuals link.
  6. Click the plus sign next to the table visuals to expand the performance counters for that visual.
  7. Click the Copy query link.

The DAX query is now in the Windows clipboard and can be used to run the same query that the designer produces to populate the table visual. To test and modify the query, do the following.

  1. Make sure you have the latest version of DAX Studio installed. If not, install it from daxstudio.org. Save your work, close and reopen Power BI Desktop after installing DAX Studio.
  2. From the External Tools menu in Power BI Desktop, click DAX Studio. This opens DAX Studio with a query session connected to the local Power BI dataset.
  3. Paste the clipboard contents to the query window and run the query.

Here’s the query that was generated by Power BI:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(501, __DS0Core, 'Product'[Category Name], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  'Product'[Category Name]

This code can be simplified for our needs. The essential code is the expression using the SUMMARIZECOLUMNS function. You can disregard everything after that. Make the following changes to simplify the DAX query code:

  1. Delete everything after the closing parenthesis for the SUMMARIZECOLUMNS function.
  2. Remove the variable line just before the SUMMARIZECOLUMNS function and replace it with “EVALUATE”

The resulting query code should look like this:

DEFINE
  VAR __DS0FilterTable = 
    TREATAS({2009}, 'Order Date'[Year])

  VAR __DS0FilterTable2 = 
    TREATAS({"October"}, 'Order Date'[Month Name])
EVALUATE
    SUMMARIZECOLUMNS(
      'Product'[Category Name],
      __DS0FilterTable,
      __DS0FilterTable2,
      "Online_Sales_Qty", '_Measures'[Online Sales Qty]
    )

This is the query I would use to populate a new table in my validation dataset. Just use Get Data to open the Power Query editor, create a new SQL Server Analysis Services connection, paste the workspace path for a premium enabled workspace and connect to the published dataset for my main Power BI solution. Then finally, paste this query code into the query window.

Next step is to import the validation file(s) that were exported from the LOB report. In our validation solution, I used the SharePoint connector and imported all files with a predefined file name prefix. After the validation files are imported, make sure that the column data types are compatible, load the table and wire-up a relationship between the two tables. In our last validation solution, I created a new measure that compares a measure from the summary table imported from the main “dashboard” solution dataset with a comparable measure from the validation file source table. Measures calculated the difference and the % difference between the two. I then used a KPI indicator icon in the table visual using conditional formatting to show if the differences were within an acceptable range.

This screen capture from the validation page has the real data blurred out but you can see the grouping and comparisons resulting in either green or red indicators.

Each month, I receive an email from our user who is responsible for the report validation. He tells me that at a quick glance he is able to verify that the totals from their line-of-business application and Power BI solution match, and that the executive leaders and financial analysts who use these reports can be certain that they are accurate and reliable. When an item doesn’t match, they can pinpoint the account in question and usually are able to track it down to a data refresh timing issue or a mistake made in their source system that hasn’t yet been reconciled.

We are now setting up alerts using KPI tiles on a dashboard that will message key users when validation either succeeds or fails in key areas. As a result, the confidence in this solution has increased significantly and business leaders trust the information they get from these reports.

Doing Power BI the Right Way: 7. Validating data model results – Part 1

Part of the the series: Doing Power BI the Right Way

Moving important business data into a data model for analytic reporting can often be a two-edge sword. Data retrieval is fast and can support all kinds of analytic trending and comparisons. But, data in the model may be one or two layers away from the original source data, making it more challenging to compare with familiar user reports. Often the first validation effort after transforming and loading data into the model and then visualizing the initial results is having a business user say “yeah, that looks about right.” Then, sometime later after more development and extensive user testing, the feedback might be “hmm, that seems a bit off.” …not exactly scientific.

I have been doing a lot of data validation work lately – both formally and informally. Informally: Validating calculation results from a Power BI data model is just a normal part of the development process. Formally: After developing and delivering an entire business dashboard solution, a formal validation process is used to validate the ongoing results after future data refresh cycles and to certify reports so that business leaders know they can trust them.

Particularly for database developers, making the leap to using reports built on a tabular data model instead of the source database is a mental transition. The developer has to stop thinking in SQL and start thinking in model mechanics. This can pose a challenge when needing to validate numbers in a report, which should be done using the data model and not only by running by a SQL query.

Validation 101: Create & Use COUNT Measures

When designing a new data model, this is typically the first thing I do… For every fact table and for each large dimension table, I create a measure that returns the record count for that table. Users normally think about their data in business terms (like sums, ratios and averages) and not about how many records there are in a table. Record counts are a convenient sanity check for record completeness and distribution across groups; and may also be a good indicator for model size and performance.

Let’s say I have a data model containing the tables: Sales, Orders, Customers, Products and Date. At minimum, I will create the following measures with code similar to this:

Sales Row Count = COUNTROWS( ‘Sales’ )

Orders Row Count = COUNTROWS( ‘Orders’ )

Customer Count = COUNTROWS( ‘Customers’ )

These measures allow you to quickly troubleshoot problems that might otherwise go unnoticed – with model relationships, key matching, orphaned records and missing records. Just drag-and-drop the measure and a field to group the values. For example, if I drag the Sales Row Count measure and Year from the Date table onto the report canvas, I can immediately see sales records across years. If I do the same thing with the customer region field, I might be able to spot a gap in the region distribution of records over time. If every column in a column chart (created by default when dragging a numeric field) shows 100%, then I know that I either forget to create a relationship, have a mismatched data type or forgot the “Set as Date Table” setting for the Date table. Easy peasy.

Can’t I just count any field value rather that creating explicit measures? – Yes but don’t. Coming from an SSAS multidimensional background, I learned to define measures for every useful numeric metric. If we needed to show the SUM or the COUNT of anything, we needed to define a measure for that. In Power BI and SSAS Tabular, you can simply use implicit measures – meaning you can just drag-and-drop any field and then indicate that you want to count each occurrence or sum up the numeric value of the field. For quick-and-dirty Power BI reports, this is a convenient way to get simple aggregate results but not a best practice in my opinion because of inconsistent behavior and other limitations. There is a growing consensus in the Power BI community that using explicit measures should be the norm and not the exception. If you always use explicit measures rather than aggregating numeric columns, users and report developers will always have a consistent experience.

Validation 201: Measure Dependency Chains

This is a fairly normal pattern:

  1. Create a simple base measure using a simple expression, let’s say:
    [Employee Count]
  2. Create additional measures that reference and then perform some logic on the first measure. For example:
    [Employee Count Females] and [Employee Count Males]
  3. Layer another new measure on the second layer of measures. Like:
    [Female to Male Employees Ratio]
  4. Create another measure to apply special business logic, let’s say to conditionally apply row-level security and non-visual totals:
    [Female to Male Employees Ratio (RLS Adjusted)]
  5. ..and so on

This is a simple example but it is not unusual. Reusing measures helps to separate reusable calculation logic but it can also create deep dependency chains. At some point, you will inevitably discover a calculation error that you will need to track-down and debug. I can’t remember my own name some mornings so I certainly cannot work through the mental gymnastics needed to keep track of every dependent measure.

When I have a visual that is not displaying the expected results, I usually do the following:

  1. Duplicate the report page and prefix the page name with TEST: (something meaningful)
  2. Delete visuals not needed for testing
  3. Convert the visual to a table so I can just see the numbers grouped by a field
  4. Drag the referenced measures into the table to create multiple columns. Validate each dependent measure.

This helps to break the problem down into manageable steps and components. If you know that the first measure is always behaving as expected, move on to the next measure that depends on it. If the problem isn’t immediately apparent, recreate the same conditions with slicers. By isolating the problem, I can find and fix the issue every time.

How to Track Object Dependencies

Keeping track of dependencies between layers of measures can be challenging. I keep a scratchpad on my desk to make lists of things I need to remember. Sometimes I don’t even save my notes but it just helps me organize my thoughts. I use a whiteboard for a similar purpose, diagramming dependencies between measures and other objects with boxes and lines. Just standing and diagramming on the whiteboard is a very effective problem-solving technique for me. Electronic tools like OneNote and Microsoft Whiteboard can be just as effective if you are in the habit of using them.

The Object Dependencies feature in Tabular Editor is pure magic and a tool I am using more frequently to test, debug and to document my data models.

Tabular Editor Object Dependencies dialog

Reusing Measures

On a different but related topic… Create separate measures for calculations that will be used in visuals and reused by other measures but having too many measures can be difficult to manage. Strike a balance between code reuse and a more succinct, easier-to-manage portfolio of measures. If you have to jump through seven hoops (e.g. trace multiple dependent layers of measures) every time you need to debug or validate a calculated value, you probably have too many dependent measures.

Organize measures but don’t over-organize them… I prefer to put all my measures in a central table or at least assign them to just a few subject area fact tables. Don’t leave measures in different tables, scattered throughput the data model. Although I’m a fan of using display folders, it can be more work to assign measures to several different folders than it is just to use meaningful naming conventions. The implementation of display folders in Power BI Desktop in my opinion is an unfinished feature. If you want to use display folders to organize measures within container tables, use Tabular Editor. Name measures so they’re predictable and easy to find in a sorted list. For example, if you will have a series of measures based on “Sales Amount“, don’t name the variations “MTD Sales” or “Ratio of Sales to Prior Year“. Instead name them with the same prefix, like “Sales Amount“, “Sales Amount MTD“, “Sales Amount YTD“, “Sales Amount Ratio to Prior Year” and “Sales Amount Region % of All Regions“. This way, they all get sorted and grouped together in the field list. The theme here should be finding the right balance to organize your objects while keeping things simple.

Validation 301: Creating a Validation Report Solution, Part 1

This is a big topic and I have a learned some great techniques which will be covered in a follow-up post. So, let’s call this teaser.

How do users and business leaders know with certainty that they can trust the information in a Power BI report? There are a variety of factors that can affect data reliability. If the data source is known to be correct, there could be issues in the extraction, staging, loading, transformation or the calculations used to present the information in an analytical report. Testing the report with data at a point in time is a good place to start but it doesn’t absolutely guarantee accuracy down the road. If we had a report with a simple set of aggregate values read directly from the business source of truth alongside calculated values from the data model after it has been refreshed, this can provide a litmus test to check and instill congoing confidence. When (I said “when” not “if”) a user tells us that report information isn’t right, we can confidentially and easily prove that analytic report information from the data model is consistent with the source. This is what a validation report is all about.

To set the stage for a longer post on this topic, think about this typical scenario:
Every month we load fresh manufacturing production volume and forecast records into our data mart or data lake, and then into the Power BI data model. The results of the data model can be used as a data source for the validation report. Using the XMLA endpoint, you can run a DAX query against the published data model to get the values shown in a report visuals complete with filters and slicers. For comparison, you can load data from a report produced by the business system, load those values into a the validation data model and compare the monthly totals. We can ne flag the data that does and doesn’t match, and calculate the differences.

More on this later.

Creating a Validation Report Solution – Part 2

Doing Power BI the Right Way

This is an introduction to a series of posts and pages that will provide a comprehensive set of best practices for successful Power BI solutions. In previous posts, I have asked readers to suggest topics for future posts. Based on that and other feedback, I will be addressing questions and suggested topics. The topics list at the end of this post is a brainstorm list and I ask that you help me make sure it is complete. My goal is to provide a set of guidelines and practices that provide the best chance of success as you navigate many decisions about how to stage and transform source data, how and where to perform data shaping and calculations, how to model data and the best way to visualize the results. The biggest question of all may be how to make the right decisions so that the small project you design today will work when you add more data, more users and transition into a formal, managed solution.

There are many fantastic resources to learn about Power BI and rest of the Microsoft BI and reporting platform; but learning about Power BI and the choosing among design options can be like drinking from multiple firehoses at full pressure at the same time. I will be the first to admit that my “best practices” are “my opinions”. In many cases they work consistently for me but points are debatable and open for discussion. I’ll tell you when I have a very strong opinion about something being done a certain way, or when I have found a pattern that works for me and that I offer for your consideration. I’m not always right… just ask my wife :-). Please comment, ask and offer alternative points of view.

Rather than offering another training course or duplicating the contributions that so many others in the industry make through their blogs, courses, books and articles; this will be a condensed set of guidelines about the many choices you must make when designing a solution. In the posts to follow, I will reference other resources and discussions on various topics.

Just tell me what to do

Any attempt to apply universal best practices to Power BI solution design is a slippery slope. The tools are so flexible and powerful, and the requirements of each project are so varied that it is challenging to establish a set of steps or rules that, if followed, will always yield the absolute best design for a given scenario. With that out of the way, I’ll say this: In my job, I see a lot of poorly-designed Power BI projects. I’ve worked with dozens or scores (maybe even hundreds?) of consulting clients who bring us projects – some partially completed, some finished, and many that are just broken – to be fixed or completed. My reactions range from “that’s just downright wrong” to “hmmm… I wouldn’t have done it that way but I guess it will work for the time being”. I try not to cast stones and do, on occasion, realize that others have found a better way to solve a problem. I don’t have all the answers but I do have a lot of experience with Microsoft Business Intelligent solution design, and have learned many good practices and design patterns from other community leaders and many successful projects over the past twenty or so years.

A little less conversation and a little more action

Let’s start with a simplified flowchart and condensed decision tree. This first whiteboard drawing is the first half of the Power BI design process, ending with the data model, before measures, visualization and solution delivery. There is a lot more but I think this is a good starting point. Let’s start the conversation here and then I will enhance this post with a more complete list of topics.

Best practice guidelines topics

The following topic list will serve as a link menu for future posts. Expect this list to be updated and completed:

  1. Futureproofing Power BI solutions
  2. Preparing, shaping & transforming source data
  3. Choosing the best option to shape & transform data
  4. Power Query design best practices
  5. Power Query in dataflows or Power BI Desktop
  6. Data modeling essentials and best practices in Power BI and AS tabular
  7. Validating data model results
  8. Planning for separation – data models and reports
  9. Power BI delivery options
  10. Choosing the right report type: analytic or paginated
  11. Designing and managing large datasets in Power BI
  12. Multi-developer and lifecycle management for Power BI
  13. Certified reports, certified datasets & the self-service mindset

How to Assign Pro Licenses to a Power BI Tenant

This is a question that comes up all the time.  Power BI licensing is not complicated but a common challenge is that the person who sets up a new Power BI subscription and tenant within an organization is often not the same person who manages Office 365 or Azure service licensing for the organization.  I’ve consulted on projects for several organizations where folks just didn’t know who to talk to or how to proceed after testing the water with Power BI.  After setting up a new subscription, IT professionals and business data analysts often don’t know how to license Power BI for company use and share reports and datasets with others.

This post will show you how licenses are assigned to users and, more importantly, what to request from your support desk or administrators who may be unfamiliar with Power BI and Office 365 user licensing.  Keep reading for background information about why this is important and necessary.

You can use Power BI in one of three modes:

1) If you’re a one-person organization or don’t need to securely share content online with anyone, you can just use Power BI for free.  Yep, free.  No feature limits.

2) If your organization has a few hundred users or less, you will need to buy a Power BI Pro license for every user to author or view reports.  Unless you publish report content publicly to the web, every user must have a Pro license – period.

3) If you have several hundred users or you have so much data that you need to manage it with dedicated capacity, it may be cost-effective to purchase a Premium capacity tenant for a few thousand dollars a month.  Pro licenses are still required to publish reports but anyone in the organization can view published reports online or an on-premises Power BI Report Server.

Power BI Subscription Basics

Let’s say that I work for XYZ company and my work address is Paul@xyz.com.  Assuming that a Power BI subscription doesn’t yet exist, if I go to PowerBI.com and setup an account using my email address, I have created a Power BI subscription for my company that is a tenant within the Power BI service.  I could be the janitor for a multinational corporation but I am now the administrator of the tenant.

By way of definitions; the Power BI Service is the entire Power BI offering within the Microsoft Azure cloud.  At any time, it could encompass hundreds of virtual machines geolocated in data centers throughout the world.  When you subscribe to the service, you are – in effect – renting some space within the service.  The “space” that you rent and manage for your subscription is called a tenant.  It’s sort of like renting some office space or an apartment in a large building.  You don’t really own it but you are paying for the right to live there.  You can read about these and other administrative concepts here.

After setting up a new Power BI subscription, you really have one of two options:

1) If you have the authority to purchase licensing and manage services on behalf of your organization, proceed to purchase and assign licenses for report developers and users.

2) Make a service request or contact the appropriate administrator within your organization to add and assign licenses.  This might be your help desk, systems or operations admin or Office/cloud services administrator.

The Admin Take-Over

After a “less privileged” user sets up the first Power BI subscription for the organization, no one else can do the same.  This can be a little confusing if some person in Finance sets-up a trial account and then a BI developer tries to do the same thing.  If the organization plans to move-forward with a governed Power BI tenant, they can perform an “Admin Take-Over”.  Contrary to what the name suggests, this is not the same as a government coupe with troopers storming the building in Kevlar vests and automatic weapons.  It simply means that an administrator assumes control of the new tenant and transfers admin rights from the person who established the subscription the appropriate person(s).  Adam Saxton describes who this works in this Guy-In-A-Cube installment.

How to Configure the Power BI Gateway to use Dataset Connection Parameters

A service provider or vendor might want to publish multiple copies of a report that should connect to different database servers or databases.  In a true multitenant service solution, we would have a singe database with row-level user mapping tables that filter data by the logged in user.  True multitenant solutions require quite a lot of planning and development work to implement.  In smaller-scale or interim solutions, copies of a report can be deployed to different workspaces and then queries can be parameterized to use different database connections.

In this post, I’ll demonstrate deploying and configuring such a solution where the server name and database name have been parameterized and setup to use the on-premises gateway to connect and refresh data.  I’ll also setup scheduled refresh.  The full video walk-through is below but I’ll do a quick review to set the stage.

This is the Power Query Editor in Power BI Desktop.  I have two parameters that are used to specify the ServerName and DatabaseName for each SQL Server query:

Power BI Gateway with Parameters (Time 0_00_57;24)

Once deployed to a workspace in the service, the gateway must be configured with a data source for every possible server name and database combination.  In my example, I can connect to my local server using the NetBIOS name, IP address or LocalHost.  These all are acceptable methods but a data source must be added to the gateway configuration for each so the that the connection strings match exactly.  Remember that the connection is from the on-prem gateway to the database server so names like LocalHost or an internal IP address will work just fine.  In my example, I’m using the IP address of my local loopback adaptor on the local machine to connect to a local instance of SQL Server over the TCP connection.

Power BI Gateway with Parameters (Time 0_04_31;15)

In the workspace, the dataset is bound to the gateway.  Click the ellipsis and choose Settings.

Power BI Gateway with Parameters (Time 0_02_44;22)

To bind the gateway to the dataset, click to select the radio button next to the gateway.  This flips the switch titled “Use a data gateway”.  Apply the setting and then you can refresh the data or schedule refresh.

Power BI Gateway with Parameters (Time 0_05_47;00)Power BI Gateway with Parameters (Time 0_06_09;27)Finally, the parameters can be updated right here in the dataset settings.

Power BI Gateway with Parameters (Time 0_06_46;21)

 

Video Demonstration

 

Power BI for Grownups

The message from the Microsoft Business Applications Summit this week was very clear: Power BI is growing up.

We have known for a while that Power BI is a great front-end tool for enterprise-scale tabular and multidimensional models.  Power BI Premium will now allow almost limitless data scale using extremely large datasets by leveraging composite models that employ both DirectQuery and VertiPaq cached mode tables.  This feature shatters the previous barriers where data models could only support either DirectQuery or cached mode (imported data).  Additionally, Microsoft publicly unveiled plans to add IT developer centric features to Power BI Desktop for managing large models, version control and application development lifecycle.

These announcements leave many open questions about how Microsoft will continue to support self-service and enterprise customers with one tool and about the long-term future of SSAS and Visual Studio-based SSDT. At this point, none of these tools are on the chopping block, but it is clear that James Phillips and his team have significant momentum and are not slowing down to wait for other products to catch-up.

In this interview with Christian Wade, he describes the benefits of this new capability.  Christian is Group Program Manager on the Power BI and Analysis Services product team with focus on data model development and scalability features.

Composite Models & Aggregations

In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data.  The larger dataset was in a Spark cluster, accessed using DirectQuery.  Aggregated tables were stored in the in-memory model using the new composite model feature.  As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details.  Composite models will allow mashing-up imported database and file-based data with an DirectQuery.

There are limits and complexities with these new features.  You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling.  With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice.  I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.

Power BI Desktop Features for Developers

News Flash: Power BI Desktop will be the primary tool for serious, enterprise data model development.  A bunch of features are in the works and will arrive soon for preview.  The obvious question is what will happen to SSDT for Visual Studio and the answer is that it’s not going away but the product team is putting their energy into enhancing Power BI Desktop for now.  It will likely have some kind of “Beginner/Advanced” switch to simplify things for the novice user and isolate advanced or complicated, developer-centric features.

Here are a few screen captures of the forthcoming features:

Schema Compare & Model Integration

Like the BISM Normalizer tool created by Christian Wade, there will be a fully-supported compare and merge tool baked into Desktop.  The early build is branded with components from MAQ Software.

image

Relationship Diagram Designer, Properties Pane and Application Lifecycle Management Features

A new-and-improved diagram designer is imminent.  It will allow large models to be more effectively managed and displayed.  You will have the ability to save individual views of the diagram with selected, subject-area tables.

image

Like in Visual Studio, properties can be set and managed in a Properties sheet window.  Support for standard SSAS features like display folders is coming and can be managed here.

image

Business Applications Release Notes

With the velocity of features and updates, a comprehensive set of release notes are available here.  This document will be updated frequently with the revised roadmap, feature history and announcements.

APIS and Powershell CmdLets for Power BI Administration

At the MVP Summit in March Kay Unkroth promised API endpoints and Powershell CmdLets for managing all aspects of Power BI.  He even did a dance and sang a song to make the announcement memorable.  This week, the product team delivered what they promised, with the announcement official here.

The following features and capabilities are on the delivery short list coming in the next 2-3 months:

Power BI Service

The new, “modern workspace” eventually replacing current workspaces integrated with Office 365 groups.  The new app workspace will no longer be bound to an Office 365 Group. The feature will be optional at first, but will replace current workspaces soon.

Now, users can be: Member, Admin, Contributor

Can add groups to workspace

Premium capacity tenants can have nodes in different regions

Admins will have visibility and access to all objects and resources

Microsoft Business Applications Summit

Keynotes and Featured Sessions from the Microsoft Business Applications Summit are here.

SQL, M or Dax? – part 2

[ Related posts in this series: SQL, M or DAX: When Does it Matter? SQL, M or Dax? – part 2 SQL or M? – SSAS Partitions Using SQL Server Table-Valued Functions (UDFs) SQL or M? – SSAS Partitions in Power Query/M ]

This is a post about a post about a post.  Thanks to those of you who are entering comments in the original May 12 post titled SQL, M or DAX?  This is a popular topic. And thanks to Adam Saxton for mentioning this post in his Guy in A Cube Weekly Roundup.

This is a HUUUUGE topic and I can tell that I’ve struck a chord with many BI practitioners by bringing it up.  Please post your comments and share your ideas.  I’m particularly interested in hearing your challenging questions and your thoughts about the pros-and-cons of some less-obvious choices about whether to implement transformations & calculations in SQL, M or DAX.

This week, I have had engaging conversations on this topic while working on a Power BI consulting project for a large municipal court system.  As a consultant, I’ve had three weeks of experience with their data and business environment.  The internal staff have spent decades negotiating the intricacies and layers upon layers of business process so of course, I want to learn from their experience but I also want to cautiously pursue opportunities to think outside the box.  That’s why they hired me.

Tell me if this situation resonates with you…  Working with a SQL Server database developer who is really good with T-SQL but fairly new to Power BI & tabular modeling, we’re building a data model and reports sourced from a line-of-business application’s SQL Server database.  They’ve been writing reports using some pretty complicated SQL queries embedded in SSRS paginated reports.  Every time a user wants a new report, a request is sent to the IT group.  A developer picks up the request, writes some gnarly T-SQL query with pre-calculated columns and business rules.  Complex reports might take days or weeks of development time.  I needed to update a dimension table in the data model and needed a calculated column to differentiate case types.  Turns out that it wasn’t a simple addition and his response was “I’ll just send you the SQL for that…you can just paste it”.  The dilemma here is that all the complicated business rules had already been resolved using layers of T-SQL common table expressions (CTEs), nested subqueries and CASE statements.  It was very well-written SQL and it would take considerable effort to re-engineer the logic into a dimensional tabular model to support general-use reporting.  After beginning to nod-off while reading through the layers of SQL script, my initial reaction was to just paste the code and be done with it.  After all, someone had already solved this problem, right?

The trade-off by using the existing T-SQL code is that the calculations and business rules are applied at a fixed level of granularity and within a certain business context.  The query would need to be rewritten to answer different business questions.  If we take the “black box” approach and paste the working and tested SQL script into the Power Query table definition, chances are that we won’t be able to explain the query logic in a few months, after we’ve moved on and forgotten this business problem.  If you are trying to create a general-purpose data model to answer yet-to-be-defined questions, then you need to use design patterns that allow developers and users to navigate the model at different levels of grain across different dimension tables, and in different filtering contexts.  This isn’t always the right answer but in this case, I am recommending that we do as little data merging, joining and manipulation as possible in the underlying source queries.  But, the table mapping between source and data model are not one-to-one.  In some cases, two or three source tables are combined using SQL joins, into a flattened and simplified lookup table – containing only the necessary, friendly-named columns and keys, and no unnecessary clutter like CreatedDateTime, ModifiedDateTime and CreatedByUser columns.  Use custom columns in M/Power Query to transform the row-level calculated values and DAX measures to perform calculations in aggregate and within filter/slicing/grouping context.

I’d love to hear your thoughts and ideas on this topic.

 

 

 

 

 

 

 

 

Power BI Global Hackathon Contest Results

The results of last month’s Power BI Global Hackathon are in! The Hackathon was facilitated by our our PUG here in Portland with the goal of welcoming global contenders in subsequent contest. Five teams entered the contest using publically-available data to visualize and tell data stories using our favorite data analysis platform.  Congratulations to Xinyu Zheng and Ron Barrett for winning the challenge with their entry, analyzing Yelp restaurant star ratings.  These were all great entries and you can view the contest results in the Power BI report below.

image  image  image

Here are the published projects that were entered in the Hackathon:

Xinyu and Ron analyzed ratings from nearly 1200 restaurant Pittsburgh, Phoenix and Las Vegas.  Results compare ratings and reviews by restaurant and food categories, sentiment and key phrases in the review comments

image

I loved the creativity of this solution from Jeremy Black and Kirill Perian who analyzed alcohol sales statistics using infographics and bookmarks to switch out visuals on the same page.  The presentation concludes on the last page of the report with an auto-advancing variation of “100 Bottles of Beer on The Wall”.  Nice touch.

image

I’m admittedly a bit biased because this was my design, with a lot of help from Brian, Ron and Greg.  We used a series of tables to prompt a user for Power BI solution business requirements and recommend fitting solution architectures and components.  We pushed some practical and technical limits in our project and I’ll write a separate post about it.

image

This entry from Ron Ellis Gaut is a nice, clean orchestration of county health data, measuring health and comparing personal well-being and program efficacy.

image

The entry from Daniel Claborne emphasizes machine learning predictions performed with R Script, commonly used in data science.  He actually includes the annotated code and explains the technique and the approach using training and prediction data sets.

image

The Portland Power BI User Group was one of the first and continues to be one of the most active in the international community.  We meet on the 4th Wednesday evening every month in Beaverton, Oregon. Today there are many active PUGs all over the world.

 

Keeping Up with Power BI – A Never Ending Story

This first week of the new year has been a lot of housecleaning for me (literally and figuratively…  my office desk and cabinet will be clean by the end of the day!).  Three years into teaching classes and workshops on being productive with Power BI, it continues to be a product requiring a lot of work to stay current.  New features are introduced in every Power BI Desktop monthly update; not to mention updates to the cloud service and on-prem server.  You would have to be a child of the 80s to get the Never Ending Story reference.  Otherwise, it’s just a really bad flying dog – and pop song, which are both hard to explain, so we won’t.  The point is that it’s an ongoing effort to keep skills, courseware and presentation material up-to-date.

Image result for never ending story

If you’re like me, sometimes all these updates can be a bit of a distraction (we’re talking about Power BI again – not the dog, movie or song… case in point).  I’m excited by the continual innovations and improvements to Power BI.  However, the foundational rules of good design don’t really change that much.  Effective data curation, correct modeling and good core visualization design are as critical as ever.  The trick is to know which new features may be used to improve foundational design and which ones you can leave as icing on the cake for minor enhancements.  Updating courseware and workshop labs seems to be a never ending task and I’m hard at work revising older material and adding new content to prepare for new events this year.  An important topic I will continue to revisit this year is how Power BI is used along with other Microsoft tools to create different types of solutions.  I’m working on a new presentation to describe all of the incarnations of Power BI, used to deliver everything from self-service desktop reports – all the way up to enterprise-scale solutions for corporate users with governed and secured data – and most everything in-between.

The first workshop of the new year will be a one day preconference before the Redmond SQL Saturday, on Microsoft campus Friday, February 9.  You can sign-up for this event here.  I’m working on a few others in the following months and will update this and other blog posts when they are scheduled.  I hope to see you there.  You are welcome to send me questions or suggestions about specific topics of focus.  Just add a comment to this post, or reach me through Twitter or LinkedIn.

image

 

Implementing Row-Level Security in Power BI

The best method to implement row-level security in a published Power BI model or SSAS Tabular model consumed from the Power BI service will depend to some degree on your data structure and requirements.  The method I demonstrate here is one of the most flexible approaches and one that I commonly use in my projects. Continue reading