Guy In A Cube Power BI Livestream

I just wanted to give a big big Thank You to Adam Saxton and Patrick Leblanc at GuyInACube.com for inviting me to participate in the Power BI Livestream yesterday. It was a blast and an honor to be hang with you the Saxton clan (even the sheep and the great dane!). You can watch the replay for the Saturday morning YouTube livestream here:

I’m trying to minimize crossposting between the my long-time blog, SQLServerBI.blog and DataOnTheRoad.blog, but we are still getting started and just wanted to make sure you all got the word. Please subscribe to DataOnTheRoad.blog if you are interested in following the data community leader interviews along our journey in the Data Bus. Regular technical blog posts and videos will be on SqlServerBI.blog.

Here are a few quick shots from yesterday. Interviews with Adam will follow in the next few days.

Power BI Live Stream this Saturday

Only the most dedicated data professionals are willing to invest their personal time and spend any part of a weekend improving their skills. Having organized many SQL Saturday events, I know that there is a big difference between those who punch out on the proverbial time clock on Friday and those who love working with data so much that they will invest their personal time on a Saturday. One of the greatest examples of this kind of dedication is the group who participate in the weekly Power BI Live Stream from Guy In A Cube.

When Adam invited me to participate in the weekly Guy In A Cube Power BI livestream on Oct 23, I was elated. I am honored to join. If you are not already familiar with the Saturday morning live stream, you can join by subscribing to the Guy In A Cube YouTube channel. The channel is member-supported. Patrick and Adam do all of this outside of their day jobs at Microsoft and then member donations help cover their costs. I encourage you to support the channel with a donation but it is not a requirement. Even without supporting the channel, you can attend the hour-long livestream. Everyone can chat and submit questions during the first half and then the second half of the session is open for members to submit their questions.

So, why am I going to be “in the cube” rather than joining remotely? Well, Houston is one of the stops in our DataOnTheRoad journey in the Data Bus! I’m interviewing Adam and other data professionals as we tour the country, working on the road, on the way to the Orlando Live! conference in November.

PASS Data Community Summit & Orlando Live! Sessions

There e are two conferences in November where I am thrill to be presenting sessions. The second annual PASS Data Community Summit is on November 8-12. Like last year, this is an online event and IT IS FREE! I will be presenting a session titled Power BI Paginated Reports: the New Old Operational Reporting Platform. Join me on Thursday, Nov 11 at 12:30 PM Eastern Time.

You can register to attend at PASS Data Community Summit November 8-12 2021

The Orlando Live conference is an in-person event, and in-case the name of the event didn’t make it obvious, it will be on Orlando, Florida the week of November 15-19. This event is multiple co-located conferences in one, for attendees from multiple technical disciplines.

I will be presenting Reporting Services and Paginated Report Recipes and Doing Power BI The Right Way For Database Developers. Sessions are on November 17 at 8:00 AM and 4:00 PM Eastern, respectively.

We’re heading to Orlando in the Data Bus now and taking our time to get there. To follow our journey, visit DataOnTheRoad.blog.

A Developer’s Guide to Creating Bad Power BI Projects – Part 1

It occurred to me that we have put so much effort into promoting best practices and proper design that there is far less information about how to create bad reports and data models. In that light, the purpose of this article is to talk about what to do if you want things to go poorly and make sure your projects fail – if not immediately, then sometime in the future – and if not for you then for whoever inherits the work that you have done.

I welcome your comments and thoughts about the worst practices you have seen in Power BI. Post your ideas in the comments below. What examples have you encountered of sure-fire ways to make projects go badly?

Keep in mind that the rules of “good” and “bad” design might be different for small, informal projects vs larger-scale, formal projects where longevity and scale are more important. Arguably, bad practices for enterprise-scale, multi-user report projects might be just peachy keen for small, informal desktop reports for one user with limited data. This article is written from the perspective of formal, larger-scale projects.

So, you’ve taught yourself how to use Power BI! You’ve Googled some questions and let YouTube show you how to cobble together a little data from different places, wire-up a few tables and drag-and-drop fields to make some attractive visual reports. It wasn’t all that hard to do, right? You’ve demonstrated your new Power BI design chops and now all the managers in the company want more cool charts and reports. Since the first round of reports were so well received, the company bought licenses so everyone in the organization can use Power BI to run their business. Welcome to the big leagues, folks! Just import data from all the corporate systems, desktop Excel files and whatever web feeds you can find. If it’s on the Internet it MUST be right! Munge all this data together and create dashboards to help the executives run the business!

This guide demonstrates common examples of how enterprise Power BI projects are bound to fail using self-taught undisciplined techniques to build larger, enterprise solutions. Most, if not all of the examples in this and the posts that follow, are examples that my colleagues and I encounter when we go in the fix failed Power BI projects.

Bad practice #1: Keep PBIX files on your local computer rather than shared storage

Many Power BI projects evolve through stages of prototypes, so they might create multiple PBIX files. After some trial-and-error, solution developers often save-off multiple files with specially-encoded names including things like the date, their initials and other information each one with different features and techniques. One example might be: FinancialAccountingReport_20210615_BobR_TestDisconnectedSlicer and another one might be FinancialAccountingReport_20210618_BobR_FixedForecastMeasureButNowItRunsSuperSlow. Reminds me of when I was eight. We would write notes with cyphers that only the writer and recipient would understand. There is nothing wrong with making a few backup copies of your work in a convenient place where you will find it. someone has to keep track of it all and recover work when there is a staff change.

Without a deliberate effort to exercise discipline, delete or archive the old files and then store the most current copies in a central storage location that gets backup and synced to the cloud; you are sure to have a mess on your hands. Worse than that, if the project ever needs to be turned over to someone else, they wouldn’t have a clue about how to continue the work. I always ask myself what would happen if I had to hand-over my work to another developer tomorrow? Would they have what they need to continue and would they know where to pick things up? Some transition documentation is good to have in this case, but make sure the master files are in a safe location. Better yet, keep them in a source code repository like Azure DevOps or GitHub, or in OneDrive or SharePoint with automatic file versioning.

Oh, wait… this article is about creating bad projects! Bad practice is to just leave files on your local drive. If you want things to go badly, that’s what you should do. When you leave this mess for someone else, they will have no idea about what you did or how to continue the work.

Bad practice #2: Keep the only copy of a report and dataset published in the online workspace to download when you need it

Today, we’re all about using the cloud to safely store files. This is a good practice. You can store all your PBIX and other important files in a OneDrive folder where they are backed up to the cloud and the files are even versioned so you can revert back to an earlier version if you do something silly and need to recover. However, relying on the option to store your one-and-only copy of a report or data model file in a Power BI workspace is optimistic at best.

There are several conditions that will disable to ability to download a PBIX file. The thing is that enterprise-scale data models need to take advantage of features that are not supported in Power BI Desktop (and probably never will be). This is by design. Desktop is primarily a self-service tool and most of the feature we use in enterprise-scale projects require the model definition file to either be saved to a JSON model definition file (e.g. “Model.BIM”) or to a PBIX file with an internal model version number unusable by Power BI Desktop. I’ve heard from numerous novice developers who were accustomed to publishing their latest PBIX file to the service, only to discover that after enabling some advanced feature; they could no longer download the file.

If you want to shoot yourself in the foot, do not keep a master copy of your model and report files (which you are managing separately, right?) in OneDrive and roll the dice that you will be able to download a copy when you need to edit the model or report in Desktop.

Bad practice #3: Import source data files from your local file system and then publish the report to the service

If your source data files are on your local computer, after you publish the data model to the service as a dataset; you will not be able to refresh the dataset from the service unless you take a few more steps. There are a few options to remedy this situation, such as:

  1. Move the source file folder to a shared OneDrive folder or SharePoint library, and then use that connectors to refresh the dataset from the service.
  2. Move the source file folder to a network share and then setup the enterprise gateway to import and refresh the published dataset from the service.
  3. Install and setup a personal gateway to fresh files from your local computer.

Option #3 is a sure fire way to run into problems down the road. This might work for a small project managed by a single developer. But, if you have source files on your laptop which happens to be asleep or shut off during a scheduled refresh event, that’s not gunna work, and the refresh will fail.

Another common gotcha is that if you import source files that are stored in a OneDrive folder, and use a local syched folder as the source; the connection is to the local file system (like “C:\Users\Paul\OneDrive\…”) and not to the OneDrive service in the cloud. In this case, the data refresh will try to read from the local folder which is inaccessible from the Power BI service.

Bad practice #4: Use hand-written SQL queries to import large database tables

SQL is the tool of choice for most database professionals, so it would stand to reason that you should write a SQL query when connecting to a relational database (like SQL Server or Oracle.) Right? Sure, if you want things to be slow and inefficient. Best practice is to connect your queries in Power BI to read from a database table or view, and not to use SQL statements so that Power Query can employ a method called “query folding”. This means that the Power Query processing steps, where possible, will be translated into SQL or the native query language for the data provider. There are advanced techniques to promote query folding over a SQL statement, but this is the exception to the rule. If you need to write SQL logic to join, filter, group or sort data from multiple tables before it hits Power Query, write a view and store it in the source database.

Simple answer: If you want things to go badly, write SQL queries in your Power Query queries using Power BI Desktop.

Bad practice #5: Use DirectQuery for “real-time” analytics

Power BI includes an awesome capability called “DirectQuery” that enables certain data source engines to handle query operations as user interacts with report visual, rather than using the VertiPaq in-memory analytic engine, to process queries. DirectQuery doesn’t use the VertiPaq storage engine, which is really the heart and soul of Power BI and SSAS Tabular data models. With DQ, records are read live from the source. The downside to DQ is huge. Most interactive report visuals can take several seconds to load, and some DAX functions simply can’t be used due to resource overhead and performance cost.

Database professionals often consider DQ to be a cure-all solution because it enables familiar database technologies. Everything is managed centrally, there is no record duplication in the data model and users see the current state of a record when they interact with reports. That’s all fine and dandy but compared to Import mode, DirectQuery performance can be several orders of magnitude slower.

Import mode is fast because because data sits in memory, compressed by column. DQ is slow because queries must be translated and pushed down to the data store, then results must be sent back. DirectQuery is a powerful feature when used strategically to address a few specific problems. It is a good solution for massive fact tables that would otherwise use a lot of memory. Just filter large tables and don’t do a lot of grouping and aggregation over a large set of records.

In a Composite data model, with most tables using Import mode, and one or a few detail tables using DirectQuery, you can a achieve “best-of-both-worlds” solution by allowing users to drill to details against a filtered set a records.

If you want your visual, analytic reports to be really slow, use DirectQuery for all the tables.

Bad practice #6: Combine source data and load one big table into the Power BI data model

People who approach data problems in Power BI the same way they do in Excel, tend to combine source tables into a large spreadsheet-like table. That approach creates an seemingly easy-to-navigate grid of data with a lot of columns; but you can lose a lot of functionality and create a very inefficient model this way. The VertiPaq, Tabular modeling engine was architected around the premise of “star schema” dimensional models.

It’s not that Power BI won’t work with flattened data, or that you “have to” build a star schema, with separate dimension tables for every attribute! The the natural rules of data behavior just cause Power BI to work better when data is modeled in related fact and dimension tables. There are several DAX functions that expect records to be filtered through a relationship. One of the most common examples is a date dimension table, used to filter a fact table to perform time-series calculations. But that same behavior can exist for Products, Customers, Geography and other filtering or grouping attributes that can be used to filter a related table of numeric business facts.

If you want to struggle with DAX calculations and limit users’ options to interact with your data, pound your data a large, flat table, rather than individual dimension or look-up tables.

How to Name Worksheets in a Paginated Report Exported to Excel

This question comes up every few years in SQL Server Reporting Services. Of course, in Power BI Paginated Reports, we have the same features. A couple of days ago, Karthik posted this question as a comment to my post titled Chapter 7 – Advanced Report Design:

I am working on a SSRS report where the grouping is done to group the records in to multiple tabs/worksheets. When the report is exported to excel, the worksheets has the default name (Sheet1, Sheet2, Sheet3,…). Here I want to override the default worksheet name with (Tab1, Tab2, Tab3, …). The number of tabs/worksheets varies each time depending on the parameter selection by the user. How to address this? any suggestions please.

There are a couple of requirements to set this up. First, you must drive report pagination using a Group and then set a Page Break on the Group. This figure shows that I am grouping my table data region on the Brand field and then setting a page break on that group. You can get to the Page Break settings by using the little arrow next to the group item in the Row Groups pane or you can use the properties window like I’m doing here.

Next, you want to give the page a name under the PageBreak properties. Expand PageBreak and then use the Expression builder in the PageName property to use the grouped field name in an expression. In this case, I am referencing the Brand field.

The result is shown here when I open the file resulting from exporting this report to Excel. You can see that the worksheet tabs are titled with each Brand value and each sheet contains that product information group by the corresponding brand.

New Blog: Data On The Road

That’s right! I’m starting a new blog site today. Just to be clear, it will not replace this one. I will continue to post here at SqlServerBi.blog about everything related to Business Intelligence best practices, Power BI, Paginated Reports, SQL and the Azure Data Platform.

The new blog is called Data On The Road and it is dedicated to connecting with the professional community while on the road as a working technology professional. In May of 2021, my wife and I decided we needed an adventure so we transitioned our living space into a new Tiffin Class A motorhome. We had it outfitted with a private office and tech so we could stay connected while working fulltime from the road. We call it the “Data Bus”.

We’re just getting started and taking short trips to practice and work-out the kinks – but it has been great so far. We hit the road for good in the Fall to go experience the US from the Southwest, the South, the East coast – and who knows where?

We’re still assembling the tech and working out the logistics to visit and interview friends and colleagues along the way. In the database and BI community, we’re a dedicated and social bunch so let’s talk about work and about fun! We might chat a little tech but the purpose of this blog – and this adventure – is to talk about work/life balance. In our jobs, we work hard and play hard – so we want to know what our community friends and leaders do when they need to get away. How do they balance their careers with other activities?

Please follow both of my blogs but head on over to DataOnTheRoad.blog and follow us on this new adventure!

Doing Power BI the Right Way: 10. Designing and Managing Large Datasets

Part of the the series: Doing Power BI the Right Way

Later in this post, you will find a 27 minute framed video including a series of interviews and discussions with a few of my colleagues. They provide perspectives and recent experiences related to this topic. The best approach to design and maintain large data models depends on different factors. However, many years of experience have proven that there are best practices to embrace and common mistakes to avoid.

I was just talking to a consulting client about the best approach to build a data model and he told me something very interesting about the way they were loading data into Power BI. He said “We don’t use facts and dimensions, we load all of our data into one huge table.” He said that their data model performs well and that it meets their reporting needs. It is a difficult point to argue, when something is working at the time although the design might not follow the accepted rules. Life is like that and there are plenty of analogies to make the point that a practice, even a real bad practice, might solve a problem for a period of time and under certain conditions. <analogy>You can drive a car at excessive speed to get to your destination faster. You might not get caught by the police on that day and you might not crash but eventually, if you make it a habit, this practice will catch up to you.</analogy> Data is like that. If you don’t play by the rules, you limit your options. Bending the rules lets you move faster and sometimes with less hassle. But, as the project scope expands – and after adding enough data or other complexities to the solution, it will not endure. The data model won’t perform well, won’t load the correct data or it just won’t be reliable.

This post will explore the realities of best practice design for large data models; some important considerations and trade-off decisions when working with both “big data” and “large data”.

One of the great challenges with Power BI is that it is so easy to build simple data models and reports quickly. But, often when those projects need to grow-up, they aren’t ready for enterprise use. This isn’t a problem with the tool, it is a problem with the way it is often used. If someone says “Hey, Paul, I need you to create a report quickly. Can you import data from these three sources into Power BI and create a quick dashboard report?” I might say “sure, let’s just do this:”

  • Merge records from multiple files and queries into a single table in the model.
  • Use in-line SQL queries in Power Query to join source tables together and apply transformation logic.
  • In reports, summarize numeric columns from a table in the data model, rather than defining DAX measures.
  • Reload the entire model when source data changes.

There is nothing inherently wrong or bad about using these quick-and-easy techniques in a smallish project, but if it needs to grow, scale and be future-proof, you may need to follow a longer path to achieve a more durable result.

The answer to a question we often ask at the beginning of a BI project: “what information do we need include in the data model?”, is often the same. I think Freddie Mercury said it best: “I want it all and I want it now!” More specifically, stakeholders might say that they want:

  1. A lot of rows of detail or historical fact records, (lets say tens of millions or billions of rows.)
  2. Many columns from the source table(s) so they can slice, filter or group values on anything imaginable.
  3. Reports must perform well and return results quickly.

…to these requests, we might say “pick any two but you can’t have all three”. Maybe you can have all three to a certain point but not to an extreme.

Tall and Wide tables

A “tall” table contains many rows or records. A million might be a substantial number of fact table rows in a desktop Power BI data model, and row counts in the billions are not out of reach in an enterprise-scale solution built in either Power BI or Analysis Services. The VertiPaq analytic data processing engine (the data modeling engine behind SSAS, AAS, Power Pivot and Power BI) is very good at handling very tall tables when the columns are conservative, numeric data types; especially when they contain repeating, non-unique values to aid columnar compression. In rare cases, Power BI data models can even read and process PetaBytes of data! The point is that tall tables are very achievable.

Tables with many columns can be managed by the modeling engine but tall (many rows) and wide (many columns) are often not a good combination. The fact is that analytic reporting solutions shouldn’t include the whole kitchen sink. The problem often stems from developers not building data models based on on business requirements but simply exposing all the available data just in case someone might find it valuable. This is just not the way BI works. If you don’t need certain columns to meet the current report requirements, leave them out. Columns can usually be added later when they are needed. Additional tables can be also added until the model exceeds its design goals. Herein lies the need for balance.

The thing is that simple data models can be relatively easy to build and often don’t take up a lot of space. It may be more cost-effective in the long run to create another data model with some duplicate design elements, than to spend all of your time and energy trying to craft an all-encompassing data model.

Large Data and Big Data

By “Large Data”, we generally mean that tables contain a high number of rows. Conversely, “Big Data” refers to a set of technologies that were engineered to manage large volumes of data records using distributed storage and processing.

Big Data

In the first wave of cloud computing, we saw the introduction of “Big Data” which was a new concept for data professionals. Technologies like Hadoop and Apache Spark use distributed computing to process larges sets of data in parallel, using highly scaled-out architecture. The term “Big Data” doesn’t just mean that there is a lot of data but it really means “distributed data processing” which can potentially be used to handle very large data sets. Like any architecture, there are strengths and weaknesses. Big Data implementations like Spark and Databricks can transform and process an enormous amount of information very quickly with portions of the data processed on distributed nodes.

Conversely, the VertiPaq engine at the center of a Power BI and SSAS tabular data model, is not a distributed architecture. Because VertiPaq keeps all of the data in memory on a single computing node, it can perform calculations extremely fast. For this to work, data must reside in contiguous memory on a single computer. So, Power BI is not Big Data but it can be used in conjunction with Big Data sources when you use these respective technologies the way they were designed and architected to work.

Complex Models

Unique problems require unique data model designs. Sometimes a Power BI report and accompanying data model just need to address a few business reporting objectives. As time goes by, additional requirements are introduced and the model grows. Along the way, certain data challenges might require additional data massaging or new tables to bridge tables and address special needs. Over time, a model can get complicated. This often raises the question: should we create one big, complex data model to address many requirements or multiple, less complicated data models? There are trade-off decisions either way. It is often possible to engineer larger, high-value models. However, you have to weight the cost to maintain a complex model with the relative simplicity and cost savings of less-complicated models. Sometimes starting over is cheaper than layering more messiness onto a data model with a shaky foundation.

Partitioning

Once considered an essential activity in the design of any data model, partitioning breaks a table into multiple storage units that can be managed and loaded separately. Partitions containing unchanged records don’t need to be reprocessed, and the smaller partitions containing new records or those that have changed, can be processed quickly and in parallel, so they don’t block other operations.

Video Interviews

This video runs about 27 minutes and includes a series of interviews and discussions with some of my colleagues, with their thoughts and experiences on this topic.

Power BI Data Modeling Sessions

This is going to be a very busy week for presentations Iam presenting with five sessions scheduled on the topic of data modeling!

Data modeling is the heart and foundation of a useful Power BI solution. If you can get the data model right, most everything else falls into place. Correct data modeling is relatively simple on the surface but can be challenging in practice. Evidence of the interest and importance of this topic is that I submit abstracts on a few different topics for these events and this is the one keeps getting selected.

Silicon Valley Microsoft Data Platform Meetup
Tuesday, April 20 at 6:00 PM Pacific Time

Power BI Summit
April 19-23 – multiple sessions and times

The session will be repeated three times for attendees in different time zones.

https://globalpowerbisummit.com/

BI & Analytics community the Norwegian Computer Association
April 21 5:00 PM Central European Time

Supporting materials are available on the Presentations page on my blog here: Presentations | Paul Turley’s SQL Server BI Blog. In addition to to the material and demonstrations I present in these sessions, I wanted to share some additional thoughts which I have recorded in this short video:

Drill-through from Power BI to Paginated Report – Report Recipe #4

Navigation between reports is the hallmark of an interactive reporting solution, enabling the ability to drill-through and see relevant details and contextual filtered information in a target report. Power BI and Paginated Reports both offer navigation capabilities suited for different purposes. Having the ability to navigate between reports can provide a tremendous amount of power and flexibility.

This is simple math: with two different report types, there are four different possible report navigation combinations. They are:

  • Power BI report to Power BI report
  • Power BI report to Paginated report
  • Paginated report to Power BI report
  • Paginated report to paginated report

Continuing this series of blog posts to highlight our recently published eBook about Paginated Report recipes, this post summarizes the technique detailed in recipe #4, “Drill-through from Power BI to Paginated Report On-premises”.

If you have worked with the native SSRS report drill-through using report actions, that capability is not yet baked into the product. It probably will be some day but likely not soon. That’s OK because this technique, using URL navigation, works quite well. There is one other twist which is that the URL parameters are a little different when comparing on-premises Paginated (SSRS) reports to Paginated reports in the Power BI service. The technique in in this recipe uses an on-prem target report. You can apply the same pattern in the cloud with sight modifications and I will post an example in the near future.

This image depicts the source Power BI report with links that navigate to the target paginated report, showing filtered details:

The mechanism to enable navigation is a report URL produced with a DAX measure. The code emits a URL with parameters that preserve the row and filter context of the selected item or record. This URL measure is exposed as a link in a table or matrix visual in the Power BI report, optional with a URL “link” icon:

To see the instructions in detail, follow this link to the recipe: 04-Drill-through from Power BI to Paginated Report On-premises | Paul Turley’s SQL Server BI Blog

Creating a Paginated Report Template – Report Recipe #3

This series of posts introduces the recipes in the Paginated Report Recipes eBook. In recipe #3, “Reusable Report Template”, Kathi Kellenberger demonstrates how to create a universal report template that you can use to create standardized reports that have consistent styling, branding and look-and-feel. This is an effective best practice to ensure that all reports meet organizational standards and conform to a standard style and format. Kathi’s approach provides tremendous flexibility by using parameters as variables to change several report styling elements. Thank you, Kathi, for contributing this recipe to the collection.

The Reusable Report Template recipe allows you to build Paginated reports and SQL Server Reporting Services reports in Visual Studio, starting with a new report template that allows you to easily switch-out a logo image, background, border and line colors and the report title in the header.

Back when I was doing a lot of SSRS project work, I used techniques similar to those described in this recipe to create “starter” report templates. At the beginning of a new project, I would create a standard template report and then create new reports from the template. I also used parameters and expressions to define dynamic styling properties. However Kathi has taken all of these tricks to the next level by building a comprehensive report template with all of these techniques built-in. At the center of the template is a custom parameter handler function that helps manage several styling properties in a single parameter and then maps them to various report settings.

Use the detailed instructions in this recipe to define your own standard report template with your corporate branding and style, and then use the advanced coding technique to swap-out styling property values. The report recipe is available here. The sample files and other resources related to this recipe are available here.

Paginated Reports Recipe eBook – first two recipes

First and most importantly, I want to acknowledge and thank these contributing authors who wrote report recipes and working samples included in this eBook:

  • Robert Bruckner
  • Kirill Perian
  • Kathi Kellenberger
  • Stacia Varga
  • Tomaž Kaštrun
  • Tom Martens

As I am working on posting the initial report recipes, I wanted to share some comments about the first two recipes. These recipes are step-by-step instructions that lead through the start-to-finish process using provided sample database, and can be used with SQL Server Reporting Services (SSRS), Paginated Report Server or Paginated Reports published to a Power BI Premium capacity workspace. Reports may be authored with Power BI Report Builder, Report Builder or SQL Server Data Tools/Visual Studio. Additional recipes will follow.

The eBook with all of the recipes will be accessible from the main menu here on the blog. Each recipe is published as a page rather than a post, so you don’t have to search and try to figure out when each one was originally posted. As of tonight, I have published eight of twelve recipes that have been completed thus far. I’ll get others posted in the near future. Please enjoy these and watch for more paginated report recipes here: Paginated Report Recipes eBook | Paul Turley’s SQL Server BI Blog

The first two recipes in the Paginated Report Recipe eBook are about applying alternate row and group shading. Dating back to the early days of printed reports, this style of reports were called “greenbar reports” because they were literally printed on continuous, pin-fed paper that had alternating shaded green and white rows to make it easier for someone reading the report to follow-along.

As the industry evolve from scrolling paper reports to laser-printed pages and then to on-screen reports, the greenbar effect was still useful. Recipe #1 introduces expression-based styling by changing the background color using the modulus (divisional remainder) of the row number in a table data region. If the row number is on odd number, set the background color to a shading color. Otherwise, set it to white or transparent.

01: Alternate Row Table “Green Bar” Report | Paul Turley’s SQL Server BI Blog

In recipe #2, contributing author Kirill Perian takes this technique to the next level by maintaining the alternate shading across column and row groups in a matrix data region. The method is a little more involved than just using the row number. He uses a hidden column to track the column group and then an expression to reference that value to manage the background shading for each row.

02: Alternate Row Shading in Matrix (with Column Groups) | Paul Turley’s SQL Server BI Blog

The end results looks similar to the simple table report using in recipe #1 but the problem is a little more complex in a matrix report because of the column groups. The advantage of this technique is that it will work across row and column groups at multiple levels.

Paginated Reports eBook Finally Released

At last, the second edition of “SQL Server Report Recipes” written by Paul Turley, Robert Bruckner and a host of contributors; is being released, a few recipes at a time. This time around, it will be a free book published through my blog and perhaps other sources. I’ve posted the introductory chapter and six report design recipes. You may also download the sample SQL Server database used in all the recipe reports. I sincerely hope that you and others will use this as a free resource to serve the common interest of the greater community. Please spread the word and send those who might learn and benefit to this page.

The work on this book really started about eighteen years ago…

Late December back in 2003; what a very special time for me! SQL Server Reporting Services was due to be released to the market and I had been using the pre-released version to integrate reports into a custom web application. After using a variety of products like FoxPro, Access and Crystal Reports, and after mastering essential T-SQL queries and .NET programming, using SSRS was a dream. It was simple and elegant, programmable and powerful. I was so excited that I wanted to tell the world, so I recruited authors to write our first book about Reporting Services. Three editions later, I teamed up with Robert Bruckner, one of the lead developers on the SSRS product team at Microsoft, to write a different kind of book. Robert and I had both blogged quite a lot about report design techniques. Robert had done some crazy things in SSRS that no one knew was possible. He found very creative ways to develop games using SSRS such as tic-tac-toe, hangman and Battleship. Honestly, there are easier ways to do game programming, but he proved that almost anything was possible if you were willing to think outside the box and maybe even draw outside the lines a bit.

Several contributing authors have worked tirelessly over the past difficult year to make this eBook a reality. Others, through no fault of their own, sign-up to contribute to the book but “the year that shall not be named” had other plans. At first we paused, and then we stopped and then we regrouped and decided to publish a shorter but comprehensive, modern kind of book.

Please visit the new Welcome and Introductory page with the Table of Contents to the current and future recipes: Paginated Report Recipes: 2020-2021 | Paul Turley’s SQL Server BI Blog

Doing Power BI the Right Way: 4. Power Query design best practices

Part of the the series: Doing Power BI the Right Way (link)

Although my professional focus is building enterprise-scale BI solutions, I’ve created my share of informal Power BI reports that were put together quickly, with the goal to create something “good enough” rather then achieving perfection. This guide is about designing proper and formal solutions but these practices apply to any Power BI project that needs to survive future maintenance.

When you need to load data into a data model, you have a few options and the right choice is going to depend on a few factors. This equation is usually a balance between quickly and conveniently generating a table or using a disciplined approach to get reliable data from a sustainable source of record. The following image shows that data can be transformed in the source (or before data is loaded into the source) or within Power Query using Power BI Desktop.

Convenience or Scale?

Life is full of choices and trade-off decisions. Let’s say that you need to create a lookup table containing sales regions and this information doesn’t exist in the source database. You could easily create a new table in Power Query using the “Enter Data” feature, and just manually enter the sales regions. This would solve the immediate problem with very little effort, but how will the table be maintained if sales regions are added or change in the future? We could keep the list of values in an Excel file stored in SharePoint for a business owner to maintain when information changes. You could also go all the back back to the beginning of the process and load a dimension table in the source database. IT professionals might take the hardline and say that all data must flow through the data warehouse regardless of the effort and cost. Which of these is the right choice? …it depends on the scope reporting project, and the long-term ownership of the solution. Sometimes quick and convenient are OK but particularly tables that need to scale and handle larger data volumes in the future, the following guidelines are critical.

I’ll get to the point and then explore the details afterward. Start with these fundamentals:

  • A proper data model is the heart of a Power BI report solution.
  • The purpose of Power Query is to shape and prepare each of the tables loaded into the data model.
  • The tables, fields and measures in the data model should be uncomplicated and user-friendly; intuitive and easy to navigate.

Recommended Practices

Following are the general best-practices I religiously apply when creating queries. After briefly describing each recommended practice, I’ll break it down and review some in detail.

PracticeExplanation
Use parameters for query paths & connectionsParameters are used to make the solution portable. Any connection information, like a file path or database server name, should be stored in a parameter so it can be changed without modifying query code.
For large SQL tables, reference either tables or viewsPower Query has optimizations built-in to work with different data connectors. Several connectors support query folding, where Power Query translates query steps into the native query language.
Starting with a SQL query rather than selecting a table or view from the list of database objects will ensure that query folding will not work. When possible, start with a table and if you need to use SQL to prepare data before loading it with Power Query, create a view.
Limit large table volume using data range parameters, compatible with Incremental RefreshIncremental Refresh enables the Power BI service to partition large tables and only load data that changes rather then the entire table when the dataset is refreshed. This was once a Premium-only feature that now works with shared capacity licensing with datasets up to 1GB in size. Even if you don’t intend to use the Incremental Refresh feature, using a pair of date range parameters allows you to filter large tables and keep the PBIX file size small. After publishing the file to the service, you can update the parameters and load more records.
Create two date/time type parameters named RangeStart and RangeEnd, and then add a date range filter according to these instructions.
Remove all unneeded columnsResist the urge to leave columns that you are not sure that you need for reporting. In each query, remove all unneeded columns early in the sequence applied steps. The easiest way to do this is to use the Choose Columns button on the Home ribbon and deselect columns. To change the selection later, click the gear icon next to the Remove Other Columns step.
Rename using title case for all table names and column names that will be visible in the data modelAlthough it may seem trivial, it is absolutely necessary to apply friendly naming conventions to all tables and fields. Chris Webb wrote an excellent post and about object naming conventions. As a rule, rename all columns that will not be hidden in the data model, using friendly title names (with spaces and mixed case).
There is no need to rename primary key, foreign key and other utility fields. After tables are added tot he data model, hide those fields to remove clutter and confusion for report developers (even if you are the report developer).
Explicitly set data type for all columnsColumn renaming and changing the data types an be time-consuming work but be meticulous about check every column returned by the query.
Consolidate multiple steps of the same type (such as change type and rename columns)As query design evolves, you will inevitably create inefficient queries with redundant steps. There will always be opportunities to improve the design, often by consolidating and reorganizing steps.
Rename key steps to document purpose and aid future developmentRenaming query steps allows you to understand the function and purpose of each step in the Applied Steps list. This creates a self-documented series of operations that will be easier to maintain down the road.
Add code comments in the M script and/or step descriptions to document queriesYou can further document query steps by changing the step Description in the Properties window or by adding code comments in the Advanced Query Editor.
Use steps that promote query foldingMany query steps allow records to flow-through without blocking subsequent steps. There are also certain transformation steps that must read all the records into memory to apply the transformation.
Move steps that break query folding as late as possibleSteps that support query folder, that can be translated into the native query language of the data provider, should be performed first. If non-foldable steps can’t be avoided, they should be performed as late as possible in the sequence of query steps.
Perform Pivot, Unpivot, Transpose transformations only on small result setsThese transformations must load all records into memory. They are powerful but have volume and performance limitations. Test them with production-scale source data.
Create row-level calculations in Power Query instead of DAXRow-level calculations and derived column values can be performed either using Power Query or DAX, but M s a far more capable language for data transformation. Effective data preparation using Power Query will simplify and ease the burden of data model design. Regardless of the relative efficiency, if you perform all data preparation and transformation in one place, this can simplify maintenance in the future. There are less-common cases where DAX is the best choice to create calculated tables and calculated columns outside the scope of a single row.
Avoid using DirectQuery when not warranted by latency and data volume requirementsDirectQuery has it’s place in data model design but it should be an exception to the normal pattern of importing data into an in-memory data model. DirectQuery tables can work with simple aggregation calculations but don’t perform well using many other DAX functions.
Use DirectQuery to aggregate values over very large tables that won’t otherwise fit into memory, or to support drill-through into non-aggregated detail records. These are advanced design patters that should be treated as rare and exceptional.
Avoid using native SQL queries, especially with large source tablesNative SQL queries don’t support query folding so they should be avoided when possible. This may be a an acceptable exception to load low-volume query results but generally use a database table or view for query data sources.
Use Date/Time values for dates (use Date or Time)Load column data only at the necessary level of grain so values in the data model will compress. In particular, reduce date/time values to date only. For time-level resolution, store date and time values in separate columns.
Import or create Date/Time tables in Power Query rather than using auto-generated date hierarchiesAuto-generated calendar tables in Power BI Desktop are fine for self-service projects but purposely-built date tables will afford more flexibility.
If available, use a date dimension table in your data warehouse or source database. Secondarily, generate a date table using Power Query.
Date tables can effectively be created with DAX functions but if all tables are loaded with Power Query this gives you the convenience of managing all tables centrally.

Breaking It Down

Let’s take a deeper look at some of these recommendations.

Use parameters for query paths & connections

For every file path, web address or server name in source connections; use a parameter. It is not hard to go back and edit source connection information using the Advanced Editor but the easiest way to build parameterized connections is to build them as you go.

Start by enabling the feature “Always allow parametrization in data source and transformation dialogs” on the Power Query Editor page in the Option dialog.

As you build each query connection, for most connection types, you will be promoted to select or create a new parameter.

Here is an example of the parameters in a demonstration project. Without modifying any code or editing a query, any of these values can be changed easily.

Here are two examples of parameterized connections. For the SQL Server connection, the server/instance name is passed as the first argument to the Sql.Database function. The second example concatenates the folder path (stored in the SourceFolderPath parameter) with the file name to create the fill folder and file path.

For large SQL tables, reference either tables or views

The best way to guarantee poor query performance with a relational data source is to start with a hand-written native query (like SQL) and then perform transformations on the results.

If a query is based on a relational table or view, Power Query can generate the native SQL (and a few other supported languages) with a SQL statement rather than selecting a table or view.

Use Query Diagnostics to Diagnose and Performance Tune

The following image shows the results of Power Query Diagnostics which I describe here: Power BI Query Performance & Query Diagnostics | Paul Turley’s SQL Server BI Blog. This post demonstrates how to capture timings for each step and overall queries.

Doing Power BI the Right Way: 4. Power Query in Dataflows or Power BI Desktop

Part of the the series: Doing Power BI the Right Way (link)

Power BI, more than any other Microsoft product in my recollection, offers more options and choices to architect and deliver a solution. Without compromise, Power BI can effectively be used to do anything from create a simple chart using an Excel spreadsheet, to enterprise reporting and analytics on a Fortune 100’s massive data warehouse. At the end of this post, I will share a comprehensive list of resources and insights from Matthew Roche, a Program Manager on the Power BI Customer Advisory Team (CAT). To tease that series, I’ll start with this quote from Matthews’s blog:

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain of change has become outweighed by the cost and pain of maintaining course.

Matthew Roche, Building a data culture – BI Polar (ssbipolar.com)

When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparent when the business reaches a certain stage of data culture maturity.

Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.

Solution Architecture Choices

On a scale of one to ten, with ten being the most formalized, strictly-governed and complex corporate reporting platform; the self-service Power BI option might range from one to four.

For the self-service data analyst, working entirely in Power BI Desktop, data can be imported and transformed using Power Query. Tables are modeled, calculations are defined and data is visualized. This mode is simple and works well for small to moderate-scale solutions with less emphasis on data governance and centralized control.

Even using this simple approach, data models can be developed separately from reports, certified and shared with multiple report developers and self-service report authors. So, to a point, business data can be managed and governed – but the queries in the Power BI solution read directly from source systems or files that are not curated for analytic reporting.

Data Warehouses and Data Marts

The “single version of the truth” or “golden record” repository, a data warehouse (or smaller-scale “data mart”) is the ideal solution to store and manage reliable corporate information. The challenge with creating a central data warehouse to manage centrally-governed organizational data is that it is costly and time-consuming, however the trade-off is that self-service data models can be inaccurate and out of date. When business leaders need answers quickly, it is not always feasible to add more data sources to a data warehouse quickly.

On the complexity scale of one to ten, versions of this option might be from seven to ten.

A conventional DW/BI solution typically uses on-prem data transformation tools like SSIS to stage and transform source data into a central data warehouse built using a relational database product like SQL Server. Although viable for on-prem systems, this old-school architecture model doesn’t embrace scalable and cost-effective cloud technologies.

The first generation of the Microsoft cloud-based modern data warehouse can utilize several different Azure services. The components in following example are easily equated to the conventional data warehouse solution in the previous example. Azure Data Lake services as the staging environment typically using text files and structured file storage as an inexpensive landing area for ingested source data. Azure Data Factory is used to orchestrate and transform files and data streams into and out of the data lake – and the data warehouse. Depending on the need for scale and size, Azure SQL Database or Azure Data Warehouse (now called Azure Synapse) may be used for data warehouse storage.

If your organization has a comprehensive data warehouse to serve-up all or most of the data needed for analytic reporting, this is probably the best fit for a Power BI solution in your business environment.

Constructing an enterprise data warehouse solution is not a trivial endeavor, often involving as much effort to negotiate business process challenges as the technology development to implement the solution.

The newer generation of the Azure modern data warehouse is a best-of-breed collection of tightly-integrated cloud services called Azure Synapse Analytics. Compared to the previous set of independent Azure services, Synapse Analytics provided a unified development and management interface. Apache Spark and other industry standard technologies designed for data science and platform-agnostic analytics provides the open source data prep engine. Azure Synapse is the evolution of Azure Data Warehouse, Microsoft’s read-optimized, scalable massive parallel-processing (MPP) SQL-based database engine.

Power BI Dataflows

Dataflows can fill an important gap between purely self-service data prep and formal data warehouse solutions. If you don’t have a comprehensive data warehouse to meet your analytic reporting requirements but need to provide more data quality control over standardized entities, incorporating dataflows might be the ticket.

In its simplest form, dataflows provides reusable transformation logic (queries) that can be shared by multiple Power BI data models. Using dataflows deployed to a workspace can save data model developers from repeating the same transformation steps in multiple datasets. But these are more than just Power Query scripts stored in the cloud.

A long list of capabilities are enabled by using dataflows. They can provide integrity and standard entity definitions stored in Dataverse (previously known as the Common Data Model) to enforce standard naming, data types and schema compliance among other features.

In Premium Capacity, dataflow results may be persisted in Azure Data Lake Gen2 storage. This essentially allows you to use dataflows to create a moderate-scale data warehouse without a big investment. Entities may be linked to related entities which creates virtual joins and referential constraints. Other Premium features include DirectQuery, Computed entities and Incremental refresh – all managed in the dataflow rather than for each dataset. Integrations with Azure AI, Machine Learning and Cognitive Services allow you to utilize AI features without writing code. For example, in a recent project, we used AutoML on a dataflow containing high school student data to predict graduation outcomes.

flow of data

Dataflows Start with M Queries

Dataflows begin with an M query, just like the queries in Power BI Desktop before adding the additional capabilities mentioned earlier. Queries are authored entirely in the browser but migrating from Power Query in Power BI Desktop is fairly easy. Start with a Power BI solution (PBIX file) in Desktop and open a query in the Advanced Query Editor. You can create a new dataflow in the browser and then copy and paste the existing query M code from Desktop to the dataflow designer. You do have to copy each query one at a time and there just a few compatibility differences but for the most part, it should be a one-to-one transfer.

Building a Data Culture

Matthew Roche from the Power BI Customer Advisory Team has an excellent 17-part blog series about Building a data culture. Dataflows sit at the crossroads between business process, data stewardship and technology. The industry has been throwing technology and software at data governance and quality problems for decades, with marginal success. It is much easier for data practitioners to acknowledge that these are multi-faceted business culture challenges than it is to formulate a plan to be successful. If anyone can effectively carry and delivery this message, it is Matthew. In this video series, he offers prescriptive guidance to enlist an executive sponsor, work with business stakeholders, and to navigate the landmines of a business landscape to a successful data culture transition.

Honestly, I’ve only pursued this series in fragments over the past year and now that I’ve caught the vision, I plan to watch the entire series start-to-finish. It is that good. Think of it as Game Of Thrones with data.

Resources

Matthew also provides a comprehensive list of Power BI Dataflows resources here. Matthew recently presented to our 3Cloud Power BI and Analytics development team about using dataflows to promote a data culture. This presentation was an epiphany for me, that helped to better understand how dataflows fit into the BI solution puzzle – that’s when the gauge metaphor popped into my head. I encourage you to watch and perhaps you will have a similar moment of reckoning.

The Power BI Adoption Framework is a set of presentations from Microsoft that can serve as a checklist of important tasks and areas that should be covered in any Power BI implementation, large and small. These decks are also a great tool for adopting and sharing your organization’s BI and Analytics strategy with business leaders and stakeholders. You can use them a s a picklist to assemble your own presentations.

A First Look at Gen 2 Composite Models with Live Power BI Datasets

About three years ago when the data model development engineers from the Power BI product team told me they were working on the ability for tabular data models to share other published data models, that sounded amazing and almost too good to be true. They said it would be a lot of work to make this a reality. The Utopian dream of distributed datasets was publicly announced on the Wave 2 Roadmap last year and then the feature went to Preview in December with this announcement: Using DirectQuery for datasets and Azure Analysis Services (preview) – Power BI | Microsoft Docs. We’ve seen the new feature demonstrated and explained several times but now I’m exploring this exciting new feature for myself.

For this test, I have two different data models in separate PBIX files. One published dataset contains Internet Sales information from the ContosoDW Azure SQL database and the other contains Store Sales information. These models contain many of the same tables and a few unique tables. Notably, each data model contains a different fact table. Here are the two published data models in the Contoso Sales workspace:

After making sure that I had upgraded to the December 2020 version of Power BI Desktop, I open the Options dialog and enable the preview feature: DirectQuery for Power BI datasets and Analysis Services.

After restarting Power BI Desktop, I use Get Data and choose Power BI datasets. After connecting to the workspace and selecting the ContosoDW Internet Sales published dataset, the local model is now connected using a live connection.

After connecting to the remote model, all the tables are listed in the Fields list. At this point, I’m using a capability that has been available for a long time: one Power BI report file connected to a separate, published dataset. Previously I would not be able to add additional dataset connections or import from other sources into a live connected data model.

The status bar shows a new message with a link to enable changes to the model. On the Model ribbon, a new button is also displayed. Using either of these options will modify the local model metadata to allow multiple live dataset connections.

When I click the link Make changes to this model, a dialog box allows me to confirm that I want to switch the live connection to a DirectQuery connection and prepare the local model to allow additional DirectQuery connections.

What happens if I start with an existing local model?

The first important lesson I learned was this if I were to create a new PBIX report file containing existing tables and then connect to a published dataset, this does not enable the ability to make changes to the local model.

With an imported table in the data model before using Get Data to select and connect to a published dataset, the status bar shows this message:

I found this to be unintuitive: The text “(click to change)” is not a clickable link. At this point there is no way to update the local model with the necessary changes. I had to start with an empty PBIX to add the first live connection. After that, I could use Get Data to import additional tables from others sources, but not the other way around.

Adding More Dataset Connections

Next, I use Get Data to connect to the ContosoDW Store Sales published dataset. After the local model had been converted, there was nothing particularly unique about the experience. I could just connect to an additional dataset and all of the tables will simply show up in the data model diagram view. The tables from each model are displayed with a different colored heading, in this case they are blue and red.

I had several duplicate table names after connecting the second data model. The second set of tables were renamed and postfixed with the number 2. I renamed the tables that I wanted to keep visible. The redundant tables that I don’t need are displayed on the right side of this diagram layout. Tables in a remote model cannot be deleted from the local model. I used the Properties pane to set the unused tables to be hidden.

If I were connecting to a large remote model that had a lot of tables just for the sake of bringing one table into the local model, it would be cumbersome to hide all of the the unneeded tables – so I hope there will be an easier way to manage this in the future.

I opened Tabular Editor from the External Tools ribbon and was able to use it just as if I were designing a normal import model data model, to rename tables and set properties with multiple objects selected.

Adding relationships between tables from each remote model was pretty much the same process as it would have been in a local model. I can use the Manage Relationships dialog and use the drag-and-drop method in the diagram view. The latter method opens the Manage Relationships dialog rather then simply creating the relationship.

Finally, I added a new Combined Sales Qty measure to one of the remote tables in my local model that references an existing measure from each of the remote models. For demonstration, the table visual shown below includes the Order Date field which is related to each of the two fact tables (Online Sales and Store Sales), the Online Sales Qty measure from the Internet Sales remote model, the Store Sales Qty measure from the Store Sales remote model, and the new Combined Sales Qty measure which I added to the local model:

After publishing the new composite data model and report, the Lineage View shows that the Combined Sales dataset has connections and dependencies on the two remote datasets that show up as Analysis Services DirectQuery connections, apparently using the XMLA endpoint in the Power BI service.

Once I worked through the design process a couple of times, the data model behavior is seems to be exactly the same as always, and designing reports is no different at all. I’m just scratching the surface and I’m anxious to dig deeper and compare notes with others in the community. I have no doubt that behaviors, limits and issues will arise with this brand new and very compelling capability but I’m very encouraged so far.