Power BI Performance Analysis

How can you find the cause of slow performance issues in Power BI? This post demonstrates some of the advanced tools you can you to test slowing performing queries and to do deep troubleshooting and performance analysis in your data models. Using the new Performance Analyzer in the Power BI Desktop May 2019 release, finding the cause of slowly-performing reports is easy.

The Performance Analyzer allows you to see the time it takes to run DAX queries for each visual and slicer on a report page. It also shows how long it takes to render the visual and for “other” activities such as web service calls and waiting for other blocking visual queries. After viewing the timings for each visual, you can export all the results to a JSON file for comparison and trend analysis.

The Performance Analyzer will be my first step for analyzing report performance. DAX Studio has several advanced features that are useful for performing deep analysis, troubleshooting and DAX measure development. It is a tool that requires more investment and some advanced knowledge to get beyond the surface of simple issues, but has more advance query tracing, development and problem-solving features.

Tell me about your experience. Please post your comments about your experience using the Performance Analyzer. What other techniques and tools are you using to find and troubleshoot performance?

Advanced DAX Training from Marco Russo in Portland: June 2019

It’s very rare to have the opportunity to learn DAX from the world’s leading expert but we will have that opportunity in Portland on June 19-21! The one-and-only Marco Russo is coming to Portland to teach The Italians’ Mastering DAX course. No one knows DAX like Marco and his ability to teach others about simple, advanced and uniquely insightful concepts is truly remarkable. DAX is simple but it’s not always easy. You can create some amazing calculations and analytical expressions with DAX but many advanced concepts allude most users without help learning some challenging concepts. If you need to learn DAX to use in Power BI, SSAS Tabular or Excel Power Pivot – or to take your DAX Skills to the next level, do not miss this training event.

While in town, Marco will also be the keynote speaker at the 2019 Pacific Northwest Power BI Symposium on June 18th. Last year’s event was a big success. We enjoyed hosting great speakers and a large number of local attendees last year and this event will be just as great.

Getting Started with Power BI Report Design level 101 – What Every New User Should Know

Let’s step through a few very simple things that every Power BI new user needs to know. Power BI is a great tool and it’s fairly easy to get started but there are a few gotchas that everybody experiences. It’s easy to get frustrated with little minor things until you master a few simple features in the designer. I’m just going to walk you through just a few of these items. It doesn’t matter how simple or how complex the project or your data model is, or what your data looks like. What I’m going to show you are some things that everyone experiences.

The first thing is that Power BI like most business intelligence tools really exists for the purpose of grouping and aggregating data. Now, that’s not to say that we can’t use it for detail-level reporting – and we can work with a lot of data but by default, it wants to group and it wants to aggregate. Sometimes that’s not what we want it to do.

I have a table here that I’ve just created, and this is just to simplify things but I’m going to work with a single table. You would experience the same thing if you are working with multiple tables in a data model. As you can see, I have a name column with the name of some of my my favorite cartoon characters. I have a location and I have a numeric column that I want to report on. We’re going to call that my measure or my implicit measure column. it’s just a column of numbers and just keep in mind that we’ve got three records for Fred two records for Barney two for Wilma etc and notice that those are in combination with different locations now I’m going to switch back to a blank report page and I would like to see the number of shells that are sold to a customer at a location and so I’m going to drag and drop the name column on to my report canvas and that’s going to create this table of names. The first thing that I noticed is the text is very small so I want to bump up the text size. I’m looking up here… looking at all of my ribbons for a place to bump up the text and I can’t figure out how to do that. The first thing is that you need to make sure that you have this visual selected. This is a table visual – there you can see that that’s a table and I can move that around and it can get a little confusing because when the table is already selected, and I go to grab it, that it doesn’t always move.

If you click off of the table and then hover over it, and grab it with the left mouse button; then I can move it around but that might seem to be a little inconsistent. I can also resize it but this white table over the white background… it’s hard to see where that table is. I want to bump up the text so I’m going to go to the paint roller. You can see that this is the format option. It’s simple – once you’ve done it a few times. With this table visual selected, when I choose the paint roller, I get to the formatting options. I’m going to go down to “Grid” and here you can see that I can bump up the text size. So, I can just spin that up to let’s say 14 points. That’s a whole lot easier to read. Because this is a unique combination or unique values in that column, I’ve only seeing four of them right now. I’m going to add location. Now, I can drag and drop into the visual.

With the visual selected, if I’m looking at the field list, I can drag and drop into what we call the “field wells”. This is the field list (on the right of the panel)… these are the field wells for the selected visual (on the left of the panel). Now you’re seeing the unique combination of name and location. I’ll drag Shells field into the field wells and notice that for Fred, I only see two rows. If I’m looking at the rows for Fred, how do I know if these are detail rows or if they’re being rolled-up? Now, we go back and we look at the table and we can figure that out pretty easily. If I’m just coming into this data for the first time, I may not be able to tell how how many rows or records there are in the underlying tables behind this visual. How can I figure that out so that I can validate this data and make sure that my data is accurate? Well, the the best thing that you can do is get a row count. There are a couple different ways to get a row count. If I were just to drag and drop one of these fields that I’ve already used into the visual, I can use it to count rows. I drop this down and I can say “let’s count that”. I don’t want to count it distinctly… I just want to count it and that gives me a row count because it is the count of the number of occurrences of location. Here you can see that this row in the table visual actually represents two records in the underlying table and so that kind of solves this puzzle for me by letting me know that 104.08 is actually the sum of the shells for two underlying records where the name is Fred and the location is Bedrock. There are two records that represent that one row in the table visual. Okay… so that that solves mystery number one.

Now another thing that I can do is drag and drop location onto the report canvas. Then I can switch this to Count and that just gives me an overall count for the entire report. I’ll switch that to a card visual whitch makes that a little bit bigger, so now I can reconcile the number of rows that I see here (which is six) against the number of records in my table (which is 8), knowing that that there’s at least one row here that consists of two or more underlying records in the table. That’s an easy way to resolve that.
Alright, item number two is how do I know where these visuals are on the page? I see that I’ve resized this visual and there’s whitespace all over the place and that’s not very intuitive. What I like to do in new reports is to choose the background of the report canvas, go to the paint roller (make sure that we’re looking at page information) so you should see that I have this property group called “Page Background”. Again, that’s because I just clicked on the background of my page. I’m going to go to the page background and I’m going to set the background color to a light shade of gray. I notice that it doesn’t change and that’s because by default the background of a page has a 100% transparency. why? I don’t know but if I change that, you can see that it is now a very light shade of gray. You see that my table also is transparent so I’m going to go to the paint roller and I’m going to turn the background on which does show me a slightly different background color. The default background is going to be white. I’m just going to double check that it is white. I’ll set the background transparency to 0%. I prefer to see solid colored visuals sitting on a light grey background because now I can see where things are and it’s much easier for me to differentiate where those visuals start and stop. That goes a long ways toward helping my users understand what they’re seeing on this report.

What if I didn’t want to roll-up the name and location to a single row on this table? How can I see my detail records regardless of whether the name and location are the same or not? The fact of the matter is that Power BI is always going to group on any fields that have the same value – so you have to have a unique column. You have to have a unique value that you can group on and that is the purpose of this Unique ID column. Before I just show you that I can use that Unique ID to show detail rows, I want to show you something that happens with any numeric column by default in Power BI. Numeric columns are always automatically set to summarize. That’s what this little Sigma icon is just to the left of this field. When you see that. We do want the shells column to roll-up and summed but not the Unique ID column; we want to be able to group on this Unique ID and use it to make sure that we get our detail rows. I need to come over here to the modeling tab and I’m going to drop down the “Default Summarization” list and set this to “Do not summarize”. That gets rid of that little Sigma icon and now if I have my table selected and then when I drag and drop that field… (oh, look at that it’s going to give me a count by default so it’s even overriding my my summarization). I have to tell it not to summarize right here in the field list. I’m actually getting one row in the table visual for every record in the underlying table and that’s exactly what I wanted. Unfortunately, there’s no way to to hide or get rid of this unique ID column. If I didn’t want, I can just make that column wide and I can resize the table and get rid of that the header text.

Hopefully, this helps dispel some some very simple quandaries that a lot of new users struggle with just a little bit. Best of luck as you’re getting started with Power BI!

Data Geeks and Data Jocks Meetup in Vancouver, Washington

Our little “Data Geeks” Meetup group in Vancouver, Washington is growing with a line-up of heavy-hitters and bigger crowds.

This month, on April 16th, we are honored to welcome Reid Havens, Microsoft MVP and Business Intelligence superstar from the Seattle area. Reid will be presenting “Demystifying Chart Types & Report Design Principles”. Reid is the Founder of Havens Consulting Inc. He has a formal background in technology and organizational management, having obtained separate Master’s Degrees in Organizational Development & Business Analytics. He has experience working with many of the Fortune 50, 100, & 500 companies. Additionally he guest lectures at the University of Washington and other universities, teaching Business Intelligence, Reporting, & Data Visualization.

Please RSVP on the user group Meetup page for the event. Thanks to Lexicon Solutions for sponsoring this month’s meeting. Please visit their site if you would like to upgrade your career or are seeking technical talent.

We’re seeing considerable cross-pollination of complimentary user groups in the greater Portland area which gives professionals in the region several quality, free learning and networking opportunities every month. These groups include the Oregon SQL PASS Chapter, which meets downtown Portland on the second Wednesday evening and the Portland Power BI User Group, which meets on Beaverton on the 4th Wednesday evening of each month. With the Vancouver Data Geeks meeting on the 3rd Wednesday evening, if you attend these data-centric user groups three out of four Wednesday evenings each month, you will be smarter than you can imagine and should have immense career opportunities!

In February, we had international trainer and well-known conference speaker, Peter Meyers share his Power BI insights to a crowd so large that we had to find a larger venue. Indeed, our Southwest Washington technical community is growing. Little did we know that if you search the Meetup site for ‘Data Geeks’ in Vancouver, you might find the “Vancouver Data Geeks” group in that other Vancouver way up north of the 48th parallel. We didn’t plan that nor did they but it has a been a minor point of confusion. We may be the smaller of the Vancouvers but we are making a name for ourselves, even in the shadow of Portland, Oregon; our big sister city to the south, just over the river.

Thank you all for participating in our growing community and we look forward to seeing you on April 16th for Reid’s session at the Interject Data Systems Office in Salmon Creek, right off I-5, Exit 7A.

Azure Everyday Roundup

Short (2-4 minute) video posts to help you learn about the Microsoft Azure analytics and data platform.

In November of last year, I joined the Pragmatic Works consulting team as Principal Consultant where I work alongside several great professionals, many of which I have known and have worked with over the years. One of the many things I appreciate is the deep expertise and collaboration we share across this diverse team of professionals. In early 2018, we began sharing a short video post every single day in a series called “Azure Everyday”. I’ve contributed several of these video posts in the past couple of months that I’d like to share with you here. I encourage you to check out the complete collection of Azure Everyday recordings using the link at the end of this post.

Each video post is a short insight or learning about a unique capability in Microsoft Azure. These short videos are a great way to become familiar with all of the Azure services and new features in the Microsoft cloud data platform.

New Features That Help Manage Large Volumes of Data

Data Governance with Power BI Premium

Azure Data Lake Storage to the Rescue!

Modeling Options for Enterprise Scale BI Solutions

Paginated Reports Integrated into the Power BI Service

To see the entire collection of Azure Everyday video posts, visit this page on the Pragmatic Works blog site.

Power BI at Enterprise Scale

Last night I was scheduled to present this topic at the Power BI User Group in Portland except the meeting was cancelled because of a snowstorm; except the snowstorm didn’t happen… not even a little. In fact, the roads were so clear and traffic was so light (because everyone stayed home from work to avoid the alleged storm) that I nearly got a speeding ticket on the freeway during what would have normally been rush hour traffic. I made the the 60 mile round trip from my house to the venue and back in under an hour.

Since I had prepped up my presentation, I decided to record and post it here on my blog. Please like it on YouTube but post your questions here so other community members can see them. I’ll add the transcription below.

This presentation demonstrates how to implement Incremental data refresh and discuss a recent reference project that required versioning & managing high-volume data. We discuss: Composite data models, dual-mode tables, partitions & parameters in Power Query.

We’re going to talk about new features in Power BI specifically designed for managing large-scale, serious IT-driven and managed solutions. This presentation was actually prepared for a live session at our local Power BI user group meeting which was cancelled due to a snowstorm that didn’t occur. The pop-music references were there to incent participation and there were prizes for correctly guessing the artists of the songs that I referenced – just for fun.

What’s this all about?
We’re going to talk about BI models… the way that we did serious data modeling in the past, today and where that’s going in the future. We’ll talk about some new, evolving emerging, and some existing Power BI Enterprise features which includes different modeling choices, how the design tool features are merging and where that’s heading over time. We’ll talk about working with data in high volume and massive scale; and then I will show you a reference project which is a real project that I worked on recently. I’m actually using screenshots of that project simply because there was a lot of sensitive data. A live demo of a sample project will follow.

We’ll talk about composite models and then I’ll demonstrate using parameterised queries – which you can use even if you’re just in a regular shared capacity tenant without premium features. And then incremental refresh policies, which are a premium feature, to take that to the next level.

The Way We Were
So let’s think about the song title first and we’re going to talk about the way that serious BI solutions used to be and then we’ll talk about where that’s going in the wake of Power BI. So, The Way We Were of course… that’s a reference to a song sung by Barbra Streisand for a movie that she co-starred in with Robert Redford of the same title.

Serious BI solutions have in the past used SQL Server Analysis Services for data modeling and storing that data, and of course we’re talking about analytical datasets that are used for analytical trend reporting the kinds of things that we typically use tools like Power BI for. In the past, Analysis Services is installed with SQL Server onto an on-premises server. You pay for that server; you not only have to buy the hardware and manage, optimize and tune but there are also licensing costs associated with SQL Server outside of Power BI. Typically, the person who’s going to design those data models and manage those data models is a person like this. They’re a very technical person. It’s a fairly complicated thing. It’s time-consuming and these people generally work for an IT organization. That’s the traditional approach.

Times, They Are A Chang’in
What’s the song reference? …of course that was a song performed by Bob Dylan in the 60s.
Visual Studio has long been the design tool of choice for serious semantic data model design and up to as recent as a year ago, that was my recommendation. If you’ve got serious data large volume data you need to protect that data, it needs to be a govern dataset that is being managed seriously within the organization… That’s typically going to be owned by IT; and they’re going to design it deploy it, manage it and then users can connect to it – or IT will build reports that users can use. That’s the way that things have been and but things are changing.

Come Together
Tool sets are coming together and of course “Come Together” as a reference to a song by the Fab Four, the Beatles.
What’s coming together? …the features of Visual Studio SQL Server Data Tools (SSDT) for Analysis Services tabular and Power BI Desktop are coming together. Now it’s a bit of an evolution. We’re seeing features added to each of the tools but they’re definitely converging and so we would typically use Visual Studio the SQL Server Data Tools add-in for Visual Studio to create tabular data models like this but now we have these same capabilities – and even more capabilities – in Power BI Desktop. Power BI Desktop is really kind of becoming the preferred tool.

That Was Then But This Is Now
You’d have to have grown-up in the early 80s to get this song title. This was a song, and the name of an album, from a pop/dance group in the 80s called “ABC”. They did some great songs like Poison Arrow and When Smokey Sings, but things are changing and we’re now at a place where Power BI is moving at a very rapid pace. The development of that product is very quick and the product team will focus on adding features to the product that they own and they manage; and that is Power BI Desktop. In order for this to be a cohesive development experience and for Power BI Desktop to be a tool that is aligned with the Power BI service, Power BI Desktop is really becoming the recommended enterprise BI development tool. Now, that’s not to say that there aren’t features and capabilities of SSDT in Visual Studio that may be preferable in some cases but today Power BI Desktop is the tool that I typically go to first before falling back on Visual Studio so you think about that experience we author in Desktop. We manage all of our development in Desktop; our queries, data sources, security, the data model calculations and the report visuals. All of that is done in Desktop. We deploy those assets up to the cloud, into workspaces in our Power BI tenant.

Peeking behind that cloud out there… that’s not the Sun that you see; that’s actually the Analysis Services tabular engine. That’s right: Power BI in the service, up in Azure actually has an instance of Analysis Services tabular that manages our data models. This is the same analytic database engine that’s available when you install Analysis Services on your own server with the the SQL Server install bits. The enterprise features of SSAS just have to be lit up through the Power BI tools that we’re using

What enterprise features are we talking about well we’re talking about DirectQuery which is a capability that allows you to not import data into your data model but stored metadata about tables so that when users interact with reports queries are generated in the native query language of the data source; whether that’s SQL Server, Teradata, Oracle or whatever – and those queries are actually processed by that data platform and results are sent back to your Power BI reports. There are trade-offs but it is still viable with some important features.

Composite data models give us the ability to use the capabilities of DirectQuery with the capabilities of imported/cached tables as well. Because there are performance trade-offs with DirectQuery and when we’re just generally working with data at scale, very large volumes of data, that’s going to impact performance. This impacts performance drastically with DirectQuery but it can also impact performance with imported/cached tables as well.

Aggregations give us a mechanism for speeding up those queries in our reports by actually storing aggregate values at different levels of grain. I’ll show you what that looks like and then incremental data refresh policies allow us to automatically generate partitions over our tables for a couple of reasons: First) so we don’t have to run huge queries to reload high volumes of data, and Secondly) so that we can break-down the the data stored for those tables so that we don’t have to reload unchanged historical datasets. We can only refresh new data and perhaps more recent data for our large tables.

The XMLA endpoint… as of this recording, this is still a forthcoming feature. We know that the product team are actively working on the XMLA endpoint for Power BI which will give us the same capabilities we have for Analysis Services today, so that third parties can develop tools, we can write PowerShell scripts, we can do automation, and we can generate scripts. We can do all of those things that we do have access to with Analysis Services – against Power BI that we haven’t had access to in the past. This capability will really open up the gate for for serious automation, for DevOps, build management, version control. Those things are going to become much easier when we have this programmatic and metadata layer access to data models.

If you’ve attended a Microsoft event or a conference where the Microsoft product team folks have presented about Power BI recently, you’ve probably seen this demonstration.
How is this possible that in a Power BI report we can surface six trillion rows of data?

Most of us don’t work with datasets this large but this is an excellent example of how these new features can enable such capabilities and we’re going to talk about how that works. Before we talk about enterprise scale, let’s talk about the limitations of different levels of Power BI and Analysis Services.

First of all. it’s important to understand that when working with Power BI data models and Analysis Services tabular data models (which technically are the same thing), that data is always compressed. The compression ratio (the degree to which the size of your data model will be reduced) depends greatly on characteristics of your data. Correctly modeled data will allow you to achieve compression ratios of typically up to about 20x. We generally we expect to see 5x, 10x and 20x compression but I’ve seen tables that compress much smaller: up to let’s say 50x or even 100x but that’s not very usual. This depends on characteristics like the the sparsity of the data, duplicate column values, the data types and granularity of key values. The more sparse the data, the more duplication or empty values there are in column values, and the less granular specific values are; the more that data is going to compress.
Model Size Limitations

Thinking about compressed data sizes, a standard shared capacity Power BI tenant will allow you to upload up to a one gigabyte PBIX file (that’s again, compressed data). We want to reduce the size of these files as much as possible and especially reduce the size of the file that you actually upload. If we can perform a data refresh that will then expand the size of the data, making all necessary data available after it has been refreshed, that’s better than having to refresh your Desktop file before you upload it.

What about if we are dealing with more data than one gigabyte?
In Power BI Premium dedicated capacity, a premium tenant will allow you to upload and store datasets of three to twelve gigabytes. This depends on the level of the premium capacity subscription that you pay for each month. The largest capacity will allow you to upload a file that is up to ten gigabytes in size, then when you refresh that it can expand to up to twelve gigabytes. so that’s that’s actually a lot of data for most solutions if you need to exceed that though, you can use Azure Analysis Services in the cloud (so no hardware to buy and nothing on prem.) It is in Azure which means that Azure Analysis Services is a neighbor to Power BI – possibly in the same data center – so connections between that data are very fast. Depending on the service level, you can store up to 400 gigabytes of compressed data in Azure Analysis Services.

On premises, if you have Analysis Services installed on your own server: if you use SQL Server standard licensing, you are limited to 16 gigabytes per database.

If you use enterprise licensing, there is no stated limit. The product itself can deal with any size dataset but you’re limited just by your hardware. Think about modern server hardware where you know we can have very large shared storage. Of course, memory is a limitation for the tabular engine but modern, well-equipped servers can have 1 to 4 terabytes (or more) of RAM. That’s a lot of memory that can manage a lot of data …so really, virtually there there is no limit to the size of the dataset that you can manage in SQL Server Analysis Services with enterprise licensing.
Let’s talk about how to optimize that experience. Another song reference…

Blurred Lines
Of course, that refers to the the dance song from Robin Thicke and Pharell Williams who were famously sued by the Marvin Gaye family due to copyright infringement.
what do I mean by “blurred lines”? Well, the distinction between Analysis Services and Power BI are becoming blurred because Power BI uses the Analysis Services tabular modeling engine. In the past, we’ve had a subset of those features and now we have a superset of those features but Analysis Services also has features that Power BI doesn’t have. This means the two really are converging, feature-by-feature, as they’re being added to each of those services and tools. There is definitely a shift to the Power BI toolset and the Power BI service from Analysis Services in terms of the preferred tool to use for developing solutions.

I want to cite a reference solution that I recently finished up for a client. I’m not going to show any sensitive data here but I did want to show the row counts of various tables. These table sizes have actually increased since I captured this screen but you can see that we have some relatively large tables here… just shy of five million appointments; just over six million charge capture records (which is an extension of patient encounters), about 20 million test results. The largest table in this model today has about 30 million records. The “person-measures” table in the bottom row had 16 million at the time that I captured this but that’s up to about 30 million now.
In this reference solution, how did I design for scale? There were several high-volume tables that I filtered using parameters. That was to reduce the working dataset size and to reduce the size of the PBIX file. This was, 1) to speed up development and, 2) so that it didn’t take forever every time I deployed a new version of that dataset file.

I’ll point out that in new Power BI projects I separate the dataset and report files. I create two different PBIX files. There are some minor challenges in doing that so actually I start with one PBIX where we manage the queries, the data model, measures (calculations) and the reports. But then when we get ready to transition to a true version management & ALM model, I create a separate copy of the PBIX file where the reports are connected to a separate, published dataset. That copy of the PBIX file only has the reports in it. You can actually copy and paste visuals from one Desktop file to another now. In the end we only have the queries and data modeling calculations in one file, and the reports in another file. That makes it a whole lot easier for users to do their own reporting as well. The dataset PBIX file can be large and I use parameters to reduce the size of that file. Then we can fill it up with data when we do a full data refresh, once it’s been deployed to the service.

How are parameters used to manage the dataset size? Well, we define two parameters. If you look at the documentation for the new incremental data refresh policy feature, you define two parameters called RangeStart and RangeEnd. They have to have these case-sensitive names and they have to be date/time datatypes. Even if you’re working with date-only data, they must be date/time types in order for this feature to work correctly. You then give them default values. Here, I used a range of one-year: January 1 2018 through December 31 2018. Then, I imported the tables from views.

This is key: database professionals tend to try to solve problems in SQL, which is fine because that’s an effective tool for joining tables together, creating aliases, doing groupings and aggregations. You know, all of those things can be done in SQL. However, for this and many other features to work correctly, Power BI prefers – and in some cases only works with – single database objects that it perceives to be tables. That that could be a table, or it could be a view, or it could be a table-valued, user-defined function. As you can see here, once I’ve connected to the data source, I have this list of views (I actually wrote a view for every single table that I wanted to import) and then I just select those views rather than using inline SQL. You need to select database objects from the source and don’t write any inline SQL queries. Again, I created views to consolidate database queries on the server and we actually created a separate database just to contain the views. This was so we didn’t have to to put those views into a vendor database. Next, we create a row filter step. That’s very simply: you go to the column that is a date/time datatype (this is our partition key that we want to filter on to create partitions) and just create a date filter on that column.
The row filter must use this logic on the column: is after or equal to RangeStart, and is before RangeEnd. Again, the column has to be date/time just as the parameters are date/time – even if you’re just dealing with dates, it has to be date/time. You want to perform that filter as early in the query as possible (by “query”, I’m referring to the Power Query query in Power BI Desktop) and the reason that we do that is that we want the filter to get folded into a native query. This only works with data sources that support query folding. There are many data connectors that support query folding. Query folding means that the “M” language query in Power Query gets translated into the native query language of the data source. In this case (since we’re using SQL Server), the native query language is T-SQL. If it were Oracle, it would be PL/SQL and if it were Teradata, it would use the Teradata SQL dialect, etc. Here, you can see the predicates down at the bottom of this native query that I’m viewing. It says WHERE encounter timestamp is greater than or equal to…and less than…the parameter values. Power Query generates the folded native query with this filter, and if you have steps later in your query that break query folding with some complex transformations, things still work as they should. It’s OK because the queries already been folded at this stage of the query and the results are cached in memory for later steps to be applied. Power Query works against that cached result set as a result of this folded filter parameterised query.
Within the data model, we define an incremental refresh policy. That’s done within the data model designer and not in Power Query. That feature will only light-up and enable you to perform the incremental refresh policy feature when you have this filter in place referencing this pair of parameters. You define the period number and period type. In this case, I’m saying five years. If I were to say sixty months, it would generate sixty partitions instead of five partitions. I actually had a table in this project where the SQL Server query plan was was so poor (because so many tables were being joined together and there were some indexing issues in in SQL Server). Those queries were very slow and the one-year partitions actually would timeout. By using months instead of years to partition, the Power BI service generated a series of smaller queries that didn’t timeout and it solved that data refresh problem. You can optionally detect changes using the check box you see down at the bottom of this dialog. If you select that option, it prompts you to select a column in the table with the last update date. This would typical be a timestamp type column that a database designer would use for ETL maintenance. This will detect changes and automatically refresh those changes.

To Cache Or Not To Cache
Even though the spelling is different, this is a reference to the late great Johnny Cash, just for fun.
Imported, in-memory data tables provide high performance. Okay, so this this is an important learning about using the Vertipaq Analysis Services engine: In its most native mode, where you import data into the model that is held in memory, you can expect high performance but the trade-off is that you have to have a place in memory to put all of that data. DirectQuery tables, by contrast, offer scalability so if you have you know massively large data at the data source; it may not be feasible to import all of that data. There’s a lot of goodness that you get if it is possible to import all the data into an in-memory data model. It makes things simple, development is faster, performance is going to be good and your projects are uncomplicated. All good news, but in cases where there is a bonafide reason to use DirectQuery to deal with data at scale, perhaps using unique data sources that deal efficiently with large volume data. There’s a lot of goodness in this option as well but there’s such a trade-off between those two different options.
Now, we have composite data models. These are the best of both worlds with what we call “dual mode” tables and optionally predefined aggregations.
So, what does that look like? In the past, the only option was to use one or the other – either DirectQuery or import. You could not combine tables in both modes into a single data model. Today we have the ability to create composite models where tables can either be DirectQuery or imported. It’s important to note that you can’t take an import table and convert it to a DirectQuery table but you can take a DirectQuery table and convert it to import table. Composite models provide performance for cached in memory tables but also provides some flexibility supporting high volume and near real-time access from data sources. DirectQuery has some performance trade-offs and some calculation restrictions; there are certain DAX functions that will either not be supported and not work at all or will just perform very poorly because of the way that they have to handle a lot of data at aggregate. There are reasons that DirectQuery has been kind of a difficult pill to swallow in the past. Aggregations allow us to mitigate some of these performance concerns. The idea behind aggregations is that we define a virtual table in the data model that is the aggregate of values grouped by a group of keys. As you can see, sitting between the date and geography tables we have this aggregate table that we’re calling Aales Aggs, and if the grain of the Date table is one row per day, and the grain of the Geography table is City, then we’re going to have one row in the Sales Aggregate table per each combination of Date and City with grouped and pre-aggregated values. When a user navigates to a visual or uses a report that is grouped on Date and City, they’re going to get that pre aggregated value rather than having to go back to a fact table and perform the aggregation on-the-fly. If that fact table were DirectQuery, then it’s not going to have to go out and run a group by SQL query against the data source. So, this resolves performance issues by storing the cached aggregate results and the aggregate tables are already rolled up at a certain level of grain. You can have multiple aggregate tables. This all requires some planning and advanced design skills. Typically, you’re going to use aggregations to solve performance problems. Perhaps, with more experience, after we have years of experience with this feature, we’ll be able to look at a model design and say “OK, yeah we ought to put an aggregate table there” but today, we see that there’s a performance issue and say “let’s add some aggregations to to take care of that performance problem.” It’s a brand brand new feature and it does work well. We just don’t have much experience with this yet.
As you can see in this image, we specify the columns for a table that we want to group on. This is much like a lot of user experience interfaces that we already have in Power Query and in query building tools. We just say “I want to group by the Order Date, the Customer and the Product Subcategory and then I want to aggregate the Sales Amount and the Unit Price using sums.”
You can create multiple aggregate tables to support different levels of detail or grain. Since the aggregate values are stored in the model, it does increase the model size but if designed well, the storage costshouldn’t be significant. The purpose is to improve performance and mitigate the performance handicap of DirectQuery tables. You’re not limited just to DirectQuery but that’s a use case that that’s very common.
Something to keep in mind is that these aggregates are stored in the model and of course DirectQuery results are not stored in the model, therefore there is opportunity for the DirectQuery tables and the aggregate tables to to become out of sync so refresh often and just manage expectations around that. Just know that this is a trade-off of using aggregations with DirectQuery.

Show Me The Way
What’s the song reference? I’m gonna show you a picture of the group but I’m not going to tell you who they are yet. what are we talking about? Well, these are our demos so I’m going to demonstrate a couple of things. I’m going to show you query parameterization, which is not limited to premium capacity tenants. Then, I’m going to show you incremental refresh policies which are kind of an extension of query parameterization. Incremental refresh policies are a premium feature.
Show Me The Way was a great song in the 80s from the group Styx. Peter Frampton also had a hit with a song by the same name in the mid 70s. Alright, let’s take a look at our demos. This is a Power BI project that I’ve created with data from a an on-premises SQL Server database… the Contoso DW database, with retail sales information here in Power Query…
(demos)
Thanks for watching. Contact information is on the last slide.

It’s about solving business problems & meeting people… Allen White interview

I’ve been remiss about blogging for a little while and have some great stuff to share, starting with this interview with Allen White at the PASS Summit conference in November. Allen is a 15-year PASS veteran and one of the most consistent SQL Server experts in the industry. He’s a powerhouse of knowledge, experience and wisdom. He’s a long-time SQL Server and Microsoft Data Platform MVP and owner of DataPerf Professionals Consulting, mentor, coach, trainer, co-founder and leader of the Ohio North SQL Server user group. He’s served on the PASS Board of Directors as the Program Director.

Allen shared some pearls of wisdom in this sit-down interview during PASS Summit 2018 in Seattle. He knows a lot about SQL Server tools and his first bit of advice for industry newcomers… “it’s not about tools.”

P: Allen, as an experienced leader in this industry what advice would you give to someone breaking in the industry or really thinking about planning their career around the Microsoft data platform?

A: People just starting out in the industry tend to do is focus on the tools in the technologies that’s the exciting thing. The problem is where the focus really needs to be is on the business problems that need to be solved. The tools are there to solve business problems; not in and of themselves. So, focus on what it is about the business that makes the tools useful to solve those business problems.

P: OK, and… so for a student or someone who has not immersed in facing business users and business leaders yet in their career; where does a person go to get that experience? What do they look for?

A: It’s a function of studying what it is about a particular line of business that…proposes the challenge. So, a transportation industry example… You’ve got trucks and they’ve got to get from point A to point B and you’ve got to measure that.. What tools out there into measures that? Do they have built-in GPS? Are they able to send those GPS signals to some centralized device that can even report on and track the progress of those vehicles etc.? …those kinds of things. Think about the business and the information needs that will help that business succeed.

P: Okay, good. What are you looking forward to most about PASS Summit this week?

A: Meeting the people. People make all the difference. My first Summit was 15 years ago and I’ve been to every one since and it’s always about the people.

Learn more about Allen and the PASS organization here.

Interviews from PASS Summit 2018: Matthew Roche, Matt Masson, Adam Saxton & Patrick LeBlanc

I had a chance to catch up with two of the three Matts from the Power BI product team at PASS Summit 2018.  Speaking with Matthew Roche and Matt Masson about the new Power BI Dataflows announcement, they told me that Power Query is becoming a truly enterprise-class data transformation technology. Matt Roche explained that not only is Dataflows a cloud-based implementation of Power Query with a browser-based query designer but it allows you to enforce business and data quality rules using strongly-typed entities in the Common Data Service.

Power BI Dataflows & Self-service Data Warehousing

Matt Masson told me that the expansion of Power Query into multiple products and services demonstrates how truly powerful the Power Query platform has become. He said that behind the simple-to-use user interface is an extremely flexible data transformation language, and underneath the M language is a great, flexible dataflow engine.  He’s currently working on the Intellisense features of the Power Query M language advanced editor. Matt Masson also said that he is expecting more third-party vendors to create custom data connectors for Dataflows and the Common Data Service.

I also caught up with Adam Saxton and Patrick LeBlanc from the Power BI Customer Advisory Team (CAT).  You may recognize Adam and Patrick from the “Guy In A Cube” video series.  Patrick described Power BI Dataflows as “self-service data preparation for the masses”. Adam added that that Dataflows takes Power Query to the next level by writing data to tables in Azure Data Lake Storage rather than directly to a VertiPaq data model.  This capability enables the Power BI platform to create a truly enterprise-scale data warehouse and Business Intelligence solution.

I asked Adam what skills are most important to someone growing a career in IT-based Business Intelligence.  He said that the most important thing is to understanding proper data modelling in data warehouse design.  Power BI – by itself is a great platform downstream from data sources or an existing data warehouse.  Dataflows and the CDS fills an important gap in self-service BI reporting & data analysis.  These technologies provide a common platform for data preparation, data warehouse design and standardized data entity management for self-service and BI reporting.  Dataflows shows a lot of promise for serious, next generation data warehouse solutions.  Stay tuned for more on this as I explore and prepare demos with Dataflows, Azure Data Lake Gen2 Storage and the Common Data Service. 

Data Model Options for Power BI Solutions

At the heart of every a Business Intelligence reporting  solution is a data model to optimize queries and enable ad hoc report interactions.  Data modelling technology has evolved quite a lot over the past 20 years or so.  You have several modelling technology choices with options depending on the simplicity or formality of your project and factors like data size and security.  In the past, choices were simpler.  Power BI was the choice for smallish, “good enough” projects; and when data quality, high volume and exactness were the priority, Analysis Services (SSAS) was the better choice.  Now, using Power BI for modelling larger data sets is even advantageous compared – with new features like hybrid models, aggregations and incremental data refresh.  Just in the past year or so enhancements have been added – or are in the process of being added – to the Power BI platform, that the list of options and considerations continue to grow.  Here’s a quick (and I mean quick) summarized list of the modelling options for Power BI solutions:

  • Power BI Desktop model published to the Power BI service – limited to 1 GB model size, no partitioning.  Pay per user with Power BI Pro licenses.
  • Power BI Desktop model published to a Premium Capacity Power BI Service – 12 GB size limit, dataflows & numerous “grown-up” features.  Unlimited read-only domain users.
  • Power BI Desktop model published to on-prem Power BI Report Server – limited to 2 GB model size, no partitioning.  Prior version & feature restrictions.
  • SSAS on-prem or AAS using report published to PBIRS – no model size limit.  Prior version & feature restrictions.
  • SSAS tabular or multidimensional on prem connected to Power BI report in service via the data gateway.  No size limits.  License server cores for SSAS.  License Power BI using Pro or Premium.
  • Azure Analysis Services = SSAS tabular.  No size limit, pay as you go and scale as needed.
  • Power BI reports in the service or PBIRS using DirectQuery to connect to data source with no cached model.  Report interaction queries may be slow, with some calculation limits.  *Aggregations & hybrid model design can optimize & reduce query latency, with moderately complex model development overhead (Premium features).

There are numerous advantages and disadvantages with each option so the best choice always clear but we can certainly compare the pros and cons.  I’ve been on a quest to find the best medium to break these choices down into simple terms but it truly is a journey and not a destination.  Along this journey, earlier this year, I presented a session called “The Nine Realms of Power BI” where I enumerated different categories of solution architectures for Power BI solutions; and they are numerous.

One important consideration is Microsoft’s commitment to support a product with new features in the future.  They have made it very clear that the Power BI platform is their primary focus and that they will continue to invest in enterprise-scale capabilities in the cloud service.  Never plan on a crucial feature being added later to a product but give serious consideration to  where a product is headed.

Making a side-by-side comparison of features between products and options is a little like comparing oranges, apples, grapes and bananas.  As a best effort, I started with the feature lists for SQL Server Analysis Services and added the Power BI variations.  Admittedly, this type of comparison doesn’t fit perfectly into this type of pivot format but I think it serves the purpose.  This post will likely evolve a bit with time.  Your feedback and input are welcome.

Some additional resources from trusted colleagues: Matthew Roche from Microsoft recently posted this article.  He refers back to this blog post and mentions an excellent session at the recent PASS Summit from Gabi Münster and Oliver Engels.  On Matt’s post, he links to the session slide deck that Gabi and Oliver shared.  An official whitepaper written by Chris Webb and Melissa Coates covers Power BI enterprise deployment options in 105 pages.  Keep in mind that many new features have been added since this paper was written but the whitepaper will likely be updated in the future.  Matt also references another recent on this topic by James Fancke where he contrasts features and costs.  This is, no doubt, a timely and important topic.

Guy-In-A-Cube’s Adam Saxton and Patrick LeBlanc have posted numerous short videos highlighting specific features of Power BI with Analysis Services and here are a few to get you started.  

Side-by-Side Feature Comparison

In the following table I compare capabilities, starting with SSAS Tabular and SSAS Multidimensional, with Power BI Pro and Premium capacity.  My blog theme doesn’t current support wide tables like this (apologizes for the overlapping text) so you can Open or download the Excel file here.  I hope this is helpful.  I’ll work on updates to optimize this comparison chart as features are added and time allows.

Feature
Enterprise/Developer
StandardAzure Analysis ServicesPower BI StandardPBI Report ServerPower BI PremiumComments
Max model size (compressed)No limit*16 GB No limit*1 GB2 GB10-12 GB**Premium supports 10 GB upload & 12 GB refresh.
Hybrid models
(DirectQuery & cached mode)
NoNoNoNoYesYes 
M/Power Query sources & transformationsYesYesYesYesYesYes**Query extensions in the service using dataflows
Integrated application lifecycle management (ALM) toolsYesYesYesNoNoNo 
Integrated version control toolsYesYesYesNoNoNo 
        
Tabular Models       
Programmability (AMO, ADOMD.Net, OLEDB, XML/A, ASSL, TMSL)YesYesYes*Yes*Yes*Yes*Third-party tool support, new XMLA endpoint for Power BI
HierarchiesYesYesYes**Yes*Yes*Yes**Simple hierarchies,
**AS supports HideMemberIfBlank
KPIsYesYesYesNoNoNo 
PerspectivesYes YesNoNoNo 
TranslationsYesYesYesNoNoNo 
DAX calculations, DAX queries, MDX queriesYesYesYesYesYesYes 
Row-level securityYesYesYesYesYesYes 
Multiple partitionsYes YesNoNoYes**Incremental refresh builds partitions
In-memory storage modeYesYesYesYesYesYes 
DirectQuery storage modeYes YesYesYesYes 
        
Multidimensional Models       
Semi-additive measuresYesNo 1Yes*Yes*Yes*Yes**Using DAX code,
Effort: moderate
HierarchiesYesYesYes**Yes*Yes*Yes**Simple hierarchies,
**AS supports HideMemberIfBlank
KPIsYesYesYesNoNoNo 
PerspectivesYes YesNoNoNo 
ActionsYesYes*Using 3rd party tool*Possible, limited*Possible, limited*Possible, limited 
Account intelligenceYesYesYes*Yes*Yes*Yes**Using DAX code,
Effort: high
Time intelligenceYesYesYesYesYesYes 
Custom rollupsYesYesYes*Yes*Yes*Yes**Using DAX code,
Effort: moderate
Writeback cubeYesYesNoNoNoNo 
Writeback dimensionsYes NoNoNoNo 
Writeback cellsYesYesNoNoNoNo 
DrillthroughYesYesYes*Yes*Yes*Yes**Multiple techniques
Advanced hierarchy types (parent-child and ragged hierarchies)YesYes*YesNoNoNo*Supports HideMemberIfBlank
Advanced dimensions (reference dimensions, many-to-many dimensions)YesYesYesYesYesYes 
Linked measures and dimensionsYesYes 2No*No*No*No**Equivelent functionality
TranslationsYesYesNoNoNoNo 
AggregationsYesYesYesYesYesYes 
Multiple partitionsYesYes, up to 3YesNoNoYes**Incremental refresh builds partitions
Proactive cachingYes *****In-memory model is always cached
Custom assemblies (stored procedures)YesYesNoNoNoNo 
MDX queries and scriptsYesYes*****Supports MDX queries & DAX scripts
DAX queriesYesYesYesYesYesYes 
Role-based security modelYesYesYesYesYesYes 
Dimension and cell-level securityYesYesNo*No*No*No**Equivelent functionality for measures
Scalable string storageYesYesYes*Yes*Yes*Yes**Equivelent functionality
MOLAP, ROLAP, and HOLAP storage modelsYesYesYes*Yes**cached or SSAS directYes**DirectQuery & hybrid models are equivelent or superior
Binary and compressed XML transportYesYesYes*Yes*Yes*Yes**VertiPaq in-memory compression on all data types
Push-mode processingYes YesYes*Yes*Yes**Multiple & different processing techniques supported
Direct writebackYes NoNoNoNo 
Measure expressionsYes YesYesYesYes 

Interviews with Microsoft Data Community Speakers and Leaders

What is the key to advancing your career in the Microsoft data platform?  Here is some advice from some of the most successful people in the industry…

Every year we have some big community events that bring together community leaders from all over.  These are international user group and community leaders who write books and speak at conferences. we had our local Oregon SQL Saturday “SQL Train” which is a chartered train – some coaches that one of our speaker’s chartered to bring all of our speakers and attendees up to the PASS summit after Oregon SQL Saturday, and then the big PASS summit (the big conference up in Seattle). I had a chance to sit down with a number of our speakers, community leaders and attendees and just ask questions about what brought them there, and advice that they would give people in the industry about how to get the most value out of that experience …and this is what they said:

Well-run monthly user group meetings and big annual events local events like SQL Saturday don’t just happen by themselves. It takes a lot of planning, a lot of volunteers and a lot of coordination to make these events successful. Part of that effort are the annual leadership planning meetings that we have during the week of PASS summit. Here are some short clips from those meetings where several hundred local leaders from all over the world got together to share ideas, to exchange notes and to coordinate to be able to make these events successful. Leaders cross-pollinate, exchange ideas and they work together to make this a great community. Why? …because somebody did that for us when we were getting started and we want to give back to the community. So, get involved; join the leadership committees at your local user groups, volunteer at SQL Saturday. Volunteer to do a short speaking engagement. Just get up and talk to some of your peers. Get started by volunteering in the community so that you can be part of the ongoing great community we have around the Microsoft data platform.

How to Assign Pro Licenses to a Power BI Tenant

This is a question that comes up all the time.  Power BI licensing is not complicated but a common challenge is that the person who sets up a new Power BI subscription and tenant within an organization is often not the same person who manages Office 365 or Azure service licensing for the organization.  I’ve consulted on projects for several organizations where folks just didn’t know who to talk to or how to proceed after testing the water with Power BI.  After setting up a new subscription, IT professionals and business data analysts often don’t know how to license Power BI for company use and share reports and datasets with others.

This post will show you how licenses are assigned to users and, more importantly, what to request from your support desk or administrators who may be unfamiliar with Power BI and Office 365 user licensing.  Keep reading for background information about why this is important and necessary.

You can use Power BI in one of three modes:

1) If you’re a one-person organization or don’t need to securely share content online with anyone, you can just use Power BI for free.  Yep, free.  No feature limits.

2) If your organization has a few hundred users or less, you will need to buy a Power BI Pro license for every user to author or view reports.  Unless you publish report content publicly to the web, every user must have a Pro license – period.

3) If you have several hundred users or you have so much data that you need to manage it with dedicated capacity, it may be cost-effective to purchase a Premium capacity tenant for a few thousand dollars a month.  Pro licenses are still required to publish reports but anyone in the organization can view published reports online or an on-premises Power BI Report Server.

Power BI Subscription Basics

Let’s say that I work for XYZ company and my work address is Paul@xyz.com.  Assuming that a Power BI subscription doesn’t yet exist, if I go to PowerBI.com and setup an account using my email address, I have created a Power BI subscription for my company that is a tenant within the Power BI service.  I could be the janitor for a multinational corporation but I am now the administrator of the tenant.

By way of definitions; the Power BI Service is the entire Power BI offering within the Microsoft Azure cloud.  At any time, it could encompass hundreds of virtual machines geolocated in data centers throughout the world.  When you subscribe to the service, you are – in effect – renting some space within the service.  The “space” that you rent and manage for your subscription is called a tenant.  It’s sort of like renting some office space or an apartment in a large building.  You don’t really own it but you are paying for the right to live there.  You can read about these and other administrative concepts here.

After setting up a new Power BI subscription, you really have one of two options:

1) If you have the authority to purchase licensing and manage services on behalf of your organization, proceed to purchase and assign licenses for report developers and users.

2) Make a service request or contact the appropriate administrator within your organization to add and assign licenses.  This might be your help desk, systems or operations admin or Office/cloud services administrator.

The Admin Take-Over

After a “less privileged” user sets up the first Power BI subscription for the organization, no one else can do the same.  This can be a little confusing if some person in Finance sets-up a trial account and then a BI developer tries to do the same thing.  If the organization plans to move-forward with a governed Power BI tenant, they can perform an “Admin Take-Over”.  Contrary to what the name suggests, this is not the same as a government coupe with troopers storming the building in Kevlar vests and automatic weapons.  It simply means that an administrator assumes control of the new tenant and transfers admin rights from the person who established the subscription the appropriate person(s).  Adam Saxton describes who this works in this Guy-In-A-Cube installment.