Doing Power BI the Right Way: 4. Power Query in Dataflows or Power BI Desktop

Part of the the series: Doing Power BI the Right Way (link)

Power BI, more than any other Microsoft product in my recollection, offers more options and choices to architect and deliver a solution. Without compromise, Power BI can effectively be used to do anything from create a simple chart using an Excel spreadsheet, to enterprise reporting and analytics on a Fortune 100’s massive data warehouse. At the end of this post, I will share a comprehensive list of resources and insights from Matthew Roche, a Program Manager on the Power BI Customer Advisory Team (CAT). To tease that series, I’ll start with this quote from Matthews’s blog:

Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.

Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain of change has become outweighed by the cost and pain of maintaining course.

Matthew Roche, Building a data culture – BI Polar (ssbipolar.com)

When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparent when the business reaches a certain stage of data culture maturity.

Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.

Solution Architecture Choices

On a scale of one to ten, with ten being the most formalized, strictly-governed and complex corporate reporting platform; the self-service Power BI option might range from one to four.

For the self-service data analyst, working entirely in Power BI Desktop, data can be imported and transformed using Power Query. Tables are modeled, calculations are defined and data is visualized. This mode is simple and works well for small to moderate-scale solutions with less emphasis on data governance and centralized control.

Even using this simple approach, data models can be developed separately from reports, certified and shared with multiple report developers and self-service report authors. So, to a point, business data can be managed and governed – but the queries in the Power BI solution read directly from source systems or files that are not curated for analytic reporting.

Data Warehouses and Data Marts

The “single version of the truth” or “golden record” repository, a data warehouse (or smaller-scale “data mart”) is the ideal solution to store and manage reliable corporate information. The challenge with creating a central data warehouse to manage centrally-governed organizational data is that it is costly and time-consuming, however the trade-off is that self-service data models can be inaccurate and out of date. When business leaders need answers quickly, it is not always feasible to add more data sources to a data warehouse quickly.

On the complexity scale of one to ten, versions of this option might be from seven to ten.

A conventional DW/BI solution typically uses on-prem data transformation tools like SSIS to stage and transform source data into a central data warehouse built using a relational database product like SQL Server. Although viable for on-prem systems, this old-school architecture model doesn’t embrace scalable and cost-effective cloud technologies.

The first generation of the Microsoft cloud-based modern data warehouse can utilize several different Azure services. The components in following example are easily equated to the conventional data warehouse solution in the previous example. Azure Data Lake services as the staging environment typically using text files and structured file storage as an inexpensive landing area for ingested source data. Azure Data Factory is used to orchestrate and transform files and data streams into and out of the data lake – and the data warehouse. Depending on the need for scale and size, Azure SQL Database or Azure Data Warehouse (now called Azure Synapse) may be used for data warehouse storage.

If your organization has a comprehensive data warehouse to serve-up all or most of the data needed for analytic reporting, this is probably the best fit for a Power BI solution in your business environment.

Constructing an enterprise data warehouse solution is not a trivial endeavor, often involving as much effort to negotiate business process challenges as the technology development to implement the solution.

The newer generation of the Azure modern data warehouse is a best-of-breed collection of tightly-integrated cloud services called Azure Synapse Analytics. Compared to the previous set of independent Azure services, Synapse Analytics provided a unified development and management interface. Apache Spark and other industry standard technologies designed for data science and platform-agnostic analytics provides the open source data prep engine. Azure Synapse is the evolution of Azure Data Warehouse, Microsoft’s read-optimized, scalable massive parallel-processing (MPP) SQL-based database engine.

Power BI Dataflows

Dataflows can fill an important gap between purely self-service data prep and formal data warehouse solutions. If you don’t have a comprehensive data warehouse to meet your analytic reporting requirements but need to provide more data quality control over standardized entities, incorporating dataflows might be the ticket.

In its simplest form, dataflows provides reusable transformation logic (queries) that can be shared by multiple Power BI data models. Using dataflows deployed to a workspace can save data model developers from repeating the same transformation steps in multiple datasets. But these are more than just Power Query scripts stored in the cloud.

A long list of capabilities are enabled by using dataflows. They can provide integrity and standard entity definitions stored in Dataverse (previously known as the Common Data Model) to enforce standard naming, data types and schema compliance among other features.

In Premium Capacity, dataflow results may be persisted in Azure Data Lake Gen2 storage. This essentially allows you to use dataflows to create a moderate-scale data warehouse without a big investment. Entities may be linked to related entities which creates virtual joins and referential constraints. Other Premium features include DirectQuery, Computed entities and Incremental refresh – all managed in the dataflow rather than for each dataset. Integrations with Azure AI, Machine Learning and Cognitive Services allow you to utilize AI features without writing code. For example, in a recent project, we used AutoML on a dataflow containing high school student data to predict graduation outcomes.

flow of data

Dataflows Start with M Queries

Dataflows begin with an M query, just like the queries in Power BI Desktop before adding the additional capabilities mentioned earlier. Queries are authored entirely in the browser but migrating from Power Query in Power BI Desktop is fairly easy. Start with a Power BI solution (PBIX file) in Desktop and open a query in the Advanced Query Editor. You can create a new dataflow in the browser and then copy and paste the existing query M code from Desktop to the dataflow designer. You do have to copy each query one at a time and there just a few compatibility differences but for the most part, it should be a one-to-one transfer.

Building a Data Culture

Matthew Roche from the Power BI Customer Advisory Team has an excellent 17-part blog series about Building a data culture. Dataflows sit at the crossroads between business process, data stewardship and technology. The industry has been throwing technology and software at data governance and quality problems for decades, with marginal success. It is much easier for data practitioners to acknowledge that these are multi-faceted business culture challenges than it is to formulate a plan to be successful. If anyone can effectively carry and delivery this message, it is Matthew. In this video series, he offers prescriptive guidance to enlist an executive sponsor, work with business stakeholders, and to navigate the landmines of a business landscape to a successful data culture transition.

Honestly, I’ve only pursued this series in fragments over the past year and now that I’ve caught the vision, I plan to watch the entire series start-to-finish. It is that good. Think of it as Game Of Thrones with data.

Resources

Matthew also provides a comprehensive list of Power BI Dataflows resources here. Matthew recently presented to our 3Cloud Power BI and Analytics development team about using dataflows to promote a data culture. This presentation was an epiphany for me, that helped to better understand how dataflows fit into the BI solution puzzle – that’s when the gauge metaphor popped into my head. I encourage you to watch and perhaps you will have a similar moment of reckoning.

The Power BI Adoption Framework is a set of presentations from Microsoft that can serve as a checklist of important tasks and areas that should be covered in any Power BI implementation, large and small. These decks are also a great tool for adopting and sharing your organization’s BI and Analytics strategy with business leaders and stakeholders. You can use them a s a picklist to assemble your own presentations.

A First Look at Gen 2 Composite Models with Live Power BI Datasets

About three years ago when the data model development engineers from the Power BI product team told me they were working on the ability for tabular data models to share other published data models, that sounded amazing and almost too good to be true. They said it would be a lot of work to make this a reality. The Utopian dream of distributed datasets was publicly announced on the Wave 2 Roadmap last year and then the feature went to Preview in December with this announcement: Using DirectQuery for datasets and Azure Analysis Services (preview) – Power BI | Microsoft Docs. We’ve seen the new feature demonstrated and explained several times but now I’m exploring this exciting new feature for myself.

For this test, I have two different data models in separate PBIX files. One published dataset contains Internet Sales information from the ContosoDW Azure SQL database and the other contains Store Sales information. These models contain many of the same tables and a few unique tables. Notably, each data model contains a different fact table. Here are the two published data models in the Contoso Sales workspace:

After making sure that I had upgraded to the December 2020 version of Power BI Desktop, I open the Options dialog and enable the preview feature: DirectQuery for Power BI datasets and Analysis Services.

After restarting Power BI Desktop, I use Get Data and choose Power BI datasets. After connecting to the workspace and selecting the ContosoDW Internet Sales published dataset, the local model is now connected using a live connection.

After connecting to the remote model, all the tables are listed in the Fields list. At this point, I’m using a capability that has been available for a long time: one Power BI report file connected to a separate, published dataset. Previously I would not be able to add additional dataset connections or import from other sources into a live connected data model.

The status bar shows a new message with a link to enable changes to the model. On the Model ribbon, a new button is also displayed. Using either of these options will modify the local model metadata to allow multiple live dataset connections.

When I click the link Make changes to this model, a dialog box allows me to confirm that I want to switch the live connection to a DirectQuery connection and prepare the local model to allow additional DirectQuery connections.

What happens if I start with an existing local model?

The first important lesson I learned was this if I were to create a new PBIX report file containing existing tables and then connect to a published dataset, this does not enable the ability to make changes to the local model.

With an imported table in the data model before using Get Data to select and connect to a published dataset, the status bar shows this message:

I found this to be unintuitive: The text “(click to change)” is not a clickable link. At this point there is no way to update the local model with the necessary changes. I had to start with an empty PBIX to add the first live connection. After that, I could use Get Data to import additional tables from others sources, but not the other way around.

Adding More Dataset Connections

Next, I use Get Data to connect to the ContosoDW Store Sales published dataset. After the local model had been converted, there was nothing particularly unique about the experience. I could just connect to an additional dataset and all of the tables will simply show up in the data model diagram view. The tables from each model are displayed with a different colored heading, in this case they are blue and red.

I had several duplicate table names after connecting the second data model. The second set of tables were renamed and postfixed with the number 2. I renamed the tables that I wanted to keep visible. The redundant tables that I don’t need are displayed on the right side of this diagram layout. Tables in a remote model cannot be deleted from the local model. I used the Properties pane to set the unused tables to be hidden.

If I were connecting to a large remote model that had a lot of tables just for the sake of bringing one table into the local model, it would be cumbersome to hide all of the the unneeded tables – so I hope there will be an easier way to manage this in the future.

I opened Tabular Editor from the External Tools ribbon and was able to use it just as if I were designing a normal import model data model, to rename tables and set properties with multiple objects selected.

Adding relationships between tables from each remote model was pretty much the same process as it would have been in a local model. I can use the Manage Relationships dialog and use the drag-and-drop method in the diagram view. The latter method opens the Manage Relationships dialog rather then simply creating the relationship.

Finally, I added a new Combined Sales Qty measure to one of the remote tables in my local model that references an existing measure from each of the remote models. For demonstration, the table visual shown below includes the Order Date field which is related to each of the two fact tables (Online Sales and Store Sales), the Online Sales Qty measure from the Internet Sales remote model, the Store Sales Qty measure from the Store Sales remote model, and the new Combined Sales Qty measure which I added to the local model:

After publishing the new composite data model and report, the Lineage View shows that the Combined Sales dataset has connections and dependencies on the two remote datasets that show up as Analysis Services DirectQuery connections, apparently using the XMLA endpoint in the Power BI service.

Once I worked through the design process a couple of times, the data model behavior is seems to be exactly the same as always, and designing reports is no different at all. I’m just scratching the surface and I’m anxious to dig deeper and compare notes with others in the community. I have no doubt that behaviors, limits and issues will arise with this brand new and very compelling capability but I’m very encouraged so far.

Power BI: The Week in Graphics

If you’re like me right now, you need a distraction. I got started on a new blog and tweet about project management best practices and after yesterday’s events, just had to put it on a shelf. So, a couple quick things…

Power BI Top Features

Melissa Coates, long-time BI community rock star, published this amazing infographic depicting all of the features and services in the Power BI ecosystem. Next time you hear someone incorrectly describe Power BI as “just a visualization tool”, show them this:

You can download a hi-res version and get more background here: Diagrams — Coates Data Strategies

BI-Survey Searchable Maps

I recent heard from Donald Farmer, one of the original architects of the Microsoft Business Intelligence platform. Donald went on the work for QlikTech and now runs an advisory practice called Treehive Strategy Services. Donald has been working with the BI-Survey, one of the leading analyst firms in Europe, to enable an interactive map of the Azure analytics ecosystem. Similar to Garner and Forrester research, BI Survey compare industry platforms and products but do so using survey data. I’ve been participating in the BI-Survey for about 12 years and find it to be a comprehensive and insightful resource to compare BI tools, among other products. They just released these interactive guide maps for Power BI and Azure Services, Partners and Success Stories. These are just a couple of examples:

Each map has a bunch of filtering and search filters that allow you to visualize the heatmap based on pricing, features and a variety of categories. These are really useful tools.