Part of the the series: Doing Power BI the Right Way
Power BI, more than any other Microsoft product in my recollection, offers more options and choices to architect and deliver a solution. Without compromise, Power BI can effectively be used to do anything from create a simple chart using an Excel spreadsheet, to enterprise reporting and analytics on a Fortune 100’s massive data warehouse. At the end of this post, I will share a comprehensive list of resources and insights from Matthew Roche, a Program Manager on the Power BI Customer Advisory Team (CAT). To tease that series, I’ll start with this quote from Matthews’s blog:
Succeeding with a tool like Power BI is easy – self-service BI tools let more users do more things with data more easily, and can help reduce the reporting burden on IT teams.
Succeeding at scale with a tool like Power BI is not easy. It’s very difficult, not because of the technology, but because of the context in which the technology is used. Organizations adopt self-service BI tools because their existing approaches to working with data are no longer successful – and because the cost and pain of change has become outweighed by the cost and pain of maintaining course.Matthew Roche, Building a data culture – BI Polar (ssbipolar.com)
When should you use dataflows vs regular Power Query? I didn’t jump on the dataflows bandwagon and struggled to adopt them at first. Frankly, Power Query is easier to use. The browser-based dataflows designer is quite impressive but it is not as responsive and convenient as the desktop app, so this is a bit of a trade-off. The power and value of dataflows becomes apparent when the business reaches a certain stage of data culture maturity.
Before we can address the question of whether to use Power BI Dataflows, conventional Power BI queries, or any other approach to source and transform data; we need to briefly review different options for orchestrating a Business Intelligence solution in the Microsoft cloud ecosystem.
Solution Architecture Choices
On a scale of one to ten, with ten being the most formalized, strictly-governed and complex corporate reporting platform; the self-service Power BI option might range from one to four.
For the self-service data analyst, working entirely in Power BI Desktop, data can be imported and transformed using Power Query. Tables are modeled, calculations are defined and data is visualized. This mode is simple and works well for small to moderate-scale solutions with less emphasis on data governance and centralized control.
Even using this simple approach, data models can be developed separately from reports, certified and shared with multiple report developers and self-service report authors. So, to a point, business data can be managed and governed – but the queries in the Power BI solution read directly from source systems or files that are not curated for analytic reporting.
Data Warehouses and Data Marts
The “single version of the truth” or “golden record” repository, a data warehouse (or smaller-scale “data mart”) is the ideal solution to store and manage reliable corporate information. The challenge with creating a central data warehouse to manage centrally-governed organizational data is that it is costly and time-consuming, however the trade-off is that self-service data models can be inaccurate and out of date. When business leaders need answers quickly, it is not always feasible to add more data sources to a data warehouse quickly.
On the complexity scale of one to ten, versions of this option might be from seven to ten.
A conventional DW/BI solution typically uses on-prem data transformation tools like SSIS to stage and transform source data into a central data warehouse built using a relational database product like SQL Server. Although viable for on-prem systems, this old-school architecture model doesn’t embrace scalable and cost-effective cloud technologies.
The first generation of the Microsoft cloud-based modern data warehouse can utilize several different Azure services. The components in following example are easily equated to the conventional data warehouse solution in the previous example. Azure Data Lake services as the staging environment typically using text files and structured file storage as an inexpensive landing area for ingested source data. Azure Data Factory is used to orchestrate and transform files and data streams into and out of the data lake – and the data warehouse. Depending on the need for scale and size, Azure SQL Database or Azure Data Warehouse (now called Azure Synapse) may be used for data warehouse storage.
If your organization has a comprehensive data warehouse to serve-up all or most of the data needed for analytic reporting, this is probably the best fit for a Power BI solution in your business environment.
Constructing an enterprise data warehouse solution is not a trivial endeavor, often involving as much effort to negotiate business process challenges as the technology development to implement the solution.
The newer generation of the Azure modern data warehouse is a best-of-breed collection of tightly-integrated cloud services called Azure Synapse Analytics. Compared to the previous set of independent Azure services, Synapse Analytics provided a unified development and management interface. Azure Databricks is built on Apache Spark and other industry standard technologies designed for data science and platform-agnostic analytics. Azure Synapse is the evolution of Azure Data Warehouse, Microsoft’s read-optimized, scalable massive parallel-processing (MPP) SQL-based database engine.
Power BI Dataflows
Dataflows can fill an important gap between purely self-service data prep and formal data warehouse solutions. If you don’t have a comprehensive data warehouse to meet your analytic reporting requirements but need to provide more data quality control over standardized entities, incorporating dataflows might be the ticket.
In its simplest form, dataflows provides reusable transformation logic (queries) that can be shared by multiple Power BI data models. Using dataflows deployed to a workspace can save data model developers from repeating the same transformation steps in multiple datasets. But these are more than just Power Query scripts stored in the cloud.
A long list of capabilities are enabled by using dataflows. They can provide integrity and standard entity definitions stored in Dataverse (previously known as the Common Data Model) to enforce standard naming, data types and schema compliance among other features.
In Premium Capacity, dataflow results may be persisted in Azure Data Lake Gen2 storage. This essentially allows you to use dataflows to create a moderate-scale data warehouse without a big investment. Entities may be linked to related entities which creates virtual joins and referential constraints. Other Premium features include DirectQuery, Computed entities and Incremental refresh – all managed in the dataflow rather than for each dataset. Integrations with Azure AI, Machine Learning and Cognitive Services allow you to utilize AI features without writing code. For example, in a recent project, we used AutoML on a dataflow containing high school student data to predict graduation outcomes.
Dataflows Start with M Queries
Dataflows begin with an M query, just like the queries in Power BI Desktop before adding the additional capabilities mentioned earlier. Queries are authored entirely in the browser but migrating from Power Query in Power BI Desktop is fairly easy. Start with a Power BI solution (PBIX file) in Desktop and open a query in the Advanced Query Editor. You can create a new dataflow in the browser and then copy and paste the existing query M code from Desktop to the dataflow designer. You do have to copy each query one at a time and there just a few compatibility differences but for the most part, it should be a one-to-one transfer.
Building a Data Culture
Matthew Roche from the Power BI Customer Advisory Team has an excellent 17-part blog series about Building a data culture. Dataflows sit at the crossroads between business process, data stewardship and technology. The industry has been throwing technology and software at data governance and quality problems for decades, with marginal success. It is much easier for data practitioners to acknowledge that these are multi-faceted business culture challenges than it is to formulate a plan to be successful. If anyone can effectively carry and delivery this message, it is Matthew. In this video series, he offers prescriptive guidance to enlist an executive sponsor, work with business stakeholders, and to navigate the landmines of a business landscape to a successful data culture transition.
Honestly, I’ve only pursued this series in fragments over the past year and now that I’ve caught the vision, I plan to watch the entire series start-to-finish. It is that good. Think of it as Game Of Thrones with data.
Matthew also provides a comprehensive list of Power BI Dataflows resources here. Matthew recently presented to our 3Cloud Power BI and Analytics development team about using dataflows to promote a data culture. This presentation was an epiphany for me, that helped to better understand how dataflows fit into the BI solution puzzle – that’s when the gauge metaphor popped into my head. I encourage you to watch and perhaps you will have a similar moment of reckoning.
The Power BI Adoption Framework is a set of presentations from Microsoft that can serve as a checklist of important tasks and areas that should be covered in any Power BI implementation, large and small. These decks are also a great tool for adopting and sharing your organization’s BI and Analytics strategy with business leaders and stakeholders. You can use them a s a picklist to assemble your own presentations.