I Am Done Using Visual Studio …for BI data model development

For several years, Visual Studio has been my go-to tool for designing semantic data models used for Business Intelligent reporting. Back in 2005, I used the Business Intelligence Development Studio (BIDS) Visual Studio add-in for SSIS, SSRS and SSAS projects to develop BI solutions with multidimensional cubes. In 2012 when Microsoft began the transition from on-disk cubes to in-memory SSAS Tabular models, I used SQL Server Data Tools (SSDT) to create tabular models. It was a rocky road at first. The Tabular designer was fragile to put it mildly.

Enter Power BI… Initially intended for self-service data model and report design, Power BI Desktop has quickly grown into a robust and full-featured BI design tool. Not only does Power BI Desktop include a lot of great features, it is stable and streamlined. It’s a joy to use compared with my early experiences using SSDT for tabular model design. I prefer to use Desktop to perform model design. It’s faster, more convenient and just easier than SSDT. However, at some point I the life of a project it just makes more sense to transition the data model to an enterprise-scale effort.

Now, before anyone has a chance to comment and say “Paul, what the #$@! are thinking? Visual Studio is an essential tool and there are certain things that you can’t do with out it!”, I agree and will continue to use SSDT for a few key features. So, yes, I am not absolutely done using Visual Studio for managing projects other than SSAS, and perhaps for code check-in …I’ll finish this part of the story in a bit.

I want to be clear – I love Visual Studio.  It’s a great product for developing software and a variety of business and data solutions. However, history has demonstrated that the notion of stitching together several different products and expecting them all to just work together seamlessly is just untenable. Without getting into all the reasons that it has been difficult for Microsoft to develop and maintain a rock-solid tabular model design add-in for Visual Studio, contrast that effort with the evolution of the Power BI product.  The Power BI product team is entirely focused on developing one product by a development team under unified leadership, with a focused set of objectives. Negotiating co-development of any product by several different teams is difficult within any organization, particularly one as large as Microsoft. The reason that new features can be added weekly to the Power BI service and monthly to Power BI Desktop is that one product team manages all those features.

Some of you will remember the time when the Business Intelligence message from Microsoft was that we supposed to create solutions relying on coordinated components of many products like SQL Server (relational, SSIS, SSAS and SSRS), Windows Server, SharePoint and Office – all orchestrated to work together seamlessly. It was a good idea – and still is in moderation – but this approach produced a delicate and complicated beast that was difficult to manage and had many potential points of failure.

One of the reasons Power BI Desktop is such a wonderfully streamlined product is that the feature set is optimized for data analysts and not for IT developers. To maintain a streamlined product, we are not at all likely to see enterprise capabilities (like version control, multi-developer code merging and scriptable objects) added to this product. These capabilities do exist, however, for Analysis Services projects and community supported tools like Tabular Editor and DAX Studio. But now (drum-roll, please) Power BI dataset can be developed and deployed to a workspace using enterprise tools through the magic of the XMLA endpoint.

The Visual Studio/SSDT Quandary

<RANT>

Call it a learning disability, but I have tried time and time again to use the Visual Studio tabular designer to manage SSAS projects with the same outcome.  Smallish demo and POC projects go well but not-so-much when tackling the complexities of product-scale design. I guess it is just my natural optimism to hope things work out better than they did last time, but the laws of the universe dictate that if you do the same thing, history will be repeated.

Here’s how it goes… I start developing a data model in SSDT by importing some tables and queries, and adding relationships and measures.  All good, right?  At this point in the timeline, I often convince myself that development environment is stable and that everything will work-out so I forge ahead, believing that all will be good. I then add some more tables and a whole bunch of new DAX calculations – and soon everything goes to hell.  The model designer stops responding or behaves sporadically, Visual Studio crashes, the model definition file gets corrupted and I then I remember that I’ve been down this dark road before.

Recounting the painful past, it is frustrating to open a support ticket and explain to the engineer that “sometimes when I do that, this happens but not always” and “in all the confusion, I really can’t remember exactly how I got to this state.”

Pondering these memories, then I draft a blog post titled “I Am Done Using Visual Studio”

</RANT>

I sincerely appreciate the efforts of Kasper DeJonge from the SSAS product team back in 2012 as we spent hours in remote meetings trying to reproduce various odd behaviors in the tabular designer with a large data model. The fundamental problem was that the Model.bim file, which defines all the objects in the data model, was an enormous XML document (ours was approaching 100,000 lines.) Every change in the designer required the entire file to be re-written to disk and the loaded back into memory. Things improved significantly in 2016 and 2017 when the model definition was streamlined using JSON rather than XML, and the file structure was simplified to reduce the file size. Similar meetings with several other product leaders have proven that the product team are seriously dedicated to optimizing the enterprise tabular model experience.

I’m all about solutions and not just ranting about problems.  So what’s the answer? How should we manage enterprise BI data model and Power BI solutions from now one? Using Tabular Editor alongside Visual Studio is really a best-of-both-worlds experience. You can open the Model.bim file stored in the Visual Studio SSAS project folder.

Tabular Editor

Tabular Editor is a superb tool for developing and managing tabular data models for SQL Server Analysis Services (SSAS), Azure Analysis Services (AAS) and Power BI. It is a community supported tool created by Daniel Otykier, a Microsoft MVP and Senior Business Intelligence Architect with Kapacity.dk in Denmark. The most comprehensive resource to find this and other community supported BI tools for the Microsoft platform is on the Italians’ site at SqlBi.com/Tools

If the project is under source code control, changes made with Tabular Editor will be detected and can be synchronized with the remote source repository from Team Explorer in Visual Studio.

Here are some quick facts and recommendations:

Power BI and Version Control
Power BI Desktop files (.PBIX) do not support version control and code merging.
Recommendation:
Don’t try to do this – It will turn out badly. Starting model design in Power BI Desktop will save you time but once you transition to the Model.bim file format, use Tabular Editor.

Separating Reports and Dataset
A monolythic PBIX file created with Power BI Desktop containing reports, data model and queries is simple and easy to manage until you need to move beyond several limitations that this imposes.
Recommendation:
Power BI reports and datasets (data models) should be managed separately in all serious projects. Period. …whether you need to transition the data model to Model.bim or not.
Separating Power BI reports from the data model/dataset has many advantages which include allowing report and data model development to be performed in parallel and by different team members. This is an absolute must to create a certified dataset for users to connect and do their own reporting and analysis.

Visual Studio and Version Control
Visual Studio SSAS projects allow you to manage version control.
Recommendation:
This is a good thing. Keep doing this but use Tabular Editor as your primary model design tool.

A data model stored as a Model.bim file can have changes compared, split and merged between data model version files, deployed AS databases or Power BI datasets. Manage integrated source control with Azure DevOps or GitHub. Check-in changes, branch, merge, push and pull changes made by other developers but don’t use the Visual Studio Compare tool. Dependencies within the model definition can easily get you into trouble

Tabular Editor
Tabular Editor is a far superior design experience than Visual Studio. It is streamlined, easy to use and it won’t blow up when writing measure calculations. You can switch back and forth between tools since each tool has features that the other lacks. Just be sure to save and close the model file with one tool before opening it in the other …AND MAKE BACKUPS! The more that I do this, the more I prefer using Tabular Editor.

Tabular Editor doesn’t have a graphical model designer so I prefer to use Visual Studio to model tables and relationships. Set table and column properties, create calculated columns and measures, manage partitions and other tabular model features in Tabular Editor.

Start in Power BI Desktop and Migrate to an Enterprise Model
From Power BI Desktop, save the file as a .PBIT (template) which and then be opened in Tabular Editor. Once you save the file to the .BIM format, this is a one-way trip since a an Enterprise model cannot be saved back to a PBIT or PBIX file. Of course, if you start designing the data model in Visual Studio, there is no need to resave the model. You can just and use Tabular Editor to make new model enhancements.

Power BI Premium, AAS & SSAS
Power BI Premium capacity is required to deploy a Model.bim file as a dataset in a Power BI workspace.

Power BI Premium capacity is the E-ticket and is the best way to access all Power BI enterprise capabilities. With the XMLA endpoint, it will make more sense for most customers to deploy enterprise data models as datasets to a workspace rather than Azure Analysis Services (AAS) models.

Future-Proofing Data Models

If your organization chooses not to use Premium, which might be a more economical choice until you have enough licensed users to justify the cost, you can use AAS data models or on-prem SSAS data models to exceed the capacity limits of Power BI datasets under only Pro licensing.

If industry certification or compliance restrictions prevent your organization from using Power BI in the cloud, using enterprise models is the norm. Use a local instance of SQL Server Analysis Services Tabular. If you move to the cloud in the future, the transition should be relatively simple.

Keep in mind that Premium Capacity, or perhaps a more attractive licensing alternative, may be in your future so architect solutions in such as way that you can easily make that transition.

Demystifying the Power BI XMLA Endpoint

When the Power BI product team began promoting the new XMLA endpoint connectivity for Power BI I thought that this could be a game changer for IT-driven, enterprise class BI solutions. Now that I have used it in a few real solutions and actively working with clients, I’m sharing my experience about how it works and what you can do with it. The read-only endpoint has been in GA for Premium capacities and the read/write endpoint is currently in preview.

Before my long-winded introduction, I’ll get to the point:
Using the XMLA endpoint, the Power BI Premium service now includes the capabilities of SQL Server Analysis & Azure Analysis Services combined with newer data modeling capabilities of Power BI. Data models published to the Power BI service now support version control, scripted builds and team application life cycle management, enterprise IT tooling and scripted object management.

…wait, what? How?

Let’s just start by reviewing some of the challenges that have existed in the Power BI platform prior to the availability of this capability:

  • SSAS/AAS enterprise features
    Buried deep within the Power BI cloud service is the SQL Server Analysis Services (SSAS) Tabular model “Vertipaq” in-memory analytics database engine. The SSAS engine & on-prem product itself has many useful features not exposed in the Power BI implementation of data models and datasets. These “enterprise class” features are numerous including object-level scripting for deployments, source control, application life cycle, continuous integration and build management, data partitioning, perspectives, translations, currency conversion, KPI definitions, measure groups and calculation groups.
  • SSAS/AAS redundant services & costs
    At one point, Azure Analysis Services was a superset of data modeling features but now many exclusive features are available in Power BI datasets in a Premium capacity workspace so this choice isn’t so clear. Power BI and AAS also have separate costs. As the Power BI services continues to evolve, many new an compelling features are available only in Power BI datasets and not in Analysis Services; such as hybrid mixed-mode data models, aggregations, incremental refresh policies, dataflows and AutoML.
  • Source control, Application life cycle, builds & Team data model development
    Power BI Desktop is a convenient tool for creating an entire Power BI solution from soup to nuts but isn’t optimized for IT scale projects. A single PBIX file contains connections, queries, data model objects, measures and report pages. A PBIX file structure cannot be easily parsed, compared, branched or merged by established source control and build management tools. Analysis Services projects on the other hand support Microsoft development tools like Visual Studio, Azure DevOps and SQL Server Management Studio. Several mature third-party development tools like Tabular Editor, DAX Studio and the ALM Toolkit enhance or exceed the features of Microsoft’s development suite.

The XMLA endpoint in Power BI Premium bridges this divide by exposing the underlying Analysis Services database instances in the Power BI service. It really is as simple as that. Each workspace is in fact an SSAS instance and each Power BI dataset is an SSAS database. What you can do with traditional SSAS through management and development tools, TMSL or XMLA script, PowerShell or API automation; you can do with Power BI.

Use a Premium capacity workspace

If you have Premium capacity setup in your tenant, you are good to go. Several of our consulting clients do but to experiment and test new features, I use my own private tenant. In lieu of paying for Premium capacity, I can setup an embedded capacity in the Azure portal. Sizing the embedded capacity to A4 is the same as a P1 premium capacity but it can be paused when I don’t need to use it. The cost is about $8.50 per hour so MAKE SURE TO PAUSE WHEN DONE.

Here’s the Power BI Embedded capacity I created in my Azure tenant, sized to A4 and currently paused. It takes about a minute to start or pause the embedded service.

After starting the capacity, I jump over to the Power BI Admin Portal and click the capacity name to change settings. Note that because my capacity was added as an embedded service, it shows up in the Power BI Embedded page but the settings are the same for a Premium capacity.

Enable the Read/Write endpoint

By default, the XMLA endpoint will be set to Read Only. Under the Workloads group on the Capacity settings page, switch XMLA endpoint to Read Write and then save the changes.

Now any workspace set to use Premium capacity can be accessed with an XMLA endpoint address. I’ve created a test workspace in my tenant to test migrating an SSAS tabular database to a Power BI dataset. I select the workspace in the Power BI Portal and in the workspace settings, make sure that Dedicated capacity is switched on. I know that it is because of the diamond icon next to the workspace name. The Workspace Connection address is below. Click the Copy button to get the address on my clipboard.

Now, I can use that address in any tool that knows how to connect to SSAS tabular in compatibility level 1450 or above. Let’s try to connect using SQL Server Management Studio. I need a newer version of SSMS, 18.4 or higher.

Connect to SQL Server Analysis Services and paste the XMLA endpoint address for the Server name. You need to use Azure Active Directory authentication. If your organization uses MFA, you can use that option but I will choose regular AAD authentication.

…and viola! Connected.

I can run queries and scripts here in DAX, MDX, XMLA or TMSL. Most but currently not all capabilities are supported in the preview. In particular, if you have RLS roles, the members must be dropped and then added back in the Power BI Portal.

So far, I have scripted existing Power BI datasets and migrated them to Analysis Services projects in Visuals Studio, and then deployed to a new dataset from Visual Studio. The learning here is that migration is a one-way street from Desktop to Visual Studio. Whether development starts in Power BI Desktop or Visual Studio, there is no going back to Desktop. Ongoing development must be in Visual Studio.

Definitions: “Report”, “Database” and “Dataset

In the self-service Power BI world, the term “Report” has been used to mean at least two different things. With the queries and data model managed separately and deployed as a Power BI dataset, the term “Report” in this context means only report pages with visuals (don’t get me started talk about “dashboards”). A data model originating from a Power BI Desktop file is published as a dataset. Now that we see these objects through the lens of Analysis Services, a Power BI dataset is a Database.

In migrated projects, continue to author and manage reports in Desktop connected to the deployed dataset as an Analysis Services connection. You can actually switch the connection between an SSAS instance, an AAS instance or a published dataset using the XMLA endpoint address. I usually use query parameters for the Server and Database to easily change these values.

There’s a lot more that I will need to cover in later posts, but I’ll mention a few things briefly.

Best practices: After working with the XMLA endpoint, the choices now seem pretty clear to me but there was some confusion until I got to that point. Best practices will continue to emerge. In light of this and other recent announcements, I can say that I have a much clearer vision for how to plan and manage solutions (and to delineate between self-service and enterprise BI projects) than I did a year ago. If you have questions, please post them in the comments and I’ll do my best to address them in future posts.

Combining & querying datasets: The ability to use the the endpoint to query one data model from another model enables some very compelling composite model scenarios – but planning these solutions is important.

Large models & storage: The size and scale limitations are similar to those in AAS and generally only limited by your chosen Premium capacity. Since models are compressed and typically only include column data needed for analytic reporting, it is unusual to see datasets larger than a few gigabytes but Premium will support model sizes up to 400 GB.

By default, datasets deployed using the endpoint are stored in the Azure data centers using single file storage. This is fine for small models but after creating larger partitioned models, using large file storage will improve development and deployment performance by managing objects in multiple files on the instance backend. There will eventually be a UI for this setting but it is currently available only through the management API or via PowerShell.

COVID-19 Three Day Change Report

In my ongoing quest to present the Coronavirus data from the CDC and WHO in useful ways, I have created another set of report pages. There are to primary pages that show all countries with the percent of change in non-recovered cases in the past three days. Our previous reports provided a lot of information for deep analysis but it wasn’t so easy to make quick comparisons and to see how a state or country was progressing. This latest report connects to the same dataset but will let you see how the numbers have changed in the past three days across countries and states.

Click the following screen image to view the publish-to-web report. Just right-click a country to drill-through and view state/province three-day changes. The CDC and WHO provide state & province case data only for the United States, Canada and Australia.

I’ll update the GitHub repository and the template app with these additions as well.

Non-recovered cases are the number of confirmed cases less the number of recovered cases. The “Confirmed Cases” metric used in reports we see in most news programs and websites is the all-time accumulated total for a country or state. Attempts to report only the number of active cases has been inconsistent across geographies. This method to calculate the recovery rate seems most reliable at the country level but we don’t yet have consistent recovery numbers at the state and county level in the unified data set.

Roadtrip, Climbing Case Numbers and Public Perception

Having this information at our fingertips makes a huge difference. My wife and I are currently on an unplanned road trip attending to some urgent family business. In light of the outbreak, we are driving about 3,500 miles rather than flying. Driving through Oregon, Idaho, Utah, Arizona and New Mexico; it is apparent that the situation is perceived very differently in different areas. For example, the Navajo reservation is observing curfew and no one is allowed to be outside their homes under penalty of law. In New Mexico, absolutely everyone must wear a mask to go into a store. But in other places where we have traveled in the past two days, it is life as usual with little or no restrictions – even though the number of confirmed cases has risen by about fifteen percentage points in the past three days. Here in New Mexico, cases have increased by about 28% so we will be wearing our masks and cautiously practicing social distancing.

Countries with Improved COVID-19 Recovery Rates

Is there any good news in the COVID-19 Coronavirus data? Everyone is familiar with the tragic growing number cases and fatalities globally and in most countries that have high populations. However, several countries have increasing recovery rates. You can click here to view the new Power BI report shown in the image below.

The experts keep taking about looking for “the peak”, when the number of people recovering from infection increases faster than new cases. We’re not there yet but where is the recovery rate increasing? The “Recovery Rate (3 Day Change)” measure is the Recovery Rate (e.g. The number of confirmed cases divided by the number of recovered cases) from three days prior to the most recent report date. This report ranks countries by this measure.

On the left side are the countries highest ranked in improved recovery rates over the past three days (pictured as of April 10). The line chat on the right shows the top 20 countries with their trend of Recovery Rates over the past 30 days.

Among these are countries with small populations or those with a small number of confirmed COVID cases, so even a small number of recovered cases yields a high ratio. To remove some of these outliers, the report page only includes countries with 100 or more cases. Some could have had a high infect rate that has since slowed or it could be a lag in the available recovery date. This report uses the dataset from the main COVID-19 Daily Updates report but just focuses on recover rates.

By the time you read this, the top 15 countries by improved recovery rates may change. Right now, these countries have the highest increase in recovery rates. Why is this and what can we learn from these results?

  • Djibouti
  • Algeria
  • South Africa
  • Guatemala
  • El Salvador
  • Austria
  • Australia
  • Germany
  • Cambodia
  • Azerbaijan
  • Brunei
  • Iraq
  • Venezuela
  • Iceland

I Need Your Questions about Power BI and Paginated Reports

For a new series of posts to include tutorials and “how to” demonstrations for Power BI, SQL Server Reporting Services (SSRS) and Paginated Reports, I need your help to know what questions are most compelling and top of mind.

Please scroll to the Comments section at the bottom of this post and help me create a list with your questions and suggested topics.

What questions do you have and what do you need to learn how to do?

Make suggestions and cast your votes on others’ ideas for video posts, demos and tutorials you would like to see. I will add these to a list and work through as many as I can. The following are some rough topic areas and ideas to get started:

Power BI Solution Questions:

  • Power Query techniques
  • Data modeling techniques
  • DAX calculation techniques
  • Visualization best practices
  • How to design a specific report feature
  • Of the many choices or approaches to address a business need, which is a recommended practice?

Paginated Reports & SSRS Questions:

  • How do Paginated Reports work in the Power BI cloud service?
  • How are Paginated Reports and SSRS licensed from Microsoft?
  • When should I use Power BI vs SSRS or Paginated Reports?
  • Navigating between different report types

New Coronavirus (COVID-19) Daily Updates Report

Updated again on April 7 2020. See video tutorial below

After publishing the original daily COVID-19 cases report on March 14, Johns Hopkins University changed the file format. This required us to publish an updated dataset with a new report, which you can access >here<.

Since the original solution was posted, I have received a tremendous amount of feedback, suggestions for enhancements and corrections. I introduced the tool in the earlier post. Members of the Microsoft MVP program have been a lot of help and several of my colleagues from Pragmatic Works have jumped in to add their insights and design support. We’re working on a Power BI app template that can installed into a Power BI tenant. In the interim, you can access the publicly accessible report through this link.

Using the report, we can track the daily progression of confirmed cases, recovered cases and deaths by country. Where available, these metrics are also available state or province and US counties.

The current project and future updates can be accessed using this GitHub repo.

There are now three different ways to access the COVID-19 Daily Updates Power BI dataset & report:

COVID-19 Coronavirus Daily Updates in Power BI

March 24 update – This post will be updated frequently for the next few days. Please watch for updates.

This is a Power BI report (<link) I have developed and published with public access to the web to be shared with anyone who needs access to this information. It contains daily updates from the Center for Disease Control (CDC) using data curated by the Johns Hopkins University Center for Systems Science & Engineering. To the best of our collective ability, the data is accurate but I cannot make any guarantees. Please validate with other sources before making any decisions with this information.

Additional enhancements and contributions are being made by Microsoft MVPs and community members:

After the initial version, my colleague Robin Abramson spent late evenings and a weekend to help work-through design details. I appreciate members of the Microsoft MVP community, Reza Rad and Miguel Escobar, stepping in to help with query updates to get scheduled data refresh working.

I’m very hopeful that the this report will be a valuable resource. It’s been a labor of love and considerably more work that I envisioned. But, I will continue to work on enhancements and corrections as I am able – based on feedback. I started working on this project to help a consulting client try to understand how the virus outbreak is affecting their customer order shipment delays and materials supply chain. That grew into a off-the-clock side project, demanding nights and weekends to get this far. Now, I hope we can use this information to proactively respond to this threat.

Please post comments here or contact me through Twitter, if you have feedback, comments and questions.

link to published report

The CDC and WHO began collecting COVID-19 case information from various sources on January 22 with the latest count of confirmed cases, recovered cases and deaths recorded by country, state or province. John Hopkins University collect this data every day and store files in a publicly accessible GitHub repository. On March 1st, they began geocoding the location for each case, where available, with the latitude and longitude. Location information is sparse but available frequently enough to observe trending.

Pete Gil at Pragmatic Works initially discovered this data source from another report published at worldometers.info. He scraped their web page and created an attractive Power BI report with the latest daily numbers. Springboarding from that project, I went back to the source files and created this new data model with daily snapshots and cumulative updates.

Watch for updates (where I’ll explore the design and more details) but this a quick tour of the initial set of report pages based on my published data model:

The first page provides some background information about data sources, credits and a report page menu:

Use the bookmark buttons to navigate to each page. You can also use the page number navigation buttons below the report.

The three measures displayed at the top of this and other pages show the latest counts, as of the highest select date range. Use the range slicer to limit the time-series charts and to set the “as of” date for the latest measures (Confirmed, Recovered and Deaths).

Right-click the line/area or stacked column charts to drill-through to details for a specific date.

The Global Cases page displays the aggregate case counts by country and for known locations. You can switch between the three measures using the radio button slicer. This changes every value on this page to use the selected measure.

On every page, you can narrow the view of countries using the Country Region drop-down list slicer. Use this slicer to deselect countries that have a high number so you can view and compare other countries. Hold Ctrl to select and deselect multiple items from the slicer item list.

The Country shape map definition was created by David Eversvelt. I’ve made some modifications to accommodate country names provided by the CDC.

I have created three separate pages with Country/State & Province maps. Only a limited number of shape map files are available in Power BI so I have selected the US, Canada and Australia for now.

Either use drillthrough or navigate to the Detail Matrix page. The matrix shows the progression of the selected measure over time within a region. Expand the geography hierarchy to view details by states or provinces for a country. The date range slicer in the top-right can be used to control the range of dates displayed as columns. Within the scope of the displayed data, the largest values are displayed with graduating shades of red.

To narrow the comparison, use the Country Region slicer to filter by country and change the scope of the conditionally colored cells. This allows you to remove irrelevant regions and focus on those of interest.

The Novel COVID-19 Coronavirus outbreak is a serious matter that is affecting our world in ways that we are only beginning to understand. If we can use this data to better understand what is happening , maybe we can use this information to mitigate the affects if this global event.

What questions do you need to answer and how do you need to use this information?

How can we look at it differently to provide better insight?

How do you need to correlate the the state of cases with other data to make decisions and forecast outcomes?

Correlating Corona Virus Outbreaks with Business Data using the ESRI Map in Power BI

At the time of this post, the world is dealing with a serious health and economic crisis. The COVID-19 Corona Virus is impacting the lives of people around the world and in turn, it is affecting world markets and industries and many different ways. For example, I am working with a consulting client whose material shipping and supply chain are being are impacted by the breakout and they need to quickly respond by making order changes and logistics choices. Forecasting and planning analysts must make adjustments to help the company prepare for these impactful changes.

This demonstration shows you how to create a multi-layer map to correlate current outbreak case locations with your own data, using Power BI and the ESRI map visual. I’m using sample data for demonstration but this is the same technique I am using for our client. In the real data set, correlation is impactful where shipping orders are being delayed and cancelled in areas most affected. For more, visit my blog at SqlServerBiBlog.com.

The technique used in this map report is relatively easy to implement because both data sources are separate feeds to the map service. There are different ways to correlate map data from two different sources. In our solution, we are also integrating the CDC data into the data model, which will allow us to perform comparison calculations. Using AI and machine learning, we may be able to perform predictions and associations.

Power BI Query Performance & Query Diagnostics

This post demonstrates how the order of steps added to a query can make a big performance difference and drastically effect the number of steps generated by the designer. I’ll demonstrate how to use the new query Diagnostics tools to compare and understand query performance.

The Power Query Editor for Power BI simplifies data transformation processing by generating query steps for each action you perform in the query designer. This whiteboard diagram shows the high-level flow of information through a Power BI solution. Every query has a source (“SRC” in the diagram) followed by a connection. The query consists of a series of transformations (“XForm”) prior to populating a table in the data model.

These steps are defined in “M” code which is executed when the data model is processed. In simple projects, all the query steps are automatically generated. The order with which you add these steps makes a difference. Not only does the order that you add steps to a query help organize and manage a query but it can have a significant impact on performance and the computer resources needed for a query to run. A little planning and iterate clean-up as you work through the design process can make a big difference.

The two queries shown here have exactly the same outcome and they were both created just by choosing transformations from the query designer menus. The only difference is the order that I chose the options.

Introducing Query Diagnostics

To understand how query steps are being processed an to compare two test queries, I use the new Query Diagnostics features on the Tool ribbon. In this simple test, this is really easy.

I select a query in the designer, start the diagnostics, perform a refresh and then stop the diagnostics. This generates two new queries with the diagnostics results.

I then choose the other query and repeat the same steps to get diagnostics for that query.

There is a boatload of useful information in the diagnostic results query but it’s way more than we need.

The most important information for this test is the Exclusive Duration column. For this test, I all need is to summarize this column. I did the same thing with both diagnostic queries and then compared the two results. Appending these two summarized diagnostic query results clearly shows the difference in performance:

Video Demonstration

This video demonstration is an exaggerated yet effective example of working through the process of importing a simple Excel worksheet and then transforming a series of columns. In the first example, I rename and change the data type of each column, one-at-a-time. In the second example, I consolidate the steps; renaming each column and then change the column data types. How does this simple change to my approach affect the generated query and execution performance?

Getting Started with the New Power BI Activity Log API

When the new Power BI service activity logging API was announced over the holidays, I was intrigued and anxious to start working with it. I’ve had some experience with report usage monitoring using the existing Office logs and usage metric reports that do provide some useful information but can be a chore to use. Activity monitoring and troubleshooting with the new logging API is focused entirely on Power BI tenant events like dashboard, interactive and paginated reports views, deployments, errors and data refresh events. This should be easier than before, enabling admins to be more proactive by tracking usage patterns. In a short series of blog posts, I’ll demonstrate how to build a complete activity logging and reporting solution for your entire Power BI tenant. In this first post of the series, we’ll just get started with the basics by capturing just a few log events for a brief window of activity and surface them in a simple report.

Before we get started,

a brief history lesson is on order:


When the Power BI cloud-based service was initially offered back in 2013 as a feature extension to Office 365, it used SharePoint Online as the storage architecture. Some additional logging events were added to the existing and numerous Office 365 events catalog. It has always been cumbersome to find relevant reporting information amid all of the other noise in these logs. Components of the Power BI back-end services have since been migrated into a more specialized service infrastructure but the activity logging has remained in Office 365 until December of 2019. The Office logs required special privileges within the Office tenant and produced volumes of event data related only to folders, files and Office documents.

The new Power BI Activity Log API is specially-suited and optimized just for Power BI. By contrast, it will be much easier to identity and track relevant service and user activity for workspaces, apps, dashboards, interactive Power BI reports and paginated reports in the Power BI service.

I envision that my production-scale logging solution will use an orchestration tool like Azure Data Factory to iterate over historical activity logs, store log files to Azure Data Lake Storage and then incrementally update a Power BI data model for reporting and analysis. This first example will use PowerShell script manually executed from my desktop.

PowerShell Me, Baby

The new Get-PowerBIActivityEvent commandlet is added to the Microsoft Power BI Management library. Install the latest version to gain access to the activity logs.

In my project, the first step is to open and run the PowerShell ISE as a local administrator. To installed the latest Power BI Management library locally, I execute this code :

Install-Module -Name MicrosoftPowerBIMgmt.Admin

I need to sign-in to my Power BI tenant with a service or user account that has tenant admin privileges. This code opens a standard login dialog and prompts for a user name and password, populating a Credential type object variable used to open a connection to the Power BI service:

$Cred = Get-Credential

Connect-PowerBIServiceAccount -Credential $Cred

In the Power BI product team’s Developer blog, Senior Program Manager Kay Unkroth explains that the Get-PowerBIActivityEvent commandlet can be called with date/time parameters to include only one day at a time. This line requests all activity on Junuary 5th, 2020, caching the activity log information as a JSON structure:

$activities = Get-PowerBIActivityEvent -StartDateTime ‘2020-01-05T00:00:00.000’ -EndDateTime ‘2020-01-05T23:59:59.999’

Finally, I write the log contents to a local file using this code:

$activities | Out-File -FilePath C:\Temp\RSActivityLog.json

Importing this file into Power BI Desktop produces the following simple table:

A couple of important things to point out

The API is optimized to handle large numbers of events. As such, it is limited to return records for a range of time up to one full day using the StartDateTime and EndDateTime parameters. The web service returns a continuation token return parameter to let you know if there is more data beyond a fixed frame size that will typically return about 5,000 to 10,000 records.

Incidentally, I’ve played with a few different file formats. JSON is by far the most flexible format but you may not get all the key/values you want just by consuming the results right out-of-the-box and without parsing all the nested levels. In Kay’s article, he uses the ConvertTo-JSON directive to flatten the native JSON document into a more conventional colon-delineated log flat file with named key/value pairs. Using this method, I was able to get more field information that those that apparently slipped through the cracks from the JSON document. Although, I had to massage the output a bit and then transform the rows into columns using some fancy pivot dancing in Power Query.

This simple report

is a first effort showing records for a short time frame. I’ve produced a single table with some recognizable grouping and slicing attributes but we can do considerably more. Using these fields, I analyze activities for only dashboard , Power BI or paginated report views. We can filter by user, object type, operation type, the web browser or devices used to view content, type of workload or the containing workspace.

In a fully-fleshed-out data model, some of the attributes might exist as separate dimension/lookup tables. But, this is enough for now.

Post Conclusion

Please share your questions and your own experience with the activity logging API, and watch for subsequent posts about tenant monitoring and activity management solutions.

Curator’s Corner: December 2019

As I read the many posts from those in the community who I follow, I am reminded that the community brain trust is much greater than any individual. As a writer and blogger, I’m occasionally compelled to express an original thought or opinion that I think is uniquely my own. However, we work in a world where everything comes from somewhere and there are many contributors who I trust and rely upon for advice and cutting-edge information. This “corner” of my blog is to highlight these community contributions that I find informative.

James Serra, Microsoft Solution Architect and former Data Platform MVP, continues a deep expose’ of Azure Synapse Analytics, with the sixth post in the series. This new Azure service headlined at both Ignite and PASS Summit, currently in Preview from Microsoft, is the evolution of the modern data warehouse. Azure Synapse Analytics is an orchestration of services including Azure SQL Data Warehouse, Data Bricks Data Lake Gen2. It will be an important consideration for serious cloud-based BI, analytics and data warehouse solutions at enterprise scale.

Azure Synapse Analytics & Power BI performance
Azure Synapse Analytics new features
Azure SQL Data Warehouse Gen2 announced
Azure SQL Database vs SQL Data Warehouse
What is Microsoft Azure Stream Analytics?
Azure Synapse Analytics & Power BI concurrency

Marco Russo, a name synonymous with DAX and BI expertise, captures what happened in the DAX world in 2019 in a an aptly-named blog post: “What has happened in the DAX world in 2019” :-). He also writes “I’ve seen the future, and it’s bright – but unfortunately, it’s under NDA!” and actually goes on to describe some of the announcements expected in the next year and major conference events.

David Eversveld, Data Platform MVP, writes about improvements to the Power BI theme capabilities. In addition to the new design ribbon in Power BI Desktop, themes can be exported from the designer. Adding to his collection of reference material that I have found valuable in my Power BI toolbelt, David posted this data color reference to assist with color selection for reports.

The new Power BI Activity Log was announced this month. This will make it easier to capture and monitor user and report activity. It also simplifies Power BI tenant administration by isolating report activity from Office 365 events and other log events. Power BI started out as an extension of Office 365 and SharePoint Online services but not all organizations use or manage Office 365 and Power BI under the same administration role. Microsoft continues to deliver on the promise to provide comprehensive automation APIs and tooling for administration.

The consistent contributions of Guy In A Cube’s Adam Saxton and Patrick LeBlanc are too numerous to mention. Notably, they were awarded the “Most Helpful Data Video Channel” by Data Literacy Awards. Data Literacy LLC is a Seattle-based training and education company founded by Ben Jones.

A Conversation with Ásgeir Gunnarsson about Power BI in the Enterprise

As I continue to explore and develop best practices for managing serious business-scale Power BI solutions, I’m having conversations with recognized community leaders. Last month I chatted with Ásgeir Gunnarsson on the SQL Train ride from Portland to PASS Summit in Seattle. Ásgeir is a data platform MVP and seasoned Business Intelligence expert from Reykjavik, Iceland who works as Chief Consultant for Datheos, a Microsoft-focused BI and Analytics consultancy in Copenhagen. He leads the Icelandic Power BI User Group and PASS Chapter.

He gave an inspiring presentation at Oregon SQL Saturday about Enterprise Power BI Development. You can view his presentation deck from the Schedule page here.

Ásgeir talked primarily about the development life cycle for for projects centered around Power BI, data and object governance. As I’ve mentioned in my earlier posts on this topic, the development experience for BI projects in general is different from application development and database projects and you cannot use the same management tools – at least not in the same way. He promoted using OneDrive for Business to manage version control.

He shared several excellent resources, many of which I either use or have evaluated, to help manage Power BI projects. The ALM Toolkit is a useful tool for comparing objects in great detail between two PBIX files. Ásgeir also show some efforts from community contributors to automate change-tracking file-level source control (which really made the point that it’s a difficult thing to do with Power BI). We know that Microsoft are working on an integrated release management solution for the Power BI service which may amend or replace the need for existing tools.

Guidance for publishing and management the life cycle for Power BI solutions included deployment automation using OneDrive and PowerShell. Using multiple workspaces for development, testing and pre-production; deployment can be managed using the Power BI REST APIs and PowerShell script, which is detailed in this tutorial. These PowerShell examples demonstrate how to clone workspace content, publish files and rebind data sources.

Regarding governance and security, he made reference to the extensive Microsoft whitepaper: Planning a Power BI Enterprise Deployment. He steps-through diagrams that help simplify each of the important processes and tasks for developing, deploying and managing Power BI solutions.

If you need to manage Power BI solutions, I encourage you to review his presentation and you can connect with Ásgeir on LinkedIn and Twitter.

Setting up Power BI project Team Collaboration & Version Control

Team file sharing and version control may be managed through Power BI workspace integration with OneDrive and SharePoint team sites. In this post, I will show you how to get started and setup a new project.

In this demonstration, I’m going to show you how to set up a workspace and a team collaboration site so that you can share Power BI files among members of your team and manage version control. In another post, we’ll discuss the nuances and the differences in version control, build processes and DevOps comparing a Business Intelligence project with an application development project – and how those are different experiences. An important lesson is to learn how to work with this tool the way that it was designed to be used rather than to try to force Power BI to work with incompatible build management and version control tools. Power BI, along with companion technologies like Analysis Services and Reporting Services, can participate in an application life cycle (ALM) and continuous integration (CI) framework but sometimes not in the way we might imagine. This post addresses a piece of that larger puzzle.

Get Started in the Office Portal

If your organization uses office 365, you have all the tools that you need. First thing is to go to the Office Portal at office.com. Click on the App menu and choose “SharePoint”. Here, we’re going to create a new site. I’m going to choose the Team Site template and this is going to be for my Contoso Sales project. We’ll give the site the name: “Contoso Sales Project” and click “Next”. The only other thing I really need to do here is add some users as owners to this site. You can add an office group or you can add users later on. I’ll go ahead and add a couple of users and we’ll finish so that creates the site in SharePoint Online. Here, you see my new site. What happens is in Azure Active Directory, a new group is created for this site and we don’t need to do anything extra. You’re going to see that here in the Power BI portal.

Create a New Workspace and Assign a OneDrive Group Alias

I’m going to create a new workspace for development. Depending on the formality of your project, you could have a DEV a TEST and a Production workspace. You can create one or perhaps two workspaces for a project. I’ll postfix the name to indicate that this is going to be my DEV workspace. In the Workspace Settings, I’m going to choose the Contoso Sales Project group that was automatically created for my site as my workspace OneDrive group. That gives every member of that group access to this shared space and gives me the ability to sync files with this workspace.

Add a Folder and Setup Sync

Let’s go back to SharePoint Online and view the team site. Since the site was just setup, you’ll see that it shows up at the top of the most recent list. The next thing that I’m going to do is add a folder that I can sync with my desktop so I can share files with other members of my team. We’ll go to the Documents library within the new site and here I’m going to add a new folder. I’ll click new and folder and we’ll give the folder a name so this is going to me my Contoso Sales Project Files folder. We’ll create that folder and then the next thing I’m going to do is configure that folder to sync with my desktop. Each member of your team can do the same thing which will enable them to put copies of the files within this synched folder onto their desktops. You always want to use Power BI Desktop to work in a local file folder. So, there’s my my synced folder on my desktop. It’s probably a good idea to add that to my Windows Quick Launch bar or add it to my Favorites so that I can get to it quickly.

I’ll put a PBIX file into this folder and in the future, I’ll go to do all of my Power BI development work. I have a file from an existing project that I’ll use for simplicity sake. I’m just going to add a .PBIX file and designate this is my dataset file. In another post, I’ll talk about separating datasets from reports – which is something that I routinely do, especially in larger more formal projects. I’ll go ahead and just copy and paste a .PBIX file and then rename that file eventually I’ll have one dataset .PBIX file that will contain all of my data, and that’s what this file is for. It currently contains a report with pages in it. When I make a change to that file, it immediately synchs – which you can see in the OneDrive app. If you just click on the OneDrive icon, you’ll see that that will sync-up. This is a fairly small file so it goes pretty quickly.

Publish a File from the Synchronized Folder

Back to the workspace: I’m on the Datasets page and I’m going to click “Get Data” which takes us to a page where we can see the Files tile. I’m going to click “Get” on that tileand that’s going to show me that there’s a new OneDrive tile with the name of the new team site. I’ll click that and it takes me to the team site where I’ll see the folder that I had created. I click that and now I see the .PBIX file.

When I choose “Connect”, that actually imports the file as a new dataset in the service. Using this method to publish the dataset will keep that file synced up and any changes I make will automatically be deployed. There is no more need to deploy updated dataset or report files from Power BI desktop. This file will remain synced-up all of the time. I can set-up a gateway, schedule refresh and all of the other things that I would normally do with a Power BI file as if I had deployed it from Desktop. Using this method, it’s going to remain synched-up through this SharePoint Online folder, which is managed through the OneDrive For Business application.

Testing and Demonstrating Synchronization

I’m going to open this file in Power BI Desktop and just make a simple change and save the file. We should see that it automatically gets synched-up. I’ll do that a couple of times and then we can go take a look at the automatic version control that takes place through SharePoint Online. OneDrive For Business is actually SharePoint Online under the hood. When you create a team site, it essentially creates a OneDrive For Business folder. That’s how all of that magic happens.

Incidentally, while synching the file – as you can see, I just got an alert and an invitation from SharePoint that welcomes me to the site. Each member of your group will automatically get an invitation like this with a link to the folder that I’ve shared with them. I’ve changed the folder view to large icons and you can see the little green check box icon that shows that that file is synced-up with my desktop.

Make sure to always open the .PBIX file from the synced folder on your computer rather than from the SharePoint site through your browser. If you make a change to the file in Power BI Desktop, you’ll see the changes in seconds to minutes, depending on the file size. It doesn’t really matter what kind of change I make… I’ll just go to the data model diagram view and make a change… I moved a table around (imagine that I’m adding a measure or adding a relationship, adding into the table anything that would constitute a change)… this flags this file and causes me to be prompted to save it. Now we go back to File Explorer and notice the little sync icon which I’m going to get for a few seconds in this case. Because it’s a small file, now it’s synced-up again.

Edit Local Files in the Synchronized Folder

After making changes, always close Power BI Desktop so the file isn’t locked by the application. That releases the file so that it can be locked momentarily by the OneDrive process and synched-up with the online service. You can watch the progress here in the OneDrive app. If this were a large file, you’d see a little progress bar and it might take a couple of minutes to synchronize. All members of your team will see the new file here in the file system afterward.

Using Version Control and Team Collaboration Features

From File Explorer, choose the “View Online” menu option to go to the SharePoint library in your browser. You can use the ellipsis menu next to the file name to check-out the file. This prevents changes by other team members while you have it locked. You can use the “Version History” menu option like I’m doing here. You can see that every single change generates a new version of the file. If I have admin access – as I do here – I could choose a version and restore or revert back to that version. You have complete control here so this gives us a really good version control story for Power BI and it gives us the ability to collaborate as members of a team.

Next Level Power BI – M Language Basics

This is a series of posts and exercises from a recent full-day workshop called Next Level Power BI. Check back for additional lessons and freely post your questions and feedback in the comments section after this post.

Power Query is a modern data acquisition and transformation technology and it is the “Get Data” component of Power BI and many other Microsoft tools. The query script language of Power Query is known as “M” or the Microsoft Data Mashup language. The menu options and selections in the Power Query Editor interface build this “M” script for you but to use Power Query at an expert level, you must understand fundamental concepts of the language. The post will step you through some essential mechanics of the language to build the necessary fundamental skills.

What you learn here can be applied to the Power Query implementation of other Microsoft products, which currently include:

  • Power BI Desktop Query Editor
  • Power BI Data Flows
  • Excel “Get & Transform” on the Data ribbon
  • SQL Server Analysis Services (SSAS) Tabular projects in SSDT/Visual Studio, or Azure Analysis Services.
  • Azure Data Factory: Power Query task

By the time you read this, there are likely to be more implementations of Power Query.

Basic Syntax

A quick review of the characters that are used to define or encapsulate groups of objects will help to better understand and recognize what you are seeing in the following code examples.

Parentheses, square brackets and Set braces all have different meanings. For references, here they are in the same order:

( ) – Parentheses are used to pass parameters to a function and to control mathematical order of operations.
[ ] – Square brackets encapsulate a set of records.
{ } – Set braces encapsulate a list of values.

Most functions in M are part of an object class (such as Date, Table or Text) and the syntax is something like this:

  • ObjectClass.Function( ),
  • ObjectClass. Function( Parameter ),
  • ObjectClass. Function( Parameter1, Parameter2 )

Examples:

  • Date.AddYear ( <date value> )
  • Table.FromList ( <list object> )
  • Text.End ( <text string>, 2 ) – returns last two characters from right of string

A few older functions that were part of the initial release of Power Query are not prefixed with the object class name. For example the #date( ) function has three parameters: Year number, Month number and Day number.

Any word may be used as a variable. Common examples include step names like Source. Variables are often multiple words containing no spaces, such as ServerName. You are strongly encouraged to rename steps with short but meaningful phrases to make them self-documenting and easy to understand.

Are you with me so far?

The Mighty #shared Function

This little function is literally the key to the entire kingdom and will return every Power Query object defined in your file along with every available M object class and function in the entire language along with complete documentation and code examples. This is the best kept secret in Power Query.

Add a new blank query and type the following into the formula bar, and watch while magic things happen:

=#shared()

The results will be returned as a list of objects and functions. Convert that to a table and you can view the definition of several hundred objects and functions. The M laguange contains nearly 900 functions and more are frequently added to new product versions.

Click the cell next to the link for any function to peek at the definition or click the link to explore related details.

Getting Started

The following examples are hand-written queries entered into the Advanced Editor in Power Query. None of these queries have an actual data source. Values are generated within the code itself but all of these examples can be applied to real queries based on any data source.

Each of the following queries adds a little more to the previous query to explore progressively more advanced objects and code techniques in the M language.

If you create a new blank query in Power Query, it will generate code like this:

let
Source = “”
in
Source

This is the fundamental pattern to get you started.

Values

Let’s modify the script to give the Source variable a text value of “This is a value” (creative, I know).

Use the Advanced Editor to view the generated M script.

Here’s just the script outside of the Advanced Editor:

//——————————————————-
//#1: Object Types: Value
//——————————————————-
let
Source = “This is a value”
in
Source

Use the Done button to close the Advanced Editor to see the result.

Unless they are explicitly data-typed, variable object and data types are automatically assigned, which you will see in the following examples.

The Source variable represents a value type object that is implicitly typed as text. You can see the resulting value in the query designer:

Records

In M, records are encapsulated in square brackets. This script generates two single column records.

//——————————————————-
//#2: Object Types: Record
//——————————————————-
let
Source = [Record1 = “Value 1”, Record2 = “Value 2”]
in
Source

The result is two single column records:

Lists

The List object in M is denoted using Set braces, sometimes called “squiggly brackets”. The following script creates a list of four text-type values:

//——————————————————-
//#3: Object Types: List of Values
//——————————————————-
let
Source =
{ “Value 1”, “Value 2”, “Value 3”, “Value 4” }
in
Source

The list is shown below. Note the column header in the preview grid reminds us that this is a List object. Also note that the designer shows the List Tools ribbon with options to convert the List to a Table, because that’s what we normally do in Power Query to be able to consume List data.

Now, we can start to combine these concepts and techniques. The following query script creates two records, each containing two columns with corresponding values:

//——————————————————-
//#4: Object Types: List of Records
//——————————————————-
let
Source =
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
}
in
Source

The designer shows two records as links. This is because these are not consumable objects. To use them, you can either convert them to a table or expand one record by clicking the link to view the values.

To peek inside a record without adding a new query step, you can click in a cell but not on the link and the value will be displayed in the preview area at the bottom of the screen.

The first time you see a link presented in this way, you are likely to click on it to see what happens. No damage done, but this will generate a step that produces a single record and you will lose sight of the entire list. To go back, just delete the generated step in the Applied Steps list.

Tables

If you use the Convert to Table button on the toolbar, that will add the steps you see in this query.

The Table.FromList function is a good example of several similar functions that serve a similar purpose. As you look through all the functions within the different object classes, you will notice several different “From” and “To” functions that convert an object from one type to another type. These are both easy to use and they are easy to identify and to understand their purpose. This one should be obvious: As the FromList function in the Table class, it converts a List type object to a Table. Another example, the ExpandRecordColumn in the Table class expands a column containing delimited values into separate columns.

//——————————————————-
//#5: Object Types: Table
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”})
in
#”Expanded Column1″

The result of these steps produces the results you see here. We began with a list of two records, each having comma-separated values. After the conversion steps, we now have a Table object with two rows and two columns with values and headers.

Transforming Each Row

When you use the Transform ribbon or menu options to perform most any transformation on a column, you are in essence saying “I want to perform this action on the specified column in each row of the table.” The key word in this statement is “each” and this is an important differentiation in the following M code that was generated when extracting the last character of the column named Column2.

An example of this generated query is shown in Query #6 below.

//——————————————————-
//#6: Object Types: Transform each row
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”}),
#”Extracted Last Characters” = Table.AddColumn(#”Expanded Column1″, “Right Char”, each Text.End([Column2], 1), type text)
in
#”Extracted Last Characters”

Let’s isolate just that step and examine the M code in the last line of the query:

#”Extracted Last Characters” = Table.AddColumn(#”Expanded Column1″, “Right Char”, each Text.End([Column2], 1), type text)

The ‘#”Extracted Last Characters”‘ variable is being assigned the outcome of The AddColumn function. We know that the return object type is a table because of the object class Table always returns a Table type object. The AddColumn function takes three parameters:

  • A table type object (usually the name of the previous step variable)
  • Name of the new column as text
  • An expression that returns a value for the new column

The last parameter argument in this case is the output of the Text.End function. You might guess that this is going to be a text type value. The End function, similar to the RIGHT function in Visual Basic, returns a specified number of characters from the end of a text string. But, since Column2 exists in every record in the table returned by the ‘ #”Expanded Column1″ ‘ variable, how do we specify that we want to do this for each record, one-at-a-time? …go got it. We use “each“.

Note to the Right Char column in the following results. It is the right-most character from Column2:

Transforming Columns In-place

Rather than creating a new column containing a value extracted from a previously created column, you can also modify an existing column. This is done using the TransformColumns function of the Table class. Again, we’re looking at the last step of the following query that was generated by using the Extract menu on the Transform ribbon or menu:

//——————————————————-
//#7: Object Types: Transform each row col in place
//——————————————————-
let
Source =
Table.FromList(
{
[Column1 = “Value 1”, Column2 = “Value 2”],
[Column1 = “Value 3”, Column2 = “Value 4”]
},
Splitter.SplitByNothing(), null, null, ExtraValues.Error
),
#”Expanded Column1″ = Table.ExpandRecordColumn(Source, “Column1”, {“Column1”, “Column2”}, {“Column1”, “Column2”}),
#”Extracted Last Characters” = Table.TransformColumns(#”Expanded Column1″, {{“Column2”, each Text.End(_, 1), type text}})
in
#”Extracted Last Characters”

Here is the last step isolated:

#”Extracted Last Characters” = Table.TransformColumns(#”Expanded Column1″, {{“Column2”, each Text.End(_, 1), type text}})

Instead of the column name being passed to the Text.End function, we see an underscore. This is just a shorthand reference that means “use the column currently being transformed”.

Creating a Table from Records

Now back to completely hand-written code… This one’s easy:
By default, new columns are data-typed as “Any” which is essentially an unclassified data type and this is not typically a best practice. Columns should be strongly typed so that grouping, aggregation and DAX functions work correctly in the data model.

In the following query, we apply the TransformColumnTypes function from the Table class. The function requires a table and at least one column name and data type reference. You can pass any number of columns to this function, each encapsulated with set braces, and with another set of braces around the whole thing.

Here, you can see the results: one table with headers, two columns and three records. This all looks good.

Creating a Table from a Set of Records

Now, we’re going to start putting things together by using object type variables. I think you’ll agree that the code is much easier to read.

And we see similar results as before but with a much cleaner approach and readable code.

Extracting a Record from a Table

As a final assignment, I need to get only one record so I essentially need to unwind the objects that were used to assemble this table and get only one record from it. I want lemons so I need to figure out what the ordinal record number is within the table’s record collection.

Since there are three records, there is automatically an ordinal (index) number assigned to each. Sometimes you just have to guess if you don’t already know, if the first record starts at zero or one. Turns out that all these collections are zero-based so if you want to identify the second record (for Lemons) in the table, it’s going to be ID = 1.

Here’s the syntax:

let
Rec1 = [Product ID = 123, Product Name = “Oranges”],
Rec2 = [Product ID = 124, Product Name = “Lemons”],
Rec3 = [Product ID = 125, Product Name = “Apples”],

MySet = { Rec1, Rec2, Rec3 },

MyTable = Table.FromRecords( MySet ),
MyLemons = MyTable{1}

in
MyLemons

SO what’s all the fuss… what does this actually do?

I’m glad you asked. Let’s take a look.

Isn’t she a beauty?

It’s the best looking lemon I’ve seen all day.

Now you’ve seen the mechanics of how M language functions can be used to perform very simple value manipulations and transformations. More importantly, you have learned about the relationship between important objects like Values, Records, Lists of Values and Lists of Records, oh – and let’s not forget – Tables. That’s the one that actually does stuff in a data model,

I hope this has helped you to broaden your horizons with Power Query and the M language. Please comment with you feedback.

Chat with Donald Farmer at PASS Summit 2019

Enabling the modern data analyst by becoming shopkeepers rather than gatekeepers of information

I appreciated the opportunity to sit down and talk with one of my heros in the industry. Donald Farmer has been a recognized leader in the Business Intelligence and Analytics community for the past two decades and his thought leadership is even more relevant today. He played a key role in shaping the Microsoft BI toolset as a Principal Program Manager from 2001 when the products we use today were in their infancy. He’s also been outside of Microsoft circles which gives him a broad and unique industry perspective. As the Vice President of Innovation and Design at Qlik Technologies, he brought Qlick Sense to market. He remains a friend and advocate for the community. Donald holds multiple patents and has published multiple books. Today he is doing his own thing as the Principal of TreeHive Strategy.

The chat was 9:41 long and is worth watching every second. I have added captions to highlight a few key points which are also listed below. Please enjoy.

Key Points

Route Knowlege is a view of information at a micro level, like turn-by-turn directions. This might be important when you only need to answer a single question without a broad understanding of the entire landscape.
By contrast, Survey Knowlege is a view of information in the context of the bigger picture. Someone with Survey Knowlege knows “how the whole thing hangs together“.

Serving the needs of the modern Business Data Analyst requires a change in attitude. We should become provisioners of information rather than protectors of information. In other words, become shopkeepers rather than gatekeepers.

Actionable data is not always detail data and consuming a summation can not only be easier but more useful than getting lost in the details and the accuracy or precision of a number. Drilling to detail in context is a natural part of the exploration process. He talked about a pattern in the process to provide actionable business information, which includes these stages:

  1. Orientation
  2. Glimsing
  3. Examination
  4. Acquisition
  5. Action

I appreciate Donald taking the time to both speak at PASS Summit and to chat with me for this blog series. Please watch my blog for additional interviews and insights from PASS Summit 2019