After publishing the original daily COVID-19 cases report on March 14, Johns Hopkins University changed the file format. This required us to publish an updated dataset with a new report, which you can access >here<.
Since the original solution was posted, I have received a tremendous amount of feedback, suggestions for enhancements and corrections. I introduced the tool in the earlier post. Members of the Microsoft MVP program have been a lot of help and several of my colleagues from Pragmatic Works have jumped in to add their insights and design support. We’re working on a Power BI app template that can installed into a Power BI tenant. In the interim, you can access the publicly accessible report through this link.
Using the report, we can track the daily progression of confirmed cases, recovered cases and deaths by country. Where available, these metrics are also available state or province and US counties.
The current project and future updates can be accessed using this GitHub repo.
March 24 update – This post will be updated frequently for the next few days. Please watch for updates.
This is a Power BI report (<link) I have developed and published with public access to the web to be shared with anyone who needs access to this information. It contains daily updates from the Center for Disease Control (CDC) using data curated by the Johns Hopkins University Center for Systems Science & Engineering. To the best of our collective ability, the data is accurate but I cannot make any guarantees. Please validate with other sources before making any decisions with this information.
Additional enhancements and contributions are being made by Microsoft MVPs and community members:
After the initial version, my colleague Robin Abramson spent late evenings and a weekend to help work-through design details. I appreciate members of the Microsoft MVP community, Reza Rad and Miguel Escobar, stepping in to help with query updates to get scheduled data refresh working.
I’m very hopeful that the this report will be a valuable resource. It’s been a labor of love and considerably more work that I envisioned. But, I will continue to work on enhancements and corrections as I am able – based on feedback. I started working on this project to help a consulting client try to understand how the virus outbreak is affecting their customer order shipment delays and materials supply chain. That grew into a off-the-clock side project, demanding nights and weekends to get this far. Now, I hope we can use this information to proactively respond to this threat.
Please post comments here or contact me through Twitter, if you have feedback, comments and questions.
The CDC and WHO began collecting COVID-19 case information from various sources on January 22 with the latest count of confirmed cases, recovered cases and deaths recorded by country, state or province. John Hopkins University collect this data every day and store files in a publicly accessible GitHub repository. On March 1st, they began geocoding the location for each case, where available, with the latitude and longitude. Location information is sparse but available frequently enough to observe trending.
Pete Gil at Pragmatic Works initially discovered this data source from another report published at worldometers.info. He scraped their web page and created an attractive Power BI report with the latest daily numbers. Springboarding from that project, I went back to the source files and created this new data model with daily snapshots and cumulative updates.
Watch for updates (where I’ll explore the design and more details) but this a quick tour of the initial set of report pages based on my published data model:
The first page provides some background information about data sources, credits and a report page menu:
Use the bookmark buttons to navigate to each page. You can also use the page number navigation buttons below the report.
The three measures displayed at the top of this and other pages show the latest counts, as of the highest select date range. Use the range slicer to limit the time-series charts and to set the “as of” date for the latest measures (Confirmed, Recovered and Deaths).
Right-click the line/area or stacked column charts to drill-through to details for a specific date.
The Global Cases page displays the aggregate case counts by country and for known locations. You can switch between the three measures using the radio button slicer. This changes every value on this page to use the selected measure.
On every page, you can narrow the view of countries using the Country Region drop-down list slicer. Use this slicer to deselect countries that have a high number so you can view and compare other countries. Hold Ctrl to select and deselect multiple items from the slicer item list.
The Country shape map definition was created by David Eversvelt. I’ve made some modifications to accommodate country names provided by the CDC.
I have created three separate pages with Country/State & Province maps. Only a limited number of shape map files are available in Power BI so I have selected the US, Canada and Australia for now.
Either use drillthrough or navigate to the Detail Matrix page. The matrix shows the progression of the selected measure over time within a region. Expand the geography hierarchy to view details by states or provinces for a country. The date range slicer in the top-right can be used to control the range of dates displayed as columns. Within the scope of the displayed data, the largest values are displayed with graduating shades of red.
To narrow the comparison, use the Country Region slicer to filter by country and change the scope of the conditionally colored cells. This allows you to remove irrelevant regions and focus on those of interest.
The Novel COVID-19 Coronavirus outbreak is a serious matter that is affecting our world in ways that we are only beginning to understand. If we can use this data to better understand what is happening , maybe we can use this information to mitigate the affects if this global event.
What questions do you need to answer and how do you need to use this information?
How can we look at it differently to provide better insight?
How do you need to correlate the the state of cases with other data to make decisions and forecast outcomes?
At the time of this post, the world is dealing with a serious health and economic crisis. The COVID-19 Corona Virus is impacting the lives of people around the world and in turn, it is affecting world markets and industries and many different ways. For example, I am working with a consulting client whose material shipping and supply chain are being are impacted by the breakout and they need to quickly respond by making order changes and logistics choices. Forecasting and planning analysts must make adjustments to help the company prepare for these impactful changes.
This demonstration shows you how to create a multi-layer map to correlate current outbreak case locations with your own data, using Power BI and the ESRI map visual. I’m using sample data for demonstration but this is the same technique I am using for our client. In the real data set, correlation is impactful where shipping orders are being delayed and cancelled in areas most affected. For more, visit my blog at SqlServerBiBlog.com.
The technique used in this map report is relatively easy to implement because both data sources are separate feeds to the map service. There are different ways to correlate map data from two different sources. In our solution, we are also integrating the CDC data into the data model, which will allow us to perform comparison calculations. Using AI and machine learning, we may be able to perform predictions and associations.
This post demonstrates how the order of steps added to a query can make a big performance difference and drastically effect the number of steps generated by the designer. I’ll demonstrate how to use the new query Diagnostics tools to compare and understand query performance.
The Power Query Editor for Power BI simplifies data transformation processing by generating query steps for each action you perform in the query designer. This whiteboard diagram shows the high-level flow of information through a Power BI solution. Every query has a source (“SRC” in the diagram) followed by a connection. The query consists of a series of transformations (“XForm”) prior to populating a table in the data model.
These steps are defined in “M” code which is executed when the data model is processed. In simple projects, all the query steps are automatically generated. The order with which you add these steps makes a difference. Not only does the order that you add steps to a query help organize and manage a query but it can have a significant impact on performance and the computer resources needed for a query to run. A little planning and iterate clean-up as you work through the design process can make a big difference.
The two queries shown here have exactly the same outcome and they were both created just by choosing transformations from the query designer menus. The only difference is the order that I chose the options.
Introducing Query Diagnostics
To understand how query steps are being processed an to compare two test queries, I use the new Query Diagnostics features on the Tool ribbon. In this simple test, this is really easy.
I select a query in the designer, start the diagnostics, perform a refresh and then stop the diagnostics. This generates two new queries with the diagnostics results.
I then choose the other query and repeat the same steps to get diagnostics for that query.
There is a boatload of useful information in the diagnostic results query but it’s way more than we need.
The most important information for this test is the Exclusive Duration column. For this test, I all need is to summarize this column. I did the same thing with both diagnostic queries and then compared the two results. Appending these two summarized diagnostic query results clearly shows the difference in performance:
This video demonstration is an exaggerated yet effective example of working through the process of importing a simple Excel worksheet and then transforming a series of columns. In the first example, I rename and change the data type of each column, one-at-a-time. In the second example, I consolidate the steps; renaming each column and then change the column data types. How does this simple change to my approach affect the generated query and execution performance?
When the new Power BI service activity logging API was announced over the holidays, I was intrigued and anxious to start working with it. I’ve had some experience with report usage monitoring using the existing Office logs and usage metric reports that do provide some useful information but can be a chore to use. Activity monitoring and troubleshooting with the new logging API is focused entirely on Power BI tenant events like dashboard, interactive and paginated reports views, deployments, errors and data refresh events. This should be easier than before, enabling admins to be more proactive by tracking usage patterns. In a short series of blog posts, I’ll demonstrate how to build a complete activity logging and reporting solution for your entire Power BI tenant. In this first post of the series, we’ll just get started with the basics by capturing just a few log events for a brief window of activity and surface them in a simple report.
Before we get started,
a brief history lesson is on order:
When the Power BI cloud-based service was initially offered back in 2013 as a feature extension to Office 365, it used SharePoint Online as the storage architecture. Some additional logging events were added to the existing and numerous Office 365 events catalog. It has always been cumbersome to find relevant reporting information amid all of the other noise in these logs. Components of the Power BI back-end services have since been migrated into a more specialized service infrastructure but the activity logging has remained in Office 365 until December of 2019. The Office logs required special privileges within the Office tenant and produced volumes of event data related only to folders, files and Office documents.
The new Power BI Activity Log API is specially-suited and optimized just for Power BI. By contrast, it will be much easier to identity and track relevant service and user activity for workspaces, apps, dashboards, interactive Power BI reports and paginated reports in the Power BI service.
I envision that my production-scale logging solution will use an orchestration tool like Azure Data Factory to iterate over historical activity logs, store log files to Azure Data Lake Storage and then incrementally update a Power BI data model for reporting and analysis. This first example will use PowerShell script manually executed from my desktop.
PowerShell Me, Baby
The new Get-PowerBIActivityEvent commandlet is added to the Microsoft Power BI Management library. Install the latest version to gain access to the activity logs.
In my project, the first step is to open and run the PowerShell ISE as a local administrator. To installed the latest Power BI Management library locally, I execute this code :
Install-Module -Name MicrosoftPowerBIMgmt.Admin
I need to sign-in to my Power BI tenant with a service or user account that has tenant admin privileges. This code opens a standard login dialog and prompts for a user name and password, populating a Credential type object variable used to open a connection to the Power BI service:
$Cred = Get-Credential
Connect-PowerBIServiceAccount -Credential $Cred
In the Power BI product team’s Developer blog, Senior Program Manager Kay Unkroth explains that the Get-PowerBIActivityEvent commandlet can be called with date/time parameters to include only one day at a time. This line requests all activity on Junuary 5th, 2020, caching the activity log information as a JSON structure:
Importing this file into Power BI Desktop produces the following simple table:
A couple of important things to point out
The API is optimized to handle large numbers of events. As such, it is limited to return records for a range of time up to one full day using the StartDateTime and EndDateTime parameters. The web service returns a continuation token return parameter to let you know if there is more data beyond a fixed frame size that will typically return about 5,000 to 10,000 records.
Incidentally, I’ve played with a few different file formats. JSON is by far the most flexible format but you may not get all the key/values you want just by consuming the results right out-of-the-box and without parsing all the nested levels. In Kay’s article, he uses the ConvertTo-JSON directive to flatten the native JSON document into a more conventional colon-delineated log flat file with named key/value pairs. Using this method, I was able to get more field information that those that apparently slipped through the cracks from the JSON document. Although, I had to massage the output a bit and then transform the rows into columns using some fancy pivot dancing in Power Query.
This simple report
is a first effort showing records for a short time frame. I’ve produced a single table with some recognizable grouping and slicing attributes but we can do considerably more. Using these fields, I analyze activities for only dashboard , Power BI or paginated report views. We can filter by user, object type, operation type, the web browser or devices used to view content, type of workload or the containing workspace.
In a fully-fleshed-out data model, some of the attributes might exist as separate dimension/lookup tables. But, this is enough for now.
Please share your questions and your own experience with the activity logging API, and watch for subsequent posts about tenant monitoring and activity management solutions.
As I read the many posts from those in the community who I follow, I am reminded that the community brain trust is much greater than any individual. As a writer and blogger, I’m occasionally compelled to express an original thought or opinion that I think is uniquely my own. However, we work in a world where everything comes from somewhere and there are many contributors who I trust and rely upon for advice and cutting-edge information. This “corner” of my blog is to highlight these community contributions that I find informative.
James Serra, Microsoft Solution Architect and former Data Platform MVP, continues a deep expose’ of Azure Synapse Analytics, with the sixth post in the series. This new Azure service headlined at both Ignite and PASS Summit, currently in Preview from Microsoft, is the evolution of the modern data warehouse. Azure Synapse Analytics is an orchestration of services including Azure SQL Data Warehouse, Data Bricks Data Lake Gen2. It will be an important consideration for serious cloud-based BI, analytics and data warehouse solutions at enterprise scale.
Azure Synapse Analytics & Power BI performance Azure Synapse Analytics new features Azure SQL Data Warehouse Gen2 announced Azure SQL Database vs SQL Data Warehouse What is Microsoft Azure Stream Analytics? Azure Synapse Analytics & Power BI concurrency
Marco Russo, a name synonymous with DAX and BI expertise, captures what happened in the DAX world in 2019 in a an aptly-named blog post: “What has happened in the DAX world in 2019” :-). He also writes “I’ve seen the future, and it’s bright – but unfortunately, it’s under NDA!” and actually goes on to describe some of the announcements expected in the next year and major conference events.
David Eversveld, Data Platform MVP, writes about improvements to the Power BI theme capabilities. In addition to the new design ribbon in Power BI Desktop, themes can be exported from the designer. Adding to his collection of reference material that I have found valuable in my Power BI toolbelt, David posted this data color reference to assist with color selection for reports.
The new Power BI Activity Log was announced this month. This will make it easier to capture and monitor user and report activity. It also simplifies Power BI tenant administration by isolating report activity from Office 365 events and other log events. Power BI started out as an extension of Office 365 and SharePoint Online services but not all organizations use or manage Office 365 and Power BI under the same administration role. Microsoft continues to deliver on the promise to provide comprehensive automation APIs and tooling for administration.
The consistent contributions of Guy In A Cube’s Adam Saxton and Patrick LeBlanc are too numerous to mention. Notably, they were awarded the “Most Helpful Data Video Channel” by Data Literacy Awards. Data Literacy LLC is a Seattle-based training and education company founded by Ben Jones.
As I continue to explore and develop best practices for managing serious business-scale Power BI solutions, I’m having conversations with recognized community leaders. Last month I chatted with Ásgeir Gunnarsson on the SQL Train ride from Portland to PASS Summit in Seattle. Ásgeir is a data platform MVP and seasoned Business Intelligence expert from Reykjavik, Iceland who works as Chief Consultant for Datheos, a Microsoft-focused BI and Analytics consultancy in Copenhagen. He leads the Icelandic Power BI User Group and PASS Chapter.
Ásgeir talked primarily about the development life cycle for for projects centered around Power BI, data and object governance. As I’ve mentioned in my earlier posts on this topic, the development experience for BI projects in general is different from application development and database projects and you cannot use the same management tools – at least not in the same way. He promoted using OneDrive for Business to manage version control.
He shared several excellent resources, many of which I either use or have evaluated, to help manage Power BI projects. The ALM Toolkit is a useful tool for comparing objects in great detail between two PBIX files. Ásgeir also show some efforts from community contributors to automate change-tracking file-level source control (which really made the point that it’s a difficult thing to do with Power BI). We know that Microsoft are working on an integrated release management solution for the Power BI service which may amend or replace the need for existing tools.
Regarding governance and security, he made reference to the extensive Microsoft whitepaper: Planning a Power BI Enterprise Deployment. He steps-through diagrams that help simplify each of the important processes and tasks for developing, deploying and managing Power BI solutions.
If you need to manage Power BI solutions, I encourage you to review his presentation and you can connect with Ásgeir on LinkedIn and Twitter.