If you are fortunate enough to be attending the Microsoft Data Insights Summit in Seattle along with about 2,000 others on June 12 & 13, I’ll look forward to seeing you there. This will be a great community event and a chance to meet, network and learn from well-known speakers, authors and product team leaders responsible for the best data analytics tools in the industry. The summit has been sold-out for a few weeks. If you can’t make it to Seattle to be there in person, you can still attend the conference! I don’t just mean that you can watch a few cherry-picked sessions and keynote addresses… You can watch every session for free. Selected sessions, which do include the keynote addresses, will be streamed lived during the conference but all of the sessions are recorded and will be available within two weeks after the last sessions wrap-up on June 13. This an awesome learning experience. I will post links and more details here about the recorded sessions after they become available.
If you are going to the summit, please attend my two sessions:
Monday, June 12, at 1:00 PM
Power BI Solution Options and Approaches
Power BI is a terrific self-service data analysis tool but it is also an enterprise-class reporting and dashboard tool for serious business data stored in a data center. Which options are best and most cost-effective as you plan your solutions? Learn how to build serious, secure, and scalable business solutions with on-premises data sources, SQL server, analysis services and other business systems. You’ll also see how you can architect solutions with security and user-access control utilizing groups and content packs.
(BI Pro / 200 LEVEL / Power BI, SQL Server Analysis Services, SQL Server Reporting Services / Administration and Governance)
Tuesday, June 13, at 10:30 AM
Advanced Data Transformation Techniques
Learn how to solve challenging data import and cleansing problems with Power Query and M functions. This session covers advanced data transformations and shaping techniques for unique data formats and structures. See how to transpose and pivot data, and work with structured data sources like XML & JSON. We will explore the “M” Data Mashup language, create and use query functions and parameters to create durable & manageable automated data refresh processes.
Co-presented with Brian Grant from CSG Pro
(BI Pro / 300 LEVEL / Excel, Power BI / Data Prep and Modeling)
Here’s a sneak peek at the material from both sessions. You’ll be able to download the slide decks when the sessions are available to view online.
In “Power BI Solution Options“, I’ll introduce different ways Power BI can be used with enterprise data & on-premises data sources. Using Power BI to create self-service data models is a great capability. In the right hands and used in the right way, it can be a valuable business asset. Used incorrectly and in the hands of inexperienced users, good data can product bad results. How can you prevent a train wreck?
Power BI, used as a client tool for managed and governed data sources, whether managed on-premises or in managed cloud services, can provide a solution with the best of both worlds; control and governance over enterprise data and flexibility for reporting and analysis.
The Power BI platform provides all the components needed to manage analytical data; including queries for data transformation and shaping, data modeling, calculations and visualizations.
In enterprise-scale solutions, Power BI is a great data presentation tool but may just be the tip of the iceberg. Below the surface, we may use SQL Server Analysis Services to manage models and calculations, and data transformation tools – such as SSIS -for data acquisition and transformations. Formal solutions may incorporate an enterprise data warehouse, data marts and master data management.
Categorically, there are four options to use Power BI with enterprise data. These include:
- Using scheduled refresh with a conventional embedded data model, a cache of data is stored in the Power BI model and refreshed at regular intervals.
- Direct Connect uses an on-premises or cloud-based Analysis Services data model. Larger data volumes are managed and secured within the model and calculations are stored within a centrally-managed repository. This option often provides the best balance between IT governance and reporting flexibility.
- DirectQuery allows data to remain in SQL Server or another supported on-premises database platform. This is a good option for solution teams having strong relational database skills who don’t need complex calculations or the flexibility of a tabular or multidimensional model.
- Developers and solution architects can stream data directly to Power BI dashboard tiles through a REST API. Streaming can be performed from other data streaming data sources and using the streaming service available through Power BI Premium.
I will demonstrate using Direct Connect and DirectQuery using SSAS and SQL Server on-premises through the On-premises Gateway.
In the session titled “Advanced Data Transformation Techniques”, I’ve teamed-up with Power Query guru, Brian Grant. I love watching Brian teach people how to use these tools, which he knows back-and-forth and always has interesting analogies and object lessons to make learning M and Power Query easier.
These are the topics we are covering in our 50 session. It will be lightening fast and nearly all live demos!
First the basics… real quick. When you use menu options in the Power BI Query Editor, the design tool actually generates steps as M script.
One of the most powerful features of the Power Query (officially called “Get Data” feature) is the ability to preview data at the completion of any step in a query. Essentially, you have the ability to rewind and play-forward each of the query steps and see the data as it will exist after each transformation step. This rapid design preview experience allows you to effectively build queries in a highly-interactive fashion.
Here are three common examples of data that would need to be transformed prior to consumption in a tabular data model.
We show you how to transform structured data in a JSON document .
How can you pivot Data? Get it… Pivot Data! Corny, I know. There are actually several different ways that data can be transposed, pivoted or unpivoted. In short, transposing means that all rows become columns and all columns become rows. …easy. To Pivot data, we must establish an anchor value of some kind (like the push pin in the corner of the Lt. Commander Data picture ) that will bind like-rows together so other rows can be rotated and transformed into columns. We’ll show you a practical example by extending the analogy.
Error handling is a reality in most any data project and there are several approaches that may be appropriate depending on your data quality requirements. We share a simple flow chart for deciding how to handle and manage erroneous data.
The bottom line is that we must either deal with the erroneous data or not. If he erroneous rows can be transformed and corrected, life is bliss and we ca move on. In a financial accounting system where every record must e accounted for, efforts would be made to find and correct all te errors. If a sampling of rows aren’t worthy to pass the transformation test, and loosing them would not significantly affect the statistical outcome, that may be an acceptable loss. In that case, removing the noise caused by the erroneous records would be the better option.
We will explore the “M” or Data Mashup formula language and demonstrate using some useful examples.