sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009
The message from the Microsoft Business Applications Summit this week was very clear: Power BI is growing up.
We have known for a while that Power BI is a great front-end tool for enterprise-scale tabular and multidimensional models. Power BI Premium will now allow almost limitless data scale using extremely large datasets by leveraging composite models that employ both DirectQuery and VertiPaq cached mode tables. This feature shatters the previous barriers where data models could only support either DirectQuery or cached mode (imported data). Additionally, Microsoft publicly unveiled plans to add IT developer centric features to Power BI Desktop for managing large models, version control and application development lifecycle.
These announcements leave many open questions about how Microsoft will continue to support self-service and enterprise customers with one tool and about the long-term future of SSAS and Visual Studio-based SSDT. At this point, none of these tools are on the chopping block, but it is clear that James Phillips and his team have significant momentum and are not slowing down to wait for other products to catch-up.
In this interview with Christian Wade, he describes the benefits of this new capability. Christian is Group Program Manager on the Power BI and Analysis Services product team with focus on data model development and scalability features.
Composite Models & Aggregations
In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data. The larger dataset was in a Spark cluster, accessed using DirectQuery. Aggregated tables were stored in the in-memory model using the new composite model feature. As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details. Composite models will allow mashing-up imported database and file-based data with an DirectQuery.
There are limits and complexities with these new features. You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling. With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice. I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.
Power BI Desktop Features for Developers
News Flash: Power BI Desktop will be the primary tool for serious, enterprise data model development. A bunch of features are in the works and will arrive soon for preview. The obvious question is what will happen to SSDT for Visual Studio and the answer is that it’s not going away but the product team is putting their energy into enhancing Power BI Desktop for now. It will likely have some kind of “Beginner/Advanced” switch to simplify things for the novice user and isolate advanced or complicated, developer-centric features.
Here are a few screen captures of the forthcoming features:
Schema Compare & Model Integration
Like the BISM Normalizer tool created by Christian Wade, there will be a fully-supported compare and merge tool baked into Desktop. The early build is branded with components from MAQ Software.
Relationship Diagram Designer, Properties Pane and Application Lifecycle Management Features
A new-and-improved diagram designer is imminent. It will allow large models to be more effectively managed and displayed. You will have the ability to save individual views of the diagram with selected, subject-area tables.
Like in Visual Studio, properties can be set and managed in a Properties sheet window. Support for standard SSAS features like display folders is coming and can be managed here.
Business Applications Release Notes
With the velocity of features and updates, a comprehensive set of release notes are available here. This document will be updated frequently with the revised roadmap, feature history and announcements.
APIS and Powershell CmdLets for Power BI Administration
At the MVP Summit in March Kay Unkroth promised API endpoints and Powershell CmdLets for managing all aspects of Power BI. He even did a dance and sang a song to make the announcement memorable. This week, the product team delivered what they promised, with the announcement official here.
The following features and capabilities are on the delivery short list coming in the next 2-3 months:
Power BI Service
The new, “modern workspace” eventually replacing current workspaces integrated with Office 365 groups. The new app workspace will no longer be bound to an Office 365 Group. The feature will be optional at first, but will replace current workspaces soon.
Now, users can be: Member, Admin, Contributor
Can add groups to workspace
Premium capacity tenants can have nodes in different regions
Admins will have visibility and access to all objects and resources
Microsoft Business Applications Summit
Keynotes and Featured Sessions from the Microsoft Business Applications Summit are here.
Since starting the Guy In A Cube series over three years ago, Adam Saxton has become the front man for Microsoft Business Intelligence feature announcements and Power BI how-to tutorials. Joined by Patrick Leblanc last year, the Guy In a Cube series features over 200 short video tutorials with at least two new clips added every week. The Guy In A Cube YouTube channel currently has over 13,000 followers.
I always look forward to spending time with both of these guys at community events. I first met Adam several years ago when he was a Senior Support Escalation Engineer for Microsoft who helped with a number of tough BI server setups. Patrick did senior level consulting work for customers in the field before joining Microsoft as Data Platform Solutions Architect. Adam and Patrick are natural born entertainers. With their deep knowledge and expertise with database technology, security and the entire BI, analytics and reporting stack; they offer one of the top online resources for learning and keeping up with the rapidly-expanding Microsoft BI platform.
At conferences and events, I rarely see Adam without a camera in his hand and so was a real treat to get them in front of my camera for this interview. We chatted about how they got started and how they continue to use their series to highlight new and important features, and to provide guidance to support the Microsoft BI community.
In this interview during the 2017 PASS Summit, we continue to explore the topic of this series of blog posts on Microsoft BI solutions for the enterprise. Patrick and Adam share their guidance about managing large-scale solutions, version control and multi-developer projects using Power BI, Analysis Services and SQL Server.
Between sessions at the PASS Summit, I had the privilege of interviewing Alberto Ferrari and Marco Russo; SSAS Maestros and BI industry thought leaders. Alberto and Marco are revered as expert trainers and practitioners of SSAS and Power BI solutions. They are also well known for evangelizing best practices using the DAX calculation and query language. We spoke about some of the challenges and learnings they have gained from years of large-scale project implementations.
Both SSAS Tabular and Power BI Desktop share the same characteristic in that the semantic data model definition is described in a single file. Compared with older multidimensional model projects, the single file approach simplifies much of the development effort but it can be challenging to coordinate projects with multiple developers. They shared tool recommendations to assist with project integration, version management and development. Marco and Alberto recommend a number of community supported add-ins and other freely available software to enhance Visual Studio and Power BI Desktop.
I sincerely appreciate these two gentlemen taking the time to share their insights in this short interview, and their many contributions over the years that continue to improve Microsoft’s industry-leading Business Intelligence platform and the community that supports it. A comprehensive library of resources, training, tools and articles are available at: SQLBI.com.
During the interview we talked about several different tools that they adamantly endorse for any SSAS developer to have at their disposal. From my own experience, in rough order of priority, I recommend :
BIDS Helper / BI Developer Extensions – This Visual Studio add-in is an important tool for anyone developing Business Intelligence solutions in any version of Visual Studio or SQL Server. It includes several essential features and capabilities for SSIS, SSRS and SSAS projects. It has been a community-developed and community-supported tool for many years and is considered by most experienced BI developers to be a core extension of the Visual Studio SSDT (formerly called “BIDS”) add-in.
Community developed tools like this help to spur future enhancements to the officially-supported capabilities in later Microsoft development tooling. I suggest that you carefully read the documentation for SSAS Tabular features in this tool because some of the options only apply to certain versions of SSAS. Some of the BIDS Helper features for SSAS 2012 and 2014 have since been added to the officially-supported designer for SSAS 2016 and 2017. BIDS Helper was initially maintained on the Microsoft CodePlex site (here: http://bidshelper.codeplex.com/) and has since been moved to GitHub here: https://bideveloperextensions.github.io/. It can now be installed in the Visual Studio Tools > Extensions and Updates menu by search for “BI Developer Extensions” in the Online section.
DAX Studio – This is a stand-alone application and the most comprehensive DAX, SSAS and Power BI calculation script and query editing and performance tool available. It is an open source project managed and maintained by a collaboration of several community MVPs and industry leaders (coordinators are Darren Gosbell & Marco Russo). The latest version and codebase have recently been moved from CodePlex to http://daxstudio.org.
Tabular Editor – An open source, stand-alone SSAS Tabular model design tool that runs outside of Visual Studio. It is distributed through a GitHub repo at: https://tabulareditor.github.io. The source code is maintained at: https://github.com/otykier/TabularEditor. It is described as a lightweight editor for SSAS Tabular Models built in .NET/WinForms and utilizes the Analysis Management Objects (AMO) library.
DAX Editor – an extension to SSDT that makes it easier to edit DAX measures in a Tabular project. It includes: DAX Syntax Highlighting, DAX Formatting and Textual representation of measures defined in a Tabular project
Extracting/importing measures from/to a Tabular model file (.BIM). This add-in is developed and maintained by SQLBI. It can be found in the Visual Studio > Tools > Extensions and Updates menu or downloaded from SQLBI at: https://www.sqlbi.com/tools/dax-editor.
BISM Normalizer – An add-in for Visual Studio/SSDT that can be installed from the Visual Studio Tools > Extensions and Updates menu by search for “BISM Normalizer”. This is an essential tool if you need to compare multiple SSAS Tabular projects and databases (including workspace databases) and then manage the steps to selectively merge additions and differences between objects. I have found this useful for integrating parallel development work on the same SSAS Tabular project but I will warn you that if used routinely to merge multiple changes to a single project, it can be a complex undertaking and not something I recommend on a daily basis.
BISM Normalizer is developed and maintained by Christian Wade, formerly an independent consultant and now a Senior Program Manager on the SSAS product team. Although Christian works for Microsoft, this is still considered a community-supported, third-party tool not officially supported by Microsoft. It is, however, the only tool designed specifically to perform schema comparisons and object merging in SSAS.
In our discussion, we also refer to these tools and resources:
VertiPaq Analyzer – An older tool (actually an Excel Power Pivot workbook) created by Kasper de Jonge, Program Manager from the Power BI/SSAS product team, that imports SSAS DMV query results and reports object size, object data compression rates and optimization statistics. It has since been updated and enhanced. This is a convenient way to get information quickly about your SSAS or Power BI model. The same information can be obtained, with a little more effort, by running DMV queries in DAX Studio or SSMS. Links to the original and updated versions are available on the SQL BI site at: https://www.sqlbi.com/articles/data-model-size-with-vertipaq-analyzer/.
DAX Formatter – An online tool for optimizing and reformatting DAX expressions, available from SQLBI, at: http://www.daxformatter.com/
Tabular Modeling in Microsoft SQL Server Analysis Services – is a Microsoft Press book written by Alberto and Marco that applies mainly to SSAS Tabular 2016 and 2017 projects. This is an absolutely invaluable reference and tutorial on SSAS design best practices and design techniques. Previous books from our Italian friends include DAX Patterns and The Definitive Guide to DAX; all excellent learning resources that I have in my book library.
Shortly after this conversation, Marco and Alberto delivered two sessions at the PASS Summit conference about DAX best practices and performance optimizations. A little later, Marco sat with Guy In A Cube hosts Adam Saxton and Patrick Leblanc to chat about these sessions. You can find that interview here. I also had a chance to catch-up with Adam and Patrick in an earlier interview during the conference, which I will share in a later blog post.
As I visit businesses, consulting clients and training classes, to teach data reporting and analytics; there is a recurring conversation. It is one that I have had for eighteen years. The audiences change and the technology implementations are a little different over time, but the essential conversation is still the same.
This happened again last week as I was trying to explain the unique characteristics of Multidimensional and Tabular storage to a client. I’m developing a training course where I needed to explain the concepts once again – and then it hit me! …these unique data storage and analytic technologies do what we dream about in science fiction stories and achieve capabilities we imagine existing in the distant future. Channel surfacing on television this week, I watched an episode of Timeless, a Star Trek rerun and a Star Wars movie – where time-travel, space-warp travel and teleportation were commonplace realities. Although fictional as they were portrayed, I think these concepts are very real in our technology landscape. Please indulge me as I explain.
We live in a Linear world. We reside in a place, work in a place, store and move stuff from place to place. Centuries ago, if man needed to move something (perhaps just himself) to a distant place, he would walk, ride a horse or take a sailboat. In weeks or months, he would arrive in another place. Today, we get in a car, navigate the streets, perhaps highways and Interstates and then arrive in a different place within minutes or hours. For longer trips, we board a large metal tube, sit very close to several complete strangers as we climb up into the sky; some period of time goes by and then we somehow arrive in a distant place along with our stuff. At 35,000 feet where the air is very thin, a jet airplane can travel many times faster; leaving an altitude where certain physical laws restrict speed, only to re-enter that atmosphere after it has arrived. To someone from the distant past, this experience would seem fictional.
image: nasa.gov
On a daily basis, I sit or stand in front of my computer monitors, and see and speak with people in distant places. We have live conversations about the weather in their time zone or hemisphere. Through some strange but now common process, technology converts our speech, images, directions, documents, effort and thoughts into 8-bit network packets and electrons that travel almost instantly through space; leaving the earth for a short period, to be transported and reassembled somewhere else.
Years ago, when I wanted new stuff, I would drive to the store. If they didn’t have what I wanted or if it was too expensive, I would spend the day driving to different stores. Today, I “go” to the store on my computer and the very next day, my stuff arrives on my doorstep. We are continually inventing ways to bend space and teleport things within the changing confines of reality.
Data storage is much like the real world. We store terabytes and petabytes (numbers we can’t even explain) in a small space. But, to navigate through the linear storage structure of relational databases and files, the data might as well be thousands of miles or kilometers apart. In order to perform time-variance and statistical calculations, program code must access a group of records in one location to aggregate them, and then find a related group of records – perhaps millions of rows apart – to perform another aggregation. The query might need to perform this operation dozens, hundreds or thousands of times; before it can calculate the final results. One of the best examples is a time-series calculation where a query must “visit” records for each time period to perform an aggregation. The effort is compounded when the goal is to compare aggregate values and variances for parallel and relative periods (e.g. “one year ago”, “three years ago”, etc.) and then perform forecasting or regressions.
image: ctdn.com
In relational storage, the query engine must “drive” from place to place, navigating the streets (file groups, files, pages and rows) to find the records. In an analytic structure (like an Analysis Services cube, or SSAS Tabular model or Power Pivot/Power BI in-memory model), the query engine performs a calculation operation and then teleports to a new location to find a different group of related records.
image: space.com
In a multidimensional cube, the storage processing code quite literally folds space to create the cube structure and pre-calculate some of the aggregate values. Rather than leaving all the data in a linear table, it actually moves data segments close together, reshaping the data structure (thus, the term “cube”). This approach required a great deal of complexity in both the software and the resulting data structure. Advances in computer hardware and software techniques helped simplify the approach. The in-memory tabular model exists in linear, yet fragments of compressed space – where the actual distance between adjacent records is much closer than it appears to be. When the query process is done looking up a group of records it warps to the next group of related records using a memory address. Instead of traveling through space, it steps out of it, like the jet airplane leaving the atmosphere, to re-enter in a different location. It takes much less time to run queries because it is not moving through all the rows of linear tables.
By using unique and innovative data storage methods (to reduce storage space and access data faster without moving through data), we can, in effect; perform time-travel, warp space and teleportation.
I just finished posting a 4-part series demonstrating how to use Power BI with on-premises data. The running time for all four parts is about 44 minutes. I cover using the Power BI Enterprise Gateway to connect published Power BI reports to on-premises data sources:
SQL Server database connected with DirectQuery
Large SQL Server database with clustered column store index connected with DirectQuery
SSAS multidimensional/cube on-prem with direct connect
SSAS Tabular with large data volume & multiple partitions on-prem with direct connect
The four-part series is on my YouTube channel using the following links:
Part 1
Part 2
Part 3
Part 4
Adam Saxton also gives a nice walkthrough of using SQL Server DirectQuery with Power BI that may provide an additional perspective and information about securing and sharing the solution.
I’d also like to give a plug for a terrific Power BI architecture diagram poster created by Dustin Ryan. I love the simplicity of this comprehensive graphic. You can download the PDF file from his blog.
A new ground-breaking feature in SQL Server 2016 allows report user context to be passed through shared data sources, enabling role-based, row-level security. This feature doesn’t require complex administration configurations, service principal delegation, Kerberos, Claims/Windows Token conversion or impersonation. You may not be as excited as I am about this unless you’ve also lived through the tedious configuration process we’ve had to endure for many years until recently.
A little over a week ago I heard from my friend and past co-author, Robert Bruckner – a Senior Architect and Development Lead on the Reporting Services product team. He told me I needed to check out a new capability they had just enabled in SQL Server 2016 that would make it much easier then before to implement row-level security for SQL Server Analysis Services. I had to check it out for myself and was please to discover that this technique works with Reporting Services mobile and paginated reports. It also works in a variety of server architectures. I tested it in a domain environment and on a simple workgroup server. It will work in scaled-out scenarios, and with or without SharePoint in the mix.
Until now, there were two major blocking points to authenticate report users on the SSAS server so data could be restricted or filtered for each user. Kay Unkroth talks about the prior authentication landscape in his post to the SQL Server Reporting Services Team Blog titled Passing user names to Analysis Services with personalized connection strings in SQL Server 2016 Reporting Services. He describes the challenges in more detail and I’ll summarize from my experience.
1. The first challenge is that user context gets lost as database connections are passed from one service to another. Before we get into the weeds with this story, let’s just categorize this as complicated and confusing. It raises high-visibility questions and concerns that necessitate handing-off to support and administrative folks to take care of settings and components outside the environments you manage – namely the database and report servers.
Not long ago when we tried to use SharePoint as the platform for all BI and reporting systems; getting connection and user information to pass through through all the layers in the server architecture was more like running a relay race… a user logged into SharePoint and navigated to a report. At the point, the credential object for the user was exchanged for a Claims Token. The user who chose to run the report and the Claims Token had to be delegated to a Windows service principal name (SPN), which, in-turn was delegated to the service account running a specific service on the Windows server. It was then up to the service to interface with your database in Analysis Services. Without getting into more details; the service account was not the user who had access to certain important data. For the service account to connect to the database at all, it needed to have either admin privileges – a potential security risk – which is now being handled by your system or database administrators (who probably know very little about your reporting solution, your users’ need or business objectives. They just want to make sure it has no security holes and will lock-down anything suspicious or unfamiliar.
2. The second challenge is that Reporting Services requires shared data sources with stored user credentials, in order to use a lot of it’s advanced features like shared datasets (under some conditions), data-driven subscriptions, caching, history or snapshots. In every case, the credentials of the user actually running the report get exchanged for the service proxy account and the original user context is lost. Analysis Services sees the same user connect for every report request. A security genius might conclude that these are all issues that can be handled through some security configuration mechanism – and they would be right. The problem is that they often don’t get handled because of the complexity or the simple fact that management falls within another team’s realm of responsibility.
Over the many years that Reporting Services has been a leading reporting tool in the industry, practitioners have asked if they could somehow add an expression to a shared data source definition that would inject the username into the connection string so it could be deciphered in the server. Interesting idea with a lot of reasons that it wouldn’t happen – – sort of.
Using conventional approaches, where principal delegation was configured and used correctly – which may or may not involved the Kerberos protocol and NTLM Authentication.
The New And Improved Technique: Using the CUSTOMDATA Property
Drumroll, please… Here’s the big change that allows new, cool things to happen.
In every data source connection string, you can add a simple expression that maps the current Windows username to the CUSTOMDATA property of the data source provider. This works in SSRS embedded data sources, shared data sources, in a SharePoint Office Data Connecter (ODC) file and in a SharePoint BISM connection file. In each case, the syntax should be the similar. Here is my shared data source on the SSRS 2016 report server:
The connection string modification is simple. Just add this text to the end, following a semicolon:
;CUSTOMDATA = {{ USERID }}
This value travels through the connection stack when ever a connection is open on the database server. Since the CUSTOMDATA connection property value is assessable and exposed by query and expression languages and APIs on the server, code can be written to respond to the user. In the case, we will filter queries with records and data that only the current user should see.
Windows and Report Users
Windows users, whether added to Active Directory, whether on domain-joined computers or on a simple workgroup server, are standard Windows users and need not be added to the Local or Domain Administrators group. Any SSAS users with restricted data access should also not be added to any Analysis Services administrator role.
As you can see, Paul and Fred are standard Windows users with limited access to elevated resources.
Analysis Service Roles and Membership
I’ve tested this technique with both SSAS multidimensional and Tabular, with the same results. I’ll show you the set-up on a named instanced on my server, SVR201R21, named SSAS_Tabular. The SSAS Security properties page for the instance show that only the Administrator local admin user and the SSAS service accounts. This is important to check because if you accidentally add a user as an administrator, row filtering won’t work. Testing and debugging solutions on a single development machine requires a lot of logging out and logging back in to test under different user conditions.
My SSAS Tabular database is named Adventure Works Tabular 2016. Under the database Roles, properties for the Users role, you see that there are three users; Fred, Paul and ReportUser. The ReportUser is our proxy users. When we need to setup shared a data source with saved credentials, I supply the user name and password for the ReporrUser account.
It’s important to note that the ReportUser must have Read access to the database in order to connect and execute queries. But, since that same account may be used to connect on behalf of others users, the ReportUser account will have restricted access to data through the security bridge tables that are implemented by the role’s row-level filtering expressions.
Row-level Filtering Bridge Table
The core approach for row-level, user-specific filtering is the same as always but the programmatic implementation is only slightly different. This is the essential foundation of that feature. The following table image demonstrates tow things. One is that the table itself can be completely generated using DAX. This is a calculated table (one of many new cool features added in SQL Server 2016). I’ve just used inline ROW functions to added each row to the UNION function that defines the entire table.
In production, you may want to store this lookup data in a SQL Server table for easy maintenance, and then process it into the model as you would all you tables.
For each row, the UserName is an exact match for the value passed in from the UsersName token in the connection string, which will be presented by the CUSTOMDATA DAX function. Do you see how this will work? For testing and debugging, you can add a couple of measures useful for know who the current user is and what the CUSTOMDATA function value returns.
User Name:=USERNAME()
Connected User:=CUSTOMDATA()
In a Reporting Services report, you can add these measures to your query and then add the dataset fields to textboxes in your report.
Implementing a Role-based Row-level Filter
Start by opening the Roles window…
Checking the user membership of the Users role; ReportUser must be a member if the report server or client tool is connected as ReportUser. Even though no records are mapped to this user in the security bridge table, it has served it’s purpose by getting us connected to the database.
Paul and Fred are in the role in case they connect on their own, like from Excel or from SSRS using a data source without stored credentials.
This demonstration scenario calls for filtering all the data related to the Sales Territories for user who have permission to see it, according to the Country/User mapping in the bridge table. If the data related to other business entities should have user-specific filtering, you would follow the same pattern and multiple filters will be in effect.
The filter on the Sales Territory table is a DAX expression. You can use other functions to provide additional logic but this is a good example that demonstrates the essential pattern. As each row is scanned during a query, the expression will determine if there is a match. If so, the row is returned. If there is no match, the row is ignored and not returned in the query results.
Here’s the filter expression with e brief explanation:
(again, each row in the table is considered in turn)…
Does the current row’s [Sales Territory Country] field value match the value of…
the result of the LOOKUPVALUE function which looks in the Security Bridge Country table for a Country matching the current Sales Territory[Country] AND where the User filed value on the same row matches the value returned by the CUSTOMDATA function.
Let’s try it out…
My Reporting Services report uses a data source where I have added the text ;CUSTOMDATA = {{ USERID }} to the end of the connection string.
The dataset is a simple MDX query that returns the Sales Territory Country, the Connected User test measure and the Reseller Total Sales measure. Notice that when the query runs locally in the designer, the CUSTOMDATA function returned the literal text in the parameter token. This is expected.
Here’s the report design. I’ve added the Connected User measure so you can see what the CUSTOMDATA function returns, along with the filtered results.
Run as the Administrator user, only the Country row for the corresponding row in the Security Bridge table is returned.
Let’s test the report as a real user.
I’ll sign-out as Administrator and then sign-in as Paul…
Running the deployed report in SSRS web portal this time, only the Country records are returned for this records that were mapped to my user name in the Security Bridge Country table. Furthermore, if I were to run a query that didn’t explicitly use the Sales Territory Country, only the records related to the countries in my filter scope would contribute to any totals due to the relationships in the model.
As I mentioned earlier, I also tested this in a domain environment and it also worked exactly as expected.
This means that the next time I’m called on to implement a row-level security solution, I won’t have to keep the system administrator up all night setting up complicated service principal names and configuring authentication delegation. If I have rights to setup the SSAS server and the data sources on the report server, I can now do all the configuration work myself – and that is pretty exciting.
Thanks again to Robert Bruckner for bringing this to my attention, and to Kay Unkroth for laying out the basics in his post. And a big big thanks to the Reporting Services product team for making making this a reality and saving the next generation of BI solution practitioners from reliving the challenges of the past.
Self-service data modeling tools like Power BI Desktop, Power Pivot and SSAS Tabular promote the advantage of using data from multiple data sources. I’m including SSAS Tabular on the “self-service” list because it is derived from Power Pivot and often used for “quicker” IT-managed BI projects. The alleged advantage of sourcing data directly from within the modeling tool might seem to suggest that a formal ETL process and dimensional modeling exercise is unnecessary. The more experience I gain with these newer tools in business settings has me ever thinking about using best practices and the right approach for scaling BI solutions of all sizes.
The bottom line, before getting into the weeds on this topic, is that there are cases where I think it makes perfect sense to skip the usual steps of staging, transforming and modeling data into a relational data mart. Consequently, by using this approach, you are also removing the safety net and limiting options to scale and expand the solution. I love tools like Power Query and Power BI Desktop to develop simple BI solutions quickly. Perhaps a related topic for another time but always consider options to incorporate enterprise scale with modern analytics. Solutions always grow.
The Self-Service Modeling Paradigm
More and more of the projects I inherit from consulting customers and previous developers resemble the following diagram:
In less-formal solutions, the “ODS” layer in this diagram may be little more than a small database on an existing server with lookup tables populated by a weekly scheduled SSIS package or stored procedure. A lot of these projects are either departmental or the result of a self-service project handed-off to the IT organization to be “right-sized” and transitioned from the desktop to a server-hosted solution.
Power Pivot, Power BI Desktop or SSAS Tabular are often used to mash-up data from multiple data sources. Some of the data may be staged and transformed to simplify a complicated query or to work-around security restrictions from the original data source. The data feeding multiple tables in the model arrives in separate streams where the keys then need to be manipulated in order to match-up fact table keys to identifying keys in the lookup/dimension tables.
What’s the Problem With This Approach?
In some solutions with a manageable scale and a reasonable tolerance for a certain amount of data loss and inconsistency, this approach may be just fine. There are very good reasons for inconsistencies between sets of data that come from different sources. When 20 million fact rows are obtained from an online ordering system and .1% don’t have matching customers records that come from the system used to manage the master customer list, it may simply be a question of timing. Or, maybe the definition of “customer” is slightly different in the two systems. Chances are that there are also legitimate data quality issues affecting a small percentage of these records.
Whatever the case may be, a data conformity or potential data quality issue in this design scenario falls on the model designer to either fix or ignore. Again, this may or may not be a concern but it is a decision point and might raise the question of ownership when questions about data quality are raised.
Before I continue, let’s revisit how this common issue is handled in more conventional Business Intelligence solutions…
Managing Data Conformity in Traditional BI
Here’s a quick overview of the conventional Business Intelligence approach. For the past twenty years or so, we’ve promoted strict dimensional modeling principles that can be summarized in the following diagram. This diagram is my interpretation of The Kimball Method, applied to my experience and tools available from Microsoft. I learned to orchestrate BI solutions using this approach; however tedious, time-consuming and expensive. This is my frame of reference for any BI project. Many variables can influence adaptations from this model such as product choices, big data sources, real-time analysis, data volume, budget or the stakeholder’s realm of influence or ownership – but this is my starting point for a BI project done “right”.
If we take this general approach and scale it down to a departmental or business unit level, where an enterprise data warehouse doesn’t exist to satisfy analytic reporting requirements; we commonly end-up with a solution that looks more like the following diagram. Data is still procured from multiple sources through extract and transform processes, loaded into a staging database and/or directly into a dimensional data mart. Even when using an Analysis Services Tabular project to build the semantic model, the model only has one data source with all the fact and dimension tables populated from the relational data mart. The dimensionalized data is then surfaced through a semantic model and then presented through ad-hoc analysis, analytic and operational reports, and executive dashboards.
Different Solution Architecture – Same Problem
The most significant difference between the previous two diagrams and the one I started with is that the solution developer has control over the data before it enters the semantic model. Data conformity and quality can be managed in different places which include:
At the source
In the ETL process after source data is staged, and before it enters the data warehouse/data mart
Within the semantic model
In theory, fixing “bad” or non-conforming data at the source seems to be ideal for everyone. In a utopian setting, the business can incorporate a master data management solution to let data stewards correct erroneous or unmatched records and then feed them back into source systems. Operational reports produced by business applications should match analytical reports and dashboards from the data warehouse/data marts and the models they feed. Please don’t misunderstand – many businesses incorporate MDM solutions and they work well. The reality in most scenarios is that this is a difficult undertaking. It requires business process changes and an enormous level of cooperation and coordination across business units, leadership and interdependent teams.
Weeding out bad records and making corrections as data flows downstream to the data warehouse is often a practical trade-off that has many advantages. Reports produced from the data warehouse or data marts should match-up to analytic reports and dashboards fed by semantic models which uses congruent data.
Any provisions to compensate for data anomalies within the data model will produce some degree of variation between reports produced upstream and analytic reports produced from the semantic model. This may be acceptable as long as users understand and expect certain discrepancies. Reconciling financial results and auditing to a narrow margin can be a difficult and time-consuming exercise. Often times user confidence can be maintained when the analytic model can produce the same bottom-line figure and discrepancies are evident only in different grouping buckets or categories – like a “catch-all” or “Unknown” member for records that don’t match.
How Analysis Services Multidimensional and Tabular Deal With Data Non-Conformity
SQL Server Analysis Services (which we now call “Multidimensional SSAS”) has handled mismatched keys the same way for several years. When fact rows are processed with key values that don’t match a related dimension, errors are reported and the model processing fails. The developer must make a deliberate exception for these records to match a specially-generated “Unknown” member of the dimension. Under the default behavior, a solution developer following The Kimball Method would have defined Unknown member records in every dimension table and then mapped the unmatched keys in the ETL process.
For the purpose of this discussion and as far as we’re concerned; SSAS Tabular, Power Pivot and Power BI Desktop are the same. All these tools, which are derived from different versions of the same code base, all deal with mismatched keys the same way.
SSAS Tabular is significantly different from multidimensional in that unmatched fact rows simply flow through model processing and a blank member is generated in each affected dimension table. Visually, this shows up as an aggregated value for an unlabeled heading. Casual users will miss this and won’t notice that a certain number of records have fallen through the cracks. There are currently no properties or settings to change this behavior. In fact, a Connect Item filed back in Feb 2014 was dismissed as a low priority suggestion. I’m not debating the issue but just informing you that this is how it works. I’d personally like to at least have the option to label the blank dimension record as “Unknown”. I’ve found some creative ways to work-around this limitation. The best documented approach is this one that Javier Guillen posted in 2012.
One of the reasons I like SSAS Tabular is that it’s simple. But, “simple” means that is has less “stuff”, less “clutter” and fewer features than its more complicated ancestor. I supposed that if every SSAS practitioner were able to get the product developers to add their favorite feature, we might end up right where we started – and that’s a dilemma. This is an over-simplification as there is always room for product improvement.
This is a topic worthy of deeper discussion so it’s on the list of items for me to explore in follow-up posts. Please let me know what you think, post your questions and provide some feedback.
The DAX function documentation in MSDN can be found on separate pages, organized by function categories and then details for every function. This is useful but you can’t view all of the functions in one searchable page. If you don’t know exactly what you’re looking for, it can be a bit of a hunting expedition. I find that sometimes it is convenient to have all of the documentation in one list, making it easier to find the function I’m looking for.
I’ve organized the complete set of 2016 DAX functions for SSAS Tabular, Power Pivot and Power BI in two different views; grouped by function category and in one single list. Using Excel Online, the lists can be sorted, searched and filtered. I hope this is useful.
This post is really a collection of field notes and some lessons learned from recent project experience. I’ve done plenty of SSAS Tabular projects over the past few years – usually starting with a Visual Studio project rather than Power Pivot. I’ve also done a bit of Power Pivot work for clients. These projects were either delivered to an analyst using Excel on their desktop or some business users through SharePoint. But, authoring in Excel Power Pivot and deploying to a server-hosted Tabular model has been mainly theoretical up to this point so I thought I’d share my experience. Continue reading →
I’ve just finished a series of four articles for SQL Server Pro Magazine, along with sample projects and hands-on exercises. The series will take you through SSAS Tabular model design from start to finish, using the Adventure Works sample data in SQL Server 2012 or 2014. Here are links to all four articles followed by an excerpt from each.
Starting Your Modeling Career with Analysis Services Tabular Models Part 1
Four-part series
This is the first of a four-part series about getting started with Tabular model design using SQL Server Analysis Services in SQL Server 2012 and 2014. You will learn the basics from start to finish and build a complete solution. A sample project is provided for for each stage of the solution so you can follow-along with each article. Continue reading →
It was a great honor to be asked to join my associates from SolidQ at the Microsoft Virtual Academy Studios in Redmond and talk about how to upgrade to SQL Server 2012 and 2014. These recordings, which are also on my YouTube Channel, include the material I covered in these sessions. The entire series of studio presentations are hosted on Channel 9 and here at the MVA with accompanying assessment surveys and resources.
In these studio sessions, I am joined by my fellow authors of the 429 page SQL Server 2014 Upgrade Technical Guide; Richard Waymire, Ron Talmage and Jim Miller from SolidQ. Jim and I were responsible for the Business Intelligence content. In our sessions Jim covered SSIS and SSAS Multidimensional, and I covered SSAS Tabular, BI tools and SSRS. In this edited portion of the SSAS session, Jim begins with a brief summary of multidimensional upgrade options and I continue to discuss opportunities to implement SSAS Tabular solutions. BI topics apply equally to SQL Server 2012 and 2014 upgrades. Continue reading →
This post actually applies to both multidimensional (cube) projects and Tabular projects which manage data source connectivity in a similar fashion. The multidimensional design experience is different but the mechanics of the SSAS engine are essentially the same.
When loading data from a SQL Server data source into a Tabular model project, chances are very good that you’ll see an error on the first attempt. I’ve been creating SSAS projects for over over 15 years and, even though this is a relatively simple process, I still usually work through the initial configuration in trial-and-error mode. So, let’s call this a normal part of the process.
A Word of Advice
In Tabular model design, a common practice is to select all the tables and columns, rename objects and apply filters before finalizing the import step. If something goes wrong and you can’t connect to the data source, you’ll lose all this work and will need to start over. Test connectivity with one table before going through this aggravating cycle.
Here’s an example of an “ouch” moment after taking a half hour to make all the table/column selections. Note the lack of a “Back” button!
Getting Connected
When connecting to a SQL Server data source in the SSAS project designer, there are two different identities used at different times. When you initially create and test the connection in the designer, you will authenticate as yourself – the user logged into the local computer. However, when you actually load data into the table or process the model, even in the designer, Analysis Services authenticates using the identity of the SSAS service. If the SSAS instance is installed on your local development machine and you just used the default settings during setup, this is an account generated by SQL Server setup. If you’re using a production-like SSAS server, your system administrator may have configured the service to run under a specifically-defined domain principal. If the latter is the case, you’ll need to get your DBA or sys admin involved. If you have an isolated development environment, with everything install on one computer, the following instructions will take care of it.
I’m running all my SQL Server services on my local development machine and created a Tabular project that will use the AdventureWorksDW2012 sample database. The SSAS instance I plan to use for my workspace database is the local instance. First, let’s get the service account for the SSAS instance. The easiest way to view local services is to right-click “My Computer” or “This PC” in Windows Explorer and select “Manage”. Navigate to the Services node like so:
I have two SSAS instances installed and the default instance is called MSSQLSERVER. Double-click to view the Properties dialog and select the Log On page:
As you can see, the default service account is “NT Service\MSSQLServerOLAPService”. I’ve selected this text and copied it to the clipboard. I’m done with the Services and Properties windows.
Now, I open SSMS and view the Logins for the SQL Server instance my SSAS project uses for the data source. After connecting to the relational instance, expand the object explorer, Security and Logins. Two logins are needed; one for myself and one for the SSAS service account. Since this is my development machine and had installed SQL Server, I already have a login for my Windows user account and I’m a member of the sysadmin server role which grants me access to every database on the server. If you’re not a privileged user on this SQL Server instance, you will need a login created for your Windows credential and will need to at least be in the db_datareader role for the target database (example below).
To create the service account login, right-click Logins and choose New Login… Paste the service account name here and move to the User Mapping page. Select the target database and then check the box next to the db_datareader database role.
Click OK to save changes and you should be good to go.
Back in the Tabular model designer, I’m going to import only one small table to check connectivity. Once verified, I’ll click the Existing Connections toolbar icon, select all my tables, select and de-select columns, apply filters and then load the data with successful results. As soon as I see record counts, I know I’m in the clear…
Just One More Thing
(Columbo reference)
On a different but related topic since you’re getting your development environment ready… If you’re not a local administrator on the SSAS machine you’re using for the Tabular workspace database (typically a development machine), you need to make sure your Windows account is in the the Server Administrators role for the SSAS instance. If this isn’t the case, you’ll get an error when you process the model. To do this, connect to the SSAS instance in SSMS, right-click the server and open Properties. On the Security page, use the Add button to add your Windows account.
Is is possible to duplicate the same many-to-many relationship behavior in VertiPaq that we have in SSAS multidimensional?
Since Tabular model projects were introduced in SQL Server 2012, one of the major blocking points for adaption has been the lack of out-of-the-box support for many-to-many relationships. I’ve been using, demonstrating and showcasing SSAS Tabular and Power Pivot for at least three years and in that time it became apparent that, even though this might only be a perceived limitation in many cases, for quite a few enterprise IT shops; many-to-many support has become a litmus test for whether this technology is ready to be taken seriously. Honestly most business data models don’t need many-to-many relationships to be useful but it is still a reality. This week Chris Webb posted about bidirectional relationship support in the new Power BI designer and demonstrated an example using sales of fruit that can exist in different categories.
The Power BI Designer is the first publically-available product from Microsoft that supports M2M relationships in a VertiPaq model and I wanted to conduct a test to see if I could duplicate the same M2M functionality as we have in a familiar multidimensional model, using the Adventure Works sample data. So, here goes…
Many-to-Many Example in SSAS Multidimensional
In the Enterprise examples for the Adventure Works Cycles sample data, the Adventure Works cube contains two many-to-many dimensional relationships. Here is one of those relationships; in the dimensional usage grid of the cube editor, you can see that the relationship between the Internet Sales measure group and the Sales Reasons dimension is defined as many-to-many:
The M2M relationship involves three tables in the data source view with the Internet Sales Reason Facts table bridging any number of Internet Sales Facts records to any number of Sales Reasons records. Not in view is the DimCustomer table which is related to Internet Sales Facts.
A measure group named Internet Customers contains a distinct count measure based on the CustomerKey column in the Fact Internet Sales table. This measure will exhibit behavior unique to M2M aggregation as you’ll see in these queries:
In the first query, the Customer Count measure is grouped by Sales Reason on rows. If we were to add the result for each group, the total would be 24,135. This is incorrect because it’s a non-additive measure – the same customer is counted for placing sales orders for different reasons so some customers are counted multiple times:
This is evident by replacing the Sales Reason attribute with the All member on rows, showing that there were actually only 18,484 distinct customers placing orders:
Narrowing the results to only one customer helps to explain. This is the key value for customer “Aaron Zhang”. You can see that Aaron placed two orders with a different reason for each:
Again, replacing the Sales Reason attribute with the All member shows that there was really only one customer:
Many-to-Many Example in a VertiPaq Model
To get started, I opened the Power BI Designer and imported the same tables from AdventureWorksDW2012. I’m using the Power BI Designer Preview as of early January, 2015. Because we don’t have a Power Pivot interface in the designer (yet?), we’re essentially using Power Query to populate the VertiPaq semantic model.
The FactInternetSales and FactInternetSalesReason tables are related using two columns: SalesOrderNumber and SalesOrderLineNumber. These columns needed to be converted to strings and then concatenated into a single column to facilitate the relationship. This is a common practice in Power Pivot/Tabular.
Using the Manage Relationships dialog, the tables are related with these three relationships:
I switched to the Report page which finalized the data load and populated the model. When I started, I was concerned about not being able to explicitly define DAX measures but it was a simple matter to change the field aggregation for the CustomerKey to “Count (Distinct)”.
Using the Power View report tool, it was also a simple matter to verify the results and duplicate all of the query results as before. Here, you can see that Aaron Zhang placed two different orders for different reasons and the total reflects the non-additive behavior of this many-to-many relationship.
In conclusion, many-to-many relationships are now supported in this implementation of VertiPaq and it behaves the same as in multidimensional cubes! This is very good news and may give us a good idea of what to expect in the near future.
Please join my associates and I for an all-day SQL Server Upgrade workshop on November 3rd
If you are planning an upgrade to a newer version of SQL Server, you won’t want to miss this all-day, preconference workshop.
Join John Martin, Sr. Technology Evangelist from Microsoft, will spend the morning discussing migration planning from SQL Server 2005.
In the afternoon, SolidQ mentors Ron Talmage, Richard Waymire, Jim Miller and Paul Turley will talk about and demonstrate upgrading specific workloads (projects and features) from older product versions to newer product versions. We will introduce the comprehensive SQL Server 2014 Upgrade Whitepaper that we recently wrote for publication by Microsoft.
Additional to upgrading specifically from SQL Server 2005 to SQL Server 2014, we will also talk about upgrading from SQL Server 2008 and 2008 R2 to SQL Server 2012 and 2014.
An upgrade and/or migration is far more than just a case of moving a database or installing a new version of SQL Server, there have been so many changes since SQL 2005 arrived that we need to do a lot of tasks to ensure we have a successful upgrade project.
This session will guide you through the process, looking at not only the technology but the methodology, processes and tools that we have at our disposal to make sure that when we do move from SQL Server 2005 to 2012/2014 or to the cloud with SQL Server in an Azure VM or Azure SQL Database that we do so in a way that we can be confident of success. We will take a special look at workload-specific upgrade needs for OLTP, HA, SSAS/SSRS, and SSIS environments.
In my afternoon section, I will demonstrate the capabilities of SSAS Tabular models and discuss upgrading and migrating Reporting Services from 2005, 2008 and 2008R2 to 2012 and 2014.
Requirement: Allow users to enter comments after they review results from a Power Pivot model and then show the comment text in the Pivot Table report results. Here’s a quick example of the final result. My example uses Flintstones characters in-place of employee or customer names and a simple measure value. After reviewing the results, the user moves to another worksheet and enters a comment for the date and person with an expiration date for the comment. After refreshing the report, each cell with a comment is indicated with a little “dog ear” and the comment is displayed by clicking the cell.
In practice, the data model could be in a server-hosted SSAS Tabular model, Office 365/Power BI or SharePoint-hosted Power Pivot workbook, or a desktop Excel file. In one real world application, if entries don’t meet company standards, the person reviewing the data entry can enter comments about these late entries and other policy violations. A comment has an expiration date and in the production solution, the comment expiration is also dependent on a calculated “severity level” which can be baked into the expiration logic.
My first thought when considering the need to incorporate text strings into a semantic model was that it might not be a good idea to store large text strings in a Power Pivot or Tabular model. After all, the tabular model technology is really optimized to handle numeric and redundant data. Upon carefully talking through the requirements, we’re pretty comfortable that the number of columns will be fairly low and the length of the comment strings should be conservative. The VertiPaq/VertiScan engine is actually pretty forgiving about having “stuff” added to a model that it will ignore except when it’s being used in a calculation or query. So, that’s good – my comment rows shouldn’t get in the way of core features like measure aggregation and dimension navigation. I’m really curious to find out how viable this is in a larger-scale application so if you end-up doing this with a lot of data or have the opportunity to test the limits of this technique, please share your results.
With that in mind, I took the plunge and built a simple proof-of-concept solution; which is what I’m demonstrating here. If the storage and query engine are OK coexisting with these new column fields, the big question that remains is how will the query engine react when we start to send it some unconventional long string results. Will these long values just get crunched through the query engine expressions along with the usual measures without making a ruckus.
Now remember that this is an über-simplified model and that the real one contains about 30 tables, but the major components are the same. The main fact table (tbl_fact) contains keys related to some outside tables and a measure called “Sum of Sum Value” which aggregates the SomeValue column for every Person in a pivot table report. So, here’s the solution:
The Comments table (tblComments, which I’ll call “Comments”) is essentially fact table related to Dates and People:
The source for the Comments table is a worksheet/table. Here are some of the comments:
In the production solution, we want the comment to show up until the expiration date. Rather than waiting for the system date to change, I’ve added a table of EffectiveDate values to use as a slicer for prototyping. The DAX measure expressions are added to the calculation area in the Comment table.
I’ll start with a simple version of the the CurrentComment measure before adding the expiration logic:
Current Comment:=IF(HASONEVALUE(tbl_Comments[Comment]), VALUES(tbl_Comments[Comment]))
it’s important to test the comment for only one value using the HASONEVALUE function to avoid an error. The VALUES function just returns the column value as a string. Now, I’ll add the expiration logic which uses the first measure:
Once this is all tested and working, we can just substitute “TODAY()” in-place of “FIRSTDATE( tblEffectivedate[EffectiveDate] )” to use the system date.
Now, to add the Excel comments. This is the fun part (as if it hasn’t been exciting enough thus far!)
Add a pivot table to a new sheet name “Report”. Add Dates and People on rows and the “Some of Some Value” measure on columns (which actually adds it to the VALUES). Add the “Comment Until Expiration” to columns as well. Select the last column added to the pivot table and hide it. In my example, this is column D.
You’ll need to have macros enabled and trust VBA in Options > Macro Settings.
Use Alt+F11 to open the Visual Basic for Applications editor and enter the following code into a code module (usually Modules > Module1):
‘***********************************************************
‘ Paul Turley, 9-22-14
‘ Dynamically add comments to Pivot Table value cells based
‘ on an adjacent hidden column.
‘***********************************************************
Dim ws As Worksheet
Dim pt As PivotTable
Const iPTRowOffSet As Integer = 3 ‘ Pivot Table starts on this row
Const iPTHeaderRows As Integer = 2 ‘ Number of Pivot Table header rows
Const iPTClearSafetyRows As Integer = 100 ‘ Number of rows beyond the last PT row that get cleared (in-case the filtered table shrinks by a large # of rows)
Dim iRows As Integer
Dim iRow As Integer
Dim CommentRng As Range
Dim ValueRange As Range
Sub SetObjects()
Set ws = Worksheets(“Report”)
Set pt = ws.PivotTables(1)
End Sub
Sub btnCreateComments()
SetObjects
ClearAllComments
CreateComments
End Sub
Sub CreateComments()
For iRow = (iPTRowOffSet + iPTHeaderRows) To pt.RowRange.Rows.Count + iPTRowOffSet
Set CommentRng = pt.ColumnRange(iRow, 2)
If CommentRng.Cells(0).Value <> “” And iRow >= iPTRowOffSet Then
Set ValueRange = Worksheets(“Report”).Cells(iRow + 1, 3)
If ValueRange.Comment Is Nothing Then
ValueRange.AddComment (CommentRng.Cells(0).Value)
End If
ValueRange.Comment.Visible = False
End If
Next
End Sub
Sub ClearAllComments()
SetObjects
For iRow = (iPTRowOffSet + iPTHeaderRows) To (pt.RowRange.Rows.Count + iPTRowOffSet + iPTClearSafetyRows)
Set ValueRange = Worksheets(“Report”).Cells(iRow + 1, 3)
If Not ValueRange.Comment Is Nothing Then ValueRange.Comment.Delete
Next
End Sub
Note that there are some assumptions made in this code. I don’t think the pivot table needs to start at cell B3 but that’s where mine is. Note the constants at the top that are used to skip the header cells. These values will need to be adjusted if you make changes.
Next, open the code module for the workbook (named ThisWorkbook) and add the following code for the SheetCalculate event (three lines added between the generated Private Sub & End Sub lines):
Private Sub Workbook_SheetCalculate(ByVal Sh As Object) SetObjects
ClearAllComments
CreateComments End Sub
Now, let the magic happen. Every time the sheet is calculated (this event fires every time it is connected, filtered, sliced or the pivot table is changed), the code iterates through the measure column cells and removes all the Excel comments and then iterates through and adds new Excel comments using the values in the adjacent hidden column. The hidden column contains the calculated “Comment Until Expiration” measure text values.
You can add or remove comment text in the Comments sheet, move to the report sheet and use Data > Refresh to update the model. Use slicers to change the EffectiveDate and filters to add or remove row values. With each action, you should see comments come and go. To view a comment, click the cell and hover the mouse pointer over the dog-ear in the upper-right corner of the cell. There are more comment options on the Review ribbon. Make sure you save this as a macro-enabled workbook (.xlsm) file.
As always, I’m interested in feedback, thoughts and suggestions.