I’ve created a simple query performance logging tool for Analysis Services, called the SSAS Performance Logger. The tool allows you to navigate through the metadata for a tabular model and select from measures and table attributes to build a simple query which is executed and timed. The query can be executed many times to get an accurate sampling of test variations and to test performance while various loads are placed on the server.
I literally created this over the weekend and plan to add additional capabilities as time allows – so please check back. To provide feedback, add comments to this post.
My initial objective was to choose two different measures that were alternate methods of calculating the same value, run them side-by-side and then see which performed best. Then it occurred to me that we can run any number of measure performance tests in iterative loops and compare the results by playing back the captured log results. Since the results are captured in a log database, the test results for each measure or query can easily be compared and analyzed using tools like Excel and Reporting Services.
One of my main objectives for a future version of the tool is to add logging for server performance counters like memory usage, thread counts and CPU capacity; while these queries are running.
What you will need:
I’ve developed SSAS Performance Logger for use with SSAS Tabular in SQL Server 2012. To manage the logging database, it uses the OLEDB data provider for SQL Server 2012 which may be backward compatible as far back as SQL Server 2008. It also uses the ADOOMD data provider version for Analysis Services. When MDX support is added to a future version, it should support SSAS sources as far back as SSAS 2008.
Microsoft .NET Framework 4.5
ADOMD 6.0 (can be installed with the SQL Server 2012 SP1 Feature Pack)
An instance of SQL Server 2008 or better with administrative/create database rights for the logging database
With these components installed, you should be able to copy the executable to a folder and run it.
Download the zip file and extract it to a folder on your system
Verify that the required dependent components are installed
Run SSAS Perf Tester.exe
The first time it runs, the application will check for the logging database and prompt you to create it
On the SSAS Query page, enter the Analysis Services server or instance name and press Enter
Select a model or perspective, measure and table/attribute from the drop-down list boxes to build the query
Click the Start button to run the query once and see how long it took to run
On the Options page, type or select the number of times to repeat the query
Click the Start button the run the query in a loop. The results are displayed in a line chart showing the duration in milliseconds for each execution
Every query execution is logged for later analysis. Here is a view of the logItem table in the SSASPerformanceTest database:
As I continue to explore the many facets of DAX functions, I’m finding more creative ways to add business value and simplify user experiences. In this post, I will demonstrate how to use slicers to pass values into a calculated measure in an SSAS Tabular model to dynamically change the results, parameterizing the date interval, number of periods and the type of aggregation used in the measure.
The DAX language can be both remarkably easy and deceptively complicated to solve business problems. I’ve spent many late nights working on a problem that just seemed like I was just one step away from solving. When I began using DAX calculations with Power Pivot and SSAS Tabular models for reporting, I thought that it wasn’t possible to parameterize an expression and control the results dynamically like we’ve done with shell dimensions and scoped MDX calculations in a cube. In fact, some of my constituents who are expert multidimensional SSAS practitioners think of DAX as a toy version of MDX for its apparent lack of flexibility and dynamic logic. Using any technology to solve complex problems requires us to work with it rather than trying to force it to behave like another tool. I do have to admit that I was a bit proud of myself for working out a technique to use slicers as parameters in this way but I’m certain that optimizations and enhancements can be made to this most basic example. Thanks to Hrvoje Piasevoli and Darren Gosbell for their ideas.
The example below shows a pivot chart in Excel based on tabular model. The Accumulated Departure Delay measure visualizes the average delay for five selected US airlines for each month in 2013. In this case, for each month, the measure calculates the 3 month rolling average:
The conventional standard in most model designs is to add a measure for every requirement, which can make our models large and cumbersome to navigate. For example, if the business wants to have measurements for “3 month rolling average”, “6 month rolling average”, “7 day accumulated sum” and “14 day accumulated sum”, we would define a separate measure for each of these requirements. Using a dynamic calculation, only one measure is defined and slicers are used to modify its behavior.
Any combination of Period, DateInterval and Aggregate slicer values can be used to dynamically change the results. For example, selecting Period=15, DateInterval=Day and Aggregate=Sum would display the sum of departure delay values for the past 15 days. Since the X axis of this chart uses a Day-Month-Year hierarchy, I can double-click the month of October to make Excel show each date in that month and the rolling accumulation for each day.
Note that all of the calculation logic could be handled in a single calculated measure but while developing a solution, I usually break the problem down into bite-size pieces. It might make sense to leave these as separate measures, perhaps hidden from users, or it might make sense to combine all of the logic into a single, more complex measure expression.
To simplify, I’ll leave the Aggregate slicer out of the scenario and just hard-code the measure to perform sums – and then I’ll change this later. Here’s a pivot table with the Calendar hierarchy on rows and three different measures displayed as values. These measures are Date Range Start, Sum of Dep Delay and Accumulated Departure Delay:
I want to aggregate groups of daily values using a date range starting with a date that is X number of intervals (e.g. days, months, quarters or years) before the current reporting date. In the model, I’ve imported three tables without adding an relationships. These are shown in the bottom right side of the diagram.
Here are the values in these tables:
To resolve the first numeric Period value selected from the slicer, I can used the MIN function. If no values are selected, the first row in the table is returned. If multiple values are selected, the first one is returned and if one is selected, that’s the one returned.
Since the DateInterval column is a text data type, I can’t use MIN but the FIRSTNONBLANK function will essentially do the same thing. Note that there are two required arguments but I’m just passing the value True() since we don’t want to filter any rows. Here’s the script for the Date Range Start measure:
With the dynamic start date sorted out, the actual measure calculation is fairly simple. The Accumulated Departure Delay measure applies the [Sum of Dep Delay] measure filtered using the DATESBETWEEN function:
Accumulated Departure Delay :=
[Sum of Dep Delay],
[Date Range Start],
MAX ( ‘Date'[Date] )
At this point, the pivot table example will work. After connecting a new pivot table to the model, I create two slicers based on the Period and DateInterval attributes and connected them to the pivot table.
Adding the Aggregate element is a variation of the same theme. I’ve added another intermediary measure named Aggregate Dep Delay and then used it in-place of the Sum of Dep Delay measure in the script for the Accumulated Departure Delay measure.
Finally, I add another slicer based on the Aggregate table allowing the name of an aggregate function (e.g. “Sum”, “Average”, “Min” or “Max”) to be selected and used to dynamically change the calculated result. Referring back to the first screen capture, a pivot chart is used with my Calendar hierarchy (consisting of Year-Month-Day attributes) which enables drill-down on the X axis, a filtered set of Airline Carriers is on the legend and the Accumulated Departure Delay measure is added to the chart values.
This series of video sessions will demonstrate how to create a business intelligence solution with sets of business data from multiple sources. In the first session, I import business records from 5 different D&B data feeds using the Windows Azure Marketplace with desktop BI tools in Excel 2013 ProPlus. In future sessions, I will combine public data sets, use features of Power BI and migrate to an enterprise scale BI solution with SQL Server, SSAS tabular and SharePoint.
D&B data fees in the Windows Azure Marketplace can be used to get detailed, current demographic and financial information about specific businesses or companies in different regions and categories. These business records can be used for things like market analysis, sales, verification and research.
Session 1 – Getting started (48:00) Analyzing Dunn & Bradstreet business records data from the Windows Azure Marketplace using Excel 2013 Power Query, Power Pivot and Power View to create a basic dashboard
To watch this video with more playback controls, you can also view it on ScreenCast here.
Session 2 – Enhancing the Solution (32:00) Combining Dunn & Bradstreet Customer Records with Public & Enterprise Data Geographic visualization with Power Map, publishing to Power BI and using Q&A to search a model with natural language
Good technology books usually come in one of two forms. Some of the books on my shelves go deep technically, contain useful tidbits of code but read like the phonebook. Just a few are interesting and insightful. This book is a rare gem that does both. Paul te Braak is well-known in the Business Intelligence community for his expertise and contributions and he delivers a unique guide that starts with the basics and proceeds to cover all of the essentials with depth and practical examples to solve some challenging business problems.
You might expect a book that introduces fundamental concepts to gloss-over advanced topics and avoid complex applications. This book covers the essentials of data modeling and analysis with Excel and Power Pivot in language that is plain and easy to understand but it doesn’t stop with the basics. It covers practical and useful applications without insulting the reader’s intelligence. As an experienced data modeler, I found several useful techniques and new methods to use the tools and language. Paul’s coverage of practical techniques spans the spectrum of business applications and product features. This is a rare book that is not only a good tutorial with many hands-on examples that can be repeated by the reader but it’s also a great reference of useful techniques and code samples.
The integration of Excel features such as ranges, tables, pivot tables and pivot charts with the additional features of Power Pivot and Power View that extend and enhance these capabilities
Examples and instructions are directed at Excel 2010 users and the author compares some the different features in Excel 2010 and Excel 2013.
Fundamentals of the DAX calculation language
Importing data as text, different date formats and implied data type columns
Beyond the basics, a schema.ini file is used to define column data types
Importing data from a database, working with database tables, views and queries, managing connections and challenges encountered running the table import wizard multiple times
Data feeds using OData and using a Reporting Services report to provide a data feed
Decisions a designer makes to enable the user’s experience when browsing a model. This includes sorting values, navigating hierarchies that enable drill-down interaction.
DAX “X” functions (SUMX, MINX, etc.) to perform row-level aggregation
Working with parent-child hierarchies using specialized DAX path functions.
Advanced browsing features, adjusting pivot table options to optimize the user experience
Building and using KPIs and using alternate table relationships
Time calculations and date functions. This chapter covers running totals and totals to date
Date part aggregate functions (MTD, YTD, etc.),
Essential data math and comparisons
LastYear and PriorPeriod functions, TotalYTD
Manufacturing calendar, working with “445” dates
Creating a dynamic relative time measure, using a shell dimension table
Using DatesBetween to show the average value for the past 10 days
Apply advanced modeling technique to bin, sort and rank values for reporting
Expand concepts introduced in chapter 3, using the DAX “X” functions to perform row iteration in advanced financial applications
Defining and working with many-to-many relationships. This is often no trivial task to completely understand many-to-many relationship requirements and to apply a working solution that provides the intended results
Addressing inventory and stock-keeping challenges
Conditional aggregation at different levels
Budgeting and forecasting vs actuals
Programming Excel to enhance the users experience
Excel VBA event programming to respond to slicers
Using cube functions
Interacting with charts and slicers
Building solutions for the enterprise
Using the SSDS Tabular designer
Migrating Power Pivot models to Tabular server solutions
What I’m going to show you is not rocket science. This one gets filed under the category of “well, duh. That was easy …now that I see how to do it.”
Power View sets the maximum value for a chart series based on the maximum unfiltered axis value; pretty simple but not too flexible. Let’s say that I want to show the progress of project completion. If I add a percentage type measure to the Value of a column chart with no fields on the Axis, it will plot a single column “gauge” chart with the maximum value (in this case 65%) somewhere near the top of the chart. I suppose that’s what we asked it to do but there is no way to manually set the maximum range of the axis scale. Oh, what to do?
Like most beyond-the-basics capabilities in Power View, the answer is to enhance the semantic model – and this one’s easy. In the PowerPivot or SSAS model, create another measure. I’ll do it right next to the existing Percentage calculation named “Project Pct Complete”. The objective is to get the gauge visual (our no axis column chart) scale to add up to 100% so what we need is another measure that when added to the first one, adds up to 100%. I’ll call that calculation “Remainder Pct Complete” using the DAX expression:
By subtracting the % Complete from the number one, the result will always be the remainder of 100%. I’ll format the new measure as a percentage.
Now I add both measures to the chart/gauge like so:
I haven’t quite got the legend to work the way I want it to. In this case, I’d probably just hide and and maybe show data labels for the two series values. That’s all up to you, of course. For extra credit, let’s add another progress gauge that shows the overall progress even when a slicer is used to filter the data. First of all, test this. I’ve added a Priority slicer and selecting one or a combination of items slices the project rows and changes the progress gauge value.
Back in the model, I add two more measures starting with the existing expressions. You can just copy one of the measure cells and paste it into an empty cell. This must be done on-at-a-time. The expression is enhanced using the CALCULATE and ALL functions, like this:
This says “Apply the Project Pct Complete measure calculation to whatever filter context is provided in the CALCULATE function. In this case, we are says “ignore the natural row or filtering context of the query and always apply the calculation to all rows in the Projects table.”
Now, when a slicer or filter is used on the report, the Overall gauge continues to show the same percentage while the rest of the report, including the Progress gauge shows only filtered values.
A client came to me with a challenge this week that, although simple in concept, posed an interesting challenge that I hadn’t solved before using a tabular model and DAX. She had cost information for several hospitals in a PowerPivot model. Using an Excel pivot table published to a SharePoint site, users were to select any combination of hospitals using a slicer and compare relative costs for these facilities and other measure values. After working on the problem and have two working solutions. I don’t think one is necessarily better or worse than the other but each have advantages over the other. Thanks to Rob Collie and Marco Russo for their contributions as we’ve discussed potential solutions.
In production, the model has multiple tables and a lot of cost information but the essence of the problem is represented in this sample:
A fact table contains cost information and a HospitalID key, which is related to a lookup table of hospitals. Now the challenge is to allow users to select two different hospitals and return the name and related measure values for the selected hospitals so we can perform comparisons on the same row. Later, in production, users will select 3 and maybe up to 6 hospitals for comparison. My example only shows two. My first thought was to just add the hospitals to rows in the PivotTable before the measures but they have some specific statistical calculations to perform with a group of selected hospitals that can’t be done if these are rows in a PivotTable.
Solution #1: One slicer per selection
I added two copies of the Hospital table to the model without any relationships. These are “disconnected tables” meaning that including these in queries will have no effect on filtering data in other tables unless used explicitly in calculation. They will be used to drive slicers and filters, and used in special calculations.
In an Excel PivotTable or Power View report, a slicer is created based on each of these two tables.
Calculated measures are added to the fact table to calculate and filter the existing Total Cost measure. Of course, similar calculated measures could be created to calculate any other measure as well.
Selected 1 Total Cost:=CALCULATE([Total Cost], FILTER(Cost, [HospID]=MIN(SelectedHospital1[HospID])))
Selected 2 Total Cost:=CALCULATE([Total Cost], FILTER(Cost, [HospID]=MIN(SelectedHospital2[HospID])))
I used the MIN function only to make the expression syntactically correct and because the HospID field in the fact table can only be compared to a single value in the lookup table. Functionally, this doesn’t really do anything if the user to chooses one value in the first slicer. If they happen to select more than one item, this will use the one with the smallest HospID value. This technique works fine with numeric values but aggregate functions can’t be used with text type fields. To return the selected hospital name in another calculation, I used the FIRSTNONBLANK function. Notice the required second argument, normally used to filter values in a field. In this case, I just used an expression that always evaluates to True:
A simple calculation is used to calculate the difference between the first selected item cost and the second selected item cost:
Selected Diff Total Cost:=[Selected 1 Total Cost]-[Selected 2 Total Cost]
In an Excel PivotTable, I add a slicer for each of the two lookup tables and the calculated measures for the values. The final product is easy to use. We select any hospital from the first slicer and any hospital from the second slicer to compare costs between them:
Solution #2: One slicer to select multiple items
This solution is a little more elegant in my opinion. I need to give credit to Marco Russo for working out the DAX calculations for this. He has a gift for looking at a problem differently and finding a unique solution. Rather than using one table for each selected facility, one table is used to drive a multi-select slicer:
If this were a multidimensional model, the problem could be addressed in MDX by using the ITEM property for a set of members. For example, to return the second selected hospital in a calculated member expression, an expression like this could be used: [Selected Hospital].[Hospital Name].ITEM(1)
…but DAX doesn’t work that way. We think in terms of Rows and not Sets and there is no Items collection.
To get the first selected hospital, we can use the TOPN function to get the first item in the set of rows returned by this table. This expression calculates the total cost for that item:
The TOPN function needs 4 arguments: the first tells it how many rows to return (in our case only 1… the first row), the second is a table (the VALUES function turns a bunch of field values into a table), the third is the Order By expression, and the fourth argument specifies descending or ascending sort order.
To return the second selected item…
Since DAX has no way to just go get a specific row by row number or ordinal position, now we have to get really creative. Marco’s solution is to use two different TOPN expressions. The inner expression gets the first two rows and returns the results in descending order, using the expression: TOPN( 2, VALUES( SelectedHospitals[HospID]), SelectedHospitals[HospID], 1 ). The outer TOPN expression just returns the first row from that result:
After working on a couple of serious tabular model projects, I’ve been talking to other BI practitioners lately about their thoughts regarding tabular model technology and I’m finding that there is little consensus among all these industry experts about which option to recommend for new BI projects. At SolidQ, I work with some of the most experienced Business Intelligence experts in the industry. At conferences and in discussion groups, there has been a lot of discussion about the choice between Analysis Services multidimensional models and the newer tabular semantic model technology introduced in SQL Server 2012. There are pros and cons for either side but I think there are some important factors to consider. Among some of the most experienced Analysis Services consultants, I have heard that some are primarily using and recommending tabular for most new projects and others are arguing that tabular offers little significant value over the tried-and-true multidimensional cubes and MDX script solutions.
As is typical for a newer product, the usability experience for the model designer isn’t perfect. In some ways, it’s different and just may take some getting used to, but in other ways there is clearly room for improvement. The question now is; do the advantages of tabular vs multidimensional outweigh the somewhat rocky design experience? I’m not taking sides in this comparison but merely offering an objective analysis of the state of tabular model option in SQL Server 2012 Analysis Services, Service Pack 1. I expect this product to go through changes so for future reference, this information is for the released product at the time of this posting in June of 2013.
The core VertiPaq (or xVelocity) query and aggregation engine is stable & reliable. Originally developed about five years ago and released with PowerPivot for Excel and SharePoint in SQL Server 2008R2 over three years ago, this technology has proven to be ready for serious use.
Under most conditions, for analytic reporting, data scanned and aggregated from an in-memory data structure performs faster than other conventional options; including relational and multidimensional storage.
Tabular can be less complex than multidimensional, OLAP SSAS. The core design and usage concepts are easier for both those who design models and for those use use them for analysis and reporting.
Tabular models can be easier & faster to implement because the model structure is simpler and there may be fewer steps in the design process.
DAX, the core calculation expression language for tabular models, is fairly easy to learn. Fundamental DAX expression concepts can be easier to understand than equivalent MDX commands used in multidimensional modeling and calculations.
Comparing features, tabular can be hard to justify when compared to multidimensional. Traditional SSAS cubes still have more capabilities than tabular models and for someone who already has OLAP skills and background, tabular design is a new learning curve.
PowerPivot is a version 2 product. As such, it’s simple and fairly reliable in the majority of design scenarios. In some, mostly rare, cases, error handling and debugging capabilities aren’t as mature and robust as more tenured technologies like SSAS multidimensional or relational.
Tabular SSAS is a version 1 product. As such, it has a list of well-known design annoyances and features that could (and should) improve in the next product go-round.
The recommended design patterns & best practices, both from Microsoft development and support teams and from members of the expert practitioner community, are still emerging.
One model.bim file = one developer working at a time. A departure from the long-time SSAS project development environment where every object (such as data sources, dimensions and cubes) were defined in isolated files managed by the project; SSAS tabular manages all of the database objects in a single model definition file. The model designer treats the model is more of a black box than a solution comprised of independent objects. However, the fact is that most of the same objects we work with in multidimensional/cube projects are all defined as XML elements in this file. It can be opened and properties manipulated,and there are a growing number of third-party tools to provide enhancements. Regardless, it is one big project file tat would need to be checked-out in source code management as a single project asset.
The tabular SSAS support community is thriving but still small. A core group of trusted professionals from all over the world are the loudest voices right now. They promote tabular solutions and provide active and collective support for one another.
The DAX expression editor in PowerPivot & the Visual Studio-based SSDT designer is quirky. You have to know is strengths and weaknesses and be willing to use it in that context. It attempts to assists with code completion but in the end, it doubles-up brackets and duplicates, rather then replaces old code, that it suggests. No doubt that the experience will get better as enhancements are developed and released but we must live with a product today that is useful and reliable a lot of the time but, it leaves plenty of opportunity for improvements.
The entire tabular model must fit in memory. There’s no caching, swapping or distributed processing option for large models. This means that very large models need very large hardware and there is currently no scale-out option to distribute work loads or fail-over. Got a lot of data? Get a lot of memory.
After you get into serious data with 30, 40 or 50 tables and some complexity to your model, the version 1.0 SSDT tabular model designer can be cumbersome and error-prone. I’ve seen it stop responding and throw errors when there were no errors. I will say that it’s helpful and reliable most of the time but on occasion, when it falls down, I often save and close my work; shut down Visual Studio all together and the fire it back up.
My biggest peeve about the SSDT model designer is that all design work is performed while connected to the workspace database. This means that for every table and column rename,and for every single property setting change, this information is written to the model workspace database on the local SSAS tabular server, and after the changes have been committed, control is given back to the model designer.
Some key SSAS features are not currently supported. These include things like parent-child hierarchies, many-to-many relationships, cube actions and display folders. Actually, this is not entirely true; actions and display folders can be added using after-market tools (like BIDS Helper, available in the CodePlex library) and by hand-writing the XMLA script, but they are not currently supported through the SSDT model designer. There is simply a category of features that didn’t find their way into this first version of the product. There are work-arounds and methods to enable these capabilities but they’re not supported, or at least not to the the same degree as they are in multidimensional SSAS.
There is no doubt that in-memory, tabular model technology is the promise of the future. It just makes sense. Several vendors have come to the same conclusion and are developing products following this paradigm. Oracle just made a big announcement about adding in-memory column store to their future 12C product.
Data residing and processed in memory is faster than data residing in disk. This is what VertiPaq does; whether implemented as PowerPivot, an SSAS tabular model or as a SQL Server column store, it works efficiently and elegantly without the complexities and overhead of indexes, partitions, file groups and other techniques typically used to optimize on-disk data stores.
Even though tabular is fairly new, many useful & valuable features are supported today and work well.
PowerPivot models upgrade seamlessly to tabular models. This provides a path for business users and IT professionals to author models in familiar tools (Excel or Visual Studio) and then promote them to a server hosted environment.
Tabular models are managed and stored by SQL Server Analysis Services! Although some components of the tabular engine and the designer are new and still have wrinkles to be ironed-out, the core product is based on the solid and time-tested foundation of SSAS. This means that many of the features not implemented now will be available in future builds.
Client applications that support SSAS multidimensional will also support tabular. In fact, any application built to work with SSAS cubes will natively work with PowerPivot and tabular as if it were a cube. This is because SSAS tabular uses the same data provider that understands both MDX & DAX queries.
A Getting-Started and Survival Guide for planning, designing and building Tabular Semantic Models with Microsoft SQL Server 2012 Analysis Services.
by Paul Turley
This post will be unique in that it will be a living document that will be updated and expanded over time. I will also post-as-I-go on the site about other things but this particular post will live for a while. I have a lot of good intentions – I know that about myself and I also know that the best way to get something done is to get it started – especially if I’m too busy with work and projects. If it’s important, the “completing” part can happen later. In the case of this post, I’ll take care of building it as I go, topic by topic. Heck, maybe it will never be “finished” but then are we ever really done with IT business solutions? I have been intending to get started on this topic for quite some time but in my very busy project schedule lately, didn’t have a concise message for a post – but I do have a lot to say about creating and using tabular models.
I’ve added some place-holder topic headers for some things that are on my mind. This list is inspired by a lot of the questions my consulting customers, students, IT staff members and business users have asked me on a regular basis. This will motivate me to come back and finish them and for you to come back and read them. I hope that you will post comments about your burning questions, issues and ideas for related topics to cover in this living post about tabular model design practices and recommendations.
SQL Server Analysis Services is a solid and mature platform that now serves as the foundation for two different implementations. Multidimensional models are especially suited for large volumes of dimensionally-structured data that have additive measure values that sum-up along related dimensional attributes & hierarchies.
By design, tabular architecture is more flexible than multidimensional in a number of scenarios. Tabular also works well with dimensional data structures but also works well in cases where the structure of the data doesn’t resemble a traditional star or snowflake of fact and dimension tables. When I started using PowerPivot and tabular SSAS projects, I insisted on transforming data into star schemas like I’ve always done before building a cube. In many cases, I still do because it’s easier to design a predictable model that performs well and is easy for users to navigate. A dimensional model has order and disciple however, the data is not always shaped this way and it can take a lot of effort to force it into that structure.
Tabular is fast for not only additive, hierarchal structured data but in many cases, it works well with normalized and flattened data as long as all the data fits into memory and the model is designed to support simple relationships and calculations that take advantage of the function engine and VertiPaq compression and query engine. It’s actually pretty easy to make tabular do silly, inefficient things but it’s also not very hard to make it work really well, either.
James Serra has done a nice job of summarizing the differences between the two choices and highlighted the strengths and comparative weaknesses of each in his April 4 blog post titled SQL Server 2012: Multidimensional vs Tabular. James points out that tabular models can be faster and easier to design and deploy, and that they concisely perform well without giving them a lot of extra attention for tuning and optimization. Honestly, there isn’t that much to maintain and a lot of the tricks we use to make cubes perform better (like measure group partitioning, aggregation design, strategic aggregation storage, usage-base optimization, proactive caching and cache-warming queries) are simply unnecessary. Most of these options don’t really exist in the tabular world. We do have partitions in tabular models but they’re really just for ease of design.
What About Multidimensional – Will Tabular Replace It?
The fact is the multidimensional databases (which most casual SSAS users refer to as “cubes”) will be supported for years to come. The base architecture for SSAS OLAP/UDM/Multidimensional is about 13 years old since Microsoft originally acquired a product code base from Panorama and then went on to enhance and then rewrite the engine over the years as it has matured. In the view of many industry professionals, this is still the more complete and feature-rich product.
Both multi and tabular have some strengths and weaknesses today and one is not clearly superior to the other. In many cases, tabular performs better and models are more simple to design and use but the platform is lacking equivalent commands and advanced capabilities. In the near future, the tabular product may inherit all of the features of its predecessor and the choice may become more clear; or, perhaps a hybrid product will emerge.
Isn’t a Tabular Model Just Another Name for a Cube?
No. …um, Yes. …well, sort of. Here’s the thing: The term “cube” has become a defacto term used by many to describe the general concept of a semantic model. Technically, the term “cube” defines a multidimensional structure that stores data in hierarchies of multi-level attributes and pre-calculated aggregate measure values at the intersect points between all those dimensions and at strategic points between many of the level members in-between. It’s a cool concept and an an even cooler technology but most people who aren’t close to this product don’t understand all that. Users just know that it works somehow but they’re often confused by some of the fine points… like the difference between hierarchies and levels. One has an All member and one doesn’t but they both have all the other members. It makes sense when you understand the architecture but it’s just weird behavior for those who don’t.
Since the tabular semantic model is actually Analysis Services with a single definition of object metadata, certain client tools will continue to treat the model as a cube, even though it technically isn’t. A tabular Analysis Services database contains some tables that serve the same purpose as measure groups in multidimensional semantic models. The rest of the tables are exposed as dimensions in the same way that cube dimensions exists in multidimensional. If a table in a tabular model includes both measures and attribute fields, in certain client tools like Excel, it will show up twice in the model; once as a measure group table and once as a dimension table.
(more to come)
Tabular Model Design: The Good, the Bad, the Ugly & the Beautiful
I’ve taught a few PowerPivot training sessions to groups of business users (now, remember that Tabular SSAS is really just the scaled-up version of PowerPivot.) Admittedly I’m more accustomed to working with IT professionals and when I teach or work with users, I have to throttle my tendency to go deep and talk about technical concepts. In these classes, I find myself restating the same things I’ve heard in conference presentations and marketing demos about PowerPivot data sources, like “you can import just about anything into PowerPivot”. As I read the bullet points and articulate the points on the presentation slides to these users, I have this nagging voice in the back of my mind. I’ve spent many years of my career unraveling the monstrosities that users have created in Access, Excel & Visual Basic.
Whether stated or implied, there is a common belief that a PowerPivot solution doesn’t require the same level of effort to transform, prepare and cleanse data before it gets imported into a data model. For many years, we’ve been telling these users that it will take a serious effort, at significant cost, to prepare and transform data before we can put it into a data mart or cube for their consumption. In a typical BI solution, we usually burn 70-80% of our resource hours and budget on the ETL portion of the project. Now, using the same data sources, users are being told that they can do the same thing themselves using PowerPivot!
Data Modeling 101 for Tabular Models
One of the things that I really enjoy about building tabular models is that I can have my data in multiple structures and it still works. If the data is in a traditional BI “Kimball-style” Star schema, it works really well. If the data is normalized as it would be in a typical transactional-style database, it still works. Even if I have tables that are of a hybrid design; with some characteristics of both normalized and dimensional models, it all works beautifully.
Here’s the catch; one of the reasons we build dimensional data model is because they are simple and predictable. It’s really easy to get lost in a complex data structure and when you start combining data form multiple source systems, that’s where you’re likely to end up. Getting business data into a structure that is intuitive, that behaves correctly and gives reliable results can be a lot of work so be cautious. Just because a tabular model can work with different data structures doesn’t that you don’t need to prepare your data, clean it up and organize it before building the semantic model.
The classic star schema is one of the most effective ways to organize data for analysis. Rather than organizing all data elements into separate tables according to the rules of normal form, we consolidate all the measures that are related to common dimensional attributes and with a common grain (or aggregation level), into a fact table. The dimensional attributes are stored in separate dimension tables – one table per unique business entity, along with related attributes. Any group of measures not related to the same set of dimensions at the same level would be stored in their own fact table. In the example, Invoice measures that are related to stores and customers, recorded every quarter are in one fact table. The sales debit records for customers and stores that are recorded daily go in a different fact table. The account adjustments don’t record the store key but they are uniquely related to accounting ledger entries stored in the ledger table. Note the direction of the arrows showing that facts are related to lookup values in the dimension tables.
Exhibit 1 – A Fully conformed Star Schema
If you can pound your data into the shape or a star schema and this meets your requirements; this is what I usually recommend. It’s a simple and predictable method to organize data in a well-defined structure. Now, let’s look a variation of this approach that has characteristics of both the star schema and normalized form. We’ll call this a “hybrid” model.
The following hybrid schema contains two fact tables in a master/detail relationship. The cardinality of the Invoice and LineItem tables is one-to-many where one invoice can have multiple line items. This would be considered a normalized relationship with the InvoiceID primary key related to the an InvoiceID foreign key in the LineItem table.
The Invoice table contains a numeric measure called Invoice Amount that can be aggregated by different dimensional attributes. Those attributes, such as Store Name, Customer Name or any of the calendar date units in the Dates table that are organized into a natural hierarchy (with levels Year, Month and Date). To facilitate this, the invoice table is related to three different dimension tables: Stores, Customers and Dates. Each of the dimension tables has a primary key related to corresponding foreign keys in the fact table. The LineItem table also numeric measures and is related to the Products table, also a dimension table.
Exhibit 2 – A Hybrid Star / Master-Detail Schema
This semantic model supports two levels of aggregation with respect to the Invoice and LineItem records. If I were to browse this model in an Excel Pivot Table and put all the stores on rows, I could aggregate the Invoice Amount and see the sum of all Invoice Amount values for each store
<< need pivot table graphic here >>
Are There Rules for Tabular Model Design?
Oh, absolutely. Tabular SSAS and PowerPivot allow you to work with data is a variety of formats – structured & unstructured, dimensional & normalized. You have a lot of flexibility but there are rules that govern the behavior and characteristics of data. If you don’t follow the rules, your data may not meet your requirements in the most cost-effective way.
This reminds me of an experience when I started high school.
Rule #1: Model the data source
Rule #2: Cleanse data at the source
Tabular Model Design Checklist
What’s the Difference Between Calculated Columns & Measures?
What are the Naming Conventions for Tabular Model Objects?
What’s the Difference Between PowerPivot and Tabular Models?
How to Promote a Business-created PowerPivot Model to an IT-managed SSAS Tabular Model
Getting Started with DAX Calculations
DAX: Essential Concepts
DAX: Some of the Most Useful Functions
DAX: Some of the Most Interesting Functions
Using DAX to Solve real-World Business Scenarios
Do I Write MDX or DAX Queries to Report on Tabular Data?
Can I Use Reporting Services with Tabular & PowerPivot Models?
Do We Need to Have SharePoint to Use Tabular Models?
What Do You Teach Non-technical Business Users About PowerPivot and Tabular Models?
What’s the Best IT Tool for Reporting on Tabular Models?
What’s the Best Business User Tool for Browsing & Analyzing Business Data with Tabular Models?
Survival Tips for Using the Tabular Model Design Environment
How Do You Design a Tabular Model for a Large Volume of Data?
How Do You Secure a Tabular Model?
How to Deploy and Manage a Tabular Model SSAS Database
Tabular Model Common Errors and Remedies
Tabular Model, Workspace and Database Recovery Techniques
Scripting Tabular Model Measures
Simplifying and Automating Tabular Model Design Tasks
Tuning and Optimizing a Tabular Model
How do you tune a tabular model? You don’t.
You can prevent performance and usability problems through proper design.
We can now use Visual Studio 2012 to create and edit SQL Server BI projects! Woo Hoo.
In the past, if you wanted to develop and maintain BI projects in the same development environment as other Visual Studio projects, you had to install multiple versions of Visual Studio. This may no longer be necessary if you install the new Business Intelligence for Visual Studio 2012 (SSDT BI). Continue reading →
This is a very significant day in the progression of the Microsoft Business Intelligence platform. Earlier this year, Power View was released with SQL Server 2012; a remarkable step forward in data visualization. It wasn’t all bad news that Power View only worked with the new tabular semantic models and PowerPivot worksheets published in SharePoint but it did limit our options. We’ve quietly lived with the fact that this great new reporting tool couldn’t be used with Microsoft’s flagship analytical data (OLAP) engine without building new semantic models. Well, now it does, and that’s very good news.
Literally, minutes after I began posting my running notes from the keynote presentations and the first session I attended, I received a request to fill a last minute opening on the schedule and prepare a second session. I’m working on preparing a new version of “Visual Report Design – Bringing Sexy Back”. I will be presenting that session tomorrow. Continue reading →
I had a really good experience presenting a session at the SQL Saturday BI Edition in Dallas this weekend about Power View and PowerPivot. I’ve posted the presentation online and it’s ready for viewing here: Continue reading →
I really enjoy this topic. One of the reasons, I suppose, it’s such a hot topic is that a lot of people believe that there is a perfect ad hoc reporting solution out there, just barely beyond the current set of capabilities in all of the available tools. Continue reading →
The SQL Server 2012 upgrade white paper is now available from Microsoft. I’m proud to have made a contribution and to work with these industry leaders. This extensive 454 page guide was published and edited by Microsoft and written by mentors from SolidQ. It is a valuable reference for administrators, developers and business intelligence professionals planning to upgrade solutions to the latest version of SQL Server.
Thanks and congratulations to the SolidQ authors: Ron Talmage, Nigel Sammy , Allan Hirt, Herbert Albert, Antonio Soto, Danilo Dominici, Régis Baccaro, Milos Radivojevic, Jesús Gil, Dejan Sarka, Johan Åhlén, Simran Jindal, Paul Turley, Craig Utley, Larry Barnes, Pablo Ahumada
It’s exciting to see all the pieces of SQL Server 2012 coming together. Many new and compelling features have been in development for several years. Today we finally get to see which features have made the cut and will ship in the released product in a few months. Release Candidate 0 (RC0) can be downloaded here.
I’ve been watching this product for more than two years since the product teams gave MVPs and selected partners our first look at the SQL Server "Denali" software development review back in October, 2009. Since then, we’ve been meeting with the developers, leaders and program managers in review discussions and insider first-look sessions every few months. This release will be bigger and more capable than any before.
Of course, I’m paying attention to the revolutionary business intelligence features; SSRS alerting, BISM tabular models, and the crème de la crème: Power View. Take a look at this post on Robert Bruckner’s blog for a comprehensive demonstration of the Power View features added since CTP3.
Image courtesy of Robert Bruckner
All of these capabilities will be covered extensively on the next release of our Wrox Press Professional series book on Reporting Services.