Last night’s Oregon SQL user group meeting was very enlightening. Mark Simms from the SQL Server Customer Advisory Team (SQLCAT) came down from Redmond to talk about StreamInsight, a substantial new feature in SQL Server 2008 R2 that allows data to be analyzed and aggregated in-memory as it flows through the server. I’ve read a little about this new capability and have seem some short demos but Mark gave us a comprehensive look under the hood after walking the group through a few scenarios that really made this technology very relevant.
StreamInsight is part of the SQL Server product suite, which makes sense because it is a platform for working with and analyzing data. However, it has very little to do with traditional database storage and relational or stored analytical data and query paradigms. It looks like more of a programming tool than a database technology. StreamInsight sits on top of Dot Net and uses data adaptors to received and send streams of data that travel through memory – a little bit like SQL Server Integration Services (SSIS). It uses a scripting language called LINQ, which is actually a set of .NET extension libraries, typically used with the C# .NET programming language. The tenants if StreamInsight are events, windows and various operators such as aggregates. Like .NET object-oriented method coding, nested commands and operators can be built into named code constructs called macros but this code doesn’t have to be pre-compiled into assemblies. In fact, the programming and debugging experience feels a lot like writing and testing SQL using rich programming tools.
Mark did a great job of demonstrating some very meaningful scenarios. By itself, StreamInsight has nothing to do with data storage. The data flowing through an application can originate as live transactions, streaming video, messages or something like chats and traffic on a social network. Most applications are used to analyze data in-flight without the latency and overhead of disk I/O; but all or part of this data could also be read from persistent storage or written to a database after it’s been processed. It would normally run on a dedicated server that just pipes the data through for analysis and to control other components of a larger solution.
He gave examples of reading flow information from oil pipelines, monitoring large-scale streaming media events like the Olympics or a championship game, active stock trading, and monitoring traffic patterns on a highway system. There’s little value in storing all these events but it’s critically important to monitor it in real-time to capture samples and states form time to time. I get the vision of StreamInsight now and I can envision a variety of useful applications.
One of my takeaways as a BI practitioner is this: In our practice, we are often asked to work with a consulting client to build business dashboards. It’s common to sit down with a business user who says something like “I would like to see a dashboard that shows me a timeline of historical KPIs, measures and dimensions” – that’s typical BI analysis that we would read from a data warehouse. …”then I’d like to see the continuum of that data as it relates to our daily production activity”. This is operational data, stored in a live, transactional system. The volume of data is low but and the data can change. It comes from system that stores records as they occur. “Finally, I want to see what’s going on in our live processes, how our production equipment is operating and what components are performing above or below capacity.” That’s not necessarily information they may need to keep around but it is certainly useful data during and shortly after thousands of events are taking place.
Click here to view the recording of Mark’s StreamInsight presentation The presentation begins about 26 minutes into the meeting.
Next month, March 9th, Buck Woody, Microsoft Senior Technology Specialist and renowned speaker will talk about the future of cloud computing. Don’t miss it! Buck is always a riot.
Hi,
I am Microsoft BI developer and I am in evaluation process of tool for complex event processing for one of my client.
I came across lot of tools from various background like WSo2, Jboss + drools, Nesper and Microsoft StreamInsight and
As I am from Microsoft SQL Server background I prefer the Microsoft tools only.
There is one concern before we finalize this tool is that limit of Streaminsight event processing capacity for per second.
It is 5000 per seconds for Standard version but we are looking for current 15000-20000 (more in coming years) events per
seconds. So my question is can we cluster multiple Streamlight server to add capacity to handale events per seocnds and
Required if yes can tell me the location to get this knowledge “how to cluster multiple Streaminsight server”.
Thanks and Regards
Abhijeet Barbate
abhijeetbarbate@live.com