Saturday, August 30, 2008

On the streaming SQL evolving standard

Kudos to our colleagues from Oracle and Streambase for their presentation in the industrial section of VLDB 2008 -
Towards a Streaming SQL Standard
Stan Zdonik (Streambase,Inc.), Namit Jain (Oracle), Shailendra Mishra (Oracle), Anand Srinivasan (Oracle), Johannes Gehrke (Cornell University, USA), Jennifer Widom (Stanford University), Hari Balakrishnan (Streambase,Inc.), Mitch Cherniack (Streambase,Inc.), Ugur Cetintemel (Streambase,Inc.), Richard Tibbetts (Streambase,Inc.).
Unlike last year, I have not participated in VLDB this year, though I would love to visit New Zealand when an opportunity arrives. VLDB is certainly a respectable conference, and the list of authors include some respectable members of the database research community. Mark Palmer also blogs about it, under the title: towards a CEP standard.
A few comments about it:
  • I think that this work is important, currently there are multiple variations of SQL extensions for various event processing purposes, and it will be easier if there will be consolidated.
  • There is a mention of "event based" vs. "set based" views. Looking at patterns that are detected, there are indeed patterns that are best approached in "event based" view, meaning that when each individual event arrives, there is an evaluation whether a pattern has been completed; "set oriented" is more convenient when the pattern is on set operations -- for example: looking if the average value of some attribute for all events that belong to certain context, is more than some threshold. Example of "event based" pattern is: looking for a sequence of two events (customer-complained, delivery-arrived), example of "set based" is: average of all delivery-actual-times in a certain shift is more than 30 minutes, where the delivery-summary is a derived event derived from: order-made and delivery-arrived).
  • Retrospective pattern - i.e. patterns on historical events are "set oriented" by nature, but as shown there are cases in which the set-oriented thinking is also applicable to running events (this, of course, can be emulated by "event based" pattern).
  • SQL extensions, of course, cover only part of the languages that exist in the event processing universe, and those who don't believe in the SQL region, will probably not convert to be believers if streaming SQL standard will be approved; I have written in the past about the Babylon tower and did not change my opinion since then -- I view SQL (with all of its extensions) a natural way to express queries about "states", but not about "collection of transitions", and think that there is a more natural way to think about it. The EPDL work we are doing is a step towards it, however, the idea is to use it (at least initially) as a meta-language, where the Streaming SQL may be one of its major implementations - I'll provide more information about the EPDL project later this year.
  • Another comment: while the language standard is certainly the most challenging, there are also other standards that need to be discussed in the area of inter-operability, event formats, modeling and more. In the EPTS symposium coming next month - we'll dedicate some of the time to standards, starting with a keynote address of a standard expert about the impact of standards on industries, and then there will be a panel with various participants to discuss these issues.

1 comment:

Mark Palmer said...

Hi Opher - Thanks, as usual, for your balanced and pragamatic take on industry issues. Richard Tibbets, co-author of the paper, posted some perspectives on the paper tonight on the StreamBase blog .

Hope they help advance the standardization discourse!