Event Processing Thinking: VLDB

Showing posts with label VLDB. Show all posts

Saturday, September 25, 2010

On the duration of an event

I have neglected the blogging for a while, returned from my trip in Asia, planning for my next business trip to USA (I am travelling too much, I hope for a non-travelling period after that, but one can never know), and also took some days off for the Succot holiday. Yesterday I traveled with most of my family to Tel-Aviv, to "Beit Hatfutsot", which stands for "Diaspora house", and documents the life of Jewish community over the history in many countries. Here is an artifact from the exhibition:

There was also an exhibition of Andy Worhal painting notable Jewish persons, one of the pictures is of Golda Meir, the only Israeli Woman prime-minister (time for the second one?)

The VLDB conference also uploaded pictures from the conference, so here are two pictures - one from my tutorial, and the second showing me in the first raw (it was not really the first raw, but it was the first captured by the camera) listening to the keynote talk:

While I have been away there were some Blog posts by Paul Vincent that worth focusing upon, I have already commented briefly to this one, but want to have longer reaction about the issue of event duration that was raised by Paul.

In most of the models events are considered as instantaneous, occurring within a single time point, the temporal database glossary from 1998 puts "instantaneous" as part of the definition of event, the rationale is of looking on event as transition between two states, and transition in most models takes zero time, A few years ago when we started the discussions about terms, I've pointed out the temporal glossary as a source for event definition, and David Luckham issued a strong objection to that definition, claiming that no event is really instantaneous, even simple events like the "aircraft is landing" takes more than zero time, while events that are composed of other events - "complex events" - like the 1929 crisis (now we can talk about the 2008 crisis) is compose of many events and occurred over an interval.

This is, of course, true, yet it is more convenient from computational point of view to deal with discrete time points than in intervals, furthermore, some systems have detection time semantics, looking at the time-stamp in which the event entered the system, rather than the time it occurred, this is the reason that we find time point semantics in most systems.

We can look at the following cases:

The event really occurs within a time point, e.g. time series of sensor measurements, or stock quotes. There is indeed an interval among two successive events, but this relates to the state between the intervals and not the events themselves.
The event occurs within an interval, but the granularity of our time computation is bigger than the interval, thus we can approximate the interval to a time point. Example: the granularity we are interested is an hour, thus even if an event occurs within several minutes, we can still approximate it to the closest hour.
The event occurs within an interval, and it is important to process it with an interval semantics, since we would like to see it relationship to another time interval (e.g. temporal context).
The event occurs in an unknown time-point that is bounded by an interval, there is some probability (e.g. uniform distribution) that it happened in any point of time within the interval. In VLDB there has been a paper by Yaneli Diao and her students entitled: Recognizing Patterns in Streams with Imprecise Timestamps Note that in this paper there are also some references to interval based semantics (of type 3).
Derived events are another type of events whose temporal semantics may be tuned. For example: the derived event "frustrated customer" is being derived when a customer approaches a call center the third time about the same topic, the question is whether the customer is frustrated only when approaches the third time, or the customer is frustrated over all the time since the frustrating event occurred until it is fixed. Furthermore, derived event may also indicate an event that will happen in a future interval. I'll write more about this issue in the future.

Bottom line: the event processing systems of the next generation should support both time point and interval semantics along with uncertainties (Paul also had posting about "fuzzy patterns" on which I'll write in the future).

Wednesday, September 15, 2010

On VLDB 2010 and the event processing tutorial

Still in Singapore, after the vacation, this is VLDB time, VLDB is one of the largest databases conferences, however databases is a large heterogeneous area, and I wonder if there is a single person who can understand all talks. The keynote talk by Divesh Saristava from AT&T Research talked about stream warehouses, or event stores in my language. There were also some event processing related demos, and a stream research track, with one interesting talk that compared the semantics of Coral8 CCL language to this of Streambase and got to the (not surprising) conclusion that their semantics is different and similar queries would yield different results, then it tried to come up with a framework to generalize the two types of semantics, and they extend it to other languages. I think that this is in line of the work we are doing on common model, and will follow up with them (they are from ETH Zurich) about collaboration on that one.

Today I have delivered a tutorial under the title "event processing - past, present and future", much of it follows the EPIA book. Since this is a database conference I opened in showing various opinions about the relations between event processing and data stream management, which is the name used in the database community, the various opinions are:

They are aliases -- a stream is just a collection of events, likewise, an event is just a member in a stream, and the functionality is the same.
Stream management is a subset of event processing -- there are different ways to do event processing, streams is one of them
Event processing is a subset of stream management -- event streams is just one type of stream, but there are voice stream, video stream and more streams
Event processing and stream management are distinct and there is no overlapping between them.

As I have heard all four opinions about it, I'll let you judge which is the right one. Hint: option 4 is totally false, there is some truth in options 1-3, depending on the viewpoint.

Anyway - the tutorial has been uploaded to slideshare, and you can view it there. Enjoy.

Tomorrow is my last planned day in Singapore, and I'll write more about this very impressive country soon.

Sunday, September 5, 2010

Going east

Packing to go abroad again -- this time to the east, the destination is Singapore, where I am participating in VLDB 2010 and giving a tutorial on event processing - past, present and future.
But first -- I am flying later today to Hong Kong, as a first stop in the east.

Thursday, July 1, 2010

On VLDB 2010 -- events and streams related papers

After a few years of missing VLDB, I plan this year to participate in the VLDB conference in Singapore (which will be an opportunity to visit Singapore, I have never been there). I have a tutorial accepted entitled: Event Processing - past, present, future. VLDB is one of the major research conferences of the database community (my original home community).

The list of accepted papers is now on the website -- looking at it there are some papers whose title include either events or streams:

Complex Event Detection at Wire Speed with FPGAs

High-Performance Dynamic Pattern Matching over Disordered Streams

Achieving High Output Quality under Limited Resources through Structure-based Spilling in XML Streams

SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems

Recognizing Patterns in Streams with Imprecise Timestamps

On Dense Pattern Mining in Graph Streams

Database-support for Continuous Prediction Queries over Streaming Data

Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations

From a Stream of Relational Queries to Distributed Stream Processing

iFlow: An Approach for Fast and Reliable Internet-Scale Stream Processing Utilizing Detouring and Replication

And some demos:

Active Complex Event Processing: Applications in RealTime Health Care

Efficient Event Processing through Reconfigurable Hardware for Algorithmic Trading

Geospatial Stream Query Processing using Microsoft SQL Server StreamInsight

As you can notice - in the database community the term "streams" is more common than the term "events", and I'll go back to the discussion of streams vs. events soon.

IBM will have a substantial presence in VLDB with 9 research papers, 5 industrial papers, 3 demos and 1 tutorial.

More - later.

Sunday, December 16, 2007

On Event Stream Processing

This is in part a response to my friend and colleague Claudi for his recent post in the CEP Interest Group

There are many types of streams in the universe - the Gulf stream that affects the weather, a water stream who provide pastoral nature sight, and an audio stream, to name just a few.

In the event processing area the name "stream" appears first in the database research community, as a research project in Stanford. Interestingly the name "event" is never mentioned, and the term "data stream" is the central concept. The first one who made a blend of the "stream" concept and "event processing" concept is my friend Mark Palmer from Progress who did not like the "complex" word and thought that the term "event stream processing" will be more accepted, Mark certainly did not mean to talk about data streams in the academic sense. In the discussion session of the term event stream processing in Wikipedia

Mark writes:

ESP SHOULD GO AWAY AND I HELPED CREATE THE PROBLEM!!!

You are completely correct in my opinion; these should be merged. And I say this from the perpsective of the software vendor that popularized and caused the confusion in the first place. I'm the general manager of the Progress Apama software division and we coined the term "event stream processing" in April of 2005 when we acquired Apama for $30M - we didn't like the term "complex event processing" and decided to make up another term. Yes, stream processing, and data stream processing have been used as terms in academia, but we made up the term ESP as a synonym for CEP. Some on this list will argue that there are subtle, technical differences, but, being in the center of this quagmire of a debate, I think they should be merged, and that ESP should basically go away!
- Mark Palmer, General Manager, Progress Apama, mpalmer@PROGRESS.COM

Another indication of the blurring between ESP and CEP is that the vendor descendants of the academic projects - Streambase and Coral8 now positioned themselves as "complex event processing" vendors. Both have "complex event processing" all over their homepages, Streambase labels its product as - "complex event processing platforms" (well -- we'll discuss platforms in another posting); Coral8 has a portal which is offers self-service CEP. Aleri which also provides SQL oriented API, also uses the term CEP, although they are also using the term "Aleri streaming platform" as the way to do CEP. Thus, while the term "stream processing" is very much alive in the academic database community - see the VLDB 2007 program, for example, it seems that the market has already voted on the unification of these two terms, behind the CEP term.

Why did it happen ? - in the beginning we have seen some 2 x 2 matrices, showing that CEP is - complex and l0w-performance, while ESP is simple and high-performance. It does not seem that any vendor thought it is positioned in one of the extremes, since most applications are somewhat in the middle, and confusing the customers with two names from vendors who have competed on roughly the same applications and customers did not help any of the vendors, thus, the market wisely moved to one name (BTW - this name could have also been "event stream processing" as Progress suggested, but for some reason the term CEP has caught, and some potential customers are still nervous about the word "complex", but it got the traction nevertheless).

This has been until now discussion on branding, and did not answer the questions - whether there are real differences between ESP and CEP ? in some cases, people indicated theoretical differences, the most notable is: stream processing is ordered, while CEP is partially-ordered.

It may be true, though, I was never convinced that "total order" is an inherent property of stream, it is just the way it was happened to be defined in the academic projects, but I think that the more important difference is - whether we start from set-oriented thinking (stream processing) or from individual-event-oriented thinking (event processing), and there are pros and cons of thinking in each of them, but the bottom line is that real applications may be mixed, they may have ordered events from the same type (e.g. when we are looking at trends in time-series), or it can have unordered events of the same type (e.g. when we are looking at information from various sensors whose original timestamps may not be synchronized), in fact, it can have both in the same application. It is true that the space of CEP applications is not monolithic, but there are other classifications that are more useful then the classification of partial vs. ordered set, thus, for practical purposes, let's assume that "stream processing" as defined by those who are looking for the theoretical differences indeed covers a subset of the space of functionality, however - this subset is not important enough to have separate products covering it, or even to mention it as a sub-class.

Last but not least -- an answer to Claudi on his claim that there is not really a CEP engine, since none of the current products know how to obtain general relations among events and calculate transitive closures.

My answer is that event relationship definitions do exist, but this is not the main point, the point is that one may claim that "there is not really a CEP engine that contains all the possible language features that one can think of", and this is true, the EP discipline is young, and I am sure that we just scratched the surface, and EP products will include many features that we event did not think of them today (otherwise it is an indication that the area has failed!), however, without talking about a certain feature, CEP engines do exist today, none is perfect, but probably sufficient for big majority of the existing applications today, so theoretical perfection may not be the criterion to call something "CEP engine", we'll have to settle in "sufficient conditions"

I'll relate to relations among events, including transitive closure in another postings - but the way they exist or don't exist does not really matter for the question. Long posting today - so this is all for now.

Sunday, September 30, 2007

Event Processing - a footnote to databases ?

More in the spirit of the VLDB conference I've attended last week, there is a conception in the database community that event processing is really part of database technology, and that the functionality of event processing can be obtained using regular databases by inserting the events into the database, and asking "continuous queries" in the database. According to this outlook, the only reason that customers want to have engines outside the database engine is when some performance properties - typically - throughput and latency cannot be satisfied by database engines, but this can be handled by some tricks - like in-memory databases.

This reminds me that in the first conference I have ever organized: NGITS 1993 (we did not have conference webpages at those days) there was a discussion about the relations between Artificial Intelligence and Databases that followed the keynote address of John Mylopoulos, whom I always considered as one of the most visionary people I've ever met, John said something like this "the difference between AI and database discipline is that AI is a scientific discipline and database is an engineering discipline, which deals of efficiency issues", he, of course, made the database people who were present, quite angry, however, now that I am looking from the outside (at that time I have looked from the inside) on the way that database people think, I realize that he was, as usual, right.

While, high performance is one of the reasons that customers turn to COTS in this area, this is only the secondary reason, the main reason is that event processing software is being used is the level of abstraction they provide, and consequently the improvement in ROI. It seems also that the main competition between different products will be more in the ROI (ease of use) front, then in the performance front.

Event processing is different in the required functionality from database processing, the fact that database processing processes a state ("snapshot"), and event processing processes a set of transitions ("event cloud") impose different thinking, and hence different abstractions. Trying to introduce event pattern detection as extension to database processing (as we have seen in the EPTS meeting, the proposal being prepared now) have several attributes - simplicity is not part of them, and thus it totally misses the point of "ease of use", only to satisfy the assertion that event processing should be done within database processing. While these are nice academic attempts, and probably researchers will be able to write a lot of papers about the pattern extensions to SQL, I don't believe that they will catch in reality.

However - databases do have several roles in event processing, here are a few of them:

(1). Databases will be used to store events that should be used for retrospective processing. These database require to support temporal (or even spatio-temporal) characteristics; the database products don't provide yet good support of this area, and this deserves a separate blog.

(2). Databases (or in-memory databases) Will be used to store intermediate states for recoverability.

(3). Databases will be used to enrich events for processing (mainly reference data, but sometimes transaction data).
(4). Data warehouses will be used for embedded analytics.

I think that the database community should concentrate in enhancing database technology to support these functions in event processing -- e.g. temporal database support - both in abstraction level and efficiency in implementation, instead of insist on extend SQL in unnatural way.

I still need to discuss in more depth several topics like: temporal databases, retrospective processing, and alternative approach for SQL patterns, but will leave it for later.

Tuesday, September 25, 2007

VLDB - and computer science 2.0

Hello from Vienna. Today the VLDB conference started with an interesting talk of Werner Vogels the CTO of Amazon, whose blog is entitled: AllThingsDistributed, and the framework they have built (and that other retailers use), he referred to Amazon as a technology company that happens to do retail. I think that there many touch points of event processing technology with the Amazon model, but did not find him to talk about it. I am Amazon customer for years, somewhere in the late nineties I remember ordering a bunch of books from Amazon, and not receiving them in the designated time, I have sent an Email to Amazon asking about it, the answer amazed me: we don't know what happened, we are sending the order again. A day later I received the original shipment, and sent another Email to Amazon - I got the original shipment, you may stop the substitute one, the answer I got was even more amazing: We cannot trace an order once it was issued, keep the books with our compliments". It seems that now they know how to track their order.

Other two keynote speakers have been Mike Stonebraker and Michael Brodie, two old-timers, who have been around for a while. Stonebraker gave some variation about his repeating message: "One size fits all: A concept whose time has come and gone" which talk about the elephants (Oracle, Microsoft, IBM) DBMS product as an obsolete concept, and shows that for various types of functionality (including "stream processing", of course), a specialized engine is better than a monolithic one, and in fact, the monolithic engines excel at nothing and should be eliminated. The idea that one size does not fit all is probably true, in databases (and also in event processing), one thing to note (and this follows also Mike's talk in EDAPS yesterday), he looks on everything in a single criterion -- speed (latency ?), I think that reality is a little bit more complex.

Mike Brodie started with a nice video with music that was getting louder showing facts about quantities -- size of various databases, internet webpages, use of search engines etc -- and trend (the time everything is duplicated is getting shorter and shorter), he also talked briefly about SOA, and about the need to take a new approach that is application-based, semantics-based, and create Computer Science 2.0 -- however I did not understand what new science is required, and in response to a question he answered --- I presented the problems, leaving the solutions to you. I am not sure that I have understood the problem (except for engineering issues), but let's wait to see if computer science 2.0 will arrive (I think that the term 2.0 is starting to be over-hyped, there were some attempts at SOA 2.0 as combination of SOA and EDA, but I am not sure it caught as a buzzword). Anyway -- whatever Computer Science 2.0 is -- event processing should be one of its fundementals. More later.

Event Processing Thinking