Saturday, April 26, 2008

On Streams and Events

The picture above is taken from a UCLA project that deals with multimedia stream systems. While the term "data streams" and later "event streams" that deals with continuous queries over structured data, have been introduced in the last decade in the database research community (with spin-off to products), the term "streams" has more general and more traditional meaning - referring to multimedia streams - video, voice, news etc... - which by nature belong to the family of unstructured data. In previous posting I discussed some of problems around "event stream processing", and around classification of event processing technologies. However, in this posting, I would like to point out that "stream processing" in its more traditional meaning is an important complementary technology to event processing.

First - the result of stream processing is in detection that an event has happened. Examples are: detection of vehicle's registration plate in automatic toll roads (we have in Israel one of these roads, there are other roads like this in Canada and Sweden - and maybe in more places), where the event "vehicle with registration plate X entered the highway in entrance Y in time T". This can be further processed (after correlating to the exit event) for billing purposes, but can also serve for security and other applications. In this case, from the event processing architectural view, the "stream processing" is done in a producer application, which generates events that are processed in the event processing system.

Second - the result of an event processing system can be an input to a stream, example: a game is being presented to the players as a video stream. Decisions made by the electornic player (or by the human player) can be assisted by an event processing system. The result of the decision can be movement of a player to a certain direction, and this is fed back to the video stream. In that case, the video stream is being processed in a consumer application, which gets event as an input.

Of course, a producer can also be a consumer, especially in games which are of iterative nature, thus an application is communicating with an event processing system in both side.

Since much of the events that happen in the universe is sensed thtough various unstructured media, the area of creating events out of multimedia streams, and embedding events to control the behavior of multimedia streams, will be one of the future major directions for the future, we can see some of this already hapening.

Thursday, April 24, 2008

On the science and engineering of event processing

This is holiday week here, and yesterday I have driven about 2 hours south to the Weizmann Institute, a research institute that has a graduate school in some scientific disciplines - among them computer science, and is considered a great place to researchers that are good enough to be accepted, and are satisfied with academic salaries... Anyway, the Weizamnn institute hosted yesterday a "science festival" for children, above in the picture you can see the main idea - showing scientific principles through games. Since there have been several sites within the institute, the organizers provided airconditioned busses (it was also extremely hot day), however, when we arrived, there has been big pressure in the entracne station, and though there were 4 busses waiting, they have loaded passengers in a sequential way -- all waited until one finished loading passengers, and people wondered why they don't load passengers in parallel - it seems that sometimes engineering is needed to agument science.... Talking about science, there is one country that asks you to define your profession when you are filling the "landing card" in the aircraft before landing, this is the United Kingdom, and I always fill the form by writing in my profession as - scientist, this is a matter of self-identity, but more than that, it is also a way of life - risking generalizations I would say that engineers think in induction, while scientists think in deduction. In the NGITS 1993 conference that we held in Israel, in one of the discussions, John Mylopoulos said : "the distinction between the Artificial Intelligence and Database disciplines is that AI is science, while DB is engineering". Of course, database guys did not like it.
Well - I also wanted to tie the science / engineering issue to "event processing" - this area, as typically done in areas, while have some science origins, the first generation is the engineering era - different vendors came with implementations, that attempted to solve various problems, and the thinking is very much centric to the product one is trying to sell -- thus, if a customer's requirement is not easy to implement, the typical reaction is to do ad-hoc hacking around it, I know from personal experience, been there a couple of times, with different products. Engineering solutions are inductive, sometimes based on induction with N = 1, as a basis.
The engineering approach is typically the first wave -- I often like to use the analog of databaes in the 1960-ies.
However, maturing discipline, also needs science - which is looking beyond (maybe behind) the enginnering -- getting back to the fundementals and come with a model (like the relational model in databases -- but not really extension of the relational model, whose purpose is much different). Getting the science part will be a vital part of the discipline maturing - however, this is a longer term effort, the 2nd generation of event processing products will be more incremental on top of the first one - and still engineering oriented. More about the science of event processing - in later posts.

Monday, April 21, 2008

On Event Clouds

Marc Adler in a couple of his blog postings wondered about support of event clouds in the product he chose, and at the end has settled in the opinion of the vendor (Mark Tsimelzon from Coral8) who claims that "cloud" is an abstract term, and in reality we are facing multiple streams that may or may not be ordered. The response comes from Greg-the-architect who is in "everybody are confused" mode recently. Greg-the-architect claims that vendors have sinned in disinformation towards their customer to hide their inabilities to cope with hidden causal relations.

So - what can I contribute to that party ?

First - let's look again at the defintion of event cloud in the glossary:

Event cloud: a partially ordered set of events (poset), either bounded or unbounded, where the partial orderings are imposed by the causal, timing and other relationships between the events.

Clouds became a fashionable term, we hear a lot about cloud computing in the recent year, that we all feel like flying in various clouds.

What about the clouds/streams debate ? -- one of the differences that are stated is that a cloud is a poset (partially ordered set) while a stream is totally ordered. I agree that this terms come from two different origins, the question is if indeed a cloud can be supported by multiple streams, while people focus the discussion on whether streams are always totally ordered or can also support non-ordered set of events - this is not really an interesting distinction. I agree here with Mark Tsimelzon that a stream can also be un-ordered, this is up to implementation. If one wants to make a distinction between "streams" being ordered and other things that can be unordered, I propose the term "pipes" - where ordered pipe is a stream. But the ordered/unordered does not make the main difference. Reading the cloud definition again, it is the notion of cuasality that is important for having a cloud. The "partial ordering" in the cloud is a result of causality relations between events. I have discussed in a past posting the notion of causality, support in causality (including pre-determined causality that may be result of mining, or inference system) is the enabler for the support of clouds (i.e. the partial order vs. no order).

Cloud is indeed the collection of events that an enterprise is faced with, and this cloud may be implemented by a collection of pipes (or streams, if you wish) and support in causality relation.

We can also look at a (small) cloud, which is the collection of all events that a single EPA (Event Processing Agent) is facing as an input - and this is just a subset of the big "Cloud" - with its own pipes and causality relations.

Now - to the most important question - besides the game in terminology, is it important to make these distinctions?

As stated before, the world of event processing is not monolithic, there are some applications which need total order, while other applications need partial order, and other applications don't care about the notion of order at all. Causality relations are required by some applications, either if the pre-defined relations between the events play a role in the event processing, or if there is a need to trace back the lineage of a certain event / action. For other applications it may be just an unnecessary overhead. So my (2 cents worth of) advice to the people who are looking at CEP products - is to look at their requirements and determine if they need causality, and partial ordered set. It may be that the support of totally ordered stream is totally sufficient for their applications, if it is not - they should look for if and how causality is implemented. I hope that I have not confused you even more... More - later.

Sunday, April 20, 2008

On Event Pattern Semantics

Today is Passover, while I am far from being religious, there are several traditions we keep, one of them is to have a family dinner in Passover-eve, and reading (at least part of) the Haggadah, so I've looked at the internet to find some fancy Haggadah in English, and here is the result.

The call for EPTS founding members
is also progressing - by now more than 20 compnies either signed or indicated that they are in internal approval process, and intend to sign as EPTS members, in addition to about 20 individual members. We excpect this number to grow towards the deadline, and call anybody who has not joined and wish to contribute to the emerging EP community to join.

Moving to today's topic: Tom Puzak has posted on the CEP interest group a message about nine features the CEP engine should have. This discussion is useful, since there is no agreed upon "CEP manifesto", a definition what are the functions that should be supported by "CEP engines", and we are going to need one, sooner or later.

Since I am working on a tutorial for the DEBS conference which will talk about event pattern semantics as a major theme, here is a sneak preview about the type of semantic decisions that are needed, this is in addition to the semantics of the specific pattern (conjunction, disjunction, absernce, sequence...).

1. In which context this particular pattern is relevant. Context can be temporal (within working hours, 1 hour from the power break), spatial (within the headquarter building), semantic (only for platinum customers or state-oriented ( while it is rainining) - or combinations of all the various dimensions (I have written before about the notion of context).

2. Is an event participate in the same pattern in a single context or in multiple contexts ? this can happen when there there is overlap among contexts.

3. Is the action / notification about the fact that the pattern has been detected should execute immediately or in a deferred mode (example: at the end of the temporal context).

4. Within a context - is the pattern existential (i.e. we are looking for a single pattern per context) or can there be multiple instances >

5. Using quantifiers on synonims - Taking the example from Tom Puzak's message: we are looking for a message of A, B within 60 secondes (temporal context), and the actual flowing events are: A1 A2 B1 A3 B2 B3 - we may want the cartesian product, but typically this is not what we really wish - thus, we can use quantifiers to select among the A and B events. Quantifiers can be according to order - firts, last, each or according to content of attributes (or both).

6. Can a single event particpate in more than one pattern within the same context ?

7. Should newer synonim kill older sysnonims ?

This are just titles - and in the DEBS tutorial I'll explain each with examples and show how they impact the pattern detection behavior.

Bottom line -- tune up the semantics of a pattern consists of several decisions, if these decisions are not supported in the language, and the application does not conform with the default, results in hacking around... more - later.