Thursday, April 24, 2008

On the science and engineering of event processing


This is holiday week here, and yesterday I have driven about 2 hours south to the Weizmann Institute, a research institute that has a graduate school in some scientific disciplines - among them computer science, and is considered a great place to researchers that are good enough to be accepted, and are satisfied with academic salaries... Anyway, the Weizamnn institute hosted yesterday a "science festival" for children, above in the picture you can see the main idea - showing scientific principles through games. Since there have been several sites within the institute, the organizers provided airconditioned busses (it was also extremely hot day), however, when we arrived, there has been big pressure in the entracne station, and though there were 4 busses waiting, they have loaded passengers in a sequential way -- all waited until one finished loading passengers, and people wondered why they don't load passengers in parallel - it seems that sometimes engineering is needed to agument science.... Talking about science, there is one country that asks you to define your profession when you are filling the "landing card" in the aircraft before landing, this is the United Kingdom, and I always fill the form by writing in my profession as - scientist, this is a matter of self-identity, but more than that, it is also a way of life - risking generalizations I would say that engineers think in induction, while scientists think in deduction. In the NGITS 1993 conference that we held in Israel, in one of the discussions, John Mylopoulos said : "the distinction between the Artificial Intelligence and Database disciplines is that AI is science, while DB is engineering". Of course, database guys did not like it.
Well - I also wanted to tie the science / engineering issue to "event processing" - this area, as typically done in areas, while have some science origins, the first generation is the engineering era - different vendors came with implementations, that attempted to solve various problems, and the thinking is very much centric to the product one is trying to sell -- thus, if a customer's requirement is not easy to implement, the typical reaction is to do ad-hoc hacking around it, I know from personal experience, been there a couple of times, with different products. Engineering solutions are inductive, sometimes based on induction with N = 1, as a basis.
The engineering approach is typically the first wave -- I often like to use the analog of databaes in the 1960-ies.
However, maturing discipline, also needs science - which is looking beyond (maybe behind) the enginnering -- getting back to the fundementals and come with a model (like the relational model in databases -- but not really extension of the relational model, whose purpose is much different). Getting the science part will be a vital part of the discipline maturing - however, this is a longer term effort, the 2nd generation of event processing products will be more incremental on top of the first one - and still engineering oriented. More about the science of event processing - in later posts.

Monday, April 21, 2008

On Event Clouds

Marc Adler in a couple of his blog postings wondered about support of event clouds in the product he chose, and at the end has settled in the opinion of the vendor (Mark Tsimelzon from Coral8) who claims that "cloud" is an abstract term, and in reality we are facing multiple streams that may or may not be ordered. The response comes from Greg-the-architect who is in "everybody are confused" mode recently. Greg-the-architect claims that vendors have sinned in disinformation towards their customer to hide their inabilities to cope with hidden causal relations.

So - what can I contribute to that party ?


First - let's look again at the defintion of event cloud in the glossary:


Event cloud: a partially ordered set of events (poset), either bounded or unbounded, where the partial orderings are imposed by the causal, timing and other relationships between the events.


Clouds became a fashionable term, we hear a lot about cloud computing in the recent year, that we all feel like flying in various clouds.


What about the clouds/streams debate ? -- one of the differences that are stated is that a cloud is a poset (partially ordered set) while a stream is totally ordered. I agree that this terms come from two different origins, the question is if indeed a cloud can be supported by multiple streams, while people focus the discussion on whether streams are always totally ordered or can also support non-ordered set of events - this is not really an interesting distinction. I agree here with Mark Tsimelzon that a stream can also be un-ordered, this is up to implementation. If one wants to make a distinction between "streams" being ordered and other things that can be unordered, I propose the term "pipes" - where ordered pipe is a stream. But the ordered/unordered does not make the main difference. Reading the cloud definition again, it is the notion of cuasality that is important for having a cloud. The "partial ordering" in the cloud is a result of causality relations between events. I have discussed in a past posting the notion of causality, support in causality (including pre-determined causality that may be result of mining, or inference system) is the enabler for the support of clouds (i.e. the partial order vs. no order).


Cloud is indeed the collection of events that an enterprise is faced with, and this cloud may be implemented by a collection of pipes (or streams, if you wish) and support in causality relation.


We can also look at a (small) cloud, which is the collection of all events that a single EPA (Event Processing Agent) is facing as an input - and this is just a subset of the big "Cloud" - with its own pipes and causality relations.


Now - to the most important question - besides the game in terminology, is it important to make these distinctions?


As stated before, the world of event processing is not monolithic, there are some applications which need total order, while other applications need partial order, and other applications don't care about the notion of order at all. Causality relations are required by some applications, either if the pre-defined relations between the events play a role in the event processing, or if there is a need to trace back the lineage of a certain event / action. For other applications it may be just an unnecessary overhead. So my (2 cents worth of) advice to the people who are looking at CEP products - is to look at their requirements and determine if they need causality, and partial ordered set. It may be that the support of totally ordered stream is totally sufficient for their applications, if it is not - they should look for if and how causality is implemented. I hope that I have not confused you even more... More - later.



Sunday, April 20, 2008

On Event Pattern Semantics


Today is Passover, while I am far from being religious, there are several traditions we keep, one of them is to have a family dinner in Passover-eve, and reading (at least part of) the Haggadah, so I've looked at the internet to find some fancy Haggadah in English, and here is the result.

The call for EPTS founding members
is also progressing - by now more than 20 compnies either signed or indicated that they are in internal approval process, and intend to sign as EPTS members, in addition to about 20 individual members. We excpect this number to grow towards the deadline, and call anybody who has not joined and wish to contribute to the emerging EP community to join.

Moving to today's topic: Tom Puzak has posted on the CEP interest group a message about nine features the CEP engine should have. This discussion is useful, since there is no agreed upon "CEP manifesto", a definition what are the functions that should be supported by "CEP engines", and we are going to need one, sooner or later.

Since I am working on a tutorial for the DEBS conference which will talk about event pattern semantics as a major theme, here is a sneak preview about the type of semantic decisions that are needed, this is in addition to the semantics of the specific pattern (conjunction, disjunction, absernce, sequence...).

1. In which context this particular pattern is relevant. Context can be temporal (within working hours, 1 hour from the power break), spatial (within the headquarter building), semantic (only for platinum customers or state-oriented ( while it is rainining) - or combinations of all the various dimensions (I have written before about the notion of context).

2. Is an event participate in the same pattern in a single context or in multiple contexts ? this can happen when there there is overlap among contexts.

3. Is the action / notification about the fact that the pattern has been detected should execute immediately or in a deferred mode (example: at the end of the temporal context).

4. Within a context - is the pattern existential (i.e. we are looking for a single pattern per context) or can there be multiple instances >

5. Using quantifiers on synonims - Taking the example from Tom Puzak's message: we are looking for a message of A, B within 60 secondes (temporal context), and the actual flowing events are: A1 A2 B1 A3 B2 B3 - we may want the cartesian product, but typically this is not what we really wish - thus, we can use quantifiers to select among the A and B events. Quantifiers can be according to order - firts, last, each or according to content of attributes (or both).

6. Can a single event particpate in more than one pattern within the same context ?

7. Should newer synonim kill older sysnonims ?

This are just titles - and in the DEBS tutorial I'll explain each with examples and show how they impact the pattern detection behavior.

Bottom line -- tune up the semantics of a pattern consists of several decisions, if these decisions are not supported in the language, and the application does not conform with the default, results in hacking around... more - later.

Tuesday, April 15, 2008

On Event Processing Agents


There are different type of agents -- double agents, as seen above which is a series of sweets, insuracne agents, travel agents, and some computerized agents - in my past I have dealt with mobile agents, and there are the intelligent agents in AI, and our own event processing agents (EPA).
David Luckham has written an amusing piece that follows Paul Vincent's "CEP and Agents" in TIBCO's Blog. The amusing part is that Paul has written about AI agents, which uses somewhat different terminology then the event processing terminology, and putting it in a "CEP Blog" is somewhat confusing. I am a "product" of the databases community, and have done some work that was on the AI border in the past, alas, the AI folks are using different terminology to talk about the same thing, and I thought at that time that they are doing it on purpose to confuse me. So, while there are many types of agents, I'll concentrate on the concept of "event processing agent" that has been coined by David Luckham. I like this term and adopted it in the following way: EPA (Event Processing Agents) is a software artifact that receives an event cloud or stream or collection of events or a single event (depends on the agent type and capabilities), does some computation on these event, and produce one or more events as an output. That's it. EPA is also a node in the EPN (Event Processing Network). There are different types of EPAs :
  • "simple event processing" EPAs - filter and routing,
  • "mediated event processing" EPAs - enrichment, transformation, validation
  • "Complex event processing" EPAs - pattern detection
  • "intelligent event processing" EPAs - prediction, decisions...

The common denominator: each of them receives events as input, emits events as output and does a single type of function.

I find this type of abstraction both very easy to explain people how EP systems work, and also basis for architecture. The EPN routing can be done by standard middleware, or in a stand-alone mode. Other terminology issues raised by David Luckham is the relationships to the "actor model" and to "engines".

The actor model is a model that helps reasoning about concurrency, while agents in AI are autonomous goal-driven artifacts. These are orthogonal terms, of course. In the context of EPA - when looking at EPAs as an executable network, we can look at each EPA as an actor and apply actor models.

Last but not least -- relationships of EPAs to engines -- an EPA is a software artifcat, it can be an instance of an engine, it can be some software that contains an engine, and it can be hard-coded program, as long as it complies with the EPA definition. In a future world, with inter-operability (and perhaps also language) standards, we'll be able to run (and maybe to self-select) multiple engines for the same EPN, residing in different EPAs.

More about EPA types -- later.

Monday, April 14, 2008

On the spectrum of event processing applications


Back in my office and reading some of the EP Blogs. In the picture above, somebody has tried to draw the spectrum of Blogs (you may want to link to the original in order to see better). One of the last Blogs has dealt again with "simple and complex event processing" claiming that everything done so far in this area is indeed "simple event processing", while real "complex event processing" should support uncertainty and backward chaining. Several posts on this area has been posted by Greg Reemler. We don't have "CEP manifesto" that makes an official definition what is CEP and what is not, and I am not sure that this will be very useful, as it will confuse the customers even more. There is a spectrum of applications that have spectrum of functional and non-functional requirements. On my scientist hat, I am partner to a research work about "uncertainty in event processing" together with Avi Gal and our co-supervised Ph.D. student Segev Wasserkrug However, while there there are applications that require uncertainty reasoning in event processing, there are many others that don't. As I have written several times before, I am not a big fan of the term "complex event processing", due to its ambiguity - some people mean complex processing of events and some mean processing of complex (derived from more than 1) events, some people actually mean complex processing of complex events. I think that we should continue to classify applicatiosn and match the right functional and non-functional requirements to the right applications, but we'll never get to a single functional or non-functional benchmark that will cover all applications in this area. It is better to attract the energy to areas that can help most customers to deal with the problems for which they would like to apply event processing. See my previous posting on : killer applications of EP
More - later.

Saturday, April 12, 2008

On Semantic Event Processing


This nice castle seems European, but it is really located in Tarrytown, NY. I am not living in castles, but in a hotel in the small town of Tarrytown that has many hotels. After a busy week - starting in Pasadena, visiting Mani Chandy in Cal Tech, spending three days in IBM's Impact 2008 and visit the IBM research center in Hawthorne yesterday, and before going home this evening, I have some time for Blogging. Catching up with the recent Blogs I have found several ones that has gone semantic: Jack Rusher from Aleri, Marco from Rulecore, Paul Vincent from TIBCO all have written about "semantic CEP", with the idea to use domain dependent onthologies. The idea to use semantic relations between entities as part of event processing is something we played with a few years ago (during my previous tenure in IBM research) and I believe it has a big future, certainly when the trend is to move parts of the development to the business user, this can certainly help. So - this is certainly in our road map. Will write more on this topic later.

Thursday, April 10, 2008

On Impact 2008




This is the MGM Grand Hotel and Casino in Las Vegas, the place that hosts this week the IMPACT 2008, a very big customer conference of IBM Websphere (more than 6000 persons).


The hotel is huge, the logistics is very good, and the program is very diversified. James Taylor described the two first days in his Blog . One of the main themes in this conference was the issue of business event processing, IBM's product Websphere Business Events, was announced last week. Today I have given a talk on the concept of "business event processing" together with my colleague Steve Lyons (a former Aptsoft guy), thus I have done my share. This was well covered by the press. The main message behind it is enabling the business user to control the behavior (e.g. define patterns) of EP applications, this concept won a lot of interest from the IBM clients that were present.


On another matter - congratulation to Mark Palmer for his new position in Streambase, Mark is definitly one of the notable figrue in the EP area.