Saturday, February 2, 2008

On Immutability of events

Israel is the land of milk and honey, but not the land of snow - snow is quite rare in Israel. This week the rare event of snow occurred in high mountains across the country - in the picture snow near one of the gates of the old city of Jerusalem. In the Carmel mountain ridge, where I live there was a little bit of snow, higher in the ridge, and some people went there and brought to the office bags full of snow... As I lived several years in the Philadelphia area in USA, I am personally less excited from snow - but it is a notable event here.

Today's topic relates to a question that Marco, fellow EP blogger, and the person behind rulecore, has asked about the previous posting, his question related to enrichment, but I'll extend it to the rest of the transformations : "event is something that is happened, and as such it is immutable, cannot be changed; Do transformations and enrichments break this model?".

The answer for this question is - no. Transformations do not break the model, and events are still immutable. According to the pure model - events cannot be altered or deleted, and when represented in an event store, it has to be an "append only" type of store.

Enrichment and transformations are, in fact, creation of new derived events, as a function of raw events. Thus, according to the pure model, transformed or enriched event is not the same event:
  • It has a different type of event - with different structure.
  • It has a different event-id.

An Enrichment example maybe - the event is: order, and it has an attribute that refers to customer. The enrichment function looks at the customer in some database, and fetches the values of : customer type (platinum, gold, silver, nobody) and customer_credit_limit.

The fact that there is a different event-id for the raw event and transformed event allows also the traceability to trace the transformation (maybe the problem is in wrong identification of the customer?).

There are some considerations that may, in practice, push towards not obeying to the pure model, and I'll talk about them some other time.

Wednesday, January 30, 2008

On Mediated Event Processing

I have mentioned the term "mediated event processing" in the past, but looking at previous postings - never explained exactly what it is.

So back to the educational mission of this Blog: mediations are exactly what they mean in messaging middleware or ESBs - transforming events. There are several types of mediators.

  • Enrichment: Receives an event as an input, adds attributes as a result of look up in a data store (database, spreadsheet, file) --- there are various levels of sophistications in enrichment.
  • Translation: Receives an event as an input, and translate to an event that is semantically equivalent. Example: XSL/T transformation (when the event is in XML format).
  • Aggregation: Receives N events and creates a single event with one or more aggregated attributes, note - this is a statefull mediation, but unlike pattern detection in CEP, the original events are not kept, there is just incremental updates of the state, thus, the state is bounded.
  • Split: Recieves a single event and creates M events - either identical clones or distinct - all of them functions of the input event.
  • Composition: Aggregation + Split: N input event ---> M output events.

One of the interesting questions are is MEP a subset of CEP ? -- again - this is a matter of implementation - in monolithic stand-alone engines MEP can be done by CEP engine, when CEP engine is part of a middleware, the ESB mediations may be used for this with some twist, and in agent-based EPN, agent-types may be of each of the mediators types.

BTW - I have written before about the CITT meeting in Regensburg - if you want to have more details about the presentations are now on the CITT website. My presentation can be found also on this website, as well as some other interesting stuff.. More -Later.

Monday, January 28, 2008

Why I prefer to use "event processing" with prefix, infix or suffix - a subjective tour of acronyms

Recently there has been more discussions about terms and acronyms, I am not sure that this is so important issue to spend much time on, but before moving to a more interesting points, I would like to provide some personal thoughts about acronyms in this area.

First, as you can see from the Blog's name, I prefer the term "event processing" with any prefix, infix, or suffix. The reason is that I view it as a name of a discipline and not as trend. Disciplines typically consist of two words: signal processing, information retrieval, machine learning, software engineering etc.. although there are exceptions. Three letter acronyms AKA TLA, are typically not names of core disciplines but of other things - protocols, architectures, trends etc..

Historically, when the first "event processing symposium" (which created EPTS) has been established we needed a name - the original founders were - David Luckham, Roy Schulte, Mark Palmer (from Progress Software) and myself. David, of course, thought that CEP is an appropriate name for the discipline, while Mark proposed ESP - "Event Stream Processing" since he did not like the word "complex" (read further about it). Roy and mysrelf proposed to take the part that both agree "event processing". Both David and Mark were not completely happy, but agreed, thus we advanced with the name "event processing symposium" and used "event processing" ever since.

Getting back to history - I have prefered to use the name "active technologies" being a veteran of the active database community, and although the autonomic computing community adopted the "active" term and had conferences named "active middleware services", this name actually did not get into the main stream, David Luckham used the term "complex event processing" in his famous book that used the term. The term "complex event processing" has ambigious meaning - one interpretation is that this is processing of complex events, where complex event is an event that consists of more than one event (analog to complex object), the other interpretation is that this is complex processing of events. I have started to use the term CEP in 2004 to differentiate such functionaity from "event correlation" in system management since there has been some confusion in IBM around this terms. I also made a modest contribution to get the name CEP known by giving a tutorial in ICWS in July 2004, attended by many people, whose common denominator has been tht they have not heard this term before. Anyhow - there are two school of thoughts around CEP

Interpretation one ("the monolithic approach") : CEP = EP, everything is a subset of CEP.
Interpreation two ("the layered approach") : EP is a collection of technologies, whereas CEP is one of them (a link in the chain). Some people takes the first interpretation, saying that "simple" event processing (whether it is simple event or simple processing) is a subset of complex event processing, the rational behind it that if an engine is capable of doing complex things it is surely capable of doing simple things. Interpreation two comes from Roy Schulte (Gartner) who introduced in December 2005 the following slide:

In this slide Roy Schulte talks about four types of processing (later he realized that the BPM one is of another category) - simple event processing (filter and route), mediate event processing (transform and enrich) and complex event processing (statefull pattern detector). This is consistent with a market view since there are products that do only simple event processing (messaging), other products who do mediated event processing (ESB) and CEP as the next layer as a stateful engine. I think that this approach is liked by those who are putting CEP on top of existing middleware, while the first ("monolithic") approach is liked by those who have stand-alone CEP engine. Anyway - the existence of this two approaches, and the fact that people may not understand that the other person is taking the second interpretation is causing a confusion.

Next acronym has been "event stream processing", the term "data stream manager" has been coined in Stanford in a similar meaning, but with SQL API, and continuing with other academic projects, and some descendent products (Coral8 is a descendent of the Stanford project). When Progress Software acquired Apama, Mark Palmer looked for an alternative word for CEP, since he was in the opinion that customers don't like anything labelled "complex", thus, he borowed the term "stream" although Apama's API is not SQL, and has not much to do with the academic stream projects and introduced the ESP term "Event Stream Processing" (which was dropped later). In response, David Luckham published an article to defend the "complex" word, starting with the words: "some people, I'm told, get scared when they hear the word complex, as in complex event processing.... start with the basic question, is life simple ? most people when asked about it will truthfully answer no...." and the rest you can read yourself. It seems that David has won this battle -- all vendors (including the SQL oriented ones) at some point or another have positioned themselves as CEP vendors, which also created some objections - by people who thought that it is important to diffrentiate between ESP and CEP, some saying that ESP is a subset of CEP, and some that these are completely different focus areas - as I have written before, there are many ways to define subsets of EP functionality, and I did not find any evidence that the one defined by this distinction (totally ordered events vs. partially ordered events) is the important one (in many applications we need both types for different purposes).

What other acronyms have flown around ? - well, Forrester at some point made a distinction between CEP and BEM (Business Event Management) that has been defined as - "a process of capturing real-time business events from multiple source and assigning them to the appropriate decision-maker for resolution based on the business context of the events". I have struggled to understand the distinction - maybe the fact that it deals with simple events, however, when they mention context - determining the context may by itself require CEP.

We, in IBM are using the term IEP (Intelligent Event Processing) to denote stochastic and intelligent reasoning beyond the deterministic pattern detection to CEP; this is consistent with the layer approach, the monolithic approach fans, view IEP as part of CEP.

The new term we heard this week from IBM is BEP (Business Event Processing) and this is intended to define event processing applications in which the business user can control the behaior (i.e. define and modify patterns without the help of a programmer), a topic I also discussed in the past.

Last but not least, some people in the academic community don't like the term "processing" which they think is too elementary and talk about "event-based computing" as the name of the discipline.
After this unusally long postings, my bottom lines are :

(1). The upcoming glossary should provide a consistent taxonomy of terms here - there is still much confusion about the names, and the glossary can be a good reference point,

(2). Personally, I still prefer to talk about types of functions and not about boundaries of names, however, I understand the importance of branding.

(3). I still prefer the name "event processing" without prefix, infix or suffix - and thus continue to use this name.

(4). Hopefully, this is the last posting I am writing on the *-E-P; E-*-P; E-P-* topic - I have more interesting topics to deal with.... more - later.