Showing posts with label Event Stream Processing. Show all posts
Showing posts with label Event Stream Processing. Show all posts

Monday, November 24, 2008

On evaluation criteria for EP products


Typically, I refrain from reacting in this Blog to any marketing material presented by vendors, a restriction I have taken upon myself as the chair of EPTS. I am not deviating from this rule, but since my friends in Coral8 have posted their article entitled: Comprehensive Guide to Evaluating Event Stream Processing Engines on David Luckham's site, as a vendor-neutral service to the community, I am taking a freedom to put some footnotes to this paper.

On the positive side, I think that this type of work is useful, and discussions about it is also useful, and many of the criteria presented are valid. We in IBM have devised in the past criteria for evaluation for internal purposes that included many of the mentioned criteria, I have to check if we can expose them.

On the critic side - here are several comments:

1. The first claim is that the authors view "event stream processing" and "complex event processing" as one and the same, saying that customers do not make distinction between terms, and saying that there is no agreed upon terminology. I am referring the authors to the EPTS glossary as a reference for terminology. But regardless of that, I would agree that customers typically don't care what TLA is used, the substance is more important.

2. Giving the statement that the coverage of this document is ESP and CEP which are one of the same, have created the feeling that this document is general, however, reading further I find out among the criteria that define what is ESP engine the following condition: "...process large volumes of incoming messages or events". This criterion confuses me -- is that a fundamental property of ESP/CEP engine -- I have heard in the recent year some analysts talks saying that actually most of the potential EP applications are not the "high volumes" ones, furthermore, the customers I know have various degrees of event volumes, some of them high, some low -- so maybe this is not part of the definition of what is an engine, but an evaluation criterion for certain amount of applications.

3. Reading further I see terms like: continuous queries, windows -- terms that already assume a certain type of implementation (indeed --- query-based stream processing), this fits the title of "event stream processing" assuming that there is an agreement that this is what ESP is, however, it does not represent the entire spectrum. Continuous queries is a technique that is intended to achieve some functionality, that can be achieved in other means.....

Personally I believe that "one size fits all" does not work, and that different event processing applications have different functional and non-functional requirements. There are applications in which various performance aspects are more or less important, note that there is also no standard benchmarks yet. I hope that the work of the EPTS work group on use cases that is planned to result in classification of event processing applications will result in a finite, manageable number of application classes, so the evaluation criteria can be partitioned by type.

And -- if possible, hands on experience indeed makes the evaluation more accurate and removes noise of preconceptions and false assumptions... More on evaluation - later.

Sunday, November 23, 2008

On the rain in the window -- windows and temporal contexts


I realized that I have not written for a while, I am not out of topics, just trying to do too many things in parallel... Anyway, I am typically late in changing from summer clothing to winter clothing relative to most others, but it happened yesterday, maybe the noise of the heavy rain in the window, brought me to change from short shirt and sandals to long shirt and shoes.
Winter is a relative term, people who live in some climates, will not call our winter as winter.

Last night I attended the conclusion session of "students exchange trip" in which my 13 years old daughter Hadas has participated, they visited a school in Foster City, California, this is a plan called "ambassadors", and they had also to give speeches about various aspects on Israel, one of their challenges was to convince their host to come to Israel as a counter-visit. Since the international media create the preconception that Israel is a dangerous place to be, with wars in the streets etc.., some people (typically those who have never been here) are afraid to come... It seems that the children were successful to convince that in Haifa we live normal life, and there is no war in the streets... Actually I am used to people I ask me, that I feel much safer in Haifa then in New-York, London, or Paris. Paris is the only place I was attacked by thieves, so it is the most terrifying city for me.


Back to the rain in the window. The notion of "window" that came from stream processing, is used to process a sub-stream that is bounded by time (or by number of occurrences). In some cases a window can be specified by some starting time and duration, or slide at certain time intervals, however, in other cases we need to process events in a time interval "while it is raining" - this is done either to find certain patterns that are only relevant in raining time, or use the stream processing classic application --- aggregate within a sub-stream. In any case, this is not determined by fixed time, and the duration is not known in advance. This can be either "while something is in state S" or a time interval that starts by the occurrence of event E1 and ends by the occurrence of event E2. An interval may also expire if the state lasts too long...


I'll re-visit the notion of context and its formal definition soon.


Saturday, April 26, 2008

On Streams and Events

The picture above is taken from a UCLA project that deals with multimedia stream systems. While the term "data streams" and later "event streams" that deals with continuous queries over structured data, have been introduced in the last decade in the database research community (with spin-off to products), the term "streams" has more general and more traditional meaning - referring to multimedia streams - video, voice, news etc... - which by nature belong to the family of unstructured data. In previous posting I discussed some of problems around "event stream processing", and around classification of event processing technologies. However, in this posting, I would like to point out that "stream processing" in its more traditional meaning is an important complementary technology to event processing.

First - the result of stream processing is in detection that an event has happened. Examples are: detection of vehicle's registration plate in automatic toll roads (we have in Israel one of these roads, there are other roads like this in Canada and Sweden - and maybe in more places), where the event "vehicle with registration plate X entered the highway in entrance Y in time T". This can be further processed (after correlating to the exit event) for billing purposes, but can also serve for security and other applications. In this case, from the event processing architectural view, the "stream processing" is done in a producer application, which generates events that are processed in the event processing system.

Second - the result of an event processing system can be an input to a stream, example: a game is being presented to the players as a video stream. Decisions made by the electornic player (or by the human player) can be assisted by an event processing system. The result of the decision can be movement of a player to a certain direction, and this is fed back to the video stream. In that case, the video stream is being processed in a consumer application, which gets event as an input.

Of course, a producer can also be a consumer, especially in games which are of iterative nature, thus an application is communicating with an event processing system in both side.

Since much of the events that happen in the universe is sensed thtough various unstructured media, the area of creating events out of multimedia streams, and embedding events to control the behavior of multimedia streams, will be one of the future major directions for the future, we can see some of this already hapening.

Monday, January 28, 2008

Why I prefer to use "event processing" with prefix, infix or suffix - a subjective tour of acronyms




Recently there has been more discussions about terms and acronyms, I am not sure that this is so important issue to spend much time on, but before moving to a more interesting points, I would like to provide some personal thoughts about acronyms in this area.


First, as you can see from the Blog's name, I prefer the term "event processing" with any prefix, infix, or suffix. The reason is that I view it as a name of a discipline and not as trend. Disciplines typically consist of two words: signal processing, information retrieval, machine learning, software engineering etc.. although there are exceptions. Three letter acronyms AKA TLA, are typically not names of core disciplines but of other things - protocols, architectures, trends etc..


Historically, when the first "event processing symposium" (which created EPTS) has been established we needed a name - the original founders were - David Luckham, Roy Schulte, Mark Palmer (from Progress Software) and myself. David, of course, thought that CEP is an appropriate name for the discipline, while Mark proposed ESP - "Event Stream Processing" since he did not like the word "complex" (read further about it). Roy and mysrelf proposed to take the part that both agree "event processing". Both David and Mark were not completely happy, but agreed, thus we advanced with the name "event processing symposium" and used "event processing" ever since.



Getting back to history - I have prefered to use the name "active technologies" being a veteran of the active database community, and although the autonomic computing community adopted the "active" term and had conferences named "active middleware services", this name actually did not get into the main stream, David Luckham used the term "complex event processing" in his famous book that used the term. The term "complex event processing" has ambigious meaning - one interpretation is that this is processing of complex events, where complex event is an event that consists of more than one event (analog to complex object), the other interpretation is that this is complex processing of events. I have started to use the term CEP in 2004 to differentiate such functionaity from "event correlation" in system management since there has been some confusion in IBM around this terms. I also made a modest contribution to get the name CEP known by giving a tutorial in ICWS in July 2004, attended by many people, whose common denominator has been tht they have not heard this term before. Anyhow - there are two school of thoughts around CEP


Interpretation one ("the monolithic approach") : CEP = EP, everything is a subset of CEP.
Interpreation two ("the layered approach") : EP is a collection of technologies, whereas CEP is one of them (a link in the chain). Some people takes the first interpretation, saying that "simple" event processing (whether it is simple event or simple processing) is a subset of complex event processing, the rational behind it that if an engine is capable of doing complex things it is surely capable of doing simple things. Interpreation two comes from Roy Schulte (Gartner) who introduced in December 2005 the following slide:









In this slide Roy Schulte talks about four types of processing (later he realized that the BPM one is of another category) - simple event processing (filter and route), mediate event processing (transform and enrich) and complex event processing (statefull pattern detector). This is consistent with a market view since there are products that do only simple event processing (messaging), other products who do mediated event processing (ESB) and CEP as the next layer as a stateful engine. I think that this approach is liked by those who are putting CEP on top of existing middleware, while the first ("monolithic") approach is liked by those who have stand-alone CEP engine. Anyway - the existence of this two approaches, and the fact that people may not understand that the other person is taking the second interpretation is causing a confusion.

Next acronym has been "event stream processing", the term "data stream manager" has been coined in Stanford in a similar meaning, but with SQL API, and continuing with other academic projects, and some descendent products (Coral8 is a descendent of the Stanford project). When Progress Software acquired Apama, Mark Palmer looked for an alternative word for CEP, since he was in the opinion that customers don't like anything labelled "complex", thus, he borowed the term "stream" although Apama's API is not SQL, and has not much to do with the academic stream projects and introduced the ESP term "Event Stream Processing" (which was dropped later). In response, David Luckham published an article to defend the "complex" word, starting with the words: "some people, I'm told, get scared when they hear the word complex, as in complex event processing.... start with the basic question, is life simple ? most people when asked about it will truthfully answer no...." and the rest you can read yourself. It seems that David has won this battle -- all vendors (including the SQL oriented ones) at some point or another have positioned themselves as CEP vendors, which also created some objections - by people who thought that it is important to diffrentiate between ESP and CEP, some saying that ESP is a subset of CEP, and some that these are completely different focus areas - as I have written before, there are many ways to define subsets of EP functionality, and I did not find any evidence that the one defined by this distinction (totally ordered events vs. partially ordered events) is the important one (in many applications we need both types for different purposes).

What other acronyms have flown around ? - well, Forrester at some point made a distinction between CEP and BEM (Business Event Management) that has been defined as - "a process of capturing real-time business events from multiple source and assigning them to the appropriate decision-maker for resolution based on the business context of the events". I have struggled to understand the distinction - maybe the fact that it deals with simple events, however, when they mention context - determining the context may by itself require CEP.

We, in IBM are using the term IEP (Intelligent Event Processing) to denote stochastic and intelligent reasoning beyond the deterministic pattern detection to CEP; this is consistent with the layer approach, the monolithic approach fans, view IEP as part of CEP.


The new term we heard this week from IBM is BEP (Business Event Processing) and this is intended to define event processing applications in which the business user can control the behaior (i.e. define and modify patterns without the help of a programmer), a topic I also discussed in the past.

Last but not least, some people in the academic community don't like the term "processing" which they think is too elementary and talk about "event-based computing" as the name of the discipline.
After this unusally long postings, my bottom lines are :


(1). The upcoming glossary should provide a consistent taxonomy of terms here - there is still much confusion about the names, and the glossary can be a good reference point,

(2). Personally, I still prefer to talk about types of functions and not about boundaries of names, however, I understand the importance of branding.

(3). I still prefer the name "event processing" without prefix, infix or suffix - and thus continue to use this name.

(4). Hopefully, this is the last posting I am writing on the *-E-P; E-*-P; E-P-* topic - I have more interesting topics to deal with.... more - later.

Thursday, December 20, 2007

On - "one size fits all" and Event Processing


Like commercial TV station - if a Blog wants to get "rating" one have to put somewhat controversial - the number of visitors to this Blog has more than doubled in the last few days when I had exchanges of opinions and folk stories with Tim Bass, anyway -- I got tired and did not continue that discussion. One question that I have received somehow related was -- does the fact that I don't think it is worth talking about ESP and CEP as separate entities means that I believe that there is a "one size fits all" in event processing ? well - this is a fair question, in the past I did believe it is true, until I read Mike Stonebraker in his immortal assertion: "One size fits all is a concept whose time has come and gone" Actually, I ceased to believe in it a little earlier, I think that the event processing area is not a monolithic area, and there are some variations needed - however:
  • I don't believe that ESP vs. CEP is the right type of partition in this area;
  • There may be a need to have various implementation under one roof (the heterogeneous framework approach),

For the first point -- what is the right type of partition ? this is a multi-dimensional questions and we still have to learn more to know the most useful combinations.

One of the important dimensions is the "reason for use" dimension, and here in an internal IBM study we got to five different reasons to use, I'll write about it in one of the next postings.

EPTS has recently launched a workgroup that tries to identify these classifications by doing a comprehansive survey of use cases that will be compared using the same template. A team that consists of Tao Lin (SAP), Dieter Gawlick (Oracle) and Pedro Bizzaro (University of Coimbra, Portugal) is working on this template, and a larger team will handle this survey and analysis -- the end result - a collaborative white paper about the state of the practice in event processing is expected somewhere in the second quarter of 2008. Stay tuned.

More - Later.