Wednesday, February 1, 2012

On "CEP and Big Data 2" - comments on Philip Howard's observations.

Philip Howard from Bloor Research has posted some observations on his Blog entitled "CEP and Big Data 2".   Here are some comments (actually nothing new - just summarizing things I have written about before).
Philip deals with three issues:

  • whether the name CEP is appropriate or should be changed? 
  • who should be credited as the pioneer of this area?   
  • whether CEP implies real-time processing?  
  •  who are the CEP big data platforms?

Here are summary of my views on each of this topics.

The name "Complex Event Processing"

Exactly four years ago I posted on this Blog an explanation about - "why I prefer to use the name event processing without any prefix, infix or suffix".   My particular dislike of the term "complex event processing" stems from the ambiguity in the name - some people (including David Luckham who coined this term) view it as processing of complex events, some interpret it as complex processing of events, and then debate of when something is complex enough, and what type of complexity is needed  to qualify as CEP.  Moreover some of the vendors use this term for products that are neither of the two options.   I think that two words is enough for the name of a discipline, examples: information retrieval, machine learning, image processing and much more....  Thus, from my point of view the term "event processing" subsumes all other terms like complex event processing, business event processing, event stream processing and more.

Who gets the pioneering credit

Philip as a good UK patriot wonders why the Wikipedia value about Wikipedia and other sources gives credit to David Luckham and forget the Apama work that came from Cambridge UK.    Looking at Wikipedia, it has one mention of David, as well as other references (like our EPIA book). It indeed does not mention Apama or any paper by John Bates, but being a Wikipedia, anybody can suggest additions.   
David Luckham had major influence on this area, since he was the first one who published a full book and exposed the young area to the general public.    An article in IEEE Computer, published in 2009,  made some investigation of the history of that area and determined that in the 1990-ies there were four parallel projects that can be classified as starting points in this area:  David Luckham's project in Stanford,  John Bates' project in Cambridge (UK, not Boston), Mani Chandy in Cal Tech,  and our Amit project in IBM Haifa Research Lab.    I share Philip's view that John Bates should have full credit as one of the pioneers, and still view David Luckham as the "elder statesman" of the community.

Is CEP necessarily associated with real-time?

I have written several times about this topic, last time in response to Chris Carlson, to whom Philip also responds.   There is some abuse of the term real-time in the industry, while its meaning is "within time constraints", many people interpret it as "with very low latency".   This is not the same,  anyway, event processing is a functionality with applications that require very low latency, applications which require to react within real-time constraints (which can be: 2 hours), some require both, and some require none.

Who are the CEP big data platforms?

I have taken upon myself the limitation not to state opinions on commercial products within this Blog  - leaving  it to analysts.   Thus will make one comment.  There is distinction between two types of software entities
which is sometimes confused in the language used by people.

  • Event Processing Platform is a software that enables the creation of event processing network, handle the routing of events among agents, management, and other common infrastructure issues.
  • Event Processing Engine is a software that enables the creation of the actual function - in the EPN term implementing agents.
This is similar to the difference between an application server and a single component (programming in the small vs. programming in the large).    Some of the available platforms for "event processing for big data" provide the first one -- it gives infrastructure, but not implementing any type of functionality, but enabling developers to create their own functionality, thus they don't do full-fledged event processing.   Seems that many people classify both under the same classification  (of course there are products that do both). 

1 comment:

Celticht32 said...

Ahh Amit... That's what started me on the EPA road years ago... there are still functionalities which AMIT was ahead of its time for.

I have given quite a bit of thought to the matter if EDA is more of a real time vs. batch and to be honest I believe its both. Here is why... It solely concerns the criticality of the action, essentially if the action has to happen in real time then the EDA should perform in real time. If however the Action does not have a criticality associated with it then one can use a more relaxed architecture. The key is simply the lifespan of the action and its importance during that lifespan.

The CEP debate (CEP or EP) has been one I have been struggling with for a long time... CEP itself is just a layer of the total EDA layout. From my standpoint you have Events (Raw events as it were) to which you perform some form of SEP (simple event processing...aka filtering out the white noise). SEP is a very important step as it narrows down the stream to the events of interest. After SEP is performed CEP should be performed on the refined event stream. This is where one should perform things such as enrichment, correlation and the like... Now this is where it gets confusing to quite a few people, That whole process can be extrapolated out several times into a federated hierarchy.
Ultimately I use the term Event Driven Architecture to explain this federation to people not CEP.

I agree additionally that CEP is a over used word that many vendors throw around with no concept of what it means... Just because you have a stateful machine does not a CEP / EDA product make.

That is just my thoughts for now on the matter.