Event Processing Thinking

Saturday, September 6, 2008

On subjective view about discussion agenda

I tend to believe in positive thinking, my previous Blog posting had a somewhat negative tone, since it explained why certain discussions over the blog-land turn me off (I am quoting Giles Nelson from Apama). Sometimes one has to voice a negative tone to make a point, but today I would like to quickly return to positive tone, and say what I do want to see on the blog-land. This is, of course, my own subjective view, as noted in the title, and other people may have other interests.

First - there are both macro-level issues and micro-level issues that worth discussing; macro-level issues relate to the "event processing" discipline in general, while micro-level issues relate to particular topics that are of interest. I'll partition the discussion to -- applications and technology.

Macro issues about event processing applications:

What are the types of event processing applications, what are the motivation/ROI for using event processing?
What are the conditions for EP COTS to "beat" hand-coded EP
Business evaluation about trends, markets, industries etc...
Relationships with other areas - BPM, SOA, XTP, CODA, Cloud computing and more.

Micro issues about event processing applications:

Description of interesting applications
Experience report of customers - what worked well? What needs improvement.
Patterns of use and methodologies for best practices.

Macro issues about event processing technology:

Technology trends and Research trends
Technology challenges and engineering challenges
Standards

Micro issues about event processing technology:

New functional features (e.g. supporting uncertain events)
Non-functional requirements (e.g. what is latency?)
Languages
Development tools
Optimizations

Some of the recent postings on Blogs are exactly on such topics: The Smart Order Routing application, The Value of state and more.

This is probably not a complete list, but indicates on directions, this is more or less the topics that will be discussed in the 4th event processing symposium, and in the next postings on this Blog - I already have a backlog of several topics to post...

Thursday, September 4, 2008

Is my cat CEP or not?

One of my favorite cinema artists is Mel Brooks, and one of the great films of Mel Brooks is known as "silent movie" (see flyer above). To those who don't know or don't remember, silent movie is a film about doing a silent movie, and most of the plot involves around Mel Brooks and his gang going to famous actors asking them to participate in a silent movie, they agree and this sums up their participation, since their voice is not heard; there is one exception - when asking the world famous mime Marcel Marceau to be in the film he replies, "Non", the only spoken word in the entire film, since he talked, he did not participate in the silent movie.

So here is my dilemma -- recently the blog-land is full of postings on the topic "is X CEP?" - personally I don't find that this debate leads anywhere, so my inclination is not to participate in this discussion, so I'll contribute my part by explaining why I don't find this discussion interesting.

There are different reasons why I don't find it interesting to spend time on these discussions are:

I don't see that the results of this debate are important to the reality.

One can debate terms forever - but this is not useful, there is a better way.

This debate will not converge anyway, the same arguments are being repeated and it does not enrich the reader's intellect to read them again.

Let's start with the reality -- what the current vendors attempt to do is to build generic products that will enable to develop various sorts of applications in the area of event processing. This is different from the past, where event processing has been done in hard-wired, hand-coded, ad-hoc fashion to build a single application. While any single application may be challenging, the challenges in building generic products that will be a cost-effective alternative to hand-coding, are somewhat different from building ad-hoc applications. The content of these products is determined by several factors - not by the boundaries between "simple" and "complex" - there are multiple dimensions of complexity, and different applications may exhibit different type of complexity (functionality, topology, scalability, event types)... No matter where we'll put the border between the "simple" and "complex", a generic product will have to support both sides of the border. Bottom line -- the importance is ROI for the customers and not whether something is called "complex" or not.

Debate over terms: when we have done the first event processing symposium, in March 2006, the conference started by nine presentations about the "state of the art", and a very quick observation has been that each of us has invented his own set of terms. Consequently we have decided that the first community effort will be to produce a glossary edited by our non-vendor colleagues, to reduce bias. They have collected input from many sources with different opinions, sometimes strong opinions (it is amazing that people are so caring about terms). The glossary has been published recently and its announcement was endorsed by many people in the community. I am sure that nobody is happy about all terms, but as a compromise among all opinions, the editors have done a pretty good job.

Personally I don't have strong feeling towards terminology, so the glossary definition is a possible one, there are others as well, there is no absolute truth here, but - I think that is good to have a precise definition of terms that are accepted by large community vs. continue debating on terminology forever, even if I personally would have defined it otherwise. Among the possible interpretations, the editors chose to define CEP: Computing that performs operations on complex events, where complex event is defined as:An event that is an abstraction of other events called its members. As far as I am concerned, this is what CEP is.

Last but not least -- If somebody decides to insist on his own (different) definition of a term T, and judge everything according to this own definition, then obviously, if a term T have two different defintions, then X may be T according to one definition, and not T according to the other definition, when the two sides stick to their definitions, then no one will convince the other anyway, and arguments will still repeat themselves. Actually, I have not learned any new insight from this discussion. Have I mentioned that the arguments repeat themselves? in case that I have not done it , I'll mention that the arguments repeat themselves.

Bottom line -- is my cat CEP ? well - I don't have a cat(I wonder if Roger King who has written the famous article "my cat is object oriented" had a cat).

There are plenty of more useful topics to Blog about - so no more on that issue.

Wednesday, September 3, 2008

On event processing as a paradigm shift

The readers are probably familiar with this picture where it shifts between seeing two faces facing each other (in black) and a white vase. I came across a (relatively) new blogger in this area, Pern Walker, blogging for Oracle's "event driven architecture". The title of the posting is:

Event servers, a disruptive technology. It describes the components of the (former) BEA framework, nothing new here, but the interesting part is the conclusion - event processing COTS is a disruptive technology, it displaces custom code in event processing, since it is more cost-effective.

This reminds me of a discussion we had in May 2007 in the Dagstuhl seminar on event processing, it was a night discussion with wine, and was lead by Roy Schulte, the question that Roy has posed to the participants : "Will Event Processing (EDA) become a paradigm shift in the next few years or not?”.

Today, I don't intend to answer this question, instead I'll post part of the discussion in Dagstuhl that included observations about "paradigm shifts" (thanks to my colleague, Peter Niblett, who documented the entire Dagstuhl seminar). I'll return to this topic again, with my (and maybe other) opinions about the answer, after the EPTS event processing symposium

Observations (from the Dagstuhl discussion):

Paradigm shifts can’t happen if there are too many barriers; have the entry barriers for "event processing" already been removed? ;
Paradigm shifts are more likely to happen when adopters decide they need a whole new avenue of applications; they are less likely to happen as a way of re-engineering existing systems. For example the German population will reach 1:2 old: young ratio by 2020 so this requires a paradigm shift of healthcare models. Can we identify new avenues of relevant applications?
Paradigm shifts usually happen as a result of some external change, not just because of innate strengths of the technology itself. Can we identify such external changes?
Standardization is not necessary for a paradigm shift, but good, appropriate standards (de facto or otherwise) certainly help

Another question is to where in essence is the "paradigm shift" - is it the decoupled "event-driven" paradigm ? is it the "complex event processing", i.e. ability to find patterns on multiple events? is it the entire processing framework as the Oracle's Blog claim?

More - Later

Tuesday, September 2, 2008

On flow oriented and component oriented development of EP applications

I got an invitation for some company I have not heard about for a kickoff of a product in the area of "seating in front of computers", I don't have time to go - but since they have also attached a picture of their product, I have copied it - I wonder if this is the current trend in ergonomy...

Anyway, today, some thoughts following a recent discussions here about development tools for event processing applications.

There are two possible ways, from the developer's point of view, to build an event processing network.

The first, which I'll call "component oriented" (may be there is a better name?), in which the developer defines the different components (patterns, rules, queries - use your favorite language style), each of them individually, and then some compilation process build the network implicitly. This is a kind of "bottom up" approach.
The second, which I'll call "flow oriented" , in which the developer has some graphical representation of the flow, and when building the flow, one can put some boxes inside the flow, and zoom to each box to define the component. This is a kind of "top down" approach.

It seems that each of them has benefits for other assumptions - if the application is dynamic and mainly subscription based, then the first approach is probably better, since the notion of flow is not a stable one; if the application is relatively static, then there is a benefit to use the second approach, since it can provide more visibility into what the application as a whole is doing, and help in the validation, since the "decoupling" principle may bring to the developments' feeling of chaos (this is indeed one of the barriers of the use of event processing...). As said, the "flow oriented" approach can ease the validation of event processing application, there are also some tools that help validating "implicit" flow, but the validation issue deserves discussion by its own right. More - later.

Saturday, August 30, 2008

On the streaming SQL evolving standard

Kudos to our colleagues from Oracle and Streambase for their presentation in the industrial section of VLDB 2008 -

Towards a Streaming SQL Standard

Stan Zdonik (Streambase,Inc.), Namit Jain (Oracle), Shailendra Mishra (Oracle), Anand Srinivasan (Oracle), Johannes Gehrke (Cornell University, USA), Jennifer Widom (Stanford University), Hari Balakrishnan (Streambase,Inc.), Mitch Cherniack (Streambase,Inc.), Ugur Cetintemel (Streambase,Inc.), Richard Tibbetts (Streambase,Inc.).

Unlike last year, I have not participated in VLDB this year, though I would love to visit New Zealand when an opportunity arrives. VLDB is certainly a respectable conference, and the list of authors include some respectable members of the database research community. Mark Palmer also blogs about it, under the title: towards a CEP standard.

A few comments about it:

I think that this work is important, currently there are multiple variations of SQL extensions for various event processing purposes, and it will be easier if there will be consolidated.
There is a mention of "event based" vs. "set based" views. Looking at patterns that are detected, there are indeed patterns that are best approached in "event based" view, meaning that when each individual event arrives, there is an evaluation whether a pattern has been completed; "set oriented" is more convenient when the pattern is on set operations -- for example: looking if the average value of some attribute for all events that belong to certain context, is more than some threshold. Example of "event based" pattern is: looking for a sequence of two events (customer-complained, delivery-arrived), example of "set based" is: average of all delivery-actual-times in a certain shift is more than 30 minutes, where the delivery-summary is a derived event derived from: order-made and delivery-arrived).
Retrospective pattern - i.e. patterns on historical events are "set oriented" by nature, but as shown there are cases in which the set-oriented thinking is also applicable to running events (this, of course, can be emulated by "event based" pattern).
SQL extensions, of course, cover only part of the languages that exist in the event processing universe, and those who don't believe in the SQL region, will probably not convert to be believers if streaming SQL standard will be approved; I have written in the past about the Babylon tower and did not change my opinion since then -- I view SQL (with all of its extensions) a natural way to express queries about "states", but not about "collection of transitions", and think that there is a more natural way to think about it. The EPDL work we are doing is a step towards it, however, the idea is to use it (at least initially) as a meta-language, where the Streaming SQL may be one of its major implementations - I'll provide more information about the EPDL project later this year.
Another comment: while the language standard is certainly the most challenging, there are also other standards that need to be discussed in the area of inter-operability, event formats, modeling and more. In the EPTS symposium coming next month - we'll dedicate some of the time to standards, starting with a keynote address of a standard expert about the impact of standards on industries, and then there will be a panel with various participants to discuss these issues.

Friday, August 29, 2008

On research and practice in event processing

Triggered by a question of Hans Glide to a previous posting, today's topic is the relationships between research and practice in event processing. I'll not go to ancient history of the event processing area ansectors such as: simulation, active databases etc.., but start from mide to late 1990-ies, when the idea to generate a generic languages, tools and engines for event processing has emerged. This area has emerged in the research community. David Luckham and his team in Stanford, has done the Rapide project, Mani Chandy and his team in Cal Tech has done the Infospheres project, John Bates has been a faculty member in Cambridge University and Apama was a continuation of his academic work, my own contribution has been in establishing the AMIT project in IBM Haifa Research Lab, which is also part of the research community (kind of..). In the "stream processing" front there have been various academic projects - The Stream project in Stanford, The Aurora project in Brown/MIT/Brandeis, this are just samples, and there were more - however, the interesting observation is that the research projects have been there before the commercial implementation, furthermore, many of the commercial implementation were descendents of academic projects, examples are: Isphers was descendent of Infospheres, Apama was descendent of John Bates' work, Streambase was a descendent of Aurora, Coral8 was a descendent of the Stnaford stream project, and probably there are more. However, when commercial products are introduced, the world is changing, and there is a danger of disconnect between the research community and the commercial world, since products have life of their own, and are being developed to various directions, while people in the research community continue in many cases with the inertia to work on topics that may not be consistent with the problems that the vendors and customers see. While wild research is essential to breakthroughs, the reality provides a lot of research topics that have not been anticipated in the lab, and there is a need to do synchronization in order to obtain relevant research.

The Dagstuhl seminar in May 2007, where people from academia and industry met for five days and discussed this issue has been one step, my friend Rainer von Ammon organizes periodic meetings on these issues, and a European project may spin off these meetings We shall discuss this topic in the EPTS symposium, we have more than 20 members that are part of the research community, many of them will also participate in the meeting.

Bottom line: the life cycle is --

1. Ideas start in the research community.

2, At some point the commercial world catches-up.

3. Parallel directions - research continues, commercial products evolve to their own way.

4. Synchronization, exchange of knowledge, ideas flow in both directions -- need guidance.

More - later.

Thursday, August 28, 2008

On the "Event Processing Thinking" Blog - after the first year

One of the ways to obtain events is through "calendar events", this is useful for time-out management, periodic triggering etc. Today I saw in my calendar a reminder: this is the one year anniversary of the "event processing thinking" Blog - you should write something about it. Actually, yesterday I got a note from one of the analyst firms that research the impact of Web 2.0 on companies and was asked to participate in this study on my Blogger hat... This is not the first time that people approach me based on reading my Blog for various purposes, and actually I can say that I have under-estimated the power of Blogs and the amount of visibility it gets. This is probably the most visible communication vehicle exists today (how many people are reading papers?)

Looking at the Blogland I also realized that the visibility can be a double-edged sword, since people can easily expose their own ignorance, so I am trying to write only on stuff that I think
I know something about...

One thing that is interesting is the statistics (who reads the Blog) - it seems that the previous time I've written about statistics has been one of the most read postings (see below).

Looking at the Google Analytics statistics it seems that since the start of measurement (I've installed Google Analytics 2 weeks after the Blog start) more than 10,000 distinct persons (10,139 to be exact) have read this Blog. I don't have any illusion that there are 10,000 people who are interested in event processing, and some got due to the wonders of the almighty Google (e.g. looked for a picture of unicorn), so a better metrics is to see that 1/3 of the readers returned more that once, and 1432 readers returned more than 50 times - which is the more reasonable number the amount of people interested in the content. It seems that the amount of people who read all or at least 2/3 of the Blog postings is around 800, and this seem to be the size of effective readership.

What else can I learn from the statistics? The most popular postings are:

(1). Agnon, the dog, playing and downplaying is still, and by far the most popular one, in this posting is one of the postings where I claim that "event processing" is a discipline that stands on its own fits, and not a footnote to database technology or business rule technology.

(2). Revisiting the Blog **2 again which, like this posting, is talking about statistics around this Blog, I wonder why this posting is so popular (or people wanted to look at the map of Arkansas to plan their next holiday.

(3). On infant, professor and unicorn despite the fact that this posting is much younger, it had a lot of traction, some because people are looking for pictures of unicorns, and some because always disputes bring more rating... However, rating is not all, and when I think that I've said all that I need to say about particular topic, I move on.

As far as the geographical distribution of readers: there have been readers from 124 countries.
In terms of amount of entries - the big ones are:
(1). USA, (2). UK, (3). Israel, (4). Japan, (5). Germany, (6). Canada, (7). France and (8).India. As far as the amount of individual readers - the big ones are:
(1). USA, (2). UK, (3). Germany, (4). India, (5). Australia, (6). Israel, (7). France and (8). Holland. So it seems that in Japan I have relatively small (less than 100) but loyal set of readers - I am still looking for some opportunity to travel to Japan - never been there (actually I have never been in India either).
In the USA there are now readers from all 50 states (+ DC) and the leading are: California, Massachusetts and New York. Putting Arkansas map helped - and now Arkansas in the 16th place in the USA in visits.

The three big cities in terms of visits are still : (1). London, (2). New York City, (3). Bangalore.

I'll not survey the negative and positive reviews about this Blog - and let every reader judge. that is the essence of the entire Web 2.o business! -- well, that's all for today; Will return soon with a more professional posting.