Saturday, February 14, 2009

Quantum Leap -- take II


This morning was a sunny Saturday after a few rainy ones, and along with many other people, I went out with my family to the nature... We live in Haifa, which besides its beaches and beautiful view of the bay, has also a close by big nature reserve called "Carmel forrests", not really a Forrest in global terms, but has many nice hiking trails, 15 minutes drive from home. Here are some of the flowers we watched today... good to take a break sometimes..

As a follow up to my previous posting on quantum leap, here are some more insights, we in IBM Haifa Research Lab have signed up to look at the "next generation of event processing", and are working on this topic, I may present a tutorial about our findings in DEBS 2009, if accepted.


Here are some initial insights:

  • Like in databases, there need to be a formal model that will have wide acceptance (over time) to enable the quantum leap, since acceptance provides a critical mass of work directed to the same direction. Our belief is that the "event processing network" model is the one, but it still lacks solid formal basis.
  • Besides this -- there are four areas that will show in the future significant developments, if they will be done on the basis of the model -- it can provide a coherent play. The pyramid below shows the four :


  • Platform: While the first generation of event processing is the "engine" land, we are starting to see movement for platforms which will provide shared services (e.g. - global state management, routing, load balancing, security, high availability...) and a possibly heterogeneous collection of event processing agents will run in these platforms. There may be platforms with various orientations -- grid platforms, database oriented platforms, messaging oriented platforms, streaming (data flow) oriented platforms to name a few. The platforms may be an "event processing platforms" or platforms with wider functions (e.g. event processing agents and other decision agents). Some analysts are talking about -- extreme transaction processing (XTP) and context-oriented platforms, maybe the platform will mix some of all of the above. Like the area of application servers in enterprise computing, the platform orientation is one of the facets of the next generations.
  • Engineering: The engineering progress is not really considered as revulsion, but they are required to enable the higher layers to work in reality. This is the equivalent in other areas to query optimization, tuning, configuration, scheduling, load balancing, parallel programming assignments and various of other systems related topics. The relational databases became widespread only after the vendors succeeded to get the engineering parts right, so advancement in this area is critical.
  • Functional: The functionality that products have today is just the start, more functionality will be supported, maybe even substantially more. Some directions: the "intelligent event processing" direction -- looking at discovery of unknown pattern and prediction of future events, adding more context information - like geo-spatial, getting better temporal handling; probably much more.
  • Usability: Here probably will be much of the quantum leap -- getting the abstraction levels higher. Hierarchy of events, and causality, advocated by David Luckham, are really abstractions. However, there are more than just abstractions from the implementations up, there also need to be abstractions from the user thinking down. Instead of trying to visualize and abstract out the implementation model, the opposite direction will be to have the abstractions in the users domain of thinking and translate them (perhaps not 1-1) to implementation.
The quantum leap will occur with a coherent combination of all these aspects. There may be some new vendors which will offer next generations as their first generation, since they are liberated from supporting legacy (and may be acquired by larger vendors) , and there are existing vendors which are going into some of this in an incremental way....

EPTS will attempt to contribute to the thinking about next quantum leap by the work in its working groups; we also saw in the last EPTS event processing symposium that the use cases working group has presented a variety of use cases, which cover broad range of applications types and requirements, this will be one vehicle to determine requirements. Other working groups will contribute in the various areas. In May 2010 we'll do a major summit of industry and academic people (Dagstuhl Seminar), EPTS members will get a more detailed note about it.

More - Later.

Friday, February 13, 2009

On Quantum Leap in products


David Luckham posted on the complexevents site the question: Is there a commercial need for quantum leap in CEP products. David has also a continuation article that discusses event hierarchy abstractions as a quantum leap possibility. Before answering David's question let's me make some observations. Early in my career, 33 years ago, I have been a programmer in the Israeli Air-Force (the Air-Force was in opinion that letting me flying aircraft's will be too dangerous for the public safety...) and we have been early adopters of IMS, which has been relatively new, had severe performance issues, and needed a lot of manual tuning, and did not really work as advertised. IMS was actually a second generation database, it has a predecessor called DL/I and still used the DL/I language. IMS was a huge improvement over file systems that we have used before in level of abstractions, concurrency control, and many other utilities, yet it had many issues that have been resolved over time. The relational databases are actually the third generation of database systems and it also had a rough childhood, until query optimizations have been matured; The relational database has been a disruptive technology, and also had its own childhood problems, until query optimization have been better understood.

Back to event processing --- I assume that event processing products of 2019 will be totally different from those of 2009, the questions are:

  • Is there going to be a "disruptive technology" as the relational database has been in the database area, OR "just" gradual evolution will occur.
  • What will drive the progress to next generations ?
What will trigger the next generations ?
  • Customer requirements that require substantial change
  • Competitive pressure
  • Disruptive technology
  • (more reasons) ?
I'll leave these questions for now as a food for thought and will discuss them in subsequent postings.



Thursday, February 12, 2009

On the EPTS Language Analysis Working Group

Yesterday we kicked-off the "languages analysis working group" of EPTS. The EPTS members have been part of the approval process, and a result of a going on brainstorming in the community that started in the event processing symposium in September 2008, we are kicking off six working-groups: glossary, interoperability analysis, meta-modeling, reference architectures, use cases and languages analysis. These working groups will be the main activity of EPTS in the year 2009, and will hopefully result in better understanding of the area, and better understanding of the potential standard play. In the working group we have around dozen people from the vendor, academic and customer community. I have decided to concentrate my own technical contribution to EPTS (besides the substantial time I invest in facilitating the overall activities) in the language analysis area - since this is the closest to my interest area. My partner in moderating this working group is Dr. Jon Riecke, an experienced programming language researcher, who works for Aleri. It is a challenge to have a diversified team with various opinions, which is a symptom of the challenge to achieve an event processing language standard somewhere in the future.

What is this working group chartered to do ? we have committed on two deliverables, the first

The first one is -- [the exact terminology is still under discussion] --- a [semantic model] that abstracts out the functions (and may be non-functional annotation) of event processing languages, without getting to the question of programming style (SQL or not SQL). We shall look at existing commercial languages as well as languages that have been developed in the research community and try to abstract out, another source will be a feedback from the use cases working group that works in parallel that will attempt to discover more requirements from the use cases analysis.

The second one is discussion and recommendation (possibly with alternatives) about the road- map to standards in the event processing languages area (can we aspire to one standard or multiple standards ? maybe a standard in the semantic "meta" level only, or we may determine to table the issue of standards for a certain period), frankly, I don't have a clue what the conclusion here will be.


We'll submit a tutorial proposal for DEBS 2009, if accepted - we shall present an interim report of the work in July.

I'll report about it as we'll progress.

Sunday, February 8, 2009

Event Processing Platform (EPP) --- yes, but...





STAC, as cited in the popular Blog of Tim Bass, has determined that the correct name that should be used for event processing products is EPP (Event Processing Platforms) rather than CEP. I actually like the term EPP, actually I used this term before, so I should have copyrighted this name...

However, EPP should be used in the right meaning.

In event processing software there are two different things:
  • Platforms that provide the "programming in the large" -- indeed a container in which different types of functionality can be plugged in.
  • EPA implementation Software - that performs the actual event processing work - e.g. pattern matching, enrichment, filtering etc... This is the "programming in the small" (I called it "event processing engines", but not convinced that this is the best name)
In the EPTS glossary terminology, Platforms implement "event processing networks", while the other type implement various types of event processing agents.

These are not the same, there are vendors who provide platforms, but use other software to implement agents For example - BEA provided a platform, and used Esper for various functions, if I am not mistaken this is also true for Event Zero, IBM's Infosphere Streams is also a platform -- all are indeed platforms. Some products provide both the EPN platforms and various EPA implementations, some provide just the EPA implementation and runs on various platforms (or as a centralized stand alone engine).

So, while I agree that the EPN implementations are platforms, I am not sure that the EPA implementations are also platforms, and we probably may need a different name (engines ?, not sure)...

And one sentence about the term CEP. As I have written several times, I am not a big fan of this term at all, I am consistently talking about event processing and not about complex event processing as the name of the discipline that this Blog covers. However, this reminds me that once I have been a member in a Hebrew technical terminology committee, and one of the terms that came for discussion has been "real-time", for some strange reason, in Hebrew it was translated literally as "true time", and when it came to write the official glossary endorsed by the Israeli Academy of Hebrew Language, their representative who knew Hebrew linguistics, but not computer science, insisted the the Hebrew word should be a true translation of "real time", giving a long talk about "real numbers" and other real stuff. I argued that --- from linguistic point of view he is probably right, but, the scientifically wrong name is already well-known in the industry, and decision on another name would not be accepted by the public. After long discussion he agreed to include my wrong version as an alias to his true version. You can guess which of the two is still being used for "real-time" in Hebrew.
The moral of this story is that it may be too late to change names, since the name CEP has been accepted in the industry for any type of event processing system, whether or not it is scientifically accurate, and as somebody said once -- resistance is futile...

I'll continue to use "event processing", will use "event processing platform" for a platform, and still looking for a term for the "EPA implementation" (engines or otherwise). But -- my guess is that the people that use CEP to denote any type of EP will continue doing it, since this name may already penetrated to the ground. More - Later.

Saturday, February 7, 2009

On Classification of Event Processing Applications


The illustration above talks about classification in the animal universe; classifications is one of the best way to understand the universe. In our context, I have started in the previous posting to discuss types of functions that exist in Event Processing generic tools. Today I'll complete the picture by discussing classes of applications. This classification is not a partition, a certain application can have elements of multiple classes. This classification answers the question ---
what benefit the customer expect to obtain from an event processing system ?

The illustration below is an IBM classification of what is "Business Event Processing", this is a slightly modified version of results of study we conducted within an IBM Academy of Technology study that analyzed some use cases. The use cases working group of EPTS is now repeating this exercise, three years later, and with probably somewhat broader perspective, so the end result may be different, but this will provide a sense of this type of classification:



Starting from the top and going anti-clockwise (I am left-handed...)

  • Business Activity Monitoring (BAM): Observation on collection of activities to find exceptions and monitor key performance indicators to alert business stakeholders. This typically requires aggregations and predefined pattern matching.
  • Business logic derived events (sometimes called RTE - Real-Time Enterprise): detecting situations that require reaction (typically with some time constraints). The derivation of the situation may be either by predefined patterns (e.g. regulation enforcement) or by discovered patterns (fraud detection). Most of the applications use predefined patterns.
  • Predictive Processing: Processing future predicted event in order to eliminate or mitigate them.
  • Stream Analytics: Analysis of various streams (video, voice, data etc..) to derive individual events (e.g. from video stream) or trends - this includes "real-time business intelligence".
  • Business Service Management: Monitoring satisfaction of Service Level Agreement (SLA) of IT systems.
  • Active Diagnostics: Finding the root-cause problem by looking at collection of symptoms.
  • Information Derived Events (also know as "information dissemination") -- personalized subscription that provide the right information at the right granularity to the right person at the right timing.
I'll dedicate (in the next few weeks) a separate posting to each of them with some examples, and reference back to functional and non-functional requirements.

Friday, February 6, 2009

On the first step in the way to "event processing manitfesto"


It was a very busy week and alas I had to neglect the blogging hobby, now it is Friday night, I am watching a TV program with old Hebrew songs (my favorite), and decided it is a good time to blog a bit, however, our relatively new cat, who looks somewhat like this (this is not his picture, but of a similar cat I've found on the web) decided that I am a good place to rest on, and did not want to move, another creature who is trying to manage me... He is really a kitten that my daughters found and adopted, and as I have written before, giving names in our family is not an easy task, so he has several names and is known by "the cat". I call him Gilgamesh the terrible.

In 2007 we had the first Dagstuhl seminar on event processing, and we the same set of organizers (Mani Chandy, Rainer von Ammon and myself) decided to apply again for a second Dagsthul seminar in 2010, and the seminar has passed the committee, with some clarifications that we need to provide about scope. I'll let you know if and when it will be finally approved.

The intention of this Dagstuhl seminar (that lasts for 4.5 days) is to have an opportunity for a selected group of people to have a meeting in an isolated place to have in-depth discussions. The proposed goal of this Dagstuhl Seminar is to work on "event processing manifesto". There has been several manifestos of different area in the past, for example: OODB manifesto, Hopefully, by the time of the Dagstuhl seminar we'll have advanced work done by the various EPTS working groups that are being launched this month, and we'll be able to utilize their results in order to better define what "event processing" is -- note that I don't use "complex event processing", and I explained the reasons before.





One of the questions asked is what is the scope of "event processing", since working with events is quite wide area - starting from interrupt handling in operating systems, moving through graphical programming and more -- much of this is related to programming with events in conventional programming, and there are even books dealing with this area. However, our scope is more modest: generic tools for processing events in IT systems. This scope talks on what is needed to build a generic tool, and not ad-hoc programming hard-coded for every single application, and IT systems and not operating system, embedded systems etc..

The illustration above is a first step in thinking about -- what event processing system should include -- parts of it should be mandatory and some optional, however from functionality point of view there are:
  • Routing and filtering -- the most basic form of event processing.
  • Mediation -- transformation, enrichment, aggregation, split -- the next level of sophistication.
  • Pattern Matching --- (I called it in the past "pattern detection") which may involve multiple events from multiple types.
On the bottom of the illustration there are two other entities:

Event processing platforms which are enablers for scalability, distribution and other good qualities. Event processing platforms may have their own functions or host others (or both)...

Pattern discovery that falls under the category "Intelligent Event Processing". It can be done off-line (typically this is the case) or on-line - and then the pattern matching may be unified with the discovery.

In different types of applications we may need different subsets, for example: fraud detection requires pattern discovery, security type detections (e.g. denial-of-service attack or intrusion) may use on-line pattern detection. On the other hand, other applications don't require pattern discovery at all, for example: compliance with regulations, where the regulations are given and cannot be discovered, or BAM systems in which the Key Performance Indicatros are determined according to the corportate strategy and cannot be discovered. Furthermore, there are applications in which pattern matching is not required at all, and all processing is of type filtering, routing, enrichment and aggregation.

And I'll finish with a footnote to David Luckham's recent article. David is trying to answer "critisizm on the Blogsphere" about CEP as a marketing hype, and lack of value from the current set of products. First, I never thought that there is over-hype, on the contrary, relative to the potential of event processing there is under-hype. I am re-posting this illustration taken from Brenda Michelson panel presentation in the last EPTS annual symposium.


The hype is relatively low, and in contrast, the analysts report are all indicating that the EP market has grown by 50% or so in 2008, and IDC even claims that for a second year in a raw that is the fastest growing middleware type. About the Blogsphere crtisizm, as I already written before, much of it stems from diferent interpretations of the term "complex event processing", for example, some of the postings of Tim Bass lead me the conlusion that he believes in the equation : complex event processing = on-line pattern discovery. Again, eliminating the quantification "complex", there is a large set of applications (probably most of the applications I know) of event procssing, do not require stochastic reasoning at all.


I'll post a continuation Blogs about application types, and functions they require.. It is very late - going to sleep.

Sunday, February 1, 2009

On Off-Line Event Processing



A comment made by Hans Glide to one of my previous postings on this Blog, prompted me to dedicate today's posting to Off-Line Event Processing. Well - as a person who is constantly off any line, I feel at home here...

Anyway -- some people may wonder and think that the title above is an Oxymoron, since they put "real-time" as part of the definition of event processing. I have used before this picture that is the best describing some of what is written about event processing - by everybody:



This, of course, illustrates a collection of blind people touching an elephant; each of them will describe the elephant quite differently, and the phenomenon that people say "event processing is only X", where X defines a subset of the area is quite common. In our case X = "on line".

The best here is to tell you about a concrete example of a customer's application I am somewhat familiar with. The customer is a pharmaceutical company which monitors its suppliers related activities. It looks at events related to supplier-related activities and checks them against its internal regulations. The amount of such events are several thousands per day and from business point of view, it does not require real-time requirements, the observation about any regulation violation and action taken, can be done in the next day. The way that this system works is accumulate events during the day, and activate the vent processing system at the end of each day, which is actually a batch processing done off-line.

An interesting question is why have this customer chosen to use an event processing system, and did not use a more traditional approach of putting everything in a database and using SQL queries. The answer is quite simple: This applications have some interesting properties:
  • The number of regulations is relatively high (in the higher range of three digits);
  • Many of the regulations rules are indeed detection of temporal oriented patterns that include multiple events,
  • Regulations are inserted or modified frequently.
Given all these it turned out that the use of event processing system in off-line was the most cost-effective solution; While using SQL is nominally possible, writing these regulations in SQL is not easy, and the magnitude makes the investment in development and maintenance quite high.

So - the benefit of using event processing here is neither the real-time aspect, nor high throughput support, but simple TCO considerations.

This is not the only applications of this type, and in fact, I have seen several other cases in which event processing has been used off-line. There is also another branch of off-line processing which combine on-line and off-line together, but I'll write about it in another posting...

More - Later.