Friday, January 23, 2009

On Complexities and event processing

For those who read the title and grinned -- not again, discussion about what is the meaning of the term CEP, relax --- I explained in a previous posting entitled: "Is my cat CEP" , why such a discussion is futile, and I am typically consistent. BTW - when I have written that posting I did not have a cat, since than my daughter has adopted one, and he does not seem to me complex.

However, I would like to answer a more interesting question that somebody asked me recently -- what are the sources of complexity in event processing ?

In high school we have learned about "complex numbers", we liked this topic, since it was one of the most simple topics in the matriculation exam in Mathematics... Complex number is just a combination of two numbers, thus the complexity is in the structure. David Luckham also coined the term "complex events", where the complexity is also in the structure. However, there are more levels of complexity that may serve as a motivation to use COTS instead of hand-coding this functionality. What type of complexities can we observe beside the structural
complexity ?

Complexity derived from uncertainty:

  • The applications specification is not known a priori and has to be discovered, example: fraud detection. This is related to the "pattern discovery" I have discussed in the previous posting.
  • There are no reliable sources to obtain the desired events, or the events achieved can have uncertainty associated with them. This is a distinct complexity, since there may be the case where the application specification is well defined but the events cannot be obtained, and vice versa-- the patterns are unknown, but once discovered, the required events are easily available.
Complexity derived from connectivity:

  • Producer related complexities --- semantic differences among various sources, problems of time synchronization among various sources etc..
  • Consumer related complexities --- similar to the producer ones, these two are, of course, orthogonal to each other, and to all other complexities.
  • Interoperability complexity where various processing elements are involved.
Complexity derived from functionality:

  • Complex functions requirements -- e.g. complex patterns that may involve temporal, spatial, statistical operators and combinations of them.
  • Complex topology of the event processing graph, with a lot of dependencies among the various agents, which creates a complexity in validation and control.
  • Complex subscription / routing decisions.
Complexity derived from quantities
  • High throughput of input events.
  • High throughput of output events.
  • High number of producers
  • High number of consumers
  • High number of event processing agents (imagine 1M agents in a single application)
  • Requirement to maintain high amount of space for processing state.
Complexity derived from quality of service requirements:

  • Hard real-time latency constraints.
  • Compliance with QOS measurements such as threshold on average latency, threshold on percentage of events that don't comply with some latency constraint etc...
  • High availability requirements.
Complexity derived from agility requirements

  • Dynamic, frequent changes in the logic of the event processing
  • Need for programming by various types of "semi-technical" people among the business users community...
I am sure that this list is not complete, but it provides some indication...

Of course, a single application may be the ultimate complex application of event processing and need ALL of these complexities, finding this application is, for sure, the dream of every researcher --- getting a lifetime of research challenges, but in reality different applications have different combinations of complexities. An application can be simple in all metrics, but have hard real time constraints, it can have very complex functionality, but no quality of service, or quantities issues. Another applications may need pattern discovery, but again the rest is simple, another combination can be relatively simple application, with complexity in quantity of producers and consumers and in semantic integration with all of them, and with the wonder of combinatorics, one can get to many more combinations....

More on complexities - later.

Monday, January 19, 2009

On Event Pattern Detection vs. Event Pattern Discovery

This drawing, in various forms, has been used by us for many years to illustrate the notion of pattern, actually in the PowerPoint version it is animated, and the geometric shapes are keep moving. The term pattern is a bit overloaded in event processing, as noted my DEBS 2008 tutorial on this topic, but this illustration refers to the pattern which shows some combination of events, to be more accurate it is a predicate on the event history that if evaluated to the value of
"true" something should happen. This illustration was created by Tali Yazkar-Haham from IBM Haifa Research Lab as an exercise in a presentation course, and was used in dozens of presentations ever since (including presentations of some people outside IBM who typically forgot to give credit to the source).

Paul Vincent, in a continuous debate with Tim Bass, on the complex events forum, has written about "detection of new instance" and "detection of new type". While these terms make sense, I prefer not to overload the term detection and use the terms event pattern detection and event pattern discovery.

Pattern detection deals with detection that a predefined pattern has happened. This is what illustrated in the picture above. Some example can be: a patient is hooked up to a heartbeat monitor, and the physician is pre-setting a pattern "the heartbeat is monotonically increasing within 10 minutes, and the amount of increase is more than 30 during that period". This is actually a predicate over a part of the event history of a single source and type (other examples can involve multiple sources and types, but the principle is the same).

So event pattern detection is defined as detection that a predefined patterns has occurred.
This is equivalent to what Paul called: Pattern instance Detection.
In contrast, when we talk about
event pattern discovery we mean that the pattern is not known in advance, and the pattern discovery function determines what is the pattern. The legend says that Archimedes discovered his famous laws about floating bodies when sitting in the bathtub and shouted: Eureka (this illustration was taken from the homepage of a company who has the word Eureka in its name, again animated in the source).
A pattern can be discovered by machine learning techniques using decision trees, statistical modeling, Bayesian Networks and numerous other methods. At the end when a pattern is discovered then it also need to be detected in reality; there are also cases in which there is a continuous detection since the patterns are changing after a short time.

Getting back to the previous example about the heartbeat, it may be the case that this pattern has bot been set by a physician, instead it was detected by some method that has looked at past events and found out that this pattern has some significance.

Most people thinking about "complex event processing" are actually talking about pattern detection, regardless of whether the patterns were composed by a human or discovered by machine learning. The illustration at the top of this page illustrates what people typically mean. As stated in the past, I don't want to get into the meaning of TLAs, and leave it to my colleagues who are doing marketing. Thus the term that I used "event pattern detection" is the more accurate one.

Another observation about the difference is that event pattern detection can be applied as COTS -- a user can use such a product, compose some patterns, hook it up to event sources, and get the pattern to be detected.

On the other hand --- while there are many tools that can help in the event pattern discovery, we cannot hook it up to event source and tell it: discover all. There is a need to do some formal modeling of the system, kind of patterns that are sought etc... In other words, this is not something that a typical developer or business analyst can do, since it requires some expertise,

It is getting late - so I'll finish at this point and return to this topic at some later point.

Sunday, January 18, 2009

On Another Event Processing

This is past event that was held in Waterloo almost two years ago and dealt with "quilt hanging".

One of the interpretations of the term "event" in The Webster dictionary online is : something that happens. This one has been used by the EPTS glossary editors. But there are also some other interpretations, one of them is: social occasion or activity

Today in a meeting with some students they have attracted my attention to an interesting set of products that are processing events, however, the event they process are of the second type and not the first type. There are several companies that have products in this area, and the one name I remember is Eventful.
I Mention it just as an example, as I have not investigated this area, I understand that there are some others as well.

As a food for thought -- what is the difference between the two types of "event processing" ?

First Try:

Products in the first interpretation are typically geared towards the enterprise market, an event typically is assumed to occur within a single time-point (actually an interval reduced to a time-point), and the reason for doing it is typically one of the list that I have mentioned in a previous posting.

Products in the second interpretation are typically geared towards the consumer market through the Web, thus the business model is totally different; an event typically occurs in an interval, the participants in the events are often at the center, and the processing related to them. The motivations related to person's free time, and sometimes to social or other collaboration.

An interesting question is whether from technology point of view these two interpretations are totally distinct or have some commonality (e.g. both have producers, consumers, processing, events, subscription etc..), is this commonality interesting enough to try and generalize them, and whether there is any good reason to inter operate and mix event of the two types...