Monday, April 28, 2008

On Posets and Red Herrings

Hans Glide, one of the knowledgeable persons in the EP area, has made in his Blog couple of postings about Posets, one of them claimed that the term Poset in EP is a red herring and of interest to mathematicians only, the other one says that causality graph and causality inference are not the same.

Let's make some assertions in this area, in order to clarify (or, god forbid, confuse...):

1. The EP applications space is not monolithic, different applications may have different requirements.
2. The issue of total order vs. partial order has been blown out of proportion, since it is not a major differentiator among products. Products who lack some functionality (see below) can somehow hack it in practice.
3. The main importance of causality relation in general is in the ability to trace all causal descendents of a certain event, or conversely to trace reasons for taking some action, this is a key requirement in anything that requires auditing. This does not cover the entire space of applications, and if the application does not require tracing and auditing then the notion of causality has little value to the application.
4. On the other hand, there are applications that manage scenarios, like software verification, simulation, games etc - in which causality is an important abstraction, as indeed reasoning is done on partially ordered set (directed acyclic graph) of events.
5. There are several types of orders and several types of causality relations.
6. Order may be according to the time that the arrives "happens in reality" (as reported by the source), or according to the time it is discovered by the event processing system (by getting to a system's API). In some cases the order does not play any role, in other cases, such as time series that are far enough, the sequence in the system is assumed to be good enough, and in other cases, the "reality" order has to be preserved -- how orders are preserved, if it is at all posssible, is a topic for another discussion.
7. Causality can be explicit or implicit. Explicit causality is meta-data in the event processing system, it exists because somebody put it there (maybe using mining techniques) saying that if Event E1 happens, than event E2 also happens, and we can assume that E2 happened even if there is no explicit indication that E2 happened. Implicit (inferred) causality is there since E2 is an output to some computation that E1 is an input to. In the class level it can be modelled, in the instance level, it is dynamic and created at run-time.

Closing statement: there is no - "one size fits all" in event processing. For each type of applications there are types of functionality that are more or less important to be supported. Better understanding of the existing types and mapping functionality to these types, is a work we are trying to achieve, by analyzing a significant number of use cases, as one of the EPTS tasks, by a large team of volunteers from all sectors (vendors, customers, academic people). We'll have more news on this work later this year.


Hans said...

The picture of the red herring is great.

With respect to assertion 1, I would go so far as to say that even applications that do causal inference or prediction have widely varying requirements. I just don't see a way to make a statement like "all simple pattern detection have these basic requirements" while "all causal inference or advanced detection applications have those more advanced requirements".

Opher Etzion said...

Hans. You are right, classification of requirements is multi-dimensional and include functional as well as non-functional capabilities. Attempts to do 2 X 2 matrix with classification according to two properties, as analysts like, does not really reflect the reality.



Tim Bass said...

Hi Opher,

Excellent post.

Thanks for debunking Han's posts that seem to argue that advanced EP applications do not exist and using the term "POSETS" to describe parts of the concept, initialed by Dr. Luckham, somehow gives Han's a cause to be the "antiPOSET cyberstalker".

Ironically and eventually, Han's comes around to actually agree with what we have been saying for over two years; that there are a wide variety of EP applications, one side does not fit all, and we must match the goal(s) of the processing with the correct approach.

As you often correctly point out, Opher, CEP is about processing "complex events", and "complex events" are generally derivatives of other events. Looking for unknown the causality in "complex events" is very different that taking a stream of events and enriching and creating a derived event.

Cheers and best regards,


Hans said...

I think that most people have gotten the point here already, but to reiterate for Tim, who would do well to go back and read my posts without assuming that he already knows what I'm saying:

It is a mystery to me why anyone would think that I argue that advanced EP applications do not exist. I argue simply that classifying "advanced applications" as those that process POSETs is wrong. Saying that "modern EP products do not have the features of more advanced inference engines because EP engines all fail to work with POSETs" is like saying "modern cars do not have the features of airplanes because modern cars are all made of wood."

The first part of each of these statements is correct, the second part makes the whole statement absurd. Processing and even forming POSETs is not what gives something advanced inference capability. Processing and forming POSETs is easy. The hard part is determining the criteria on which to form the POSET.

Also, not all "advanced" data analysis has the goal of forming a POSET. And not every POSET comes from a useful causal inference. Similarly with backward chaining, not all backward chaining produces useful causal inference and not all causal inference uses backward chaining.

So classifying anything as "simple" or "advanced" based on these one or two criteria of processing POSETs or using backward chaining is misleading. It gives the false impression that data analysis can be classified as "red" and "blue". Would you like the red or the blue analysis, sir?

"Advanced" data analysis has to be more accurate than other methods. Clearly there are cases where sophisticated techniques produce more accurate results. That really depends on the scenario and the kind of data being processed and not on whether those techniques involve POSETs or use backward chaining.