Monday, April 21, 2008

On Event Clouds

Marc Adler in a couple of his blog postings wondered about support of event clouds in the product he chose, and at the end has settled in the opinion of the vendor (Mark Tsimelzon from Coral8) who claims that "cloud" is an abstract term, and in reality we are facing multiple streams that may or may not be ordered. The response comes from Greg-the-architect who is in "everybody are confused" mode recently. Greg-the-architect claims that vendors have sinned in disinformation towards their customer to hide their inabilities to cope with hidden causal relations.

So - what can I contribute to that party ?

First - let's look again at the defintion of event cloud in the glossary:

Event cloud: a partially ordered set of events (poset), either bounded or unbounded, where the partial orderings are imposed by the causal, timing and other relationships between the events.

Clouds became a fashionable term, we hear a lot about cloud computing in the recent year, that we all feel like flying in various clouds.

What about the clouds/streams debate ? -- one of the differences that are stated is that a cloud is a poset (partially ordered set) while a stream is totally ordered. I agree that this terms come from two different origins, the question is if indeed a cloud can be supported by multiple streams, while people focus the discussion on whether streams are always totally ordered or can also support non-ordered set of events - this is not really an interesting distinction. I agree here with Mark Tsimelzon that a stream can also be un-ordered, this is up to implementation. If one wants to make a distinction between "streams" being ordered and other things that can be unordered, I propose the term "pipes" - where ordered pipe is a stream. But the ordered/unordered does not make the main difference. Reading the cloud definition again, it is the notion of cuasality that is important for having a cloud. The "partial ordering" in the cloud is a result of causality relations between events. I have discussed in a past posting the notion of causality, support in causality (including pre-determined causality that may be result of mining, or inference system) is the enabler for the support of clouds (i.e. the partial order vs. no order).

Cloud is indeed the collection of events that an enterprise is faced with, and this cloud may be implemented by a collection of pipes (or streams, if you wish) and support in causality relation.

We can also look at a (small) cloud, which is the collection of all events that a single EPA (Event Processing Agent) is facing as an input - and this is just a subset of the big "Cloud" - with its own pipes and causality relations.

Now - to the most important question - besides the game in terminology, is it important to make these distinctions?

As stated before, the world of event processing is not monolithic, there are some applications which need total order, while other applications need partial order, and other applications don't care about the notion of order at all. Causality relations are required by some applications, either if the pre-defined relations between the events play a role in the event processing, or if there is a need to trace back the lineage of a certain event / action. For other applications it may be just an unnecessary overhead. So my (2 cents worth of) advice to the people who are looking at CEP products - is to look at their requirements and determine if they need causality, and partial ordered set. It may be that the support of totally ordered stream is totally sufficient for their applications, if it is not - they should look for if and how causality is implemented. I hope that I have not confused you even more... More - later.

1 comment:

Hans said...

With respect to "It may be that the support of totally ordered stream is totally sufficient for their applications, if it is not - they should look for if and how causality is implemented."

This may be just a technicality, so forgive me if I'm making an issue over nothing. But I'd have to say that I don't understand where "support for totally ordered streams" comes into this. Who says that determining causality is disjoint from processing ordered streams? AFAIK, the whole ordered stream ideas started as a way to show that simply because you observe events in a particular order does not mean that this is the order that you want to use when determining causality. But does any software really force you to form a graph of causality based on the order that you received the events? If so, I've never seen it.

If every event comes with an ID of the event that "caused it" then I can store these events easily with pretty much any EP software and have my causality. It makes no difference whether there was an ordered stream involved.