Thursday, March 19, 2009

On data flows event flows and EPN


Bob Hagmann from Aleri (ex-Coral8) has advocated "data flow" model as an underlying model that unifies both engines of Aleri, and contrasts it with "event delivery systems" in which programmers create state manually if needed. I am not really familiar with the phrase "event delivery system" and don't know what he refers to, but there are event processing systems that employ different programming styles from stream processing, in which states are handled implicitly by the system and the programmer does not really deal with creating states.

But -- I have no interest in "language wars", my interest these days is somewhat different -- to find a conceptual model that can express in a seamless way functionality that exists by different programming styles.

Actually the conceptual model of EPN (event processing network) can be thought as a kind of data flow (although I prefer the term event flow - as what is flowing is really events). The processing unit is EPA (Event Processing Agent). There are indeed two types of input to EPA, which can be called "set-at-a-time" and "event-at-a-time". Typically SQL based languages are more geared to "set-at-a-time", and other languages styles (like ECA rule) are working "event-at-a-time". From conceptual point of view, an EPA get events in channels, one input channels may be of a "stream" type, and in other, the event flow one-by-one. As there are some functions that are naturally set-oriented and other that are naturally event-at-a-time oriented, and application may not fall nicely into one of them, it makes sense to have kind of hybrid systems, and have EPN as the conceptual model on top of both of them...

This is the short answer. More detailed discussion -- later.

5 comments:

Hans said...

Actually, that is not quite true about SQL languages. That StreamBase/Oracle paper was an example of how different SQL languages take different approaches.

All SSQL languages can be represented as a graph, which is a sort of EPN.

As you say, the one-by-one languages pass events straight through the graph edges. Meaning that there is no notion of uniqueness or order in the edges. But there might be a node that does implement the concept of ordering or uniqueness. If you push many elements through an edge at once, the result will be no different than if you push them one at a time.

In contrast, the set-at-a-time languages have some logic attached to the edges. If you push many elements through an edge at once, you may find them bunched, sorted and even deduplicated before they arrive at the destination node.

Opher Etzion said...

Hello Hans.

Your assertions about event-at-a-time languages is not accurate, I'll blog about it in detail, but some points in shorts:

1. The notion of order is materialized in the pattern computation and not in the input stream
2. Uniqueness is materialized through the notion of context.
3. In some patterns you may get different results if you do "event-at-a-time" computation, since in this cases order functions like "first" or "last" or threshold functions can yield different results. A very simple example: if we are looking to provide an alert when the average temperature over some time interval is more that T, if we report the events one by one, and calculate the average incrementally, we may have match for this pattern, however, if we are looking at all measurments at once, we don't have match for this pattern.. More about this in one of the next postings.

cheers,

Opher

Hans said...

I may still not get the idea with event vs set-at-a-time, so I look forward to that post.

Charles Young said...

It's a fascinating discussion. Another relevant area to consider is the comparison and contrast with approaches taken by set-based pattern-matching production engines (e.g., Rete rules engine). A stream-based EPA can be considered to be a rules engine, as David Luckham makes clear in 'Power of Events'. He also describes traditional rules engines (he appears to be talking about Rete engines, though he doesn’t explicitly state this) as 'heavyweight EPAs'.

Now, I don't think David is correct in some of his arguments about rules engines and EPNs, but that is another matter. A Rete rules engine is orientated towards set-based pattern matching. It exhaustively finds all possible matches over the poset of all 'facts' currently asserted to the engine. Those facts may represent events, and we are beginning to see temporal logic introduced into Rete engines to handle sliding windows and automatically garbage-collect expired events. Nevertheless, the exhaustive nature of Rete networks is different to pattern matching using recognised event consumption modes (i.e., contexts including recent, chronicle, cumulative and continuous). Maybe 'exhaustive' should be considered to be another type of consumption mode? Or maybe this indicates a more fundamental difference between what you term "set-at-a-time" and "event-at-a-time". I'm inclined to think the latter.

There is a lot of work going on in various places to add temporal logic and event processing features to Rete engines. It will be interesting to see if "event-at-a-time" contexts can be introduced effectively.

woolfel said...

If anyone is interested, I wrote up my experimental results here.

http://jamocha.svn.sourceforge.net/viewvc/jamocha/morendo/doc/Temporal_Logic_Extension_for_RETE_Algorithm.doc?view=log

I like to point out many business rule engines that implement RETE like JESS and JRules support reactive rules (aka ECA) and don't use conflict resolution.

In my own explorations, I implemented a hybrid reactive + conflict resolution approach, which should be more powerful than engines that only do one or the other. There's a wealth of knowledge out there on pattern matching algorithms that could benefit CEP engines.