Event Processing Thinking: event-at-a-time

Showing posts with label event-at-a-time. Show all posts

Saturday, March 21, 2009

More on event-at-a-time processing

I have borrowed this picture from Brian Connell's postings a few months ago entitled: "one-by-one can still be CEP". I have not really returned to this topic, but it is a good time to do it now, following my last posting on "set-at-a-time" vs. "event-at-a-time". Many people are used to program in set-oriented languages like SQL, thus a simple match between two entities become first creating the Cartesian products of the sets they they belong to, and then selecting the element from this Cartesian product. This is not a natural way that people are thinking about processing events. Let's look at a nice penguins in the picture, we want to trace want happen in the penguin colony, by putting observing them. Let's say that we want to observe when a young penguin stays away more than 1KM from the home glacier f0r more than an hour, which may indicate that it can get lost. Furthermore, we wish to get the alert immediately when this happens, and not at the end of any time window. Some of the "set thinkers" view the "stream processing" paradigm as something organized in which the partition of events is well defined by the notions of streams and windows, and the processing already has the input set, and all it has to do is to apply some function on the input set, and "event-at-a-time" processing as ad-hoc programming in which events arrive to some event processor who then has to hard-code the entire semantics. But this is, of course, a misconception. Let's look again at the penguin example; assuming that we are looking at the following pattern: a young penguin may be lost if it stays over a 1 KM away from the glacier for over one hour. This can be expressed by set oriented processing, as looking at observations of the penguin (let's say we watch it once every minute), and determine at the end of an hour that all observations are more than 1KM away. But -- when do we start the one hour period? the answer is -- first time that the penguin crosses the 1KM bound, so we need a notion of an event that starts a time window, actually the term context is wider than time window -- it contains here : temporal aspect (within one hour, when the one hour can be initiated by a specific event), semantic aspect : an EPA is tracing a single specific penguin, so it is associated with some penguin id, and the pattern is spatial :
all events are more than 1KM away from the glacier. Now, what is the benefit of having event-at-a-time implementation, a simple benefit: if the young penguin starting to head back then we can close this context instance, and terminate the EPA, say after 3 minutes, and don't trace this penguin anymore, until the next time it swimming far, while in the set-at-a-time we'll determine only at the end of the time window that there is nothing to detect here. Of course, the set thinkers will immediately say that we can reduce the window, so reducing the window to units of single events exactly gives us the "event-at-a-time" notion. More than that, it is not only a question of efficiency, it is also question of expressing a fine-tuned semantics. Let's look at another penguin scenario: we are now tracing lazy penguin who return to the glacier after less than 2 minutes after jumping to the water. here we have a sequence of two events relate to the same penguin within a specific temporal context ("within 2 minutes"). This is not a set operation at all, it is looking of a sequence of two individual event. True- it can be expressed in set-oriented-programming we'll have to create two streams (or one heterogeneous stream) of "jumping to the water" and "returning to the glacier", and then join them, select the appropriate instance and thus determine which members of the set matched this pattern, but this is not a natural way to think about it. While in "event-at-a-time" this can be done by just opening a context-instance for every penguin that jumped into the water, if it does not return within 2 minutes, this context-instance is closed, if the penguin returns, then there is pattern match, and the context-instance is closed even earlier.

But let's move to an example about the tune-up of semantics. Assume that we are looking for the pattern saying that the average stay in the water of a penguin is less than 2 minutes, which may indicate some laziness plague, or any other plague that makes the penguins lazy. In a set-oriented programming we'll have to define the set -- let's say a time window of one hour, thus, when the set is all accumulated we can calculate the average and match it to the threshold; however, it becomes tricky when the average is actually a moving average, thus it may be possible that if we do this calculation after 30 minutes the pattern is matched, since the average in this 30 minutes of staying in the water is 1 minutes and 56 seconds, while if we consider the whole hour, the pattern is not matched, since the average is now 2 minutes and 9 seconds. Doing the calculation event-at-a-time enables us to get the average even to set of any size, even without committing ahead on the size of the set.

This is not done in ad-hoc processing, but supported with high-level programming primitives that are sometimes easier to express than their equivalent set-oriented notation.

There are of course cases in which the set-oriented calculation makes sense -- exactly when we are doing aggregations at the end of some fixed time intervals, and in some applications this may be the main function we need -- but, I assume that we'll see more and more hybrid applications.

Last but not least -- the distinction between "simple" and "complex" event processing is considered in the fact that simple deals with events being processed "one by one" while complex process multiple events, however "event-at-a-time" is not processing "one-by-one" since in the "one-by-one" processing, an event is processed without looking at other events, and in "event-at-a-time" each event is processed individually but within the state of a certain context-driven EPA. More - Later

Thursday, March 19, 2009

On data flows event flows and EPN

Bob Hagmann from Aleri (ex-Coral8) has advocated "data flow" model as an underlying model that unifies both engines of Aleri, and contrasts it with "event delivery systems" in which programmers create state manually if needed. I am not really familiar with the phrase "event delivery system" and don't know what he refers to, but there are event processing systems that employ different programming styles from stream processing, in which states are handled implicitly by the system and the programmer does not really deal with creating states.

But -- I have no interest in "language wars", my interest these days is somewhat different -- to find a conceptual model that can express in a seamless way functionality that exists by different programming styles.

Actually the conceptual model of EPN (event processing network) can be thought as a kind of data flow (although I prefer the term event flow - as what is flowing is really events). The processing unit is EPA (Event Processing Agent). There are indeed two types of input to EPA, which can be called "set-at-a-time" and "event-at-a-time". Typically SQL based languages are more geared to "set-at-a-time", and other languages styles (like ECA rule) are working "event-at-a-time". From conceptual point of view, an EPA get events in channels, one input channels may be of a "stream" type, and in other, the event flow one-by-one. As there are some functions that are naturally set-oriented and other that are naturally event-at-a-time oriented, and application may not fall nicely into one of them, it makes sense to have kind of hybrid systems, and have EPN as the conceptual model on top of both of them...

This is the short answer. More detailed discussion -- later.

Friday, November 7, 2008

On "event at a time" vs. "set at a time" processing

Today I have broken life-long record in the length of a single session with the dentist --- more than 3 hours (I had to go out in the middle and restart the parking card which allows 2 hours at a time); this is time of setting up plans for next year and there are plenty of plans --- to be involved in 1/3 of my time in some activity, only when the number of "1/3 of your time" becomes bigger than 3, then the phrase "1/3 of your time" becomes tricky. Anyway -- today I would like to write about "event-at-a-time" and "set-at-a-time" processing, as it seems that people don't understand this issue well. We'll take it by examples:

"event-at-a-time" is a processing type in which when an event is detected, this event is evaluated against the relevant patterns to determine if the addition of this event (together with possible more events) satisfies the pattern.
"set-at-a-time" is a processing type in which patterns are evaluated against a set of events after all the set has been detected.

Examples:

look at the pattern: And (E1, E2) -- when the relevant instance of E1 arrives this becomes a potential pattern, whenever the relevant instance of E2 arrives - then the pattern is satisfied. A concrete example: E1 = the buyer received the merchandise, E2 = the seller received the money for the same transaction. The order does not matter - the transaction is closed if both of them occurred. This is an "event-at-a-time" processing since it looks at each event individually when it comes and determined what is the state of that pattern- instance.
look at the pattern: Increasing (E1.A) -- when the set of all relevant instances of E1 arrives - then the evaluation is done on the entire set. A concrete example: E1 = Temperature measurement, A = Value. When all the values (e.g. in a specific time window) arrive, there is an ability to determine if the values are indeed always increasing in time.

It is interesting to point out that patterns that are geared towards "event-at-a-time" can be implemented as "set-at-a-time", in this case, looking for patterns periodically instead of immediately, note that "set-at-a-time' does not necessarily means : time series processing, the size of the set may be unknown in advance, and the time difference between the different elements in the set may not be fixed.

It is also possible to implement "set-at-a-time" patterns using "event-at-a-time" - in an incremental fashion.

While some applications are better handled easily and more efficiently by one of these two styles -- some applications will need both, so hybrid processing will be required for those.

Event Processing Thinking