Thursday, January 29, 2009

On state processing and event processing


Yesterday, I got visited by my (now- Ex) Master student Elad Margalit, about his thesis regarding dynamic setting of traffic light policies I have written before. For some strange reason he decided that I deserve a gift for his graduation so he brought me a "flip clock" that looks like this. Strangely enough it switches the labels to show the correct time, all people who somehow got to my office yesterday thought it is a cool gadget, and it is now located in front of my eyes.

Today's topic is some echo to the discussion started by my friend and ex-IBM colleague Claudi AKA patternstorm on the forum in the complexevents site. Claudi has defined state as a sequence of events, while several others answered that this is not really the definition.

Before getting to definition, there was also very concrete motivation that Claudi mentioned -- if we equate state to "sequence of transitions" than state processing becomes a kind of event processing. I think that it is important to discuss this statement.

While state is not exactly a sequence of transitions, it is true that the value represented by the state can be reconstructed if we apply the series of transitions on an initial state, and considering that the initial transition is null and the first transition creates the state, we can obtain all information as part of series of transitions.

Let's take a simple example. The state represents the value of my balance in the bank checking account. The transitions start from a one that opens this account, going through deposits, withdrawals, commissions of the bank etc.. I have opened my current checking account in 1984. Assuming that I would like to process this state, such as getting an alert everytime that my account balance becomes negative (unlike the USA, in Israel overdraft is a common practice). I can make it an event processing activity by taking all transition from 1984 and reconstruct the state with each new transition, however, this is not an efficient way to do it, first I'll need to keep all the historical transitions forever, second, it is much more cost-effective to maintain the balance as an entity, and process it.

State processing and event processing are complementary, in states processing we are processing the snapshot of the present time, while in event processing we process the history of transitions. If I want to get alert on overdraft -- this is state processing, If a compliance officer looking for money laundering suspect is seeking if three deposits with more than $10,000 each were done to my account within a single week, he is doing event processing. In reality we need both, but each of them has other techniques for its cost-effective processing.

More on this topic -- later.

9 comments:

Hans said...

This brings up (somewhat indirectly) the idea of off-line event processing. Mostly we hear about EP in terms of detecting this-or-that in real time. But there is opportunity to improve off-line processing as well with event semantics and such. I would like to suggest this as a subject for a future post.

Opher Etzion said...

Hello Hans.

True - we even had some real cases where we have done event processing in off-line. I'll write about it in one of my next postings.

cheers,

Opher

Hans said...

In finance, off-line EP is pretty common. Of course no one calls it EP, but plenty of off-line processing uses code that treats records like events. Take the bank example from this post. Although some account calculations happen in real time, a surprising amount happens off-line (not to mention activities like auditing). I'll bet this common in plenty of places when one stops to think about it.

Marco Seiriƶ said...

Interesting, this is how we do it internally in ruleCore. The whole system can be single stepped by processing events from a stream or event store and it ends up in the same internal state every time.

It gives us several nice capabilities, like off-line processing, so I obviously think this is a good idea.

Hans said...

That is what vendors mean when they say that their engine is deterministic.

RuleCore could definitely find off-line use cases. ILOG is frequently used off-line, for example.

Opher Etzion said...

Hello Hans.

First - ILOG is doing state processing and not event processing according to the classification provided.

There is also a sense to do event processing off-line as Marco mentions, I'll write about it soon.

Deterministic behavior is another dimension and deserves also a special discussion, in a title level - deterministic behavior is that when a certain input arrives (in a certain order) then the output in different runs will be identical and predictable. I may write more about it soon and provide some examples where the difficulties are.

cheers,

Opher

Hans said...

I only mention ILOG because those use cases are a start on looking for off-line EP use cases.

Looking closely at many off-line data-processing use cases, one finds elements of event processing creeping in. Over time, it will become more clear how the various terms blend together.

Marco Seiriƶ said...

Yes I think there's off-line use cases that ruleCore could be used for. We have just not done any yet, but I think there's some comming up when we start to talk more about simulation and better support for "what if.." testing...

cke said...

Offline processing is indeed a very common practice. Very often, it is just the right thing to do to keep the history of the events (or of the records, the requests, etc.) and process them when the machine resources are available and the transaction state is stable. Offline processing is this way assimilated to be batch processing. Batch processing leads to some consideration on the optimization, deployment and transactional aspects: do I process 100 at once, or 1000 once, per day or per week? How can I parallelize processing, what is the organizational model? What do I do if the processing of an item fails in my batch? Etc.

Event processing can be of course offline. I like to give the example of credit card fraud detection. If you scan all the transactions during the night, you may find fraud pattern. For some fraud patterns, you are not real-time bound. You can discover the fraud this night, or tomorrow, or next week. Depending on the time constraint, the algorithm that you put in place will change, even significantly.

I confirm that ILOG JRules deals with state, and not (really) events. ILOG JRules engine implements the RETE algorithms, which by design deals with states. RETE is especially known to keep a stable state with not much changes in the data. But what we should succeed to do, is a good combination of events and states. Events do not, and cannot carry much data as they are sent across event infrastructures, they must be complemented with state data coming elsewhere, that’s the way to take really accurate decisions. Currently, ILOG JRules technical language provides several primitives to deal with events, they have been designed to suit specific market needs. Today, they ought to be placed under a bigger picture and be re-thought.

Regards,

Changhai Ke