Monday, January 5, 2009

On event processing and some interesting queries

Some people have returned from the vacation with a surplus of energy, otherwise I cannot explain why my inbox today was full of mails from the same thread of discussion in the everlasting Yahoo CEP interest group trigerred by a question sent by Luis Poreza, a graduate student from University of Coimbra in Portugal. I am taking a liberty to re-write the question since it was phrased as a question in trading system, thus, some of the responders answered in trading related stuff that did not help to answer Luis' question, so getting as far away as possible from the stock market, I will base the rewriten question in the fish market. So the story is as follows: the price of 1 KG of fish is determined according to the hour, the demand, the supply and the general mood of the seller. In 10:50 he made this price as 71, then in 11:15 the price was down to 69 no more changes by 12:00. There is a computerized system that works in time windows of one hour starting every hour. The request is to find out for the time window 11:00 - 12:00 whether the price of 1 KG of fish was ever > 70. The claim is that intuitively the answer is yes, since the price in the interval [10:50, 11:15] was 71, but if we look at all the events that occurred at this window there was no event with value > 70, thus current "window oriented" tools will answer --- no.

There have been plenty of answers, some even tried to answer the question, for example by adding dummy events (one at the end of the interval ? every minute? ) with the value 71.

However -- I am going to claim the following assertions:

(1). The requirement given is not an event processing pattern.
(2). Attempts to treat it as event processing patterns are not very useful.
(3). It is in fact a kind of temporal query
(4). There may be a sense to have the capability to issue temporal queries as a response to events (AKA retrospective event processing) but this has to be done right.

Assertion one - the requirement is not an event processing pattern. Event processing pattern is a function of events, it is no surprise that Luis found some difficulty to phrase it as such. Let me take two other examples that look syntactically the same and try to understand what is the problem here:



The government agency example: A government agency known for its long queues in getting service tries to monitor the lenght of the queue. Periodically some clerk goes out and counts the number of people waiting in the queue. In 10:50 he found 71 people in the queue, in 11:15 69 people in the queue, no more samples by 12:00. Now the question is -- whether there has been some point in the time window between [11:00, 12:00] in which the number of people in the queue > 70.

Before starting the discussion, let's look at another example, the bank account example.
In 10:50 Mr. X has deposited $30, his previous balance was $41, which made his balance $71;
in 11:15 Mr. X has withdrawn $2, his balance was set to $69.

The fish market example looks from syntax point of view exactly like the queue monitoring example, in both cases we have events in the hours 10:50, 11:15 with attributes 71 and 69 respectively. However, they are not the same, the reason is that the price in the fish market is fixed until changed, while the length of the queue may have been changed several times up and down since the event here is only a sample and does not cover all events. Both of these events observe some state (price or length of queue), but the semantics is quite different. If we'll use the solution of dummy event for the queue case then the value will probably be wrong, furthermore, we cannot really answer the query in the queue case in "true" or "false", yet, in reality, periodic sampling is a totally valid type of events. Moreover, if we look at the bank account example, it looks very different from the fish market example -- it has two types of events, and the events do not observe a state, but report on change, and report the change value ("delta"). Thus looking at the two events of deposit and withdrawal we'll not be able also to answer the question, but knowing the state (balance of the account) and the delta (for the deposit and withdrawal) we are getting something which is semantically similar to the fish market example.

What can we learn from these examples? first that the property "the value is the same until it is changed" is not a property of an attribute in event, it is the property of the state (data) that may be created or updated by events. This is true for some state, this is not true for others. Solution given based on the fact that a human knows the semantics of this state, and writes ad-hoc query. However this is processing of the state, based on its semantic properties, and not of the events.

Assertion two -- Attempts to treat it as event processing is not useful.

In the past I've blogged about the hammer and the nail. There is a natal tendency of anybody who has a product to try and starch its boundaries. This may also backfire, since if trying to do some functions that this product is good at, and not doing great work can overshadow the good parts of the product. Solution like adding "dummy events" is a kind of hacking. It abuses the notion of event (since dummy event did not really happen), moreover, given the fact that this is just ad-hoc query, and there can be many such queries, in order to cover all them, we may need exponential number of dummy events... Anyway- event processing software is just a part of bigger picture, and instead of improvising, hacking or get to this functionality, it may be more advisable to use a product with better fit.


Assertion three -- This requirement is in fact a temporal query. I will not get into temporal queries now, but the actual query is over the price of 1 KG fish as changed by time. It is an existential query -- looking if some predicate holds somewhere in the interval. Other example of temporal queries can be: was there any day during the last 30 days in which the customer has withdrawn more than $10000 in a single withdrawal.

And this example brings us back to assertion four --- there may be a sense to couple event processing software with temporal queries. Example is that we have an event that makes a customer "suspect" in many laundering, but we need reinforcement by looking at some temporal queries in the past - like the one written above... I'll write about this type of functionality in a later phase.

Well - it is 1:15 AM, so I'd better take some sleep, tomorrow is again a busy day. So conclusion -- not everything that looks simple to do manually is simple to be done by a generic type of thinking, second -- event processing software should concentrate on doing event processing right, and not doing other stuff wrong... Some follow up Blog postings -- later

4 comments:

Hans said...

With respect to assertion one, I believe that you are saying that if a state is maintained and subsequently queried in an ad-hoc fashion, then we are not talking about event processing. Is that right?

I believe that I could find some case that is clearly EP but involves an ad-hoc query on state. Maybe I'm wrong.

Anyway, I do not see an urgent need to differentiate what is EP from what is not. The boundary never seems to become quite clear and we do not yet seem to have comprehensive enough theory to make practical use of such a distinction.

Opher Etzion said...

Hello Hans. As said - ad-hoc query for the state created by event processing is a temporal query it queries the state and not the event, thus it is not event processing. I assume that one of the things that we need to agree is what is event processing, and an underlying theory is needed, and will probably be there.

Products which do event processing can in addition support all kinds of actions, one of them can be temporal queries of state, if required by the applications it supports, there are also other relevant actions that can be supported, which does not make them event processing either

cheers,

Opher

Richard Brown said...

Hi Opher.

I found this discussion very interesting - and had been trying to think how I would solve it with AMiT or WBE or plain SQL.

However, it was only with your "event" vs "state" insight that the pieces fell into place... thanks for writing up your thoughts in more detail.

Richard (IBM UK)

Hans said...

Ok, I understand your idea as a starting point for defining EP.

> event processing software should concentrate on doing event processing right, and not doing other stuff wrong

As you said, though, there are many cases where the main functionality required for processing events involves temporal queries. So it is possible that for software to be sold across a broad range of uses, it needs to include both kinds of functionality.