Monday, September 29, 2008

On Semantics and Race Conditions - introduction

In this Blog posting I'll touch upon an issue that requires some attention to the exact semantics.

I'll introduce the topic today -- wait a few days to see if there are comments - and then post the analysis of this case.

Given the simple application shown below:

Let's explain this simple example, since I would like to concentrate on a single issue, I'll simplify all other things to eliminate any noise.

  • There is a single event source (so no clock synchronization issues) which generates events of three types e1, e2, e3.

  • Let's also say that in our story there is a single events of each type that is published (so no synonyms issues), the table shows their occurrence time (when they occurred in reality) and detection time (when they have been reported to the system) - each of them has been reported 1 time unit after its occurrence, no re-ordering problem.

  • Events e1, e2 serve as an input to an EPA of type "pattern detection" which detects a temporal sequence pattern "e1 before e2", and when this is detected, it derives an event e4 - some function of e1 and e2.

  • Events e3 (raw event) and e4 (derived event) serve as input to another EPA of type "pattern detection" which again detects a temporal sequences pattern "e3 before e4", if this pattern is detected - create event e5 which triggers some action in the consumer.

The question is -- given the above - will the action triggered by e5 occur?, i.e. will the pattern - "e3 before e4" will be evaluated to true.

Before getting to the analysis -- I wonder what will be the results in current EP solutions:

  1. The action will always be triggered.

  2. The action will never be triggered.

  3. The behavior is non-deterministic (sometimes yes and sometimes no)

  4. Any other possibility (specify).

Please send it as a comment to this post, I'll publish an interesting analysis of this case next week.

Happy New Year.


Anonymous said...

Hi Opher - I guess it depends on what you mean by a derived event having a "timestamp". And how you caclulate derived events' "occurred" or "detected" time. For which the answers are probably context-sensitive!

Hans said...

As Paul implies, with the products that I have used, there are several ways to code for this scenario and each will respond differently.

There is of course the issue of how the "time" of e4 is calculated. But even if we fix how this time is calculated, the answer will still depend on implementation.

A survey of techniques and resulting time semantics would probably be interesting, but I do know that most products at least will not give the non-deterministic answer here.

M said...

Now this is an interesting situation ;)

For ruleCore this would trigger e5 every time - In the default configuration, flip a switch and you will never detect e5!

I suppose the main problem might be if the rightmost pattern detector sees e4 before seeing the next input which is e3.

We have a simple/primitive/efficient/naïve (pick one) solution/fix to this in ruleCore. Every event created by a pattern detector ('rule' in our world) is pushed out of the server and onto the input queue again.

So e4 will end up in the input queue as a newborn event e4', which has to created based on e4 and send back into the server again. So basically we create a new event, same content and same type, but with new id and time.

We have to do it this way to support multiple unsynchronized event sources where we need to keep every event in a sort buffer for a short while in order to resynchronize them. This allows us to receive events from slightly (the length of the temporal sort buffer) unsynchronized sources.

If we switch on internal event routing the semantics change to e5 never been detected. The e4 would be considered to be detected at the same timestamp as e2. As e3 occurs after e2 the this might be more correct as we would consider e4 to be before e3.

So this could be made to behave in two different ways depending on how the server is used. tricky indeed...

Anonymous said...

It all depends on semantics. If we take extremes:
- e4 may be an event related to an action that has started long before e1 and e2. For example, if I detect a wedding when all people come out of the chirch, the wedding has started some time ago. In this case e4 is completely out of order and related long in the past when it is generated.
- e4 may be a deduction for something that will happen and in this case e4 is well ordered compared to e1, e2 but still may be out of order compared to e3.

All this is linked to the difference between time of occuring (creation for a derived event), time of detection but also time of the happening of the thing the event is related to. It seems this is a 3 timestamps problem.


M said...

An interesting design choice is to decide at what time e4 is detected!

In this example it is detected a short while after e2.

Another option which we use in ruleCore is to consider e4 to be detected at the same time as e2. The motivation for this is that the first pattern consists of two events and when the second (e2) occurs, the pattern is considered to be detected at that exact time, and thus e4 will have that same timestamp.