Saturday, October 11, 2008

More on semantics and race conditions

In previous posting I have posed the following sceanario:

Given the simple application shown below:

  • There is a single event source (so no clock synchronization issues) which generates events of three types e1, e2, e3.

  • Let's also say that in our story there is a single events of each type that is published (so no synonyms issues), the table shows their occurrence time (when they occurred in reality) and detection time (when they have been reported to the system) - each of them has been reported 1 time unit after its occurrence, no re-ordering problem.

  • Events e1, e2 serve as an input to an EPA of type "pattern detection" which detects a temporal sequence pattern "e1 before e2", and when this is detected, it derives an event e4 - some function of e1 and e2.

  • Events e3 (raw event) and e4 (derived event) serve as input to another EPA of type "pattern detection" which again detects a temporal sequences pattern "e3 before e4", if this pattern is detected - create event e5 which triggers some action in the consumer.

I also asked the question is -- given the above - will the action triggered by e5 occur?, i.e. will the pattern - "e3 before e4" be evaluated to true?

I got a few answers to this and you can read them as comments to the original posting; as promised I am dedicating this postings to analysis of this simple case:

The first thing to discuss is the semantics of "temporal sequence". There are two possible types of semantics for temporal sequence, which I call "detection time semantics" and "occurrence time semantics".

  • The detection time semantics is implemented in various languages and means that the temporal order is the order of the time-stamps in which the "event processing platform" detects that this event occurs; if there is a single thread of such detection, then the events are totally ordered, otherwise, there may be several events with the same "detection timestamp".
  • The occurrence time semantics also implemented in various languages means that the temporal order is the order of the time-stamps that are provided as part of the event information, and designate - when this event happend in reality. There are some complexity of synchronization of time in multi-producer environment, however, in this example we assume a single producer (I'll write about multi-producer cases in another posting).
  • Note that this two order relations may not be identical.
  • There is also kind of hybrid solution ("total order semantics") -- the semantics is really "detection time" semantics, but in order to allow events that arrive a bit late to take their proper role, the events are queued at a buffer (and not considered as detected) until time-out to let "out of order" events to arrive and re-order the buffer, and then send the events according to the buffer order.

Getting back to the example - in the small table on the bottom left-hand side of the figure above, there are occurrence and detection times of e1, e2, e3. For e4 there is only detection time - e4 is different from {e1, e2, e3} by the fact that it is a derived event and not raw event like the other three. The question is "what is the occurrence time of a derived event" ? -- there is no clear answer for it - there are several possible answers:

  • In the derived event case the occurrence time = detection time, since this event is not real event but a virtual one, thus, its source is the EPA that creates it, and it occurred when created. In our case it means that occurence-time (e4) = 4.
  • Its occurence time is the occurrence time of the last event that completed the pattern - since the participating events in the creation of e4 are {e1, e2} and e2 was the last that completed the pattern, occurrence-time (e4) = occurrence-time (e2) = 2
  • Interval semantics: The event e4 occurs in the interval in which all the participants occur, which is this case means occurrence-time (e4) = [1, 2].

The phenomenon of multiple semantic interpretations apply to various other semantic decisions in the semantic of event processing language, and the preferred solution is to provide the user with semantic "fine tuning" policies, under which the user can chose the desired semantics, instead of "hard code" a certain semantics (using the most common one as a default), this is one of the benefits of using COTS for event processing, since it is quite difficult to think about such issues when developing EP manuaully using conventional language.

The semantics of the second "temporal sequence" (e3, e4) is thus:

  • According to "detection time" semantics -- both have detection-time of 4. As such the sequence condition is not satisfied. However, if we impose total order by a single thread, this may create race conditions between the two events. In this case it is recommended to use a consistent priority policy - either breadth first (the raw event always comes first) or depth first (the derived event always comes first) to ensure deterministic result.
  • According to the "occurence time" -- it depends on the policy chosen, but according to all interprerations - e4 occurs before e3 - thus the temporal sequence is not satisfied.

Bottome line: the temporal sequence (e3, e4) is satisfied if:

  • The temporal semantics is detection time
  • It is implemented by total order
  • The total order policy is "breadth first" - namely priority for the raw events.

In all other cases the temporal sequence will not be satisfied and the corollary action will not execute.


harvey said...

Apologies, but I am a bit confused.

First, I understand the need to be flexible in semantics, and I am a huge fan of using COTS to keep an overall implementation fairly standards based and consistent.

However, in the case you examined --
semantics of the second "temporal sequence" (e3, e4):

I do not understand why we do not have the option of considering the "derived event" [e4] (from the perspective of Pattern Detector (e1, e2), to be viewed as a "raw event" from the perspective of Pattern Detector (e3, e4). If so then we would have a 'occurrence time' of 4 which is one after e2 (last in to trigger detection). Then it would be detected at 5 in the second pattern detector, and e5 is generated at 6.

Or am I missing something?

In any case, this is making me think :-)

Opher Etzion said...

Hi Harvey,

Sorry if I confused you, I have such a tendancy to confuse people sometimes...

As for your question - let's analyze the option that you propose - what you propose is :

1. chose occurrence time semantics.
2. set occurrence time (e3) = 4 -- setting the occurrence time of a derived event to be its detection time. Thus you prefer the interpretation that I have phrased as:
"In the derived event case the occurrence time = detection time, since this event is not real event but a virtual one, thus, its source is the EPA that creates it, and it occurred when created. In our case it means that occurence-time (e4) = 4".

This is a totally valid interpretation, however, note that this is not the ONLY interpretation. The rationale behind the interpretation you have chosen is since e3 is a "virtual event", it occurred when it is created by the system, regardless of the participant events.

However, one can claim that the derived event occurred when the last event that completed the pattern occurred, and thus according to that interpretation
occurence time (e3) = occurrence time (e2) = 2

When we cannot say that there is a unique semantically valid interpretation we can either decide arbitrarily on one of them (and then if the customer needs to have the other interpretation -- then tough luck... need to hack around it), or let the customer chose the valid interpretation using policies, which is what I propose. I'll write more about policies for semantic tuning.

Hope I did not confuse you even more --



harvey said...

Ok, I just wanted to make sure the scenario I described was valid. I understand the need for a variety of policies and corresponding semantics, and COTS would be needed to keep it all straight.

One problem I am interested in is a meta-problem where a variety of organizations are in an EPN, most likely using a variety of COTS. It is still desirable to have a view of the entire network, even though the implementation is fragmented [meaining there is no single "platform"]

So in this case, maybe there could be some sort of "EPN-Policy-xml" standard (perhaps OASIS) where the various COTS components could ingest the policy and one COTS (owned by some agreed governing body) could be the steward for the top level EPN policy.

I think this is where you are going. In this case, we need to think about possible specialization or extension of policy in each EPA, yes?

This is a good topic. And starting with a simple case is good.

Opher Etzion said...

The idea is that policies can be local, thus, any node in the EPN can have its own policies and thus the federated COTS work on the local level. Your question hints at imposing global policies in a federated scenario, this can be done using a meta-level language definition and translating it to the local implementations. Indeed a meta-language is the solution I believe in (for various other reasons as well).

Stay well,


harvey said...

Exactly. So if we (perhaps EPTS) were to recommend languages (or at least that a TC work on an OASIS XML spec), there should be two levels to consider - a local policy(for EPA), and a federated policy (for EPN)

Take care,

Hans said...

Hi Opher, I wanted to point out that one could use occurrence time here but the semantics become muddy.

The event e4 could be stamped with the time of the first event that triggers it (e1) and not the last event (e2).

On one hand, this is problematic because e4 will seem to occur before e2 and that makes no sense. But in the greater scheme of things, maybe I don't care because I'm never comparing the times of e2 and e4.

I can report from the field that this technique is used frequently. Although to illustrate the reason, the scenario would be expanded to more than one event producer and include the problem of delayed arrival.

Opher Etzion said...

Hi Hans,

It is true that there are many scenarios in which the ocurrence time semantics is the only one that makes sense - otherwise one can get to wrong results. About the derived event e4 be considered as occurred when the first participant in its derivation (e1) occurred -- I am not sure when this intepretation makes sense, do you have an example? I think that it may make more sense that the ocurrence time of e4 will have an interval semantics instead of time-point semantics. There are some claims that "complex events" happen within an interval and not a time-stamp; David Luckham's favorite examples for the interval-based semantics is the economic crisis (well - he meant the previous one, but we can also talk about the current one, this is an event that has not terminated yet..)

stay well,


Hans said...

So a very basic situation where we would want to use the time of e1 rather than e2: when e1 and e3 come from the same system, but e2 comes from a different system that is on a WAN. Not only are the occurrence times of e1 and e3 now directly comparable, but we can not be sure even that e2 will arrive before e3, which makes using detection time a risky bet at best. This scenario is not as contrived as it might seem, it can easily happen.

So it is not really a matter of interpretation but of practicality - I use e1 because it works. And there are no negative side effects, because I will never compare the times of e2 and e4 - so no one will ever notice that e4 magically occurred before one of its components did.

This reminds me of a problem that I have with the "event a followed by event b" line of thinking, but I will put that into a blog post.