Event Processing Thinking: policies

Showing posts with label policies. Show all posts

Tuesday, November 17, 2009

When does a derived event actually happen? - (posting II)

In the previous posting I've shown some possible anomalies when dealing with derived events. The picture above shows a snowfall as a derived event, actually where I am located, in Haifa, this is a very rare event (once every 20 years for a few minutes). There are various types of derived event, this time I'll discuss derived events of two different patterns: sequence pattern, and time-out pattern.

Example 1: The pattern is: if a sequence of events E1 and then E2 occur, derive event E3.

Let's assume that event E1 occurs at 9:00 and arrives to the system at 9:02, and event E2 occurs at 9:30, and arrives to the system at 9:31. The derived event is derived by the system in 9:33. The question is when does event E3 occurs. One can think of three logical possibilities:

I: E3 occurs when it is produced in 9:33; the rationale: since it is a virtual event, it does not occur in reality, and exists only since it is derived by the system.

II: E3 occurs when the last event that triggers the pattern matching occurs, in this case, in 9:30; the rationale: the derived event occurs when the patterns conditions are satisfied in reality, and this occurs when E2 occurs.

III: E3 occurs over the interval [9:00, 9:30]; the rationale: the derived event occurs over the interval of all participating events.

Example 2: The pattern is time-out (absence event). Example: if there is no bid for an auction by the end of the auction time, derive an event "no bidders".

Scenario: A bid was issued in 9:00 and is valid for 2 hours, in 11:00 it is closed without any bidders, in 11:02 the system issues the derived event.

We have similar three alternatives here:

I: The no bidders event occurs in 11:02, the time that the derived event is issued.

II: The no bidders events occurs in 11:00, when the "bid close" event occurs, which completes the pattern.

III: The no bidders event occurs during the interval [9:00, 11:00] --- since the "absent" event relates to the entire interval.

Like some other cases, there is no single solution that fits all cases; and the actual semantics of a specific case is a matter of policies, we see here three policies, which seem to cover most cases, but not necessarily all, that's why there is a need also to enable explicit derivation of the occurrence time of a derived event, i.e. the value of the occurrence time itself can be computed and derived.

More about temporal issues -- later.

Saturday, November 14, 2009

When does a derived event actually happen? - (posting I)

Just finished reading the book "Flash Forward" by Robert Sawyer. Science fiction was always my favorite type of literature, and my favorite writers are Asimov and Hienlein. There are science fictions writers among the following generation that stand out, and the Canadian writer Sawyer, who does not forget to give Canada a role in each of his books, is one of those. I have read several (not yet all) of his books. The best of these I read so far is the Neanderthal Parallax trilogy, which is also very though provoking besides being fascinating. "Flash Forward" book, which is now also becoming a TV series deals with an experiment that get everybody in the universe to jump forward 21 years in time for 2 minutes, this is a combination of science fiction, a book that raises some philosophical issues, and a suspender, highly recommended.

The question of time and deep temporal issues also was one of my favorite research topics, since time has physical, philosophical, and also computer science implication. Back to event processing, recently I have written the "warnings" chapter in the EPIA book, and one of the interesting question is: when does a derived update occur?

As discussed before, there are two dimensions for answering the question: occurrence time which stand for the time in which an event occurs in reality, and detection time which stands for the time in which an event is detected by the event processing system. Both of these are not obvious in the case that the event is derived. If we take the naive approach that a derived data occurs when the system computes it then we can have several anomalies. Consider the following simple example: there is an auction system, each auction has some auction context time interval, in which this auction is valid, and people are doing bids. The auction works on fairness criterion, which gives preference to people who did the bid earlier, in case of multiple bidders that made the maximal bid. The raw event is bid request, but the entry to the bid process is a derived event, since the event has to be enriched, validated, and some details added from the previous bid of the same bidders (if exists). If we take the time that the derived event actually happened as its occurrence time then we can have some semantic anomalies, as shown in the following figure:

Anomaly 1 (on the right hand side) is realized by the fact that though the bid request is done within the auction validity interval, the bid entry occurs after the auction interval ends and will not get into the auction processing.

Anomaly 2 (on the left hand side) is realized by the fact that orders of the bid requests can be reversed by their corresponding derived events and thus the outcome of this auction may not be consistent with the auctions' rules.

This is just one example that create a bias into a particular solution, however, the reality is even more complicated, since in different cases the answer to the question poses in the title of the postings may not be the same, thus policies should be used to disambiguate the semantics here.

I'll have a follow-up posting with discussion about the proposed policies for this case.

Monday, October 12, 2009

On the ingredients of pattern definitions

In the previoius posting I started the discussion about the notion of pattern, and stated that it is a function that selects an event subset; continuing to drill down on patterns, I'll say a few words about what are the ingredients of pattern definitions, what information do we need in order to perform pattern matching in a well-defined way:

Pattern type: which determines WHAT the pattern matching is looking for. There are a variety of pattern types, and I'll dedicate a posting to the pattern type I collected so far. Some examples: sequence (temporal pattern), moving north (spatio-temporal pattern), trheshold pattern (e.g. related to average of some value over a set) and more.
Participant set: the set of event types whose instances issue the pattern matching set.
Context: the context to which the pattern is associated with (actually the agent executing the pattern is associated with).
Pattern assertion: Some patterns have assertions associated with them. Assertion can determine if events are relevant (e.g. if we are looking at a sequence of two events, say event of type E1, and event of type E2, where in order to do a pattern matching we require that E1.A > E2.B, where A and B are names of attributes. There are also some pattern associated with certain pattern type, e.g. if the pattern type is a threshold pattern than there is an assertion associated with it, e.g. Average (e1.A) [over each context partition] > 40.
Pattern policy: Pattern policy determines when a matching set is going to be activated, how many times, can event count for more than one pattern, how repeated events are treated and more.

This was only in title level, and in the next postings I'll provide more information about pattern types and pattern policies.

Monday, October 20, 2008

More on the semantics of synonyms

Still lazy days of the holiday week, I took advantage of the time to help my mother, who decided that she wants to leave home and move to seniors residence located 10 minutes walk from my house, this requires to deal with many details, so that is what I was doing in the last three days.... In the picture above (taken from the residence site) you can see the front entrance and the view seen from the residence tower, on the right hand side of the upper picture one can see part of the neighborhood we are living in (Ramat Begin) surrounded by pine trees all over.

Now, holiday eve again, and this is a good time to visit the Blog again. Last time I started the discussion in the semantics of synonyms by posing a simple example of conjunction over a bounded time interval (same pattern that Hans Glide referred to in his Blog), and slightly different from the "temporal sequence" pattern.

In the previous posting I have posed the following example:

Detect a pattern that consists of conjunction of two events (order is not important) - e1, e2.
e1 has two attributes = {N, A}; e2 has also two attributes = {N, B} ; the pattern matching is partitioned according to the value of N (on context partitions I'll write another time).
For each detection, create a derived event e3 which includes two attributes = {N, C}; E3 values are derived as: E3.N := E1.N ; E3. C = E1. A * E2. B.

Let's also assume that the relevant temporal context is time-stamps = [1, 5] - and the events of types E1 and E2 that arrived during this period are displayed in the table below:

The question is: how many instances of event E3 are going to be created, and what will be the values of their attributes?

Looking at this example, for N = 2, there is exactly one pair that matches the pattern
E1 that occurs in timestamp 5, and E2 that occurs in timestamp 4, so E3 will have the attributes {N = 2, C = 24}. However, for N = 1 things are more complicated. If we'll take the set oriented approach that looks at it as "join" (Cartesian product), since we have 3 instances of E1 and two instances of E2, we'll get 6 instances of E3 with all combinations. In some cases we may be interested in all combinations, but typically in event processing we are looking for match and not for join -- that is the difference between "event-at-a-time" type of patterns and "set-at-a-time" patterns that is being used by some of the stream processing semantics. So what is the correct answer ? -- there is no single correct answer, thus what is needed is to fine tune the semantics using policies. For those who are hard-coding event processing, or using imperative event processing languages, this entire issue seems a non-issue, since when they develop the code for a particular case they also build (implicitly) the semantics they require for a specific case, however policies are required when using higher level languages (descriptive, declarative, visual etc...), policies are needed to bridge between the fact that semantics is built-in inside higher level abstraction, and the need to fine-tune the semantics in several cases. In our case we can have several types of policies:

Policies based on order of events - example:

For E1 - select the first instance; for E2 - select the last instance.
For E1 - select the last instance; for E2 - select the last instance

Policies based on values - example:

For E1 - select the highest 2 instances for the value of A ; for E2 select the lowest instance for the value of B.

These are examples only -- it is also important to select a reasonable default which satisfies the "typical case", so if the semantics fits this default, no more action is needed.

These have been examples only, in one of the next postings I'll deal with identifying the set of policies required in order to make the semantics precise.

Event Processing Thinking