Monday, June 13, 2011

On the boundaries of windows

Today I heard rain knocking on the window.  Rain in June is quite a rare event in Haifa, but happened before.  When I lived in the USA it was always peculiar to me to watch heavy rains and thunderstorms in the summer. 

Looking at windows,  window in event processing is an important concept, designating temporal context.  we have discussed this issue in length in chapter  7 of the EPIA book,    Windows can be isolated or sliding, isolated windows can start by event or fixed time, end by event or fixed time, or expire after some time offset.   
Sliding windows can slide by time, or by event count, and can be overlapping or non-overlapping.  

In any of these variations, every single window is a time interval.    The question is what type of interval - open or closed on both ends.    

In the book we mention two types of intervals:

For most types of window, we used the half open window,   if the interval boundaries are denoted by Ts and Te (for start and end),  then an event whose time-stamp is  T belongs to the window if  Ts ≤ T < Te, which says that events that occur in the interval starting point are included, and those at the interval ending point are not included.   This, for example, guarantees, that in non-overlapping sliding time window, an event belongs to exactly one window instance.   
For the sliding event window we used the close interval semantics Ts   T  Te.  The rationale is that if the sliding window has a count of five events, we typically mean that all of those five events belong to this window. 

Some comments here:  I have heard the opinion that it is not an issue, since there are systems which create total order of events, serializing them, by having a single process that assigns time-stamps.     These can be valid for some applications, but is not valid in the general case due to two reasons:

  1. The event that starts or ends the window can by itself participate in some EPAs that are active in this window, thus a decision is needed whether it participates or not.
  2. In various applications the applicable time-stamp is the occurrence time of this event as reported by its source, and not the detection time assigned by the system, thus several events can occur at the same time-point.  Furthermore, even in the assigned detection time,  in distributed systems there may be multiple entry points, and ensuring total order may not be cost effective.   
While in the book we mentioned two of the four possibilities for time intervals, one can think of cases in which the other two may be useful, which indicates that such semantics might be required to be configurable. 


Ronen Vaisenberg said...

It is particularly annoying when rain event happens to occur when your window is open :-)

Anonymous said...

What do you say about those kind of windows that contain events based on the state of entities? For example, events that come from all green trucks in Haifa?

Are those windows or is is another type of concept?

Opher Etzion said...

Hi Marco. Any event processing agent (rule in your terminology) lives within a context. The context can have several dimensions - window relates to the temporal dimension, the green trucks are a segmentation direction, a context can be a combination of several dimensions; e.g., you want to find some patterns on trucks, by sliding window and color, thus context-instance is a combination of the two dimensions. See our tutorial in DEBS'10 on contexts for more thorough discussion:



Anonymous said...

Is the second interval type you mentioned really necessary? I suppose, you could always use the first type to model the second one, e.g. if the window is to contain 5 events, the arrival of the 6th event "closes" the window. Then, only one window semantics would be needed, simplifying the description of processing behavior.

Opher Etzion said...

To anonymous (I prefer that people would identify themselves when posting comments)

One can indeed define event-based sliding window in the half open semantics, however - there are two problems with that.

1. The 6th event can arrive long time later (say 2 hours), which means that to do the derivation/action associated with the window end, one needs to wait for the 6th event to happen (there are indeed some systems who work this way, and people workaround by creating dummy events).
2. When people define event interval of 5 events, intuitively they mean that it includes all 5 events, defining an interval of 6 events with the semantics that the last of them does not count is less intuitive.