Thursday, June 16, 2011

On Different roles of events in context

Continuing the series of posts on sliding windows,  following some questions and comments.

An event can have multiple roles:  it can participate in determine the context boundaries (start and end of a window), in an event processing function (e.g. aggregation or pattern matching) and also can serve both roles.

Let's look at the following example (which I thought of while driving to work today - not a long thought, I live 3.7 KM from the office).  

Consider a sliding window of every 100 cars that pass through some  point in the road (assuming there is a sensor that creates for every passing car an event with the velocity of the car).

Assume that also there is a  traffic light for pedestrians that is activated when a person presses a button, and each time this traffic light turns green for the pedestrians, there is an event created. 

Our applications consists of two aggregation EPAs that derive the following events:

1. Create a derived event with the average velocity per window.
2. Create a derived event with the count of pedestrian crossings per window.

Now, recall that the window is a non-overlapping sliding window that counts 100 passing cars.

For derivation 1:  The "car passing event" has three roles: an instance of this event initiates  the window, an instance of this event terminates the window, and the aggregation consumes instances of the same event.   The boundaries decision determines whether the 1st event and the 100th events are included in the aggregation function, and here the intuition says that the answer is positive for both, thus it makes sense to use the close interval semantics.

Now going to derivation 2.   Here the event participates in the derivation - pedestrian crossing, is different from the events that determine the boundaries of the window.

Assume that the 1st instance of "car passing event" happens in 1:00, the 100th instance is 3:45, and the 101st instance in 4:15 (no much cars driving at that hour).   

In this case, a pedestrian crossing that occurs in 3:42 is counted in this window,  a pedestrian crossing that occurs in 4:18 is counted in the next window, pedestrian crossing that occurs in 4:01 is not counted in any window, since the semantics of the sliding window don't enforce them to cover all time points.  

This semantics is OK, if that what we meant,  if not - we have to use different semantics.   More - later.


Roman said...

Intuitively I would have interpreted the "sliding" notion as overlapping windows, with each car starting a new window. How would you call such a window, with a size of 100 elements, and a "hop" of 1 element?

Opher Etzion said...

Hi Roman. Sliding window can be either overlapping or non-overlapping, it is defined by two parameters: size of the window, and periodicity (or frequency, or hop as you call it). In your example the size is 100, and the periodicity is 1, which means that there are 100 concurrent instances of this window. In non-overlapping window (the original example in this posting) there is at most one single window active at any point of time. Both types are useful (and being used) for different cases.