Tuesday, June 14, 2011

More on the two types of sliding windows

In a comment to my previous postings on window boundaries, I was asked why do we need two type of interval semantics:  half closed for the time-oriented sliding window, and closed for the event-oriented sliding window (one of them counting time, the other counting the number of events of certain types).  

The question is: why can't we use the half-closed interval semantics also for the event-oriented sliding window,  let's say we have a sliding window that counts 5 events of one type, the 6th event that serves as a starting point for the next window, will also terminate the previous window.

The answer:   this solution is not really equivalent to the one with closed on 5 events.   

Let's take an example:

Instance 1 occurs in 10:02
Instance 2 occurs in 10:03
Instance 3 occurs in 10:13
Instance 4 occurs in 10:14
Instance 5 occurs in 10:17
Instance 6 occurs in 11:01

According to the closed interval semantics, the interval is [10:02, 10:17],  according to the half closed interval semantics on the 6th instance, the interval is [10:02, 11:01) , which means that event that occurs in 10:35 belongs to the window according to the first interpretation, and does not belong to the window according to the second interpretation.

Furthermore, if in the end of the window, there are some derived events emitted, or action triggered, this will now occur in 11:01 and not in 10:35 -- which again may create other problems. 

In some applications the distance between events is very small, since the assumption is that the events of the types that bounds the windows are very dense, thus the distinction between the two becomes marginal,  however, this is not the general case; in the general case the distance between the 5th and 6th instances of the events may be quite substantial, this is true for many applications.  

This reminds me that in the course that I've taught, the students implemented projects using various products available on the market today, and one of the teams (I will not disclose the product name) has written in its report that indeed the window is closed only when the next event arrives, thus when they debugged their system they added dummy event, otherwise the window would never close.   

More window related discussion - later


Marco said...

Dear Opher,

thanks for replying to my comment on your previous post. I am, however, not completely convinced that the first point you are making is valid.

If there was an event at 10:35, the window would already close at the occurrence of this event. Using the same course of argument, I could say that an event occurring at 10:15 would be included in your closed interval semantics, therefore the window would contain more than 5 events.

Considering the point in time, when events are generated, you are completely right. This is different between both window semantics and this difference might be a major one, depending on the application type.

So, thanks for your clarifications!

Warm Regards

Opher Etzion said...

Hi Marco.

I'll post a follow-up posting to answer your question, the main key is the events may have various distinct roles.