Today I heard rain knocking on the window. Rain in June is quite a rare event in Haifa, but happened before. When I lived in the USA it was always peculiar to me to watch heavy rains and thunderstorms in the summer.
Looking at windows, window in event processing is an important concept, designating temporal context. we have discussed this issue in length in chapter 7 of the EPIA book, Windows can be isolated or sliding, isolated windows can start by event or fixed time, end by event or fixed time, or expire after some time offset.
Sliding windows can slide by time, or by event count, and can be overlapping or non-overlapping.
In any of these variations, every single window is a time interval. The question is what type of interval - open or closed on both ends.
In the book we mention two types of intervals:
For most types of window, we used the half open window, if the interval boundaries are denoted by Ts and Te (for start and end), then an event whose time-stamp is T belongs to the window if Ts ≤ T < Te, which says that events that occur in the interval starting point are included, and those at the interval ending point are not included. This, for example, guarantees, that in non-overlapping sliding time window, an event belongs to exactly one window instance.
For the sliding event window we used the close interval semantics Ts ≤ T ≤ Te. The rationale is that if the sliding window has a count of five events, we typically mean that all of those five events belong to this window.
Some comments here: I have heard the opinion that it is not an issue, since there are systems which create total order of events, serializing them, by having a single process that assigns time-stamps. These can be valid for some applications, but is not valid in the general case due to two reasons:
- The event that starts or ends the window can by itself participate in some EPAs that are active in this window, thus a decision is needed whether it participates or not.
- In various applications the applicable time-stamp is the occurrence time of this event as reported by its source, and not the detection time assigned by the system, thus several events can occur at the same time-point. Furthermore, even in the assigned detection time, in distributed systems there may be multiple entry points, and ensuring total order may not be cost effective.
While in the book we mentioned two of the four possibilities for time intervals, one can think of cases in which the other two may be useful, which indicates that such semantics might be required to be configurable.