Showing posts with label temporal semantics. Show all posts
Showing posts with label temporal semantics. Show all posts

Saturday, September 25, 2010

On the duration of an event

I have neglected the blogging for a while, returned from my trip in Asia, planning for my next business trip to USA (I am travelling too much, I hope for a non-travelling period after that, but one can never know), and also took some days off for the Succot holiday.    Yesterday I traveled with most of my family to Tel-Aviv, to "Beit Hatfutsot", which stands for "Diaspora house", and documents the life of Jewish community over the history in many countries.    Here is an artifact from the exhibition:











There was also an exhibition of Andy Worhal painting notable Jewish persons,  one of the pictures is of Golda Meir, the only Israeli Woman prime-minister (time for the second one?) 
The VLDB conference also uploaded pictures from the conference, so here are two pictures - one from my tutorial, and the second showing me in the first raw (it was not really the first raw, but it was the first captured by the camera) listening to the keynote talk:


While I have been away there were some Blog posts by Paul Vincent that worth focusing upon, I have already commented briefly to this one, but want to have longer reaction about the issue of event duration that was raised by Paul.   


In most of the models events are considered as instantaneous, occurring within a single time point,  the temporal database glossary from 1998 puts "instantaneous" as part of the definition of event, the rationale is of looking on event as transition between two states, and transition in most models takes zero time,    A few years ago when we started the discussions about terms, I've pointed out the temporal glossary as a source for event definition, and David Luckham issued a strong objection to that definition, claiming that no event is really instantaneous,  even simple events like the "aircraft is landing" takes more than zero time, while events that are composed of other events - "complex events" - like the 1929 crisis (now we can talk about the 2008 crisis) is compose of many events and occurred over an interval.   


This is, of course, true, yet it is more convenient from computational point of view to deal with discrete time points than in intervals, furthermore, some systems have detection time semantics, looking at the time-stamp in which the event entered the system, rather than the time it occurred, this is the reason that we find time point semantics in most systems.


We can look at the following cases:



  1.  The event really occurs within a time point,  e.g. time series of sensor measurements, or stock quotes.  There is indeed an interval among two successive events, but this relates to the state between the intervals and not the events themselves.
  2. The event occurs within an interval, but the granularity of our time computation is bigger than the interval, thus we can approximate the interval to a time point.   Example:  the granularity we are interested is an hour, thus even if an event occurs within several minutes, we can still approximate it to the closest hour. 
  3. The event occurs within an interval, and it is important to process it with an interval semantics, since we would like to see it relationship to another time interval (e.g. temporal context).
  4. The event occurs in an unknown time-point that is bounded by an interval, there is some probability (e.g. uniform distribution) that it happened in any point of time within the interval.  In VLDB there has been a paper by Yaneli Diao and her students entitled: Recognizing Patterns in Streams with Imprecise Timestamps    Note that in this paper there are also some references to interval based semantics (of type 3).
  5. Derived events are another type of events whose temporal semantics may be tuned.   For example:  the derived event "frustrated customer"  is being derived when a customer approaches a call center the third time about the same topic,  the question is whether the customer is frustrated only when approaches the third time, or the customer is frustrated over all the time since the frustrating event occurred until it is fixed. Furthermore,  derived event may also indicate an event that will happen in a future interval.   I'll write more about this issue in the future.

Bottom line:  the event processing systems of the next generation should support both time point and interval semantics along with uncertainties (Paul also had posting about "fuzzy patterns" on which I'll write in the future).


Wednesday, March 10, 2010

Revisiting race condition with FFD example

In the past I have written about race conditions and this triggered some responses. We recently realized that in the example we created for the EPIA book (the Fast Flower Delivery and has got already around ten different implementations, six of them can be viewed on the book's webpage, some more will be added) there is an case that if will not be handled carefully may yield wrong results due to race conditions. Here is the case:

There is an aggregate EPA per driver and day that collects assignment events for a driver and in the end of the day creates a derived event which counts the number of assignment per driver, there is a second EPA per day that collects all the drivers count at that day and calculates mean and standard deviation for the number of assignments per active drivers in that day; there is a third EPA, again per driver and day, which gets the derived events from the first two EPAs and calculate for each driver its deviation from the mean, in standard deviation units. These three EPAs are all aggregation type EPA which has some order among them, until now -- no problem. Now, the issue is that all these calculations occur at the end of the day, and have causal dependencies. If we are not careful, the first EPA calculates the count per driver at the end of the day, but until it finishes the calculation the time is say, 12:01, so the result is classified to the next day, but it is required to calculate the statistics for this day, and then if it gets into the statistics of the next day, then we get some inconsistency in the system. Obviously a naive implementation will get wrong results here. There are various ways to handle it and ensure correctness, however the main issue is whether the developer needs to be aware of it while designing the application, or the compiler that takes the definition of these EPAs and creates the actual implementation should be the one which will do the job. My opinion is that if the developer will have to take care of such things in hard coding, the life will be quite difficult, as this is only one case of race condition, and it is better that it will be transparent to the developer. This will eat the cake and have it too --- both using high level tool that makes the programming easier and lower the total cost of ownership, and fine tune the semantics in a way that require typically dedicated, and even complicated programming. More about other aspects of semantic fine tuning - later.

Saturday, November 14, 2009

When does a derived event actually happen? - (posting I)



Just finished reading the book "Flash Forward" by Robert Sawyer. Science fiction was always my favorite type of literature, and my favorite writers are Asimov and Hienlein. There are science fictions writers among the following generation that stand out, and the Canadian writer Sawyer, who does not forget to give Canada a role in each of his books, is one of those. I have read several (not yet all) of his books. The best of these I read so far is the Neanderthal Parallax trilogy, which is also very though provoking besides being fascinating. "Flash Forward" book, which is now also becoming a TV series deals with an experiment that get everybody in the universe to jump forward 21 years in time for 2 minutes, this is a combination of science fiction, a book that raises some philosophical issues, and a suspender, highly recommended.

The question of time and deep temporal issues also was one of my favorite research topics, since time has physical, philosophical, and also computer science implication. Back to event processing, recently I have written the "warnings" chapter in the EPIA book, and one of the interesting question is: when does a derived update occur?
As discussed before, there are two dimensions for answering the question: occurrence time which stand for the time in which an event occurs in reality, and detection time which stands for the time in which an event is detected by the event processing system. Both of these are not obvious in the case that the event is derived. If we take the naive approach that a derived data occurs when the system computes it then we can have several anomalies. Consider the following simple example: there is an auction system, each auction has some auction context time interval, in which this auction is valid, and people are doing bids. The auction works on fairness criterion, which gives preference to people who did the bid earlier, in case of multiple bidders that made the maximal bid. The raw event is bid request, but the entry to the bid process is a derived event, since the event has to be enriched, validated, and some details added from the previous bid of the same bidders (if exists). If we take the time that the derived event actually happened as its occurrence time then we can have some semantic anomalies, as shown in the following figure:


Anomaly 1 (on the right hand side) is realized by the fact that though the bid request is done within the auction validity interval, the bid entry occurs after the auction interval ends and will not get into the auction processing.
Anomaly 2 (on the left hand side) is realized by the fact that orders of the bid requests can be reversed by their corresponding derived events and thus the outcome of this auction may not be consistent with the auctions' rules.

This is just one example that create a bias into a particular solution, however, the reality is even more complicated, since in different cases the answer to the question poses in the title of the postings may not be the same, thus policies should be used to disambiguate the semantics here.

I'll have a follow-up posting with discussion about the proposed policies for this case.

Sunday, November 8, 2009

On challenging topics for event procesing developers and users

Spent much of the weekend in working on the EPIA book, time is getting closer to finish, and now it is the last 1/3 of the book. While in the first 2/3 of the book we concentrate on explaining what event processing is, and going step-by-step on the different ingredients of building applications, the last part of the book deal with some implementation issues, focus on challenging topics, and our view for the event processing of tomorrow. The chapter that I worked on in the last few days - chapter 11 (has nothing to do with bankruptcy), deals with challenging topics for event processing developers and users. This means -- topics that the developers and users have to pay attention, since: there are issues that can influence the quality of results obtained from an event processing systems, and the current state of the art does not have magic bullets to resolve them. In this postings I'll just provide the list of topics discussed in this chapter, I'll write about some of them in the future, here is the list:
  • Occurrence time that occur over intervals: Events typically occur over intervals, but for computational reasons it is convenient to approximate it to a time-point, and look at events in the discrete space; however, for some events this is not an accurate thing to do, and interval-based temporal semantics should be supported, along with operations associated with them.
  • Temporal properties of derived events: For raw event, we defined occurrence time as the time it occurred in reality, and detection time, as the time that the system detected its existence. What are the temporal properties of derived events? there is no unique solution to this question.
  • Out-of-order events: This topic is the topic most investigated among the challenging topics, however, current solutions are based on assumptions that are sometimes problematic. This problem is about events that arrive out of order, where the event processing operation is order-sensitive.
  • Uncertain events: Uncertainty whether event has happened, due to malfunction, malicious or inaccurate sources
  • Inexact content of events: Similar to uncertain events, some content in the event payload including temporal and spatial properties of the events may not be accurate.
  • Inexact matching between events and situations. Situations are the events that require reaction in the user's mind. This is in getting us back from the computer domain to the real-world domain. Situation is being represented as a raw or derived event, but this may be only approximation, since there may be false positives and false negatives in the transfer between the domains.
  • Traceability of lineage for event or action, this gets to the notion of determination of causality. Since in some cases there are operations in the middle of the causality network outside the event processing systems boundaries (e.g. event consumer who is also event producer) causality may not be automatically determined.
  • Retraction of event: ways to undo the logical effects of events, sometimes tricky or impossible, but seems to be a repeating pattern.

More about some of them - later.


Wednesday, October 7, 2009

On composite contexts

Today is a happy day to the Israeli scientific community it was announced that Professor Ada Yonath from the Weizmann Institute won a Nobel prize in chemistry, this adds to two Israeli scientists that won a Nobel prize in chemistry a few years ago, so Israel is a chemistry super-power. Computer science was not exist when Nobel decided on prizes, the equivalent is Turing Awards, that, if I am not mistaken, have been already awarded to three Israeli scientists, so we are not bad in this area either..

Back to event processing thinking --- getting progress on the EPIA book, yesterday we had two hours review with the editor on editorial stuff around the last three chapters, and upon revision, it will get to the 2/3 review. Our target date for publication is now April 2010, and this will probably be the final target.

Anyway, to continue my previous posting on contexts, I would like to discuss the notion of context composition. Recall that context is grouping of events based on one of the following: time, space, state and segment, for either grouping in order to apply operations on this group, or make distinct behavior for distinct groups. Composite context -- as its name suggests is just a multi-dimensional context, i.e. it contains cross-section of several contexts.
In the picture below this is a composition of segmentation ("per customer") and time (in this case fixed sliding temporal interval - per hour, each square is a context partition.


This is just example of combination, there are other useful compositions such as: spatio-temporal context: partition with one dimension is time and one is location oriented, or space and state contexts combination, example: spatial context == within the city of Trento, state of weather = {sunny, cloudy, rainy, snowing}, in each of these partitions other agents are applicable.

Somebody asked me -- what is the benefit of using the context abstraction anyway --- the answer is, like any other abstraction -- it saves work. The same application can be written with much less code and is much simpler to develop, maintain and understand --- the use of context is quite useful in this sense, see also some discussion by Marco. Next -- I'll write more about the next chapters on event processing patterns, stay tuned.

Thursday, June 4, 2009

On temporal semantics of events - or when has the shimpent not arrived ?




In the early 199o-ies, my home away from home, has been Berkeley, where I stayed for a joint work with Arie Segev on temporal databases. In one of the weekends I have strolled along the famous SF Fisherman's wharf, and there was some store for left handed people, since I am part of the deprived minority of left handed people, I was curious and entered the store, among the different items there (mostly not very practical), I saw this clock, if you notice, it is a backwards clock, which goes anti-clockwise. I am sure that the owner of the store was right handed -). Anyway, I recalled this clock, when working on a final version of a paper entitled "Temporal perspectives in event processing" that has been accepted recently for publication, and re-read the paper (as any paper, it is written, submitted, and then after a few months a review arrives and the author has to be reminded what it was, revise according to the comments, send back, and so on, until it is either accepted or rejected), and thought that temporal semantics of events can be a good topic to write about here. The temporal semantics of a backwards clock is, of course, different than that of the regular clock, and this brings me to the temporal semantics of derived events. Some background: event may have two time-stamps (or intervals) associated with it: occurrence time and detection time. Occurrence time is the time that the event happened in reality, detection time is the time in which the event processing system detected the event message sent to it. It is easier to make the processing of the events (when did they happen ? in what order ?) according to the detection time, however, for some applications, this may yield incorrect results. There are several issues around obtaining the correct occurrence time, but let's assume that we know how to do it. While the occurrence time of a raw event (events that has arrived from an external producer that assigns the value) is explicitly provided, the question is what is the occurrence time of derived events. Let's take a simple example: In May 2nd, 10:30 the customer John Galt has issued an order for books, with a guaranteed delivery of 48 hours (see my story with Amazon in its early days as a footnote to this postings). In May4th, 10:30 Mr. Galt looked at his (forward going) watch and said: "the shipment has not arrived by its deadline". The fact that he has not reported on arrival by the deadline caused the event processing system to derive the event "shipment did not arrive", which is a time-out event (or non-event event as some vendors call it). Now the question is WHEN did this event happen ? the detection time is easy, when some computational process derived the event and emitted it to the event processing system then the detection time is set. Let's say that this happened in May 4th in 10:32. The occurrence time is more tricky. Actually I can think of three different interpretations:

1. The occurrence time of the "shipment not arrived" is the same as its detection time, which means May 4th, 10:32.

2. The occurrence time of the "shipment not arrived" is the deadline of when it should have arrived, in this case, May 4th, 10:30

3. The occurrence time of the "shipment not arrived" is the entire interval of the 48 hours, since the shipment did not arrive during this interval [May 2nd 10:30, May 4th 10:30].

What is the right answer for semantics ? there is no right answer, as some more cases in event processing, the system designer should chose among these (and may be other) alternatives.

More about temporal semantics -- later.


Footnote: A story from the early days of Amazon.

I was an early customer of Amazon, buying, science fiction books through the web (I still do it). Typically it took 3 week for a shipment to arrive to Israel, so once after three and half weeks in which the packaged did not arrive, I've sent Email to Amazon customer service to ask about it. Their response was surprising -- we don't know what happened, we are re-sending you the books. After two more days I received the original package, and since I thought that may be the substitute package still can be stopped, I send another Email to Amazon friendly customer service, and got event more amazing response -- after we issue the reservation we cannot control the rest of the process-- so please keep the extra book with our compliments. At that time I thought that this company is not going to survive... Of course, since then they have much better logistic system... and I have two copies of each of the books in this shipment.