Event Processing Thinking: temporal databases

Showing posts with label temporal databases. Show all posts

Sunday, April 15, 2012

On temporal extension to SQL:2011

I have written before about the recent return to the bi-temporal databases, in conjunction with DB2. In the 1990-ies was the first attempt to create bi-temporal extensions to SQL, at that time there was a language war, some of it is reflected in the book that I have co-edited, published in 1998. Now after some attempts, SQL:2011 does include support in bi-temporal databases. The terminology was changed from the original terms. What was called in the original version - "valid time" is called in the SQL version "application time', an what was called in the original version - "transaction time" is called in the SQL version "system time".
More details about the SQL extension can be found in the overview presentation that Craig Baumunk uploaded to slideshare. As I have written before, temporal database is vital for maintaining historical events, and thus the importance of this standard, and the supporting databases to event processing application is noticable

Friday, August 19, 2011

On temporal databases and DB2

I have written before about temporal databases in this Blog, and in general I worked on temporal databases around 15 years ago, and co-edited the book whose cover is shown here in 1998. Temporal database is noted by the fact that academic people tried to drive standard in this area: TSQL2 lead by Rick Snodgrass and his colleagues. At that time it did not succeed since none of the DBMS vendors had an interest to see it as a high priority, these were the days were the Internet emerged along with XML, and the DBMS vendors had many other things to worry about. However over the time some DBMS vendors have adopted temporal capabilities within their products. Oracle already implemented temporal extensions supporting TSQL in its DBMS product. Recently IBM produced its own version of temporal database within DB2. It seems that there is now traction for temporal databases in various industries. Today my colleague Guy Sharon attracted my attention to a new article on IBM DeveloperWorks entitled "going back in time" describing the DB2 temporal capabilities, the traditional dimensions in temporal databases: transaction time and valid time got converted to new names: system time and business time (I make a note to write a post about the overuse of the term "business" ). These two dimensions enable to ask queries like: what was the value of a certain attribute in 7/7/2011, as observed from 8/8/2011. This can have different answer from the observation time of different days, since the knowledge about the past is changed in time. While the title of that article talks about "going back in time", and indeed using temporal databases is typically viewed about recording the past, temporal databases can also be used about recording the future, this was noted in a work published in 1994 by Arie Segev, Avi Gal and myself entitled "retroactive and proactive database processing" (I don't think that online version is available). Since we are dealing over the last year in proactive event-driven processing, the issue of looking at predicted future events that can be revised with time is very useful, and we are indeed looking on temporal database techniques for that. More on that - later.

Sunday, October 31, 2010

Back to temporal databases

In 1998 I have edited a book of articles about temporal databases (together with Sushil Jajodia, and Sury Sripada), this followed a Dagstuhl seminar we held in 1997 about temporal databases, an area that was hot at that time in the research community, and somewhat cooled off. Today a Master student I supervised took her final exam on "final work" (which is less than a thesis, a track that require to take more credit work), and did an implementation of a temporal database model from a paper in this book that was co-authored by Arie Segev, Avi Gal and myself. This is somewhat more expressive model than the TSQL based models, and had its own interesting featured like: ability to freeze and unfreeze data, ability to distinguish between modification and revision, ability to deal with simultaneous value. In fact some of these ideas found themselves into our work on event processing (e.g. policies when there are repeating events that may match the same pattern).

Temporal databases as an area started in the Israeli army. Kobi Ben-Zvi who went from the Israeli army to do PhD in UCLA has invented the area, by formalizing the terms, and there has been a lot of work later in the research communities in the 1990-ies. There was even big fight about how to extend the SQL standard to support temporal databases between two parties, I don't really remember the details, in the book you can find the position of the two sides of this battle, as the Dagstuhl seminar was one of the battle fields. The end result is that it never became part of the SQL standard, partly because of the fights, and more importantly since at that time the DBMS vendors have higher priorities on their mind -- e.g. Web related stuff, XML data etc.. There are some features, but it did not get fully into the mainstream of databases, although there are quite a few of specialized implementations. One of the future directions of event processing will involve getting back to temporal databases as an infrastructure, which is the area of retrospective event processing, I'll write more about it in the future.

Saturday, March 13, 2010

On events versus data

The word "data" always reminds me of the android from Star Trek The Next Generation whose name was data. The word data (in computing) typically is very general and refers to anything the is represented on digital media, the picture of data above is also a piece of data, like many other things. The word "event" also has a broad term which means something that happened.

Recently Paul Vincent wondered in his Blog about the difference between event and data, as some people think that events are footnotes to data. Since by the definitions above, obviously event and data are not really the same, I'll try to talk about the touch points among them, since those are the reason of misconceptions.

There are various touch points between events and data:

Event representation contains data. Event is represented in the computing domain by "event object" or "event message" which usually is also is called "event" as a short name. This event representation includes some information about -- what is the event type, where it happened, when it happened, what happened, who were the players etc... Example: the event is "enter to the building", the event's payload contains information that answer questions such as: what building? who entered? when ? and maybe more. The payload of the event is data, it may be stored (see event store), or just pass by the system.
Data store can store historical events. Event representations can be accumulated and stored in a data store, for further usage. There are large data stores that collect weather events. Note that in order to navigate in historical events, these events may be stored in a temporal database an area that I've dealt with in the past, sometimes if the events are spatial then it have to be stored in spatiotemporal database.
Database can be event producer. In active databases the event were database operations; insert, modify, delete and retrieve, in this case the fact that some data-element has been updated or accessed is the "something that happens" (which may or may not reflect something that happens in reality), and the database acts as event producer and emits event for processing by an event processing network. Note that actually all event producer contains some data that is turned into event, for example transaction instrumentation like what IBM has done in CICS as event producer.
Derived events as database updates. An event processing application take events from somewhere as input, does something, and creates derived events, and send them somewhere, this is all event processing is in one sentence, a derived event created in this process may go to an event consumer, the event consumer may be a DBMS or another type of consumer whose action is to update some data store.
Event enrichment by data during the event processing. During the event processing operations, sometimes enrichments of events is requested, let's return to the event of a person enters a building, the event processing application deals with security access control, and needs to know what is the person security clearance, this information is not provided with the event which provides only identification of the person, and there need to be some enrichment process in which an enrichment event processing agent accesses some global store, in this case reference data, to extract the clearance value and put it inside the event for further processing.

Thus the main issue is not the "versus" issue but the various relationships between the two terms.

Saturday, September 12, 2009

On temporal aspects of event processing

In the past I was involved in work on temporal databases, in the picture you can see a 1998 book about temporal databases that I co-edited with Sury Sripada and Sushil Jajodia. Although there were some attempts to create substantial extension to SQL with temporal capabilities, and move temporal databases to the mainstream. This did not work, and there are several reasons, the event processing area provides a second chance for these idea to come to the mainstream now, as event processing have strong relations to temporal issues. Bob Hagman from Aleri (former Coral8) has recently written some survey of implementation alternatives related to time aspects in the Aleri Blog. In the DEBS 2008 language analysis tutorial we had dealt quite briefly with the topic of time. Earlier this year I have written a chapter in the upcoming book of the book "Handbook of Research on Advanced Distributed Event-Based Systems, Publish/Subscribe and Message Filtering Technologies; edited by Annika Hinze and Alejandro Buchmann"

This chapter is entitled: "Temporal Perspectives in Event Processing".
Here is the chapter's main topics:

Temporal dimensions: in temporal databases we dealt with the temporal semantics of a collection of snapshots (states), in event processing we deal with the temporal semantics of events (transitions). Are the temporal dimensions the same ? do they have the same semantics ?
The "instantaneous" issue -- do event occur over a time-point or an interval, and if it is interval what does it mean from computational point of view ?
Time granularity -- in temporal databases we introduced the term "chronon" which stands for the time granularity that makes sense for a particular use. This idea is also applicable to event processing, for different events, different chronons make sense.
Temporal contexts: the term "time window" in stream processing is a kind of a temporal context. What kinds of temporal contexts are required, and what is the computational implications of them. I'll write more about contexts soon, as this is the topic of chapter 7 of the EPIA book.
Temporal patterns: "complex event processing" is about finding patterns among collections of events; some (but not all) of these patterns are temporal in nature -- what are the temporal oriented patterns ?
Temporal properties of derived events: An event processing system derives events as result of its processing. What is the time properties of the derived events? this is a rather tricky question that deserves a discussion.
Ordering events: for some temporal patterns, knowing the order of events is important. What are the issues associated with keeping such an order, how out-of-order events should be handled ?
A related issue is "retrospective events" -- what happens if events that relate to the past are detected, where the assumption that they did not occur already triggered some processing ?

Issues of time in distributed environment -- clock synchronization, time-zone handling, time validity for mobile clients --- are all applicable for event processing.

As written, this is an outline of topics surveyed at that chapter, I'll write more about some of them in the future.

Saturday, April 25, 2009

On Revision

Saturday morning, and I am spending some spare time (well -- ignoring my huge to-do list..) in reading the autobiography of Shmuel Tamir, who has probably been the most influential lawyer in Israel, as well as a political leader whom I always had great respect to (I don't admire people).

Today I would like to write about the notion of "revision" and relate it to event processing.
This is inspired, but not a direct response to a thread of discussion started by Peter Lin in the complexevents forum, under the name: mutability and aggregation.

Revision is somewhat different from modification; in modifications fact are modified, in revision they are revised. For example: if John Smith moves from the USA to Canada, then the facts about John Smith are modified, while if, by mistake it was recorded that John Smith lives in USA, where in reality he always lived in Canada, this is correction of recording mistake. Some people may wander what is the importance to make distinctions between the two ?

The first use of "revision" that I came about was in AI, talking about "non monotonic logic", the rationale is that using "classic logic" one can reason about the universe just if there is perfect knowledge, so the example used is that although birds typically can fly, however there are some exceptions -- Penguin does not fly, Ostrich does not fly, bird with broken wings cannot fly etc..
Let's say that Tweety is a bird and we don't know anything else about it, according to classic logic we cannot say whether it flies, however, according to the various non monotonic logics, we can say that since birds typically fly, we can assume for any practical purposes that Tweety flies, as long as we are ready to withdraw from this assumption when new information (such as: Tweety is a Penguin) becomes available, in that case we may need to retract all the assertions that were inferred directly or indirectly from the revised assertion.

Later in life, I have worked on temporal databases; one of the motivations of temporal databases have been to issue "as-of" queries, meaning -- looking what was known from a viewpoint of a certain time point in the past. For example -- if we investigate possible malpractice of a physician (I heard that the national sport of Americans is to sue their physicians) then in order to determine whether a physician made a reasonable decision we need to know what was the information available to the physician at the time that he made the decision. In order to achieve that facts cannot be deleted or changed, but we need an "append only" database, the distinction between "modification" to "revision" is important for the decision analysis, there may be a difference between -- the fever was high in the next measurement, or if the fever was high also in the measurement before the decision, but it was reported wrong and this information has been revised later. Eleven years ago I have co-edited a book about temporal databases which (among other things) discuss these issues.

Now, something about revisions and event processing. Recall that an event is something that happens, and it is reported to an event processing system using its projection which is also known as event (sometimes: event object, event message). An event that happens in reality cannot be modified or deleted, if it reflects something that happened. However, since when go to the projection on the processing system, again, if we assume that the knowledge is not perfect then we can have several cases of revision:

1. The event really did not happen, but it was reported by mistake that it happened, and the mistake was realized later.
2. The event happened, but it was not reported, and this was realized later.
3. The event both happened and reported, but some information associated with the event (reported through the event's payload) had wrong value due to error that was corrected later.

I'll post soon a continuation that discusses the implications of revisions on the processing of events. More - Later

Sunday, August 24, 2008

On Event Stores and Temporal Databases

I am an old-fashioned guy who carries handkerchiefs, like this one, anywhere he goes, it is handy for multiple usages, anyway - while in the past, all department stores in Israel carried handkerchiefs and it was quite a popular product, for some reason, it went out of fashion, and I have hard time to renew the inventory of handkerchiefs, and in this sense, I wish I could step for a minute into the past, buy two dozens of handkerchiefs and return. In the past, I have been involved in work around temporal databases and even co-edited a book in this area. Temporal databases had two major goals:

(1). Keep historical data, and enable easy retrieval of this data

(2). Enable to issue queries "as of" any point in time, i.e. issue query that takes into account the information that was available at a certain point in time (not as seen from "now") - again, returning for the past.

One may wonder why am I writing about temporal databases today, well - the issue of temporal databases is coming back when thinking about "event stores", I know that some of my database colleagues don't like the term "event store" or "event repository", since it does not include explicitly the word "database", but for me, using DBMS is just a possible implementation, while others, such as grid cache are also possible - but this is a topic for another discussion.

Anyway - why do we need an "event store" - in some cases we need to maintain historical events and use them, in some cases even apply pattern detection on past events. For auditing purposes we may also want to issue "as of" queries. Note that temporal representation of events can be done according to multiple temporal dimensions (see discussion about temporal dimensions of events). One of the characteristics of temporal databases are that they are "append only" databases, meaning: database records can be added, but not modified or deleted; modification and deletions are logical operators that create other instances, keeping the old ones. This is linked to one of the properties of events - immutability, which is actually a controversial property that still needs discussion about - in what conditions it is needed. Temporal databases seem to be a proper way to represent historical events.

Some concluding comments:

(1). Current DBMS do not support temporal databases as primitive, although temporal databases have been built as a second layer above them.

(2). Not all events need to be persistent for historical processing, this is a property of event-type, and its retention policies. Different events need to be persisted for different purposes.

(3). The issue of what language should be used to process "event stores" is also a matter of opinion, some believe that SQL is the answer (however, for some patterns it is an awkward language), there is an attempt to extend the SQL language with pattern extensions, here I will quote a wise person, Paul Vincent, who wrote in a footnote to this posting : This will be especially good news for those who like their SQL statements to run to multiple pages… Another option is to use on-line pattern language that is used for on-line patterns, and translated it to SQL (or one of its variations).

There are several issues that still need deeper discussion - but enough for today.

Wednesday, December 19, 2007

On deleted event, revised event and converse event

First, congratulations to my student Ayelet Biger, who has successfully taken today her M.Sc. thesis defense exam. Ayelet's thesis topic has been - Complex Event Processing Scalability by Partition which deals with parallel execution of CEP logic, when there are complex dependencies among the different agents. I'll discuss this issue in one of the later postings - we still need to compose a paper for one of the coming conferences on this thesis. Ayelet is my 17th M.Sc. student that has been graduated (together with 5 Ph.D. students makes it the 22nd thesis exam). Most of the students have done theses on - active databases, temporal databases (my past interest areas) and in the last few years to event processing. Supervising graduate students is a great way to work on new ideas that I don't have ability to work on in my regular work, the only thing that is needed are three more hours in each day...

Today's topic is inspired by a recent blog that I have recently read by Marco Seiriö. Marco is one of the pioneers in EP blogging, I've started reading his blog in January 2006, when he started the blog as "Blog on ESP", however at some point his blog became "Marco writes about complex event processing", another evidence that the name ESP has disappeared. Anyway, in his Blog, Marco talks about event model, I'll not discuss event model today, but concentrate in one interesting point that Marco raises about "undoing events". This is indeed a pragmatic issue with some semantic difficulties. There are systems in which events can be deleted, and some actions can be triggered by the event deletion. However, event is not a regular data and cannot be treated as such, since event represents something that happens in reality, then conceptually events are "append only" - in database terms, one can only insert events, but not modify or delete them. Deleting events also blocks the way from the ability to trace decisions/actions or have retrospective processing of the events. So - when in reality we need to delete/undo/revise events:

when event is entered by mistake - typically not the event itself, but some details in the event attributes, we'll need a possibility to revise event.
when we wish an event no longer to effect the processing.
when the event itself expired or we'll not need it anymore, and don't need to use it in any other processing - including retrospective.

The first case is a revision case - if we are in an "append only" mode, then the way to do it is to enter another event, and have the possibility that it will override an existing event (or set of events) for the purpose of processing. Example: somebody sent bid for an electronic auction and realized that one of the details (say: the price he is ready to buy) is wrong, then he can add another bid that will override the first bid. Why not delete the original bid ? it may be possible that the original bid is already in process, and the overriding cannot stop this process, even if not, there is a possibility that for retrospective processing we'll need to reconstruct past state which includes the original bid (these considerations are actually not new, we have thoroughly discussed these issues within the temporal database community a decade ago when we (Sushil Jajodia, Sury Sripada and myself) edited a book about temporal databases research and practice

The second case is even more interesting, but similar in type of thinking, here we would like to eliminate an event from taking effect, this can be done by sending a "converse event" that reversing the effect of the event - e.g. cancel bid. The implementation problem is that this event, and maybe its descendant events may have being flowing all over the event processing networks, with some even getting out from the EPN with actions triggered, some in process, and some are part of a state, but have not been processed yet (e.g. since a pattern has not been detected yet). Theoretically there is a possibility to apply something similar to "truth maintenance system" in AI that includes also the action and compensate for all actions, but this complicates the system, so recommended only when it is critical to do it (I'll discuss such cases in another postings), when the event has not gone out from the EPN, it is still possible to stop it, most system does not provide a language primitive to do it globally in an EPN, and recently I have watched a concrete customer case, where they had to do it manually.

The third case is the "vacuuming" case - when an event is no longer needed (in agents' state, in the global state etc..), I never got deep into this issue, but thought intuitively that it is a relatively easy problem; however, when this issue has been discussed in the Dagstuhl seminar last year, the claim was that the general issue of event vacuuming is still an open question.

I'll stop here now -- spent enough time on this one... more - later

Event Processing Thinking