Friday, May 15, 2009

More on the structure of event representation

The Pope has just completed a five days visit in Israel, with a lot of press coverage (here in the picture with the Israeli president, Shimon Peres).

Back to the less global issues, I continue the previous posting that follows chapter 4 of the book "Event Processing in Action" that Peter Niblett and myself are writing.

This time, I'll get one level deeper into the information we have found useful in describing events.

This is a figure taken from the book, showing different information that an event may contain. We partition the information into three types: header, payload, and event to event relations.

The header concept, taken from the messaging world (but used also in file systems) contains information that answers the following questions:
  • What is the event type of this particular event ?
  • Is it a composite event (i.e. composed of another event) or not.
  • What is the temporal granularity (chronon) of all time stamps in this event. For some applications the granularity of time stamps should be 1MS, in others a granularity of 1 minute is sufficient, and surplus of information is not helpful. This may be an application default which can be overridden in the level of the particular event type.
  • What is the occurrence time of the event (when did it happen in reality) ?
  • What is the certainty that this event has occurred ?
  • Is there an annotation associated with this event ?
  • What is the unique event identity generated by the system ?
  • When did the system detect that this event has occurred (note that the actual semantics of this information is implementation dependent, it may be when the event hits the system's API, when the event has been put into the engine's execution queue, or any other interpretation).
  • Who emitted this event ?
Note that most systems need a subset of this information, not really all of it,

The payload part of the information consists of attributes that provide additional information about the event, each attributes has a data type, and may have a semantic role (e.g. reference to some entity).

The event to event relations information enable to define event types as specializations or generalization of other event types, and classify a particular event to a super-type (possibly when some condition is satisfied). Also an event can have retraction relationship to another event, i.e. it is a logical cancellation of a previous event (I have referred to it in the past as converse event, but my co-author Peter Niblett, who is a very keen on terminology precision convince me that "converse" is a symmetric relation, while we mean here anti-symmetric relation).

In the book we show how these events are defined in a use case, using our definition elements language. We intend also to add to the book code samples from various products.

Thursday, May 14, 2009

On the structure of event representation

Today I took the day off, and spent the afternoon and early evening in the "Achziv Park" seen above. My third daughter Hadas is in a few days trip to the Western Galilee, and today was a day where the families can join for falafel in the park, so I have travelled there with my fourth daughter Daphna, and spent some quality time with the girls, even played ball.

Earlier in the day, I spent a couple of hours in my favorite coffee shops, starting chapter 5 of the "Event Processing in Action" book. Writing a book is a major commitment, and since I am doing it in my (imaginary) spare time, this is quite a burden on my time. I had a lunch last week with Roy Schulte from Gartner who is also writing a book (together with Mani Chandy), and he also complained that the book writing messes up all his spare time.

I was asked about the scope of the book, well - the book concentrates on pure event processing. There are some are complementary technologies like -- image processing, text analysis, speech recognition, sensor networks, statistical reasoning, machine learning that can be used to automatically generate either events or patterns (in relative small number of applications at this point), as they are really complementary technologies, they are mentioned briefly in the advanced topics chapter, the main stream of the book is about event processing .

After this long introduction , I'll turn to write about some portion of chapter four of the book which we submitted yesterday to the author. Chapter four deals with the representation of events (meta-data). There are currently no standards about representation of events, thus, we have taken ideas from different directions, to form such a model. We are looking at typed events, thus, each event type has some information particular to event type. We distinguish between three types of such information: header, payload, and event to event relations.

  • Header attributes provide information about the event - type, time granularity for the event, and times associated with the event (occurrence time, detection time), event identity and some others.
  • Payload attributes provides information about the event content -- references to entities and other attributes
  • Event to event relation provides information about semantic relations among events.
In subsequent postings I'll discuss each of them in depth. It is late -time to rest.

Tuesday, May 12, 2009

On Gartner's EPN Reference Architecture

Today is a holiday (for children, no vacation for adults..) called Lag Baomer, the highlight (besides not going to school) is that last night all children have gathered around bonfires, as seen in the picture. Fun.

Recently Gartner has published a report called "A Gartner Reference Architecture for Event Processing Networks".

On the positive side, it seems that the concept of EPN, as an underlying model for event processing is catching. The readers of the Blog may realize that I am in the opinion that we need an agreed upon conceptual and execution model for event processing (the same role that the relational model assumes in relational database, however, I never believed that the relational model per se, is appropriate also as the model behind event processing). The book I am writing now "Event Processing in Action" concentrates around the notion of EPN, and a deep dive into construction of EPN-based application.

Reading Gartner's report I found some slight differences between the way they describe EPN, and my own description. In the Gartner report they define a term called "dissemination network" that consists of event processing agents, channels and event flow among them, and then they define EPN to be a dissemination network + producers + consumers. I actually could not find any compelling reason to introduce the notion of dissemination network. According to the definition we are using, event processing network is a directed graph that has nodes for producers, channels, EPAs and consumers, and edges that determine the event flow among them. Another difference is that the Gartner report views event consumers and event producers as type of event processing agents. I have a slightly different opinions, I think that both event producers and consumers are not really event processing agents, since event processing agent is some software module that function events and may generate more events. Event consumer and producer have nodes representing them in the EPN in order to make the event flow from and to them explicitly, however, they are only proxies of the actual producer and consumer, for the event processing network, they are sources and sinks. The main difference is that EPA functionality is explicitly specified in the EPN definition, while what the producer and consumer do is "black box". We don't want to include their functionality, since we don't want to extend the event processing language ad infinitum,

Mentioning the EPIA book -- Chapter 3 is now on the Web, and can be obtained through the MEAP program, this is the last chapter in the introductory part, and deals with principles of programming with events. Chapter 4, the first in the deep dive will be sent to the publisher soon. It has been much more challenging to write, deals about what information we need to store about events -- I'll Blog about it soon.