Friday, May 15, 2009

More on the structure of event representation

The Pope has just completed a five days visit in Israel, with a lot of press coverage (here in the picture with the Israeli president, Shimon Peres).

Back to the less global issues, I continue the previous posting that follows chapter 4 of the book "Event Processing in Action" that Peter Niblett and myself are writing.

This time, I'll get one level deeper into the information we have found useful in describing events.

This is a figure taken from the book, showing different information that an event may contain. We partition the information into three types: header, payload, and event to event relations.

The header concept, taken from the messaging world (but used also in file systems) contains information that answers the following questions:
  • What is the event type of this particular event ?
  • Is it a composite event (i.e. composed of another event) or not.
  • What is the temporal granularity (chronon) of all time stamps in this event. For some applications the granularity of time stamps should be 1MS, in others a granularity of 1 minute is sufficient, and surplus of information is not helpful. This may be an application default which can be overridden in the level of the particular event type.
  • What is the occurrence time of the event (when did it happen in reality) ?
  • What is the certainty that this event has occurred ?
  • Is there an annotation associated with this event ?
  • What is the unique event identity generated by the system ?
  • When did the system detect that this event has occurred (note that the actual semantics of this information is implementation dependent, it may be when the event hits the system's API, when the event has been put into the engine's execution queue, or any other interpretation).
  • Who emitted this event ?
Note that most systems need a subset of this information, not really all of it,

The payload part of the information consists of attributes that provide additional information about the event, each attributes has a data type, and may have a semantic role (e.g. reference to some entity).

The event to event relations information enable to define event types as specializations or generalization of other event types, and classify a particular event to a super-type (possibly when some condition is satisfied). Also an event can have retraction relationship to another event, i.e. it is a logical cancellation of a previous event (I have referred to it in the past as converse event, but my co-author Peter Niblett, who is a very keen on terminology precision convince me that "converse" is a symmetric relation, while we mean here anti-symmetric relation).

In the book we show how these events are defined in a use case, using our definition elements language. We intend also to add to the book code samples from various products.


Marco Seiriƶ said...

Do you think that event relationships are a property of the events themselves or something that is decided by the one looking at the events?

Could we have different relationships depending on the context the events are used in?

For someone, one particular event might be a considered to be subclass of "warning" and for others it might be "fatal error", all depending on your perspective.

Opher Etzion said...

Hi Marco.

You are right, unlike the regular semantic data models in which generalization and specialization are absolute terms, in the case of events they may be conditioned based on context or any predicate, and may be asolute. This is still a property of event type, and is used to determined for which event instance how to classify for processing.



Marco Seiriƶ said...

If the information about semantics are attached to the event of its type, would it not be hard for different observers to assume a varying semantics when it comes to generalizations for example?

Not sure if I think in the right way here, but I sometimes (I have not made up my mind of this one) think that it is the private information of the observer which decides how event instances and event types are related.

Thus I would think It would be wise to decouple this information from the events.

Opher Etzion said...

Hi Marco. You, of course, can have a model in which event is classified to event type only in run-time, furthermore, different subscribers to the same event can classify it differently. It may be appropriate for applications whose main function is event dissemination. However, for different types of applications, it is easier to process if they are typed, and there is a possibility to create patterns/predicates based on attributes value. So there is a trade-off here. Most applications that I came across had events which are uniquely typed.



Hans said...

I am wondering about your use of the term header here. I think you are using this term to mean the fields that should be defined on every event, while the payload is the application specific part.

This use of "header" has become popular, and for the most part, the conflicting meanings (which I'm getting to) are easily differentiated based on context. But since you are writing a reference book, I thought I'd bring this up.

The word header implies that it is some kind of record that comes physically first in a message. Traditionally, the two uses for a header were to describe the parsing of the subsequent data (record length, record type, checksum, etc) or to describe some set of fields that could be parsed in order to route or otherwise process the message, without bothering to parse the underlying payload. For an example of the second type of header, a TCP packet has two headers: the IP header parsed by the routing infrastructure and the TCP header parsed by the protocol stack (although of course, these days plenty of modern network components parse both headers).

At some point, messaging products like MQ series defined custom properties that could be added to the header of a bus message to facilitate handling of the message without parsing the payload. And because they were a structured way of passing around some convenient data, applications started using this kind of thing for properties that should probably have been included in the payload of the message instead (the payload should possibly have its own header that is separate from the MQ header). And after years of this use, the term header is used for fields that are common to all messages or that are application standard.

But "header" is different from standard fields, which are not necessarily in a header. In practice, most experienced people would not be confused by this, but I wonder if junior practitioners reading about the topic for the first time might be.