Thursday, December 6, 2007

On Event Representation

Back to micro-oriented issue, and today I'll start discussion about -what's behind the definition of the event processing glossary and get to the issue of event representation. As the glossary says - event is something that happens in reality. We also tend to call "event" to the representation of this reality for the purpose of processing by a computer. This notion in event has in the glossary several aliases: event object, event message and event tuple. The various aliases are indications that the space of event representation is not uniform, some think about event as a message that moves around, some thinks of it as a tuple, which is part of a stream, and really the twin brother of a tuple in relational database, some think of it as an object with arbitrary structure (which may also be hidden). Obviously, there is no "universal event", and unfortunately, since in many cases, events are already given from the sources with their given formats, and the event processing designer has little to say about it, then a generic event processing system has to support multiple type of events, or have adapters that translate all types of event to some cannonic type of event (and typically both -- supporting some cannonic type of events and having adapters translating other types of events to the cannonoic type). Event can be structured, semi-structured (XML), and unstructured (the area of unstructured events processing deserves more focused attention). One of the questions is - whether there are common attributes that each event should have to enable event processing. In the data world - the answer is no - there is not a single attribute that must exist in all relations (besides the fact that each tuple should be a member of some relation - no floating tuples). For event processing -- there are some attributes that have been proposed as common attributes:
  • Event-type
  • Source
  • Time-Stamp (or Time-Interval)

Let's look about the question - are they mandatory or not:

  • The first question is whether each event is an instance of an event-type (or event-class). The glossary says - yes ! "all events must be instances of event-type". This seems reasonable, however, we may think of some exceptions - such as rare events that have not been classified.. I need to drill down on rare events in some other post.
  • The second question is whether the source should be mandatory - again, this is desirable if we want to have lineage or tracing back actions/decisions, but there may be cases in which the source is indefinite, or we wish to hide the source (e.g. leaking of information).
  • The third question is whether each event must have a time-stamp (or time-interval in case it happens over an interval - another area that needs more discussion) - the answer is that many event processing patterns are time related, and if we want to know which event occurs first, or if two events occurred within 5 minutes of each other, we need to know WHEN this event occurred in reality. However - in some cases it is not known, in other cases it is not really needed.

It seems that all common attributes are useful, but may be optional in some cases.

There are attributes that are common for types - such as probability for uncertain events, spatial coordinates for spatial events etc -- this is before relating to the content.

The content is determined according to domain related ontologies - and there is a lot of work today in different application domain or industry to define such ontologies. XML is the ontology language, and it has its own benefits, it also carries overhead relative to "flat" events in which the attributes are positional oriented and not keyword oriented.

Events also carry semantic information - such as: reference to entities in certain roles. In fact, event can be thought of a transition between one state to another and the information included in the event refers to a change in the universe such as:

what was changed ? what entities are affected? when it was change ? where did the change take place ? what other information is important about the change ?

This short discussion raised already several open issues that deserve further discussion - so I'll put these topics on the queue for further postings.... more - later.


harvey said...

Despite the DoD's attempt to catalog everything universally (DDMS, built from Dublin Core):
...which would include event metadata...

Some folks believe there can be no one taxonomy (or ontology) for everything (as you indicated):

...then we are forced to accept attributes or metadata from the discipline we are taking events from (weather, particle detectors, border patrol, etc.)

Beyond metadata used to describe an individual event (discipline related), I will assert that we need additional metadata to form appropriate relations with other event metadata.

It's the relationships (in my opinion) that will be of higher value than any individual event, and it's the relationships which determine if a pattern or situation is "found".

Take care,

Opher Etzion said...

Thanks Harvey. I'll try to get the "disorder of things" to watch the omni-potent ontology.