Thursday, December 6, 2007

On Event Representation


Back to micro-oriented issue, and today I'll start discussion about -what's behind the definition of the event processing glossary and get to the issue of event representation. As the glossary says - event is something that happens in reality. We also tend to call "event" to the representation of this reality for the purpose of processing by a computer. This notion in event has in the glossary several aliases: event object, event message and event tuple. The various aliases are indications that the space of event representation is not uniform, some think about event as a message that moves around, some thinks of it as a tuple, which is part of a stream, and really the twin brother of a tuple in relational database, some think of it as an object with arbitrary structure (which may also be hidden). Obviously, there is no "universal event", and unfortunately, since in many cases, events are already given from the sources with their given formats, and the event processing designer has little to say about it, then a generic event processing system has to support multiple type of events, or have adapters that translate all types of event to some cannonic type of event (and typically both -- supporting some cannonic type of events and having adapters translating other types of events to the cannonoic type). Event can be structured, semi-structured (XML), and unstructured (the area of unstructured events processing deserves more focused attention). One of the questions is - whether there are common attributes that each event should have to enable event processing. In the data world - the answer is no - there is not a single attribute that must exist in all relations (besides the fact that each tuple should be a member of some relation - no floating tuples). For event processing -- there are some attributes that have been proposed as common attributes:
  • Event-type
  • Source
  • Time-Stamp (or Time-Interval)

Let's look about the question - are they mandatory or not:

  • The first question is whether each event is an instance of an event-type (or event-class). The glossary says - yes ! "all events must be instances of event-type". This seems reasonable, however, we may think of some exceptions - such as rare events that have not been classified.. I need to drill down on rare events in some other post.
  • The second question is whether the source should be mandatory - again, this is desirable if we want to have lineage or tracing back actions/decisions, but there may be cases in which the source is indefinite, or we wish to hide the source (e.g. leaking of information).
  • The third question is whether each event must have a time-stamp (or time-interval in case it happens over an interval - another area that needs more discussion) - the answer is that many event processing patterns are time related, and if we want to know which event occurs first, or if two events occurred within 5 minutes of each other, we need to know WHEN this event occurred in reality. However - in some cases it is not known, in other cases it is not really needed.

It seems that all common attributes are useful, but may be optional in some cases.

There are attributes that are common for types - such as probability for uncertain events, spatial coordinates for spatial events etc -- this is before relating to the content.

The content is determined according to domain related ontologies - and there is a lot of work today in different application domain or industry to define such ontologies. XML is the ontology language, and it has its own benefits, it also carries overhead relative to "flat" events in which the attributes are positional oriented and not keyword oriented.

Events also carry semantic information - such as: reference to entities in certain roles. In fact, event can be thought of a transition between one state to another and the information included in the event refers to a change in the universe such as:

what was changed ? what entities are affected? when it was change ? where did the change take place ? what other information is important about the change ?

This short discussion raised already several open issues that deserve further discussion - so I'll put these topics on the queue for further postings.... more - later.

Wednesday, December 5, 2007

On False positives and False Ngatives


From syntactic point of view, CEP looks for patterns and derives event / trigger action based on each pattern detected, however, detecting the patten is the mechanic work, the patterns designate a "situation" which is an "event" in the customer's frame of reference to which the customer wants to react to (there are also "internal" situation for further processing. There is obviously a gap between the intention (situation) and the way it is detected (patter no the event flow). In many cases,satisfying the pattern is sufficient condition to detect the intended situation, however, in other cases, this serves as "best approximation". This leads to the phenomenon of false positives (detecting of patterns, but the situation did not really happen) and post negatives (situation occurred but pattern has not been detected). Some reasons are:
  • Raw events are missed - do not get at all, or do not get on time (source or communication issues).
  • Raw events are not accurate - values are not accurate (source issues).
  • Temporal order issues - Uncertainty in correct order of events.
  • Pattern does not accurately reflect the conditions for situation (e.g. there are probabilistic elements)
  • (other reasons) ?

Like the time constraints case there are various utility functions to designate the damage from either false positives or false negatives.

More on that issue - later.