Event Processing Thinking: event types

Showing posts with label event types. Show all posts

Wednesday, December 15, 2010

Revisiting EPN

This illustration, taken from the EPIA book, and drawn by Peter Niblett, is a portion of the EPN that describes the "Fast Flower Delivery" example that accompanies this book. In an internal discussion today somebody raised the question, why do we need EPN at all, and not using the alternative that has been used in Amit, and other places: each EPA subscribes to an event type, whenever an event from this event type is detected, the appropriate EPA listens to it and processes it, and all the event flow is implicit and the person defining the system does not need to worry about it.

Since this question is actually a good question, I wanted to share my response. There are two main reasons why we have shifted in the thinking to the EPN model: efficiency and usability.

I'll start with the usability, experience shows (and this observation is true also to inference based systems) that people feel more comfortable in ability to control the flow rather then having implicit flows, they understand better what it does, can better debug and validate it, and trust such systems more. Note that EPN is not a workflow, it does not represent control flow, it represent event streaming flow (in a way similar to data flow, with some semantic distinctions).

The other reason is efficiency. If an EPA subscribes to event type then either an EPA has to process and filter out a substantial amount of irrelevant events, or the amount of event types might successfully be increased. Imagine the following scenario: An event of type ET1 arrives, first it meets a filter that filters out much of the event using some assertion, and then there are various EPAs that process only the filtered-in events, one of this EPAs is enrichment, adding some information from a database, and then the enriched event is being sent to an aggregator for further processing. If we use the "event type" subscription, there are two choices: first -- create event type ET2 for the filtered-in events, identical to ET1, and create derived event of type ET2 for each filtered-in event of type ET1, then create event type ET3 for the enriched event with added enriched attribute, and then indeed each EPA subscribes to a single event type. The second choice is to use ET1 for all three cases, but add indication (using some derived attribute) which variation of ET1 it is, and filter inside the aggregator to have only the right type of ET1. Both are inefficient, the first one due to the need to manage much more event types, the second is that much more events are transmitted to each EPA to filter out, and the order also becomes important here.

The explicit EPN resolves it by the fact that each EPA sends it output to a channel and the channel can route according to source, type, assertion etc... - thus a specific output terminal of a channel is really the topic which EPA subscribes to. Note that all the possibilities mentioned before are just special cases of EPN and if one insists, such EPN can be constructed, in the extreme case, one can construct EPN with a single channel that routes every event to every EPA to decide whether it wants to use it or not, but I would not recommend it as a good design pattern. More - later.

Wednesday, November 11, 2009

On Defining "EVENT" in Earnest

Professional books are not that funny, this is left for comedies. My favorite comedy of all times is Oscar Wilde's "The Importance of being Earnest". In Hebrew it was translated literally to something like "The importance of seriousness", and everybody who know what it is talking about understands that this translation totally misses the point of this comedy. Anyway, I recalled Oscar Wilde's old play, when reading the book by Mani Chandy and Roy Schulte recently, since they have in their book a section called "defining "EVENT" in Earnest". In this section they are saying that there are three school of thoughts about how EVENT is defined:

State-change view - an event is a change in the state of something and as such is reported. Its properties: a change must occur, and this change must be reported. Example: An item previously outside the range of RFID reader, is now within the range of this RFID reader.
Happening view -- an event is anything that happens, or is contemplated as happening (the EPTS glossary definition), in this case, a change must occur, but its reporting to the system is optional, not every event according to this definition is of interest to be reported. Example: A person sending Email
Detectable-condition view -- an event is a detectable condition that can trigger a notification, in this case a change does not have to occur, but reporting should occur. Example: A GPS devise reporting track location (note -- location may not have changed since last report. since the track driver went for lunch).

This is an interesting observation, some people argue that only the first type is an event, while the other types are not. My view is that all the above are actually events. The question is whether we can come with an inclusive, agreed upon definition of event, maybe the glossary team (co-lead by Roy Schulte) should take this challenge.

More about event types - later.

Saturday, March 15, 2008

The Babylon tower and event formats

Still in the USA, after the OMG meeting in the Washington DC area, I got to the Boston area, and now I am in Burlington, MA, visiting the (former) Aptsoft guys. When driving abroad I am renting a car with GPS (see above to determine which one), and it typically gets me to where I want - however, it is still not totally reliable, in the last day it confused me twice, one yesterday night to find the hotel, it told me to turn left, and meant in the next turn, not in the current turn, but it did not say so, thus, I found myself back on the I95, and had to turn around at the next exit, and try again. Today it sent me to some shopping center instead of the Aptsoft site, and after I ignored it and found it - the local people told me that the GPS maps have the numbers of the street in the wrong directions (starting from the other end) - well, sometimes the technology is fine, and the weakest link is the data it uses.

Talking about data, one of the topics that were discussed in the OMG meeting about standards is the topic of -- semantic/structural standards for events. I have used the term "Babylon tower" in one of the earliest postings in this Blog and meant that we have Babylon tower of languages - like the original tower who separated the languages. However, there is another Babylon tower that relates to event format - syntax and semantics, and here the problem is even harder, since there are multiple formats in multiple domains, nobody even made an inventory survey. One of the presenters said (and it is true in some cases) that 80 percent of the cost of building EP application is to set up the events from the sources and transform them to a processable format by composing adapters (hand coded, or by using transformation engines). This is some of the domains that need more investigation, and perhaps we need a meta-data standard here.

Thursday, December 6, 2007

On Event Representation

Back to micro-oriented issue, and today I'll start discussion about -what's behind the definition of the event processing glossary and get to the issue of event representation. As the glossary says - event is something that happens in reality. We also tend to call "event" to the representation of this reality for the purpose of processing by a computer. This notion in event has in the glossary several aliases: event object, event message and event tuple. The various aliases are indications that the space of event representation is not uniform, some think about event as a message that moves around, some thinks of it as a tuple, which is part of a stream, and really the twin brother of a tuple in relational database, some think of it as an object with arbitrary structure (which may also be hidden). Obviously, there is no "universal event", and unfortunately, since in many cases, events are already given from the sources with their given formats, and the event processing designer has little to say about it, then a generic event processing system has to support multiple type of events, or have adapters that translate all types of event to some cannonic type of event (and typically both -- supporting some cannonic type of events and having adapters translating other types of events to the cannonoic type). Event can be structured, semi-structured (XML), and unstructured (the area of unstructured events processing deserves more focused attention). One of the questions is - whether there are common attributes that each event should have to enable event processing. In the data world - the answer is no - there is not a single attribute that must exist in all relations (besides the fact that each tuple should be a member of some relation - no floating tuples). For event processing -- there are some attributes that have been proposed as common attributes:

Event-type
Source
Time-Stamp (or Time-Interval)

Let's look about the question - are they mandatory or not:

The first question is whether each event is an instance of an event-type (or event-class). The glossary says - yes ! "all events must be instances of event-type". This seems reasonable, however, we may think of some exceptions - such as rare events that have not been classified.. I need to drill down on rare events in some other post.
The second question is whether the source should be mandatory - again, this is desirable if we want to have lineage or tracing back actions/decisions, but there may be cases in which the source is indefinite, or we wish to hide the source (e.g. leaking of information).
The third question is whether each event must have a time-stamp (or time-interval in case it happens over an interval - another area that needs more discussion) - the answer is that many event processing patterns are time related, and if we want to know which event occurs first, or if two events occurred within 5 minutes of each other, we need to know WHEN this event occurred in reality. However - in some cases it is not known, in other cases it is not really needed.

It seems that all common attributes are useful, but may be optional in some cases.

There are attributes that are common for types - such as probability for uncertain events, spatial coordinates for spatial events etc -- this is before relating to the content.

The content is determined according to domain related ontologies - and there is a lot of work today in different application domain or industry to define such ontologies. XML is the ontology language, and it has its own benefits, it also carries overhead relative to "flat" events in which the attributes are positional oriented and not keyword oriented.

Events also carry semantic information - such as: reference to entities in certain roles. In fact, event can be thought of a transition between one state to another and the information included in the event refers to a change in the universe such as:

what was changed ? what entities are affected? when it was change ? where did the change take place ? what other information is important about the change ?

This short discussion raised already several open issues that deserve further discussion - so I'll put these topics on the queue for further postings.... more - later.

Event Processing Thinking