Event Processing Thinking

Friday, October 24, 2008

On EDA and maturity model

This is the first postings I am writing on my new Laptop, Lenovo T61p. New laptops are a trauma, since with the new laptop one gets new versions of all software, that somehow not always behave in a familiar way. I still did not succeed to convince the new laptop work with my home wireless network (it succeeds to connect, and then disconnects after a few seconds), so I have arranged workplace near the router and works wired, catching up in reading Blogs.

Jack Wan Hoof in his Blog claims that the market does not understand EDA, since people when talking about EDA are really talking about CEP, in a previous posting he makes a distinction between EDA that is about business events and CEP which processes messages.

It is true that people are often mix the terms of EDA and CEP (both TLAs) which are not quite the same, however, the "message" vs. "event" is something the needs some discussion, as well as the relations between all these, and why should a customer of these technology care anyway.

First of all - I agree that EDA is an architectural pattern, where various services/components/agents/artifacts are communicating in a decoupled way by sending events to some mediator which routes it based on subscription or other type of routing decision, such that the source and sink may not know each other, while CEP is a functionality of detecting patterns among multiple events and deriving new (derived/synthetic) events as a result.

To the question of "messages" vs. "events" -- CEP is called CEP since its inputs and outputs are events according to the definition of event as "something that happened" (with all its variations - see the EPTS glossary), message is in this case a way to represent event (not the only one - CEP can also process events represented as objects that are not messages).

The claim that messages in general may not represent events is valid, I can send my picture as a message and of course some smart guy can find event in it, but it is not intended to represent an event. However, if we apply pattern detection on messages that not represent events, then it is not CEP, since CEP assumes, by definition, that it inputs are events. This can be pattern detection on messages, and one can invent some name for it, if it makes sense.

Interestingly, people also realized that the same type of patterns may be useful for data and not only for events, which makes is detection of patterns on data and not CEP again, and we have started to see some works on patterns in the database community, although they are still quite complex.

EDA by itself has a value to the architectural thinking, recently I came across a plan made by an IBM big customer looking at their future architecture as a combination of SOA and EDA, in the sense that services will communicate among themselves in an event-driven way, but they have not yet discovered CEP at all. In fact, observation on various customers that are in different phase of thinking about the "event" issue leads me to think on a kind of maturity model in this area :

phase I -- simple event processing: determine what events are important, how they can be obtained, who should receive these events, how these event should be transmitted etc.. This in some cases is a major effort, teaching the customer to "think in events", and building the way to emit and consume events, in some cases that by itself can be a big investment, since it can affect running systems, and invest in development of instrumentation/adapters/periodic pull mechanisms. The type of "event processing" that is being provided by basic EDA is typically routing (including subscriptions) and filtering. This phase should have a business benefit by its own right to justify the investment.
phase II - mediated event processing: to bridge semantic and syntactic gaps between the event producer and consumer, there are additional mediators that provide ESB type functionality: transformation (including split and aggregation), enrichment (from external sources) and validation - this is often needed, and since many of these features are pretty much common, it may make sense to use COTS for it, although in simple cases, hard-coding may be cost-effective enough, this phase is optional, it may make sense to unify it with phase III.
phase III - complex event processing -- this is a step-function towards looking at "event clouds" instead of events one-by-one, and typically built on top of the phase I, possibly I +II. I have seen cases in which phase III has been implemented as first step and included step I and II -- this makes step III much longer, but makes sense if the customer has a business problem for which it needs step III, but really did not think in event before.
phase IV (optional) -- This extends the functionality towards intelligent event processing - which I already discussed in the past.

More about maturity models and event processing - later.

Monday, October 20, 2008

More on the semantics of synonyms

Still lazy days of the holiday week, I took advantage of the time to help my mother, who decided that she wants to leave home and move to seniors residence located 10 minutes walk from my house, this requires to deal with many details, so that is what I was doing in the last three days.... In the picture above (taken from the residence site) you can see the front entrance and the view seen from the residence tower, on the right hand side of the upper picture one can see part of the neighborhood we are living in (Ramat Begin) surrounded by pine trees all over.

Now, holiday eve again, and this is a good time to visit the Blog again. Last time I started the discussion in the semantics of synonyms by posing a simple example of conjunction over a bounded time interval (same pattern that Hans Glide referred to in his Blog), and slightly different from the "temporal sequence" pattern.

In the previous posting I have posed the following example:

Detect a pattern that consists of conjunction of two events (order is not important) - e1, e2.
e1 has two attributes = {N, A}; e2 has also two attributes = {N, B} ; the pattern matching is partitioned according to the value of N (on context partitions I'll write another time).
For each detection, create a derived event e3 which includes two attributes = {N, C}; E3 values are derived as: E3.N := E1.N ; E3. C = E1. A * E2. B.

Let's also assume that the relevant temporal context is time-stamps = [1, 5] - and the events of types E1 and E2 that arrived during this period are displayed in the table below:

The question is: how many instances of event E3 are going to be created, and what will be the values of their attributes?

Looking at this example, for N = 2, there is exactly one pair that matches the pattern
E1 that occurs in timestamp 5, and E2 that occurs in timestamp 4, so E3 will have the attributes {N = 2, C = 24}. However, for N = 1 things are more complicated. If we'll take the set oriented approach that looks at it as "join" (Cartesian product), since we have 3 instances of E1 and two instances of E2, we'll get 6 instances of E3 with all combinations. In some cases we may be interested in all combinations, but typically in event processing we are looking for match and not for join -- that is the difference between "event-at-a-time" type of patterns and "set-at-a-time" patterns that is being used by some of the stream processing semantics. So what is the correct answer ? -- there is no single correct answer, thus what is needed is to fine tune the semantics using policies. For those who are hard-coding event processing, or using imperative event processing languages, this entire issue seems a non-issue, since when they develop the code for a particular case they also build (implicitly) the semantics they require for a specific case, however policies are required when using higher level languages (descriptive, declarative, visual etc...), policies are needed to bridge between the fact that semantics is built-in inside higher level abstraction, and the need to fine-tune the semantics in several cases. In our case we can have several types of policies:

Policies based on order of events - example:

For E1 - select the first instance; for E2 - select the last instance.
For E1 - select the last instance; for E2 - select the last instance

Policies based on values - example:

For E1 - select the highest 2 instances for the value of A ; for E2 select the lowest instance for the value of B.

These are examples only -- it is also important to select a reasonable default which satisfies the "typical case", so if the semantics fits this default, no more action is needed.

These have been examples only, in one of the next postings I'll deal with identifying the set of policies required in order to make the semantics precise.

Wednesday, October 15, 2008

On semantics of synonyms

Holidays time - and I had to spend time in replacing my home wireless router (not of the same firm that is shown in the picture) who did not work well - to get another expensive one, whose signal is not received well in the lower floor well -- I hate wasting my time on restore things to work, this is never-ending story, every time something else need fix... we need more robust appliances..

Well - back from my personal frustrations to event processing thinking. I have posted a couple of postings about the different possible interpretations of "event_1 occurs before event_2", however, different interpretations are not unique to temporal issues, and the specific anomaly of having different temporal semantics. Let's take a case in which time does not matter - the function defined this time as follows:
Detect a pattern that consists of conjunction of two events (order is not important) - e1, e2.
e1 has two attributes = {N, A}; e2 has also two attributes = {N, B} ; the pattern matching is partitioned according to the value of N (on context partitions I'll write another time).
For each detection, create a derived event e3 which includes two attributes = {N, C}; E3 values are derived as: E3.N := E1.N ; E3. C = E1. A * E2. B.

Let's also assume that the relevant temporal context is time-stamps = [1, 5] - and the events of types E1 and E2 that arrived during this period are displayed in the table below:

The question is: how many instances of event E3 are going to be created, and what will be the values of their attributes? Think about it -- I'll discuss it next week.

Saturday, October 11, 2008

More on semantics and race conditions

In previous posting I have posed the following sceanario:

Given the simple application shown below:

There is a single event source (so no clock synchronization issues) which generates events of three types e1, e2, e3.

Let's also say that in our story there is a single events of each type that is published (so no synonyms issues), the table shows their occurrence time (when they occurred in reality) and detection time (when they have been reported to the system) - each of them has been reported 1 time unit after its occurrence, no re-ordering problem.

Events e1, e2 serve as an input to an EPA of type "pattern detection" which detects a temporal sequence pattern "e1 before e2", and when this is detected, it derives an event e4 - some function of e1 and e2.

Events e3 (raw event) and e4 (derived event) serve as input to another EPA of type "pattern detection" which again detects a temporal sequences pattern "e3 before e4", if this pattern is detected - create event e5 which triggers some action in the consumer.

I also asked the question is -- given the above - will the action triggered by e5 occur?, i.e. will the pattern - "e3 before e4" be evaluated to true?

I got a few answers to this and you can read them as comments to the original posting; as promised I am dedicating this postings to analysis of this simple case:

The first thing to discuss is the semantics of "temporal sequence". There are two possible types of semantics for temporal sequence, which I call "detection time semantics" and "occurrence time semantics".

The detection time semantics is implemented in various languages and means that the temporal order is the order of the time-stamps in which the "event processing platform" detects that this event occurs; if there is a single thread of such detection, then the events are totally ordered, otherwise, there may be several events with the same "detection timestamp".
The occurrence time semantics also implemented in various languages means that the temporal order is the order of the time-stamps that are provided as part of the event information, and designate - when this event happend in reality. There are some complexity of synchronization of time in multi-producer environment, however, in this example we assume a single producer (I'll write about multi-producer cases in another posting).
Note that this two order relations may not be identical.
There is also kind of hybrid solution ("total order semantics") -- the semantics is really "detection time" semantics, but in order to allow events that arrive a bit late to take their proper role, the events are queued at a buffer (and not considered as detected) until time-out to let "out of order" events to arrive and re-order the buffer, and then send the events according to the buffer order.

Getting back to the example - in the small table on the bottom left-hand side of the figure above, there are occurrence and detection times of e1, e2, e3. For e4 there is only detection time - e4 is different from {e1, e2, e3} by the fact that it is a derived event and not raw event like the other three. The question is "what is the occurrence time of a derived event" ? -- there is no clear answer for it - there are several possible answers:

In the derived event case the occurrence time = detection time, since this event is not real event but a virtual one, thus, its source is the EPA that creates it, and it occurred when created. In our case it means that occurence-time (e4) = 4.
Its occurence time is the occurrence time of the last event that completed the pattern - since the participating events in the creation of e4 are {e1, e2} and e2 was the last that completed the pattern, occurrence-time (e4) = occurrence-time (e2) = 2
Interval semantics: The event e4 occurs in the interval in which all the participants occur, which is this case means occurrence-time (e4) = [1, 2].

The phenomenon of multiple semantic interpretations apply to various other semantic decisions in the semantic of event processing language, and the preferred solution is to provide the user with semantic "fine tuning" policies, under which the user can chose the desired semantics, instead of "hard code" a certain semantics (using the most common one as a default), this is one of the benefits of using COTS for event processing, since it is quite difficult to think about such issues when developing EP manuaully using conventional language.

The semantics of the second "temporal sequence" (e3, e4) is thus:

According to "detection time" semantics -- both have detection-time of 4. As such the sequence condition is not satisfied. However, if we impose total order by a single thread, this may create race conditions between the two events. In this case it is recommended to use a consistent priority policy - either breadth first (the raw event always comes first) or depth first (the derived event always comes first) to ensure deterministic result.
According to the "occurence time" -- it depends on the policy chosen, but according to all interprerations - e4 occurs before e3 - thus the temporal sequence is not satisfied.

Bottome line: the temporal sequence (e3, e4) is satisfied if:

The temporal semantics is detection time
It is implemented by total order
The total order policy is "breadth first" - namely priority for the raw events.

In all other cases the temporal sequence will not be satisfied and the corollary action will not execute.

Wednesday, October 8, 2008

On HITC and some small stuff

I have written about a month ago about the Arab Israeli High Tech Center, whose first goal will be to convert Arab Israeli engineers and mathematicians, to work in the very demanding Israeli high-tech industry. Earlier this week there was very impressive ceremony of kick-off for the center (that somehow migrated from AITC to HITC), in the two picture above - the top one is a group photo of the management and industry advisory committee of the center, and in the bottom one you can see me sitting in the crowd, with my eldest daughter, Anat, that decided to come and watch. This project is highly supported by the Israeli High-Tech industry and some of them included their country general managers -- like: HP, Oracle, BMC, EDS and Matrix (and Israeli services company), and other companies like - IBM, Motorola, Intel, Microsoft, Checkpoint and more, were represented by a senior person, there were several hundred people present, and was quite impressive -- studies will start in February 2009, and still a lot of work to be done to make it happen, but people came in feeling of a history in the making.

And back to event processing -- in the next posting I'll talk about the semantic question I have posted last week, and meanwhile just some short comments:

Mark Tsimlezon from Coral8 tries to define what is "CEP engine" stating that there is some confusion in the market about this. I almost agree with what he has written and wondered if I should react, since my reaction can further confuse people... So I'll just remark that the term "platform" starts to be very popular, but with somewhat different meanings. I'll write more about platforms in the future.
Marc adler is blogging about MSFT Oslo and his CEP application - without going now to further details I believe that the direction of having the ability to interface in the user's domain terminology and way of thinking, and then map it automatically to an execution language (directly or through intermediate representation) is a correct idea, somewhat beyond the state-of-the-art today; there will probably be several ways to do it, but a good topic to work on.
Marco from RuleCore is blogging about the pain of their SAAS model and mentions some obstacles, this is a good topic for further discussion, it was also presented in the EPTS meeting by Bob Marcus. Clear, there are some applications that can be served by this model and some (e.g. distributed applications) are not. Will discuss this issue in length in one of the coming postings.

Saturday, October 4, 2008

On Event Processing Network and Transaction Processing

It is a holiday period, time in which we have four holidays during three weeks, and is quite a lazy time here, with many people taking vacations (like the second part of December - beginning of January in the countries with christian majority), due to holidays and some other events I'll see my office in the coming week only on Tuesday, but working a bit from home now...

In an IBM internal Email exchange this week with a person who does not really understand event processing, this person has seen some illustration of EPN (Event Processing Network) and wondered -- this seems like regular transaction processing ? what is the difference ?

Indeed - from bird's eye view everything looks like directed graph, such as the one shown in the top of this page, both transactional flow and EPN as well as many other things are expressed using a directed graph, however there is a major difference in the semantics of the graph.

In order to refer to a concrete example, let's take an EPN example taken from an application of remote patient monitoring.

The semantics of EPN means that a node in a graph creates events and then these events are consumed by other nodes in the graph, for example the "enrich" node takes a blood pressure reading and enrich it with indication whether the patient is diabetic, thus creates a derived event; this derived event is consumed by the node that is looking for pattern to alert physician. Without going to the application's details too much - we may also state that unlike a control flow, the pattern detection node does not start its execution when all of its predecessors have finished, since the pattern may look at multiple blood pressure measurements of the same patient, it may exist for longer period relative to the enrich node that is created and measurements anytime that there is a blood pressure reading of a new patient, so the graph does not show the control flow, moreover, these two nodes don't know each other and communicate through a router (channel) node. So there are some differences between event processing network and transactional flow:

The EPN graph does not represent control flow, but event flow.
In a control flow graph, typically the relationship between predecessor and successor nodes are "finish to start" (either "meets" or "after" in the Allen's operators that I'll discuss in a seperate discussion) which means that the predecessor's node must terminate in order for the successor node to start, in EPN, this may not be the case.
EPN does not necessarily be atomic (one node in the EPN may fail, but others continue - no "atomic commitment protocol" (e.g. 2PC) is applied
It also may not be isolated -- a node can emit events, while still continue working; even if it fails later - its emitted events may still be valid, if atomicity is not required.
EPN can be restricted to behave in a transactional way - this is an interesting observation, as transaction support violates the decoupling principle, however there are cases in which it is required (again, deserves some more discussion). More - Later.

Friday, October 3, 2008

On the Genesis and Exodus in Event Processing

One of the greatest scientists I had the honor to meet in person (in a conference in France, 1991) is Lofti Zadeh the inventor of "fuzzy sets" which is one of the major ways to formulate inexact thinking. When I was an undergraduate student, there was an urban legend that Zadeh came with the fuzzy notion when his wife went out of town and left him a cooking recipe, trying to formalize the recipe he came out with the notion of fuzzy. Later in life I've met another great scientist and wonderful person, the late Manfred Kochen, who told me overa lunch in Ann-Arbor, that he has been a graduate student together with Zadeh in Columbia University; so I told him the urban legend and asked him if it is true, he was quite amused to hear it, and said that the problem that actually started the thinking about fuzzy theory was - formalizing the process of parking car between two parking cars, assuming the Fred Kochen told me the truth, was the genesis of fuzzy logic. It was interesting to observe that Tim Bass, in a couple of his latest Blog postings, have returned to the genesis of "complex event processing" citing topics that emerge from the papers that David Luckham's group in Stanford published in the late 1990-ies - the list contained:

Network Level Monitoring and Management;
Cyber Security: Network Intrusion Detection;
Enterprise Monitoring and Management,
Modelling and Simulation of Collaborative Business Processes;
Business Policy Monitoring;
Analysis and Debugging of Distributed Systems.

These applications are all still very much alive and kicking in the event processing space.

It is interesting to note that the genesis of data stream management in one of the earliest papers of the "stream" project, has been, surprise, surprise -- "network traffic management". It also should be noted that David Luckham and Jennifer Widom reside in the same building.

As the area of event processing have many ancestors - they have some more genesis books, for example, the term "active database" was first coined by Morgenstern in his VLDB paper from 1983 , and the genesis of Morgenstern has been - consistency and integrity applications. We still see compliance and governance (our current names) as major applications. Other ancestors are in the area of system management whose genesis has been the "root cause analysis" application - i.e. diagnostics of problems out of symptoms. We in the AMiT project in IBM Haifa Research Lab started also with looking at system management applications, and what is now called "business services management" - impact analysis of events in the IT on business processes. I think that at least some of the pub/sub companies started with distribution of new versions of software to subscribers, and of course some of the current event processing vendors started with applications like algorithmic trading in capital markets.

If we have used the biblical term genesis, we also may remember that the successor of "genesis" is "exodus", and in our term -- moving on and not staying just where we happened to start. While some of the software industry is based on niche players, where the niche may be quite big (one of the biggest IT companies in Israel has concentrated for many years mostly in the area of Telco billing, probably big enough to enable niche companies of several thousands employees), however, for more basic software like event processing tools, there is a big benefit in the ability to generalize beyond the genesis, and indeed we see now that some vendors are going after other markets that may seem beyond their "comfort zone" and need to make some adjustments (this phenomenon may be one of the drivers for standardization in this area, but I'll discuss this issue in another time), thus, we are watching growing list of applications and business problems that event processing can be part of its solution, both in the infrastructure area (which should grow to internet scale infrastructure) and the enterprise application area. To conclude this posting with citing another great speaker, Professor Stu Madnick from MIT, whom I remember giving an amusing talk about theoretical computer science saying something like: A bunch of people went to a close room taking with them some problems from the outside world, and since then they are still in the same close room, still working on the same problems, and sometimes inventing new problems . Well - we shall still solve the original problems, but also look around to find new ones, we are just in the early days of the event processing area, and probably did not discover much of its power to impact the business world. More - Later.