Friday, October 24, 2008

On EDA and maturity model

This is the first postings I am writing on my new Laptop, Lenovo T61p. New laptops are a trauma, since with the new laptop one gets new versions of all software, that somehow not always behave in a familiar way. I still did not succeed to convince the new laptop work with my home wireless network (it succeeds to connect, and then disconnects after a few seconds), so I have arranged workplace near the router and works wired, catching up in reading Blogs.
Jack Wan Hoof in his Blog claims that the market does not understand EDA, since people when talking about EDA are really talking about CEP, in a previous posting he makes a distinction between EDA that is about business events and CEP which processes messages.
It is true that people are often mix the terms of EDA and CEP (both TLAs) which are not quite the same, however, the "message" vs. "event" is something the needs some discussion, as well as the relations between all these, and why should a customer of these technology care anyway.
First of all - I agree that EDA is an architectural pattern, where various services/components/agents/artifacts are communicating in a decoupled way by sending events to some mediator which routes it based on subscription or other type of routing decision, such that the source and sink may not know each other, while CEP is a functionality of detecting patterns among multiple events and deriving new (derived/synthetic) events as a result.
To the question of "messages" vs. "events" -- CEP is called CEP since its inputs and outputs are events according to the definition of event as "something that happened" (with all its variations - see the EPTS glossary), message is in this case a way to represent event (not the only one - CEP can also process events represented as objects that are not messages).
The claim that messages in general may not represent events is valid, I can send my picture as a message and of course some smart guy can find event in it, but it is not intended to represent an event. However, if we apply pattern detection on messages that not represent events, then it is not CEP, since CEP assumes, by definition, that it inputs are events. This can be pattern detection on messages, and one can invent some name for it, if it makes sense.
Interestingly, people also realized that the same type of patterns may be useful for data and not only for events, which makes is detection of patterns on data and not CEP again, and we have started to see some works on patterns in the database community, although they are still quite complex.
EDA by itself has a value to the architectural thinking, recently I came across a plan made by an IBM big customer looking at their future architecture as a combination of SOA and EDA, in the sense that services will communicate among themselves in an event-driven way, but they have not yet discovered CEP at all. In fact, observation on various customers that are in different phase of thinking about the "event" issue leads me to think on a kind of maturity model in this area :
  • phase I -- simple event processing: determine what events are important, how they can be obtained, who should receive these events, how these event should be transmitted etc.. This in some cases is a major effort, teaching the customer to "think in events", and building the way to emit and consume events, in some cases that by itself can be a big investment, since it can affect running systems, and invest in development of instrumentation/adapters/periodic pull mechanisms. The type of "event processing" that is being provided by basic EDA is typically routing (including subscriptions) and filtering. This phase should have a business benefit by its own right to justify the investment.
  • phase II - mediated event processing: to bridge semantic and syntactic gaps between the event producer and consumer, there are additional mediators that provide ESB type functionality: transformation (including split and aggregation), enrichment (from external sources) and validation - this is often needed, and since many of these features are pretty much common, it may make sense to use COTS for it, although in simple cases, hard-coding may be cost-effective enough, this phase is optional, it may make sense to unify it with phase III.
  • phase III - complex event processing -- this is a step-function towards looking at "event clouds" instead of events one-by-one, and typically built on top of the phase I, possibly I +II. I have seen cases in which phase III has been implemented as first step and included step I and II -- this makes step III much longer, but makes sense if the customer has a business problem for which it needs step III, but really did not think in event before.
  • phase IV (optional) -- This extends the functionality towards intelligent event processing - which I already discussed in the past.

More about maturity models and event processing - later.

Monday, October 20, 2008

More on the semantics of synonyms

Still lazy days of the holiday week, I took advantage of the time to help my mother, who decided that she wants to leave home and move to seniors residence located 10 minutes walk from my house, this requires to deal with many details, so that is what I was doing in the last three days.... In the picture above (taken from the residence site) you can see the front entrance and the view seen from the residence tower, on the right hand side of the upper picture one can see part of the neighborhood we are living in (Ramat Begin) surrounded by pine trees all over.
Now, holiday eve again, and this is a good time to visit the Blog again. Last time I started the discussion in the semantics of synonyms by posing a simple example of conjunction over a bounded time interval (same pattern that Hans Glide referred to in his Blog), and slightly different from the "temporal sequence" pattern.
In the previous posting I have posed the following example:

Detect a pattern that consists of conjunction of two events (order is not important) - e1, e2.
e1 has two attributes = {N, A}; e2 has also two attributes = {N, B} ; the pattern matching is partitioned according to the value of N (on context partitions I'll write another time).

For each detection, create a derived event e3 which includes two attributes = {N, C}; E3 values are derived as: E3.N := E1.N ; E3. C = E1. A * E2. B.

Let's also assume that the relevant temporal context is time-stamps = [1, 5] - and the events of types E1 and E2 that arrived during this period are displayed in the table below:

The question is: how many instances of event E3 are going to be created, and what will be the values of their attributes?

Looking at this example, for N = 2, there is exactly one pair that matches the pattern
E1 that occurs in timestamp 5, and E2 that occurs in timestamp 4, so E3 will have the attributes {N = 2, C = 24}. However, for N = 1 things are more complicated. If we'll take the set oriented approach that looks at it as "join" (Cartesian product), since we have 3 instances of E1 and two instances of E2, we'll get 6 instances of E3 with all combinations. In some cases we may be interested in all combinations, but typically in event processing we are looking for match and not for join -- that is the difference between "event-at-a-time" type of patterns and "set-at-a-time" patterns that is being used by some of the stream processing semantics. So what is the correct answer ? -- there is no single correct answer, thus what is needed is to fine tune the semantics using policies. For those who are hard-coding event processing, or using imperative event processing languages, this entire issue seems a non-issue, since when they develop the code for a particular case they also build (implicitly) the semantics they require for a specific case, however policies are required when using higher level languages (descriptive, declarative, visual etc...), policies are needed to bridge between the fact that semantics is built-in inside higher level abstraction, and the need to fine-tune the semantics in several cases. In our case we can have several types of policies:

Policies based on order of events - example:

For E1 - select the first instance; for E2 - select the last instance.
For E1 - select the last instance; for E2 - select the last instance

Policies based on values - example:

For E1 - select the highest 2 instances for the value of A ; for E2 select the lowest instance for the value of B.

These are examples only -- it is also important to select a reasonable default which satisfies the "typical case", so if the semantics fits this default, no more action is needed.

These have been examples only, in one of the next postings I'll deal with identifying the set of policies required in order to make the semantics precise.