This is a blog describing some thoughts about issues related to event processing and thoughts related to my current role. It is written by Opher Etzion and reflects the author's own opinions
Friday, September 14, 2007
In-Line vs. Observations
Anyway, last post I have discussed RTE vs. BAM, and succeeded to confuse some people, so it needs some clarification. First - as said before, event processing is a collection of functions, that derive, detect pattern, enrich, transform, and some other functions that take events as input, processes the events, and produces (possibly other) events as output.
The distinction is - where, relative to running transactions, the event processing is done. In-Line event processing means that the event processing is done inside a transaction, not observing the transaction from the outside. In this case, the event creation may be part of the transaction (e.g. an application emits event, but the result actions belong to the same transaction), or a starting-point for a transaction. To give a concrete example (repeating the same example from the previous post) - some application emits an event saying "cancel order" (this by itself may or may not be a result of event processing function), this should be atomic with the execution of the cancellation, if cancellation is not possible, then some compensation is needed in the producer application. Likewise, an event can start a transaction that spans all its descendants and actions, if an action in the consumer cannot execute, or there is some validation issue, the entire thread of the "event processing network" should abort/compensate, this is in contrast to "observation mode" -- observe if there is some anomaly, and if there is - notify somebody. In the In-Line case, the transaction can change its course due to event processing, in the observation case - the impact is indirect. Of course, in some cases we have combinations of all. There are two issues that should be discussed: what is the relationship between "observation" mode of event processing to what the market knows as BAM ? another question --- what is the semantic meaning of rollback or compensation when talking about events? more discussion on each of these issues - later. Happy New Year.
Thursday, September 13, 2007
Event processing and transactions - real, real time and real time enterprise
RTE applications, in the event processing context, are thus defined as those applications that impact running transaction directly. Other types of applications that use event processing in observation mode (e.g. BAM) are typically looking at running transaction from the outside, while RTE applications are indeed part of the transaction processing. More on BAM applications and other types - later.
Wednesday, September 12, 2007
Event processing - hard-coded ? specialized software ? generic software ?
so what is the new enunciation here ? it is true that we had done event processing for long time, sometimes in "exception" mode, but it has been done in a hard-coded way embedded in regular programming. I guess that still today if we'll take most of the implemented functionality that falls under the category of event processing, we'll find that they are hard-coded within applications. The new thing is the existence of generic software. When should we used generic software ? if all what is needed in event processing sums up in one or two functions - it is probably not cost-effective to purchase, install, learn and develop in a generic tool. It starts to be cost-effective if one of these conditions is satisfied:
- The "event processing" functions required in the application quantity is at least medium.
- The complexity of these functions is not trivial, there is a benefit of programming it in high-level language rather than lower-level language
- The event processing is not internal to a single application - thus there is a connectivity issue (e.g. pull events, using adapters to re-format the event, listen to events, publish events etc..) . Again - hard-coding all of this may not be cost-effective.
- Agility requirement - frequent changes. Again - easier when there are higher-level abstractions.
- Need to enable control on the behavior by business users without involving programmers - this is a mountain we have not conquered yet, but conceptually will be easier with generic tool.
The analog, in a way, is generic DBMS with all its utilities vs. implementing your own database using file systems. There are certainly cases in which it is worthwhile to use file systems, and not use the abstraction layer that DBMS provides, but in most cases the TCO (total cost of ownership) is substantially lower, when the level of abstraction goes up. Note that the main cost saving is maintenance time and not in development time, especially if agility is a consideration.
There is another variation between the "generic" software and hard-coding which is a specialized function. It still provide high-level language, but the function is limited to a specific application / application types. Should we use generic software for all such cases ? or are we better in using specialized software, which raises an interesting question - should we strive for the "one side fits all' ?
More of this discussion will continue later
Tuesday, September 11, 2007
Event Processing - A paradigm shift ?
we can realize that the paradigm shift has not happened yet (maybe it has happened for us which live and breeze this topic, but not in a large scale anyway). Looking at one paradigm shift that has succeeded - relational databases - we can analyze some of the reasons for that success :
- We need an underlying theory in two areas one - semantic (like relational algebra) and the second engineering-oriented theory (like query optimization) that is built on top of the first one;
- We need vendors that understand the theory to generate products that implement it
- We need to be able to explain the developers community (with the various types of developers - a topic for another article) how to use it --- a good textbook like Date's book about relational database is a good step, but development of use patterns, methodologies etc.. will be helpful too...
- Standards (topic for another forthcoming article) are complementary --- but relational databases have been paradigm shift before the SQL standard was published.
The state of the practice in event processing is similar to the database area in the pre-relational time, there is several approaches, all of them grown up from implementations. This does not at all undermine the importance of the first generation of event processing products, without more experience in the field, we cannot get anywhere...
While the various vendors continue to incrementally advance the products, Some of the effort (perhaps a community effort) should go now towards bridging the gap between the first generation and the "paradigm shift"...
Bottom line: Event processing has a big potential to make the paradigm shift that Roy Shculte is talking about, making event processing a major paradigm in enterprise computing. It can happen, it should happen, and we should make it happen - but there are mountains to climb and oceans to cross.
More on the challenges and obstacles as well as the futuristic vision - later
Monday, September 10, 2007
If SQL extensions are the answer then what is the question ?
If the idea (as most vendors aspire) is to have a most general event processing language, there are some cases, in which I find mismatch between SQL type of thinking and EP thinking, let me point out some of them: first -- SQL is set-oriented, which means in case of "join" it conceptually starts from the Cartesian product, and then creates subsets by the select and project operators. In event processing some of the applications are set-oriented (e.g. finding trends in time-series) but many of them are "event at a time", where for each individual event, there is a check if some pattern is matched. While, it is possible (sometimes with difficulties) to express pattern matching in SQL, it is not a natural way to think about it, especially Second - SQL lacks abstractions that allow to fine tune the semantics. In the past I have presented a relatively simple example on the Yahoo CEP-Interest group, and have been shown SQL solutions that can solve it, but with a price of highly complex queries. Anybody interested in the details can read the example in: http://tech.groups.yahoo.com/group/CEP-Interest/message/678 there are some follow-up actions that have shown how it is done in SQL, and you can get your own impression.
However, since event processing is not a monolithic area, there may be a place for specific cases, which do not intend to provide a general language, is there a benefit to use SQL in such specific cases ?
This goes into the issue of relationships between databases and event processing which deserves more attention and will be a topic of one of the next postings on this blog.
More - Later
Sunday, September 9, 2007
EP, CEP, ESP, DSP, SEP, MEP, BPEP and more
According to this distinction - SEP (simple event processing) deals with filtering and routing (e.g. pub/sub) and is the most pervasive means of event processing, MEP (Mediated event processing) enables transformation (translation, aggregation, splitting), validation and enrichment of events. CEP (Complex Event Process) derive events based on detected patterns in the event history. BPEP is really making the consumer and producer - ready for event processing, by instrumentation on one side, and orchestration on the other side, thus, while it is part of the architecture, it is somewhat different creature.
There is, however, some potential overlap between the different levels -- i.e. aggregation can be either MEP or CEP. One possible border line is "stateless" (MEP) vs. "statefull" (CEP), but this will limit the concept of MEP, another possible border is that MEP may have a state, but in this state raw events are not preserved and only accumulative information is preserved.
And what about DSP, ESP etc.. -- some people in the community makes the distinction that CEP processes "posets" (partial order sets), while ESP processes sequences (totally ordered set). Since sequence is a special case of poset, one can claim that if poset is supported than also sequence is supported. In some applications the order is important, and we can indeed look at them as a class of event processing applications, but there are other classes of applications with characteristics - examples: transactional applications, applications that support uncertain events, applications that require retrospective processing, applications that do not require recoverability and more - thus, the "total order" is just one of many sub-types of event processing. Furthermore, in the same application, some patterns may require total order, while others don't require it.
A common misconception about "complex event processing" stems for the possible ambiguity in parsing this expression. While some people interpret it as "complex processing of events", the meaning is "processing of complex events"; this processing can be quite simple, indeed...
Interestingly - we see that the term "stream processing" which came from the academia. did not survive as a marketing term. Looking at the homepage of "streambase" I find the terms CEP and complex event processing all over ( the term "stream processing" has disappeared), Coral8 also have "complex event processing" in their title, and so is Apama - that used "event stream processing" in the past. Apama is now talking about "event processing platform" and "complex event processing language", which seems to be consistent with what's written here.
Bottom line: It seems that Event Processing is catching as the name of the area (or end-2-end game), Complex Event Processing as the name of the part that does processing of multiple events (detect patterns and derive events), and stream processing - is very much alive in the academic community, but did not survive in the market.
More - about the relations of event processing to databases -- later ...
Saturday, September 8, 2007
Event Processing and the Babylon Tower
Looking at current approaches, the following classification can be made:
1. Extension of existing programming language (e.g. Java) with some abstractions, but the programming is still imperative.
2. Script languages
3. SQL based languages - based on pure or extended subset of SQL.
4. Rule oriented languages -- the term "rule" is also quite overloaded, and under rules we
can find some languages:
a. Reaction rules (ECA rules).
b. Inference rules
c. Derivation rules.
5. Pattern oriented languages
6. Visual languages.
In some cases there is no single pure language, but a combination - e.g. pattern + derivation or pattern + SQL or script + reaction rules.
Each style has its own merits and drawbacks, and since "event processing" is not a monolithic area (but this is a topic for a short article of its own) there needs to be still a study about programming style vs. type of developers, and nature of application.
Anyway, two question are being raised for this situation: Is it possible to have a standard language that will replace all current languages ? and is it feasible to have such a language ?
The answer to the first question is probably yes, the answer to the second question is not in the near future.
While there are various approaches, the functionality in all of them have large overlap, and with some effort, there is a possibility to get to a language that will unify the functions of all existing approaches (perhaps with some levels of basic language and various extensions). However, with the investment already done, and the strong belief of various vendors in their own attitude, and will take time, and market pressure to consolidate here.
In the interim period, we can devise a "meta-language" that will correspond to a "meta-model" that will include the standard functionality in this area, and translation between this "meta-language" to the various implementations, and thus a weaker way of standard compliance will be - supporting the meta-language.
More discussion in the meta-language later...