Monday, June 16, 2008

On the right event processing language

Recently, the discussion about the "right" event processing language has been resumed in couple of Blog entries. First Mark Tsimelzon, the energetic CTO of Coral8, claims that the most important property is that it is familiar and thus similarity to SQL is a benefit. Louis Lovas from Apama argues that familiar syntax is indeed important, but readability is even more important, and argues for imperative language - since most programming in the universe is done in Java / C style; from the comments it seems that David Luckham consistently thinks that Rapide is the right answer. This discussion reminds me of the famous story illustrated in the picture above where a collection of blind people are touching an elephant in different places, and consequently each of them describes the elephant differently. Different people have arrived to the Event Processing area from different backgrounds, and have in mind different type of usages and applications, they also may see different types of users in mind - e.g. verification engineers can work comfortably with temporal logic language, C/Java programmers may have some inclination towards script languages, business users need somewhat higher level language etc... In some applications the main natural abstraction is "stream" as a set of events in a certain time window, and most patterns are set oriented (like: aggregation, threshold, trend seeking etc...), while for other applications the main natural abstraction is "event", since events are processed individually, and the type of patterns deal with individual events (like: conjunction, disjunction, time-out etc..). The challenge with the maturing of the area is to build a language with all the right abstractions, but we need to understand them well first. A comment about the language syntax seems familiar --- yes, it has a benefit, however following this type of reasoning, we would still be in Assembly languages which was what programmers were most familiar with when I started my career as a programmer (once upon a time...), we are advancing since then to provide abstractions. SQL is by itself an abstraction over imperative languages that has started someday, I have still written imperative queries with loops in the pre-SQL era, likewise, spreadsheet programming (which started with Lotus 1-2-3) does not look familiar, but became the most pervasive programming style that exists today (for non-programmers). Thus, IMHO, abstractions that enable to think naturally about a certain type of functions is more important than familiar syntax.


Richard Veryard said...

You wrote: "abstractions that enable to think naturally about a certain type of functions is more important than familiar syntax"

I'm not convinced that programmers think naturally about anything - perhaps that's what makes them programmers in the first place - but apart from that I think you've got a point.

In my post on Programming Languages I point out that vendors are often more concerned about easy adoption than correct use, and this is why they think familiarity is important.

gilad said...

and i thought the all idea is to do things faster...
as an end user who is looking into buying one and with financial data rates growing so fast i will need any c/java/sql/you name it that will be the fastest in work

Opher Etzion said...

Gilad, this is a common misconception that the main reason to use EP software is low latency -- while this is true from some application, it is not true in the general case, see :
for discussion.

gilad said...

Opher, you wrote:
"I think there are two sources - the first, the early adopters were from the capital markets industry, where some (not all !) of the candidate applications has indeed high performance characteristics"

and as i said that with market data rates going so much higher the most important for me, being in the capital markets industry, speed is on top.

so we agree (regarding financial industry) :)

Jon said...

Interesting post, and I like the elephant metaphor. To the idea of "abstractions", though, I'd add a related but slightly different idea, namely "models of computation". The rules-based approach, for instance, seems closest to tuple stores (like Linda and its ancestors and descendants), where computation proceeds by pulling tuples (events) out of the store and writing new ones in. In many of the streaming products, the model of computation is a different one, namely dataflow networks (based on Kahn's work in the 1970s). Data moves on edges from one node to another, and each node processes the data. Both are different from the RDBMS model, where data is stuffed into base tables and queried after the fact.

An understanding of the model of computation can lead one to different abstractions. I've commented on this in the Aleri CEP product (see also a recent blog entry). Our model of computation is dataflow, where the data is insert/update/delete/upsert of records. We take the dataflow model more seriously than the relational model. Nodes are programmable with operations other than the SQL, with our concepts of Flex Streams (in a little imperative language) and Pattern Streams. That's quite useful in practice, and indeed, the dataflow model suggests these extensions. In other words, by focusing on the model of computation rather than a particular language, we think we see more of the elephant.