Saturday, August 27, 2011
On streams, events, programming-in-the-large and programming-in-the-small
In the tutorial I've given in VLDB 2010, one of the first slides was a rhetorical questions - see above.
There are four opinions: some people think these are aliases, some people view stream processing as a subset of event processing that deal with ordered events, some people view event processing as a subset of stream processing, saying that event stream is one type of stream, and there are also other type of data streams such as voice stream, video streams, and there are also people who think that these two are actually totally different concepts, relating to different types of applications. There is something of true in each of them, looking at some interpretations, but IMHO none of the above is really true,
Curt Monash decided to renew the old terminology discussion on his Blog, Taking the "stream" approach which is favored by the database people which look at "data streams" as data in motion, and view events as type of data that does not need any real special handling.
The difference of opinions and terminology stems from the fact that some people are thinking about apples and some about oranges.
What is the apple? - let's take as an example the S4 from Yahoo Labs, in the Blog post I referenced here I mentioned that S4 is a platform for doing "programming in the large" for stream processing, what does it mean? -- it supports a data flow graph, where streams are flowing on the graph's edges, and the processing logic is embedded in the graph's nodes. How is this logic implemented? this is not part of the model, each developer can use the platform and implement the nodes, the platform takes care of the flow, and some non-functional properties (distribution, fault tolerance, cluster management, scalability in some aspects etc..).
It is a pure programming-in-the-large framework. There are others like that, in this case the model is blind to the type of stream, and the stream can indeed be video stream, voice stream etc.. I would call such a framework as "stream processing".
What is the orange? -- if we look at the abstract model of event processing, the way we defined it in the EPIA book, it is a model that is centered around the programming-in-the-small, with language primitives that related to the semantics of events: mainly the notion of context (when? where? to whom?) of events and patterns over multiple event occurrence. The orange does not sound at all like the apple.
Can something be both apple and orange? -- the answer is positive, while event processing can be implemented using various "programming in the large" models, we advocate the "event flow" one, and the "event processing network" can be mapped to the data-flow graph model of streams. So it is possible, but not necessarily to implement event processing as a kind of stream processing. It turns out that there are some benefits to do it, and we see that indeed this seems to become a dominant way for "programming-in-the-large", while the programming in the small is still based on the semantics of events.
The view point is always the hammer and nail issue. Those who have the stream processing "programming-in-the-large" see event processing as just an applications of their platform, and think that the platforms is the main thing. Those who are having event processing language view the semantics and functionality of the language as the main thing, and the platform as facilitator.
The intersection is not an overload, in stream processing one can add a node dealing with audio processing, but the event processing language might be of little value, likewise, there are implementations of event processing that are based on other programming-in-the-large models (such as: logic programming framework) and not on the stream model.
When looking at current state-of-the-art, we see that many of them indeed lie in the intersection of both, thus each of the sides can classify them its own way. The fact that most classify them as event processing may show where the market thinks that the value is.