Saturday, May 2, 2009

Packing for a one week (net) travel to the USA. We had recently been informed that the paper entitled: A stratified approach for supporting high throughput event processing application
has been accepted for presentation in the DEBS 2009 conference that will take place in early July in Nashville. The paper written by Yuri Rabinovich, Geetika Lakshmanan and myself describes results obtained last year in our scalability project, the project is still going on, and its results will be flowing to IBM products.

Here is the abstract of this paper:

The quantity of events that a single application needs to process is constantly increasing, RFID related events have been doubled within the past year and reached 4 trillion events per day,
financial applications in large banks are processing 400 million events per day, and Massively Multiplayer Online (MMO) games are monitoring in peaks 1 million events per second. It is evident that scalability in event throughput is a major requirement for these types of pplications. While the first generation of event processing systems has been centralized, we see various solutions that attempt to use both scale-up and scale-out techniques. Alas, partitioning of the processing manually is difficult due to the semantic dependencies among various event rocessing agents. It is also difficult to tune up the partition dynamically in a manual way. Manual partitioning is typically vertical, i.e. there is a single partition set with centralized routing. This paper proposes a horizontal partition that is automatically created by analyzing the semantic dependencies among agents using a stratification principle. Each stratum contains a collection of independent agents, and events are always routed to subsequent strata. We also implement a profiling-based technique for assigning agents to nodes in each stratum with the goal of aximizing throughput. A complementary step is to distribute the load among the different execution nodes dynamically based on performance characteristics of nodes and agents and the event traffic model. Experimental results show significant improvement in the ability to process high hroughput of events relative to both centralized solutions as well as vertical partitions. We find this to be a promising approach to achieve high scalability without requiring difficult manual tuning, especially when the traffic model and the topology of the event processing network is often changed.

More about event processing distribution and parallelization will be discussed in subsequent postings.

DEBS has also issued recently a call for fast abstracts, posters and demos, an opportunity to share with the community work that is in less mature phase. show interesting demos, and discuss ideas.

Wednesday, April 29, 2009

On events and relativism

Today has been a holiday, the Independence day of Israel, and we spent some of the day in going to an exhibition called "Body World", in which there is an exhibition about the human body and its various functions using parts taken from dead people who contributed their body using some preservation method developed by somebody in the university of Heidelberg, actually I also spent some of the holiday working, since there was something that has become artificially urgent. These high-tech corporates makes you a slave, I am getting too old for this...

Anyway, in Israel the day before the "Independence Day" is the "Memorial Day" to remind us that the independence has its cost. It reminds me that in my first year in the USA (I lived in the USA for several years around 20 years ago), there were signs in the street saying "memorial day sale", we thought that somebody is making a joke in a bad taste, which does not really fit the famous "politically correctness" of the Americans, but found out after talking with some local people that the typical American does not attribute any semantics to the "memorial day", and it is just a long weekend with sales and travels like any other long weekend, well -- a cultural difference, since in Israel, memorial day is taken seriously.

Today, I wanted to say something about events and relativism. One of the questions about chapter 1 in the "Event Processing in Action" book on the forum came from Richard Veryard.
The question has been:
How do you count how many events? If you have a three-car pile-up, does that count as one collision or two, given that the third car hits a few seconds after the first two? Or three collisions, if the third car hits both of the first two cars?

My answer has been that the decision is relative for the application. From the insurance company or companies of the cars involved it may look as three different events, since the event refers to a single car; from the point of view of the traffic police it may be considered as a single event, where the number of cars involved is an attribute.

Another facet of relativism is whether an event is raw or derived. The event can be raw event from the point of view of a certain application, since it is provided from the outside, however, the consumer is sending an event that has been produced by another event processing application, and from the point of view of the producing application, this is a derived event. There are probably more example of relativism.

Monday, April 27, 2009

More on Revision

Long day today, I got to the office around 8AM and left around 9PM. Since we have holiday this week I am trying to condense the remaining days of the week and the result is long day with plenty of conference calls. The picture above is a glance (from below) on the IBM Haifa Lab (the pair of connected building on the right hand side of the picture), my office is in the back building (known as the "banana" due to its shape), and is not really in the nice part of the building -- the one with the view to the Haifa Bay -- well, one can have everything in life -).

I still need to complete the previous posting on revision. I gave some explanation about the concept of revision, and now I still need to discuss implementation of revision in event processing. To recall -- a revision in event processing is getting later knowledge that asserts that either a reported event did not really happen, or some information associated with the event was wrong.

Let's look at two separate cases, one in which the processing has not gone out of the event processing network, and second that the results of the processing have gone out to the "outside world".

In the first case, there may be an opportunity to revise the impact of the revised event by doing kind of undo-redo for all the event processing agents that it passes directly or indirectly. Direct ones are easy -- those that the revised event participate as an input in them, indirect is more tricky, since we need to trace the causality among events, in this case, an event that is an output of an event processing agent in which the revised event participate (relate to the same context) has a causality relation to the original event, thus, an event processing agent, in which this event participates as an input, also needs to do an undo/redo, and causality is a transitive relation, so it continues as far as the EPN arrived so far. It should be noted that the fact that there is a causality may not require a real undo/redo, take as an example that an event of type E1 designates a bid, and the event of type E2 designates the bid with maximal value arrived in a certain time interval. Let's assume that a certain bid has been revised, however, neither the revised bid, or the revising bid change the selection of E2.

The second case is that the revised event has consequences that have been sent to an external consumer, thus, it may have triggered an action, a collection of actions, or a workflow that has been carried out, and this may propagate further ("the butterfly effect"), in this case, either we can treat it as "too late" and do nothing, however, there may be a cases that it can be critical to undo/redo also the consequences, e.g. the revised event has some financial meaning. In this case we'll need to issue compensation for the triggered action, which may be impossible (the consumer does not support compensation) or difficult. I'll blog again about revising the history and its aspects at a later phase.