Event Processing Thinking: stratification

Showing posts with label stratification. Show all posts

Tuesday, July 7, 2009

Live from DEBS 2009 II - our paper presentation

Stratification is one of the terms that computer scientists borrowed from Geology. In 2007, Ayelet Biger, my former M.Sc. student has done her thesis about Complex Event Processing Scalability by Partition which looked at semantic partition of an EPN graph to strata, where in each stratum all agents are independent and can run safely in parallel. Today we have reported about 2008 research project that took the stratification idea and developed a system to assign agents to machines in a distributed environment. Geetika Lakshmanan delivered the talk about this project today. I have posted it on Slidshare. Enjoy !

This project is an interesting example for a life-cycle of a project:

It started as an academic thesis;
It has flown to a research project within IBM Haifa Research Lab ;
After showing promising results in the lab, it evolved to a more "down to earth" project that deals with assignment of agents to machines and threads within the IBM product - WBE-XS (Websphere Business Events - Extreme Scale) which enable event processing on grid environment. This project has to take into account products and their implementation, deal with product instrumentation, and other stuff that pure research projects do not deal with.

While starting an idea in the academia and then going to a start-up has its magic, the work in IBM Research enables to get stuff from academic projects until impact through products, and using my hat as an adjunct professor in the Technion, this is possible. However -- as noted before, getting things pushed in a big company are not necessarily easy.

Saturday, May 2, 2009

Packing for a one week (net) travel to the USA. We had recently been informed that the paper entitled: A stratified approach for supporting high throughput event processing application

has been accepted for presentation in the DEBS 2009 conference that will take place in early July in Nashville. The paper written by Yuri Rabinovich, Geetika Lakshmanan and myself describes results obtained last year in our scalability project, the project is still going on, and its results will be flowing to IBM products.

Here is the abstract of this paper:

The quantity of events that a single application needs to process is constantly increasing, RFID related events have been doubled within the past year and reached 4 trillion events per day,
financial applications in large banks are processing 400 million events per day, and Massively Multiplayer Online (MMO) games are monitoring in peaks 1 million events per second. It is evident that scalability in event throughput is a major requirement for these types of pplications. While the first generation of event processing systems has been centralized, we see various solutions that attempt to use both scale-up and scale-out techniques. Alas, partitioning of the processing manually is difficult due to the semantic dependencies among various event rocessing agents. It is also difficult to tune up the partition dynamically in a manual way. Manual partitioning is typically vertical, i.e. there is a single partition set with centralized routing. This paper proposes a horizontal partition that is automatically created by analyzing the semantic dependencies among agents using a stratification principle. Each stratum contains a collection of independent agents, and events are always routed to subsequent strata. We also implement a profiling-based technique for assigning agents to nodes in each stratum with the goal of aximizing throughput. A complementary step is to distribute the load among the different execution nodes dynamically based on performance characteristics of nodes and agents and the event traffic model. Experimental results show significant improvement in the ability to process high hroughput of events relative to both centralized solutions as well as vertical partitions. We find this to be a promising approach to achieve high scalability without requiring difficult manual tuning, especially when the traffic model and the topology of the event processing network is often changed.

More about event processing distribution and parallelization will be discussed in subsequent postings.

DEBS has also issued recently a call for fast abstracts, posters and demos, an opportunity to share with the community work that is in less mature phase. show interesting demos, and discuss ideas.

Saturday, January 17, 2009

On Distribution and parallelism in Event Processing

This picture, taken from the site of Nature Reviews as part of an article about "parallel processing in mammalian retina", illustrates that structures like the human body distribute the functions it needs to perform, and performs many of them in parallel and by specialized systems.

Getting to event processing, the producers and consumers of event processing can be distributed, as events can come to many sources, and situations may be consumed by many sinks. The first generation of event processing was mostly centralized in processing, the centralization has been twofold: functional centralization using an monolithic engine that performs all processing fucntions, and location centralization, this engine runs on a single server.

Today I'll concentrate on the second aspect of centralization, there are various reasons to decentralize the processing, one is to do some of the activities closer to the producers or consumers, example: if a producer produces events, where only 1% is relevant to the defined event processing, and it can be done by independent filtering that does not depend on other events, then it will be more efficient that the filtering will take place at or close to the consumer site, and thus eliminate the unnecessary network traffic.

Another reason to distribute the functionality is the scalability aspect, which is really an old idea to "divide and conquer" problems. The challenge is how to do a "good" partition. First there is a need to define what a "good" partition is, i.e. looking at it as an optimization problem, what is the goal function, then solving it is a function of the topology, semantics and behavior of a particular application which can be dynamic.

IBM has recently released the first version of WBEXS (Websphere Business Event Exterme Scale) and made a statement of direction for another product: Infostreme Streams
both are aimed to handle scalability by distribution in different environments. While details about IBM products you can obtain from the appropriate people in IBM, we in the IBM Haifa Research lab are working on related topics, we have exposed initial results in DEBS 2008, in the fast abstract session introducing the statification approach.The project has substantially advanced since that time, and I'll discuss it further in future Blogs (well - I need to go over the Blog and list all the topics I promised to discuss later and have not done so yet...).

More - later

Event Processing Thinking

Tuesday, July 7, 2009

Live from DEBS 2009 II - our paper presentation

Saturday, May 2, 2009

Saturday, January 17, 2009

On Distribution and parallelism in Event Processing

Popular Posts

Contributors