Event Processing Thinking: scalability

Showing posts with label scalability. Show all posts

Friday, July 29, 2011

The 10 rules of scalability

Yesterday I got an Email from LinkedIn with 77 pictures that show those of my contacts who moved from the beginning of 2011, while some of them just sent title within their organizations, most of them indeed moved to another organizations, some from senior positions in big companies to start-ups. Is it a trend now that 2011 is a year in which many people are moving?

ACM keeps sending me hard copy of the "Communication of ACM", in addition to send Email whenever the electronic copy is available. Yesterday I browsed through the June 2011 issue (it takes more than a month to get it delivered), and found out the paper entitled: "10 rules for scalable performance in simple operation's Datastores" by Mike Stonebraker and Rick Kattell. The 10 rules are summarized in the illustration below.

This is a mix of various types of advises: from - use high availability and automatic recovery, to don't try to build ACID consistency yourself, through don't be afraid to use high level languages, and even use open source. The domain of this paper are - as declared in the title - "simple operation's data stores", the question is what can we learn about scalability in event processing -- which is somewhat different -- neither focused around data stores, nor around simple operations. Also, scalability in event processing have several dimensions, not just scalability in the number of events, in fact in the DEBS 2011 tutorial we mapped all the scalability dimensions in event processing

I guess that shared-nothing architecture is always a good practice, the use of high level languages and the utilization of main memory are also good practices. Recovery is a matter of application's requirement, for some application recoverability is vital, for others - it is not really necessary. As for the use of open source, it is again depending on the context. In summary -- some of the rules are rather well known best practices, some are subjective, and some are context dependent.

Tuesday, April 13, 2010

On the virtualization of event processing functions

There is some discussion about scale-up and scale-out for measures of systems scalability, as indicated by a recent Blog of Brenda Michelson, I would like to refer to the programming model aspects of it. Parallel computing becomes more and more a means of scalability due to hardware development and barriers in the scalability of a single processor that stem from energy consumption issues. In event processing both parallel and distributed computing will play important role, as we a large, and geographically distributed event processing networks.

The main issue in terms of programming model is that manually programming a combination of parallel programming and distributed programming is very difficult, since many considerations are playing here. The solution relies on the notion of virtualization. The event processing applications should be programmed in a conceptual level, providing both the application logic and flow, but also policies that define nonfunctional requirements, since different applications may have different important metrics. Then, given a certain distributed configuration that may also consist of multi-core machines, the conceptual model should be directly compiled into an efficient implementation based on the objectives set by the policies. This is not easy, but was already done on limited domains. The challenge is to make it work for multiple platforms. This is part of the grand challenge of "event processing anywhere" that I'll describe in more length in subsequent posts. Achieving both scale-up and scale-out in event processing require intelligence in the automatic creation of implementation, and ability to fully virtualize all functional and non-functional requirements. More - later.

Thursday, March 25, 2010

More on event processing agents

There are various agents types of agents in reality, like the one shown above, likewise there are various agents types in computing, as I've discussed long time ago in this Blog, but since that time the thinking was somewhat evolved. Recently, Jim Odell, a long time agents advocate, has been hosted by the TIBCO CEP Blog, and advocated the use of agent technology in event processing, providing scalability as main motivation. In the Event Processing in Action book we are making EPA (Event Processing Agent), a term coined by David Luckham, as the most notable building block in our model. The term agent is used as software agent, and not necessarily agent in the AI sense. We are using the term agent as a meta level, where in the run-time level there are agent instances that can be implemented in various ways. Event Processing Agents in event processing are event-driven in the sense that they are taking one or more events as input, perform some processing on these events, and derive one or more events.

EPA can filter events, transform events, detect event patterns or do any combination of the above. Event processing agents are typically (but not always) associated with context, thus context related operations determine when an EPA instance is open or close, assume that the context is a temporal sliding window of non overlapping periods of 2 hours, and there is an EPA associated with this context, then every 2 hours, an instance of this EPA terminates and another instance is initiated. In essence EPA interacts with events in various ways:

An EPA receive events as input
An EPA processes events
An EPA may query historical events
An EPA derives events as output
Through context -- an EPA instance may be initiated or terminated by events

Each EPA instance is autonomous in the sense that it does not communicate with any other EPA instance, and thus can be implemented by different run-time artifact, which indeed can enable scalability. However, there can be various grouping of EPA instances to run-time artifacts, where the two extremes are: run-time artifact for every EPA instance, and a single run-time artifact that contains all the EPA instances within the application.

Benefits of using EPA are -- simplicity of the model, modularity, and as said flexibility in implementation that may support various scalability and performance objective functions. I'll write more about EPA - later.

Sunday, December 13, 2009

On event processing as a service

While working on the Website of the EPIA book, we asked the language owners to provide downloadable version of the product implementing their language. I was asked by some of the language owners if instead they can provide instead a possibility to provide their software as a service and let the readers run it on their servers. My answer was positive, and we'll see couple of such examples (one already there, one is coming up).

The book's website is just a resource for readers who wish to study languages, but this brought me to a thought about event processing as a service in general.

Some of the reasons for doing it is to gain the benefits of cloud computing in terms of scalability,

I've recently came across some material about activeinsights which seems to be a new Israeli company developing open source "event stream processing" in the cloud (well - I have some terminology comments to them, but this is not the main point) that advocate the use of event processing in the cloud to cope with scalability issues. Using SAAS model for event processing can give rise to some interesting cost models, that are either related to the input (amount of input event processed) or the output (amount of situations detected by the event processing service, with some cost per situation, or amount of aggregated/transformed derived events) which ties the cost directly to the benefit. One of the barriers of using event processing as a service is lack of standards especially for interoperability, which does not enable just to connect and run, but requires substantial investment in writing adapters in a proprietary way. I assume that we'll see more of that, when the cost/benefit model will be clarified.

There are more interactions between cloud computing and event processing, such as the use of event processing as part of the cloud infrastructure, but this deserves a separate discussion.

Event Processing Thinking