Event Processing Thinking: benchmarks

Showing posts with label benchmarks. Show all posts

Thursday, April 18, 2013

Progress Apama announces a version which compiles to native machine code

Progress Software announced today on the release of a new version that compiles the Apama EPL into native machine code, claiming to improve the performance of the previous version by 2000%. They don't mention what they actually measure. The big data era renews the investment in scalable event processing solutions with various ways of optimizations. We may start to see specialized event processing hardware.

I think that it will be useful to establish a set of benchmarks, since it was seen in some works that there are huge differences in performance between types of event processing application - for example: those doing mainly filtering, those doing mainly aggregations, and those doing pattern matching. It will be good to have a set of benchmarks that fit different types of applications, and a method to map application characteristics to a specific benchmark - to avoid the phenomenon that vendors cite numbers that cannot be compared. More -later.

Friday, December 26, 2008

Footnotes to Philip Howard's - "Untangling Events"

My employer, IBM, does not allow to transfer vacation days across years, thus, even that I do not celebrate any major holiday this week, I have decided that this is a good time to spend the rest of my vacation days for the year, and takes two weeks off (one of them is already behind me) - spending some time with my children, taking care of some neglected health issues, and also reading books (it is rainy and cold, not a time to wonder around much...). I have looked today a little bit on the Web to see if I have missed something, and found out on David Luckham's site, a reference to Philip Howard from Bloor who writes about - untangling events. I understood that Philip is trying to look at various event-related marketing terms and determine whether there are synonyms, whether there is a distinct market for each... In doing that he is trying to list the various functions done by event processing applications and then gets to the (unsurprising) conclusion that each application does some subset of this functionality. but at the end he admits that he did not get very far and left the question unanswered, promising to dive more into it.

In essence he is right in the conclusion -- all the various functions create some continuum which a specific application may need all or a subset of them. Typically there is a progression - starting from getting the events and disseminate them (pub/sub with some filtering), then advancing to do the same with transformation, aggregation, enrichment etc -- so the dissemination relate to derived events and not just to the raw events, and then advancing to pattern detection to determine what cases need reactions ('situations') and what events should participate in the derived events (yes - I still owe one more posting to formally define derived events).

One can also move above all of these and deal with uncertain events, mine event patterns, or apply decision techniques for routing.

I think that there are multiple dimensions of classification of applications:

Based on functionality; as noted above.
Based on non-functional requirements -- QOS, scalability in state, event throughput etc,
Based on type of developers --- programmers vs. business developers
Based on goal of the application --- e.g. diagnostics, observation, real-time action...

There may be more classifications --- the question is whether we can determine a distinct market segments ? probably yes -- with some overlaps. This requires empirical study, and indeed this is one of the targets of the EPTS use-cases working group that is chartered to analyze many different use cases and try to classify them. Conceptually for each type there should be a distinct benchmark that determines its important characteristics.

Still - I think that all the vendors that are going after "event processing" in the large sense will strive to support all functionality. As analog: not all programs requires the rich set of built-in functions that exist in programming languages, but typically languages are not offered for subsets of the functionality. Likewise -- looking at DBMS products, most vendors support the general case. Note that there is some tension between supporting the general case and supporting a specific function in the most efficient way, but I'll leave this topic to when I am blogging in an earlier hour of the day --- happy holidays.

Thursday, November 15, 2007

The MARK on the BENCH - and the mythical event per second

Recent news item from BEA is talking about a benchmark and cites some EPS (Event Per Second) figures. Unlike some vendors that just cite numbers, there is also a white paper describing the benchmark. I don't wish to refer to the BEA benchmark specifically, but to share some insights about benchmarks in general. Benchmarks have a positive side, in which they are enable either to compare different products based on the same criteria, or to evaluate some properties of a product, even when not comparing it to others. Currently there is no "standard" benchmark in the event processing area, thus, vendors are inventing their own benchmark, carefully designed to expose much of the strengths, and none of the weaknesses of their products, and create benchmarks that may be non reproducible in other environments, or with some change in the application. Thus, to make any significant comparison between different products, standard benchmarks need to be constructed. Standard benchmarks, by themselves, may be double-edge sword, since we have benchmark-driven industry, vendors will invest a lot of resources into optimizing for the standard benchmark, however - this may not help a specific application, since its requirements may be far enough from the benchmark. Event Processing is heterogeneous area, which means that a single benchmark will not be sufficient - we need a collection of benchmarks, and each customer will have to chose the one or more benchmarks that are closer to its requirements. The standard benchmark should come from a vendor-neutral organization. I know of some academic work in this area, but more needed to be done.

And a word of caution - all the benchmarks refer to performance characteristics such as latency and throughput. But as noted in a previous post on the mythical event per second, I doubt if these are the main decision criteria in most applications - thus benchmarks should refer to other dimensions (functions, consumability, other non functional requirements), while, there are certainly cases that the high performance characteristics are critical, in general, I think this is over-hyped a bit. more - later.

Event Processing Thinking