Event Processing Thinking

Sunday, September 5, 2010

Going east

Packing to go abroad again -- this time to the east, the destination is Singapore, where I am participating in VLDB 2010 and giving a tutorial on event processing - past, present and future.
But first -- I am flying later today to Hong Kong, as a first stop in the east.

Tuesday, August 31, 2010

Some thoughts on data mining and event processing - take one

Somebody with whom I've talked to last week said with some irony that in order to get attention today - no matter what you do, you have to say that it has something to do with clouds, and that it performs some kind of analytics. Well - it is a clear bright day today here without any cloud, so I'll delay the discussion about clouds to rainy day. As for analytics, there are various types of analytics, today I'll write something about data mining and event processing. There are two sides here: what event processing can do for data mining, and what data mining can do for event processing. Let's focus the discussion now about the second issue. The answer seems easy, an event processing application is modeled by event processing network, that consists of event processing agents, which in most current implementations are implemented by rules/queries or some other constructs. Today the application is being composed manually using some authoring tool - however, there is a frequently asked question, can the computer somehow use magic to compose the application itself -- this is a natural candidate problem for data mining. The achievements of data mining in event processing until today are somewhat modest, but there still might be a promise there.

So - let's go further and explore the potential. We can look at three types of functions that an event processing application may be assisted by data mining:

Case in which we are looking for anomalies in general, the data mining can assist in identifying that we have anomaly now, this is actually a different type of application in which there are no preset patterns.
Case of detect of trends/thresholds oriented, where the thresholds can be adjusted by data mining
Case of pattern detection - where the patterns can be determined by data mining.

The first type is a classification issue - and this can be done by some types of learning of what is normal behavior.
The second type -- learning thresholds also has some known methods in data mining.
The main problem is the third one -- learning patterns. There are several difficulties there, the first one is the intent; data mining typically discovers events that happen together, this by itself may not be of interest, since the aim of patterns are to detect situations that require reaction, thus there is some additional semantic knowledge here that is not captured by data mining without providing additional informations, furthermore the pattern may occur very rarely, such that it will not be captured within the existing data; another difficulty is the richness of pattern types and the various variations of patterns, so looking at the space of large possibilities.
Successes in this area were typically limited to a certain type of pattern within a certain temporal window -- for example, there was some work that I familiar with to mine sequence of two events within a given temporal window, this again belongs to events that happen together, where a human has to go over all combination and decide whether they create an interesting situation.

Bottom line -- no magic bullet, but any breakthrough in this area will be helpful

Sunday, August 29, 2010

Congratulations to Richard Tibbetts for being named as TR35 2010 young innovator

Congratulations to one of the notable persons in the event processing community - Richard Tibbetts, Streambase's CTO, for being named as TR35 2010 young innovator by the "Technology Review", a media company owned by MIT.

This award is granted to young innovators under 35 years old, whose inventions and research are the most exciting, according to the judgement of the editors.

There are also some other awards, most notably some the prestigious mathematics awards, that has age limitations - well the world belong to the young people these days.

Anyway -- Richard is certainly both young and a notable innovator, and this shows another indication of the interest in event processing within various forums. The award is indeed well deserved.

Wednesday, August 25, 2010

First glance on DEBS 2011

We are doing now the first stages of DEBS 2011, I have taken upon myself the role of general chair (some people never learn...). DEBS 2011 is going to be hosted by IBM Research in Yorktown Heights, NY.
The team has been established, and you can view it in the conference's website that has been constructed by the conference Web chair, Darko Anicic. We have started also to work with ACM about the logistics details, hopefully the CFP will be released within a couple of weeks.

DEBS 2011 will both continue the tradition of DEBS conferences (the 5th as a conference and the 10th if we include the workshops), and will also include several new components that will be advertised soon, in a title level: we plan a "DEBS challenge" demo session, and "new ideas gong show" session - more details to follow.
DEBS 2011 will also have a strong collection of keynote speakers. The four keynote speakers (who already confirmed) are:

Chris Bird - Chief Architect of Sabre airlines, who will provide talk from the point of view of an end user

Dr. Don Ferguson - CTO of CA, who will provide talk from the point of view of technology providers

Professor Johannes Gehrke - from Cornell university and

Professor Calton Pu - from Georgia Tech
both of them will provide talks from the research point of view.

More details about DEBS 2011 -- later.

Tuesday, August 24, 2010

EPIA on JavaRanch

This is the logo of the "Big Moose Saloon", the JavaRanch forum. I am new to this site (which ranks me as "greenhorn"); this site has a Q&A forum about the EPIA book, opened from today until Friday the 27th of August, with a chance to win a free book. For more details: http://www.coderanch.com/t/507730/java/java/Welcome-Opher-Etzion-Peter-Niblett

More on event processing as business and technology and how this is related to CEP 2.0?

It seems that the question whether event processing is a stand-alone technology or embedded technology in other area that I've written about recently following Philip Howard's Blog, it spreading over the community Blogland. Louie Lovas, in his relatively new role in OneMarketData, takes advantage of this discussion to highlight his company's solution about integration between EP and tick databases to yield a certain type of application. Rainer von Ammon, in response to my Blog, provides his opinion that typically organizations are purchasing an industry oriented solutions and not technologies (which is true in many cases, but not all), Paul Vincent asks in his Blog whether "CEP is just a supporting act?" concluding from TIBCO's experience that the answer to this question is negative. Marco Sierio asks where is CEP heading and preaches for CEP 2.0 that will be based on abstractions and will not have rules or SQL as a basis. Some other comments to the Blogs ask about metrics to determine when EP technology should be used and when other technologies.

So - what can we learn from all these? I am not sure that we can learn anything new that was not already discussed, actually such discussions return in cycles from time to time. When we complete the document that we started to generate in the Dagstuhl seminar (it will probably take 2 months or so), we'll articulate some of this.

There are three basic issues: Does event processing has vitality as a technology? Is it define a stand-alone market? and how do we go about the 2.0 generation?

If we look at the considerations for the use of event processing as a technology, alternatively to hack in other technologies or just hard code the functionality ad-hoc for applications, one of the consideration that was mentioned is the ability to deal with high throughput of events, which is not a trivial task to achieve with hard-coding or regular technologies. However, it seems that experience got us to realize that the more noticeable benefit is the TCO.

Dan Galorath produced the TCO chart seen above related to software, there are also evidences that the software development in event-driven application achieved substantial reduction (sometimes in ratio of 1:4) relative to conventional solutions. In the last DEBS conference somebody remarked about the Fast Flower Delivery use case that is discussed as a pivot example in the EPIA book that this is a BPM example, since the event processing network looks to this person as a workflow (it has totally different semantics!), so my challenge is -- go ahead and implement in with a BPM system and then we'll compare the development time.

As seen from the chart, the software maintenance contributes much more to the TCO than software development, and here the use of higher level abstractions that leads to ease of change intensifies the difference. Rainer quotes somebody who says that there are charlatans in this area, true - there are charlatans in every area, when relational databases emerge, all of a sudden, implementations of relational databases were provided by people who did not understand what a relational database is (not just flat files!) and did not understand that they did not understand what a relational database is -- nothing is new. Investment in development in various solutions is something that can be measured.

As for the other question - whether event processing is a business as a stand-alone, I have already referred to it in the previous post, the answer is -- yes, for certain types of customers and applications, and embedded technology both in other technologies and applications. My guess is that the embedded mode represents higher portion of the market that will be still growing.

As for CEP 2.0 --- here I agree with Marco that the next generation should not be incremental. In the EPIA book we have introduced some abstractions that are independent of the implementations of the first generation, we are exploring them now as a basis for the second generation. I guess that this is also a challenge for the research community. More -later

Sunday, August 22, 2010

On event driven vs. business intelligence drive viewpoints

The last few days in Israel were extremely hot, one local newspaper claimed that Friday was the hottest day in Israel within 112 years. Now it is somewhat less hot, but still very hot. Relief is expected later this week.

I am playing now with the new editor of the Blog editor, which looks like Wiki editors, it seems that web editors are starting to converge into some form.

Anyway -- recently I have read some "business intelligence" stuff - ("analytics" is now a hot buzzword in IBM, and probably outside IBM as well). In business analytics terminology people talk about three phases: descriptive, predictive and prescriptive, while in event processing we also talk about three phases: responsive, reactive and proactive. So I was asked - are those terms equivalent. The answer -- not exactly.

Let's start with business intelligence, or analytics in general. The main starting point is: we have historical data, we can present it in different ways, we can learn from it something that can provide observations, and can predict future data (e.g. by trends) and then we can propose actions to bridge gaps towards our goals.

The basic starting-point -- analyzing existing past data. The first phase is descriptive -- describes what is seen in the data, this is the most common use of business intelligence.

The second phase is - predictive, find trends and extrapolate into the future, predicting future values of the same data.

The third phase is prescriptive - given the predicted data, and possible gaps between this predicted data and the enterprise's goals -- propose a way to bridge the gap, e.g. change inventory policies, change risk policies, even getting to change business processes.

Event processing is starting from different viewpoints - there are events happening now, and we would like to react to them -- the metaphor is -- a dangerous bear is approaching and I need to react.

In event processing the evolution is starting in "responsive" - in this case, indeed event is treated as data, information about events arrive using queries, search, or even applying any kind of analytics, this is the regular mode of programming, but it is data-driven rather than event-driven. It may be applicable to some applications, will not be very helpful in the case that the bear is chasing you. Event driven architectures and programming has enabled the next phase in the evolution - reactive programming, in which predefined alerts or actions can be triggered by the fact that an event has detected, or that an event pattern has been identified. Currently the state-of-the-practice in what is defined under event processing applications fits this category. The next step in the evolution is proactive, which means that by computerized means we'll be able to identify predicted events, and then a decision of how to mitigate or eliminate the event is being taken, for example when a bear is chasing me, I need quickly to decide whether my best bet is to hide, escape, or shot tranquilizing darts at the bear

The decision is done on-line, and has some timing constraints (depends on how close the bear is).

Having explained the basic terminology, back to the original question, how are these terms related.
First, the goal of business intelligence and event processing are typically distinct, however there are some points of overlap. From the BI perspective, reactive event patterns can be used as a component of predictive analytics. Proactive event-driven processing can be thought as a type of prescriptive analytics. The overlap occurs when the analytics system has real-time component, which requires that the prescriptive analytics will be done on-line and with some timing constraints, this turns it from being data-driven to be event-driven, but one can think of prescriptive system that is totally off-line - analyzing data in batch, predicting shift in trends, and change the policies for the next year/quarter.

From event processing perspective, analytics tool can be used in populating the event patterns, but this is not that easy -- I'll write soon about some thoughts on the feasibility of patterns learning.