Event Processing Thinking: complex event processing

Showing posts with label complex event processing. Show all posts

Monday, January 25, 2010

On reasoning about events and states

Yesterday, we watched in our local theater, the play "Amadeus". I have seen the movie when it was first launched, and another version of this play in the nineties. Still very impressive, and an opportunity to hear some of Mozart's good music.

Thanks to Paul Vincent's Blog, I clicked the link and got to Paul Haley's posting entitled "time of the next generation of knowledge automation". Paul Haley starts his postings by classifying four types of reasoning:

Reasoning at a point within a [business] process
Reasoning about events that occur over time.
Reasoning about a [business] process (as in deciding what comes next)
Reasoning about and across different states (as in planning)

The claim is that the first kind is being supported by business rules (called EDM in the original), the second kind is being supported by event processing, while the third and fourth kinds are not really supported by the state of the art. The third type requires getting the notion of context into reasoning while the fourth type requires getting cross state reasoning (which is different from cross event reasoning). I agree with the classification, and will write more in the future about the third and fourth types in depth. Actually, we also need reasoning that will combine the different types of reasoning as well.

Monday, November 9, 2009

On Stream Data Processing book by Chkravarthy and Jiang

Another related book that arrived yesterday is the book entitled: "Stream Data Processing: A Quality of Service Perspective - modeling, scheduling, load shedding and complex event processing".

First - let's start with a lesson in economics. Looking at the Amazon query about "event processing books", one can realize that the Amazon price for the book of Chandy and Schulte that I described yesterday is $32.97, the new EDA book, by Taylor et al costs in Amazon $37.30, and the book I am talking about today has Amazon price of $112.45 -- roughly a price of four books. So the economic question is what makes it so expensive? My guess is that the answer is that books of the type of the two referred book (and probably our upcoming book is within the same category) relies on the fact that people will want to buy these books out of their own pocket, while academic books, especially part of Springer series (this one is part of the series "Advances in Database Systems"), have captive audience of university libraries. I wonder how many people are willing to pay this price out of their own pocket for that book.

Now -- from the business side to the book itself. Sharma is an old colleague from my active database days. The book takes a database approach and starts by explaining why data streams are paradigm shift relative to traditional databases, then it moves to explain the notion of data streams, and gets into QoS metrics, moving to data stream challenges, and introduces CEP as a complementary technology whose support as part of the data stream management system is posed as a challenge, follows by a literature review, including a survey of commercial and open sources stream and CEP systems, that seems to me to have false positives and false negatives. Then start the more academic oriented discussion about modeling continuous queries, with theorems and Greek letters, next is discussion about engineering oriented aspects of DSMS like scheduling and load shedding.

After discussing all this, the authors move to discuss integration between stream and complex event processing, starting with differences, and stating that it will be difficult to combine incompatible execution models, nevertheless, the authors are not afraid of difficulties and a page later describe an integrated architecture, which is a layered architecture, where the stream processing is done first, as a result there is a phase of event generation, as a second layer, where the event processing is a third layer, and rule processing as a fourth layer. I think that strict hierarchical architectures are somewhat simplistic for realistic scenarios (I'll need to write something about it at later point) , then the authors dedicate two chapters to describe their prototypes, and the books concludes with conclusions and future directions, but they seem to be ideas to extend the current issues discussed.

Bottom line -- seems like an academic journal paper that has scaled up (324 pages including long list of references (not lexicographically sorted), and index. May have interest to those who wants to study the formal aspects of stream processing.

I also got with the package two books about causality models, but I need to read them first before making any comment on them.

Friday, September 25, 2009

More on what's next for EPTS

This is the logo of the event processing symposium, the building in the picture is opposite the conference's site. Back home now, after the event processing symposium and also co-located meeting of the EASSy consortium that is submitting a proposal for the EU R&D program.

This is a good time also to summarize the output of the symposium on the question what's next to EPTS. Earlier today this summary was sent to the EPTS members.

The main agreed goal of the "next steps" is Accelerating the activities and impact of EPTS, in order to do it, we'll need more people to enter the circle of activities in EPTS, some of the participants in the symposium, who have not been active so far, expressed willingness to participate in activities, and we are, as always, calling on more people to join.

Here are the activities, classified according to the activity area.

1. Existing working groups:

1.1. Glossary
There has been a discussion that included a conference call.
It was agreed that all comments and proposals from other work groups about terms in the glossary should be posted on the members wiki during the next month, after that time the glossary team will consider all comments and propose a revised version.

1.2 Use Cases
The use cases work group will publish its questionnaire and solicit use cases, also need to think of motivation to provide these use cases (see proposal on awards).

1.3 Language Analysis
The language analysis work group was asked to take the languages dimensions to the next level, so languages can be classified based on criteria. Also it will collect patterns and devise a library of patterns.

1.4 Reference Architecture
The reference architecture work group will continue its work to publish the reference architecture(s) of event processing.

1.5 Interoperability
This workgroup has been delayed. During the meeting it was re-established and will start activities.

There has been a general agreement that the workgroups should accelerate their activities; it was proposed that deliverables will be linked to external commitments (e.g. presentation/tutorials in external conferences) so deadlines will be established.

2. New proposed workgroups

Four new workgroups have been proposed. Each one has assigned a leader to write charter that will be put to a vote by all members:

2.1. Awards workgroups.

Awards will increase visibility, provide motivation for people to report various things (e.g. use cases) and establish EPTS as a recognized authority. Types of possible awards:
Research innovation award – given to researcher, Application innovation award - given to customer.

2.2. ROI

It was proposed that EPTS will publish a document about ROI of event processing in general and in different domains and industries; this will be used as a source for educating customers in a manner that is independent of a specific product.

2.3 EPTS promotion

It is proposed to promote the EPTS brand and awareness to its activities using website, webinars, white papers, press releases, participating in various conferences and more. This workgroup will recommend and coordinate these promotion activities.

2.4. Collecting Datasets.

Researchers require datasets for various research activities. This workgroup will help collecting these data sets, and makes EPTS as source of data sets repository.

3. Other activities

3.1 Coordination among work groups.

The issue of coordination among workgroups has been raised; the steering committee will work with the work group leaders to determine best way to act.

3.2. Grand challenges

There has been initial discussion on grand challenges. This discussion will continue on the members' Wiki. The Dagstuhl seminar in May 2010 will deal with the grand challenges issue.

3.3. Reach out to adjacent communities:

EPTS will continue to trace and participate in joint activities with the BPM, IT event management and robotics communities.

3.4. Association with DEBS

It was agreed to offer the DEBS steering committee to include EPTS as "in cooperation with" on DEBS website.

Saturday, September 12, 2009

On temporal aspects of event processing

In the past I was involved in work on temporal databases, in the picture you can see a 1998 book about temporal databases that I co-edited with Sury Sripada and Sushil Jajodia. Although there were some attempts to create substantial extension to SQL with temporal capabilities, and move temporal databases to the mainstream. This did not work, and there are several reasons, the event processing area provides a second chance for these idea to come to the mainstream now, as event processing have strong relations to temporal issues. Bob Hagman from Aleri (former Coral8) has recently written some survey of implementation alternatives related to time aspects in the Aleri Blog. In the DEBS 2008 language analysis tutorial we had dealt quite briefly with the topic of time. Earlier this year I have written a chapter in the upcoming book of the book "Handbook of Research on Advanced Distributed Event-Based Systems, Publish/Subscribe and Message Filtering Technologies; edited by Annika Hinze and Alejandro Buchmann"

This chapter is entitled: "Temporal Perspectives in Event Processing".
Here is the chapter's main topics:

Temporal dimensions: in temporal databases we dealt with the temporal semantics of a collection of snapshots (states), in event processing we deal with the temporal semantics of events (transitions). Are the temporal dimensions the same ? do they have the same semantics ?
The "instantaneous" issue -- do event occur over a time-point or an interval, and if it is interval what does it mean from computational point of view ?
Time granularity -- in temporal databases we introduced the term "chronon" which stands for the time granularity that makes sense for a particular use. This idea is also applicable to event processing, for different events, different chronons make sense.
Temporal contexts: the term "time window" in stream processing is a kind of a temporal context. What kinds of temporal contexts are required, and what is the computational implications of them. I'll write more about contexts soon, as this is the topic of chapter 7 of the EPIA book.
Temporal patterns: "complex event processing" is about finding patterns among collections of events; some (but not all) of these patterns are temporal in nature -- what are the temporal oriented patterns ?
Temporal properties of derived events: An event processing system derives events as result of its processing. What is the time properties of the derived events? this is a rather tricky question that deserves a discussion.
Ordering events: for some temporal patterns, knowing the order of events is important. What are the issues associated with keeping such an order, how out-of-order events should be handled ?
A related issue is "retrospective events" -- what happens if events that relate to the past are detected, where the assumption that they did not occur already triggered some processing ?

Issues of time in distributed environment -- clock synchronization, time-zone handling, time validity for mobile clients --- are all applicable for event processing.

As written, this is an outline of topics surveyed at that chapter, I'll write more about some of them in the future.

Monday, September 7, 2009

On Event Processing Patterns

This is an illustration that has been created by my of my former colleagues to the AMiT team, Tali Yatzkar, when she attended a "presentation course" as an excercise in the course to explain what is an event processing pattern (we did not use this term at that time), this is the original picture, it is animated (the animation is not presevered when copying from file to picture) and the geometric shapes in the left-hand side of the picture are moving. The idea is simple, there are patterns that designate the relationship between a set of events, e.g. a conjuction: event E1 and event E2 both hoccur in the same context (e.g. relate to the same person within 2 hours). This rather simple idea is the jewel of the crown in event processing systems, and the basis of what David Luckham called: Complex Event Processing. It is also what makes a composite event in active database terminology (I have discussed in the past the subtle differences between those term definition). This illustration in some variations has a life of its own, and we saw it in presentations of some other companies and people, I even once had to comment on a Slideshare presentation when it was attributed to (see my comment to this presentation). Anyway, besides giving Tali her due credit, I am writing about event processing patterns, since one of the chapters we complete now for the second-third review of the EPIA book deals with the notion of event processing pattern as a major abstraction. As all abstractions in our meta-language, a specific languages may implement a certain pattern as a language primitive, or implement it through a combination of language primitives. Those interested in the formal definition will need to read the book since the formal definition require definitions of several terms, so I'll give some a less formal definition here -- pattern is really a function that takes a collection (or stream) of input events that satisfy some filtering assertions (e.g. they have to be within context, and have certain other patterns) and returns zero or more "matching sets", which include a collection of individual events that collectively satisfy the pattern. Let's take a couple of examples:

The first example: Bid example: There is a bid for some auction that has been provided on an auction site. The idea is to select a single winner. The input events are acution offering events and bid events. The bid events are partitioned according to the auction offering they are refering to, and are also filterred out according to time (each auction is open for a certain amount of time only) and according to threshold condition (has to be no less than a minimal price).The matching set in this case consists of a single bid event per auction offering. The matching pattern here is - "relative max", which means that any event that we are looking for the event with the relative (to the other input events) maximal value of some attribute (in this case the bid amount). Note that the "relative max" pattern does not necessarily provide a single bidder, thus we also need a "synonyms policy" to determine what happens when we have multiple events of the same type that match the criteria. In this case we take the fairness criterion of FCFS, and the synonyms policy will be -"first", meaning the first bidder that offerred the maximal price. In our meta-language this looks like:
Pattern name = Bidder selection; Pattern type = relative max; Input events = (Auction offering, Bid); Context = (segment = by auction offering, temporal = auction offering is open); Filtering assertion = (Bid.Price >= Auction Offering.Minimal Price); policies = (cardinality = single, synonyms = first)

Note that in these three and a quarter lines we have expressed logic that is quite complex, and this is the magic of patterns. As an exercise to the reader, write the equivalent logic in Java, and then change it so that it will chose all bidders that have provided the relative maximum for a second round of bids.

The second example is a sequence example, this figure is being taken from the EPIA book; the example is looking at the case in which a patient is released from the hospital and then admitted again within 48 hours with the same complain that brought this patient to the hospital in the first time.

Here we are looking for a sequence (the order is important, of course), of the patient release event, and the patient admission event for the same patient with the same complain within 48 hours. The definition in our meta-language will be roughly:

Pattern name = Repeating admission, Filter type = sequence, input events = (Patient Release, Patient Admission), Context = (segment = by patient and complain) temporal = Patient release + 48 hours).

This pattern creates a matching set which consists of a pair of events of types patient release and patient admission).

Note that the pattern return the selected events, and the EPA can derive new events as the function of these selected event.

Here we saw two type of patterns: relative max, which is a set oriented pattern, and sequence which is event oriented patterns. I'll provide the list of patterns collected so far in one of the future postings.

Thursday, September 3, 2009

Getting closer to the peak of inflated excpectations

The Gartner hype cycle has a notion called "peak of inflated expectations", which states that different technologies go up in the hype ladder until getting to the peak, where people think that can solve much of the universe's problems, and then, somehow people realize that this is not the case and go through frustration and disillusionment, until realizing the true value (if any !), and getting back on track, now with the right set of expectations. Two recent Blog postings show some indication that we are getting closer when talking about event processing:

Carol-Ann Matignon from Fair Issac complains in her Blog that some people say that event processing = decision management and brings some counter examples where there are decision management applications that use batch processing.
Paul Vincent from TIBCO wonders whether CEP = continuous analytics.

Actually, both are right. Event processing may have a role in decision management, there are some applications of decision management that are pure event processing, some in which event processing has some role, but not doing the entire trick, and some that are really batch oriented data management. Likewise continuous analytics can be done in response to event (raw or derived) or just periodically. Event processing may or may not have a role in deciding when to do this analytics (e.g. optimize the traffic light setting), the optimization itself is typically not event processing per se.

I think that it is very good to observe that event processing can play a role in many areas, likewise, it is also good to be clear about its possible role, and which are the cases in which it has value, and which are the cases it hasn't, I guess that we'll have to wait for the enlightenment phase, in the hype cycle terminology until there will be more universal clarity about the role and value of event processing. More - Later.

Tuesday, August 4, 2009

On the Gartner 2009 application architecture hype cycle

Here is a revised version of my Blog entry that relates to the Gartner Application architecture hype cycle report (Gartner Report ID number G00168300 from July 16,2009) , the revision was done at the request of Gartner who asked that I'll make exact citations in their report, and make clear distinction between what is quoted from the Gartner report, and my own remarks.

Here are a collection of citations from the report that are of interest from the Event Processing perspective:

"Event-driven architecture (EDA) is an architectural style in which a component (or several components) in a software system executes in response to receiving one or more event notifications". In the report EDA is positioned under the hype cycle phase "Climbing the slope of enlightenment" which according to Gartner's terminology is defined as " Focused experimentation and solid hard work by an increasingly diverse range of organizations lead to a true understanding of the technology's applicability, risks and benefits. Commercial off-the-shelf methodologies and tools ease the development process"
CEP is positioned under the hype cycle phase of "Technology Trigger" which according to Gartner's terminology is defined as "A breakthrough, public demonstration, product launch or other event generates significant press and industry interest", and is the phase that precedes the "peak of inflated expectations" phase.
For CEP: "market penetration is 1% to 5% of target audience"
CEP use is expected to grow at approximately 25% per year from 2009 to 2014, but the use of COTS CEP products is expected to grow more than 40% per year in this time frame
For CEP COTS products: " Most of these products are immature and incomplete"
"Most business analysts do not know how to identify business situations that could be addressed through CEP, and that is limiting the rate at which CEP use can expand. Most software engineers are not familiar with CEP development"
"The Event Processing Technical Society (EPTS) was launched in June 2008, and it is expected to facilitate the adoption of CEP".

Here are my own comments:

Note that EDA and CEP are positioned in different phases of the hype cycle.
The fact that the market penetration is low indicates that there is still a substantial growth potential, if we can overcome the adoption challenges
The adoption challenges consist of product maturity and market awareness. We are now still in the first generation of products in this area and maturity is typically achieved in later generation. Awareness and understanding of value and positioning are indeed a challenge.
EPTS indeed has been formed to facilitate the adoption of the event processing area. Both challenges mentioned here – advancing the state of the art to accelerate the next generations, and educate the general community about the value and positioning of event processing within the enterprise computing.

Tuesday, April 14, 2009

On Innovation, Twitter and Event Processing

Reading in the newspapers about some troubles in Thailand reminded me about my family trip to Thailand two years ago, country that is full of paradoxes, but one of the most attractive to tourists, tourism is a major industry in Thailand and the local people have very innovative ideas for tourist attractions, one of them is to let people touch tigers, an animal known as one which is not easily tamed, here is a proof.

Talking about innovation, two recent postings recently motioned Twitter in conjunction with event processing, the one by Richard Veryard talked about "innovation in a small bakery" describing a small bakery which sent Twitter notification whenever freshly baked bread came out of the oven, and wondered why we don't see more innovation like this from IT departments in organizations. The second is a discussion thread that started by David Luckham on his CEP forum asking whether CEP can be used to trace Twitter notes.

Some people have already answered David, I guess that one of the difficulties is by the fact that Twitter messages are written in a free text, however, trying to apply full scale information retrieval techniques with statistical reasoning will miss the point of simplicity, the main consumers of such applications are individual consumers that cannot apply complex software and reasoning, one cannot kill flies with a cannonball. So what can be done: the messages (like in the bakery example) should be fixed and include keywords that can be easily filtered. For example, the bakery may have three messages about three products: bread, rolls and cakes and can also send alert when each of them have been fully sold. The bakery can provide a way to subscribe to each or make more complex subscriptions, and here comes event processing patterns to be used for example: I want to go to the bakery only when there is both fresh bread and fresh roles are available (and not sold out yet), which is a conjunction pattern, or I want to restrict it if the conjunction of the two event happened within 15 minutes, since I like it very fresh... In this case the identification of the event type can be done by a simple filter that search for the keyword "bread" or "cake" together with the event source (bakery) and time-stamp. If more stores join this type of service, I can do a conjunction of Twitter messages from both stores buying fresh bread and fresh shrimps in the same trip (adding some condition on the distance between the stores, given by an additional service. Furthermore, I can also condition any subscription in my personal context (e.g. only when I am at home, only in the morning hours etc...), and this is initial ideas, so for sure, Twitter as a platform for event notifications can have many usages (which does not say that every Twitter message carries event information).

Richard's question about innovation and IT department is more complicated one, however, there is some truth in his observation, based on my own experience in an IT department of a large organization, IT departments may be more conservative than their users, and typically will be cautious about be anything that is giving a "programming capabilities" to the users (and is conceived as "losing the control"). Since many event processing applications' value to the business users is doing exactly this (giving more power to the business user to do their own programming) it is sometimes a challenge to get it through the IT department, but this is a longer discussion...

Friday, April 10, 2009

On the boundaries of event processing

We are in the Passover vacation, to celebrate the biblical story of getting out from Egypt where the sea has split and people could move in the gap that was created, well -- I guess that at that time people just walked, but if it would have occurred today it might have looked like this.
Today I would like to write something about the "boundaries" of event processing, based on some discussions last week, related to writing a book about event processing. There are two issues related to the scope:

Is pre-processing to emit event by the producer, and post-processing of events by the consumer are part of the event processing systems?
Are pre-processing to obtain the event processing patterns that has to be monitored (i.e. using machine learning techniques) part of the event processing systems?

From the point of view of "event processing language", if we'll include the pre-processing and post-processing we'll have to extend the language to have the expressive power of any programming language, which will loose the focus on specific event processing functionality. Thus, while an application may require pre and post processing, this is typically outside the "event processing network". The main point of using "event processing language" and not hard-coding the event processing functionality in Java, C# or any other imperative general-purpose language is using higher level abstraction. As an analog, before the days of SQL we had to read from the database, loop over a record, and evaluate the conditions in hard-coded way, SQL did not provide anything we could not write in Cobol or PL/I (the languages of that time...), but just provided a more concise way to write it. The situation in event processing is similar, we can write something that is specified as:
" Match a pattern of events which is a conjunction of type E1, E2, E3 that refer to the same person and all occur within one hour since an event of type E0 for the same person, if there are several instances of E1, E2, E3 take the most recent of each at the point that the match occurred, and if there are multiple matches within this same time interval, ignore all but the first". Of course, one can write it in Java, but a language that enables to write this pattern in less than 1 minute is more cost-effective.

Back to the scope -- pre and post processing of events and patterns are not part of the event processing system, and typically done in different technologies. This does not say that they are not important, sometimes the pre-processing of events is more complicated than the event processing, especially since it is hard-code.

More on this - later

Tuesday, April 7, 2009

"Complex Event Processing poised to growth" in IEEE Computer

IEEE Computer, the flagship magazine of IEEE Computer society, publishes in the April issue an article entitled: "Complex Event Processing poised to growth" by Neal Leavitt, under the section "industry trends". The magazine, which has relatively large distribution, explains the basic concepts and trends of event processing, and cite some of the EPTS steering committee members like active John Morrell from Aleri, Alan Lundberg from TIBCO, David Luckham, Roy Schulte from Gartner and myself. It also cits some other people in the community. EPTS is also mentioned explicitly. The fact that one of the popular professional magazines chose to dedicate and article, indicates a growing interest in the area, and this is just one indication. As I noted before, the February issue of International Journal of Banking Systems, which is a specific industry journal, has published an article on "event horizon". Enjoy !

Sunday, April 5, 2009

On the "Return on Investment" in Event Processing

As part of the orders that I got from my physician, to which I am humbly comply, is to spend around one hour walking every day, if I have time I am doing it outside, if not I am doing it at home on an electric walker. Yesterday I walked around, not far from home, and decided to take a shortcut via a woodland, not far from home, but not a familiar one, I saw a trail that seemed to go up in the hill, where the top of the hill was supposed to lead me back to somewhere in my neighborhood, there was a split in the trail and I chose a random one, and after five minutes realized that it leads to a dead end, however, I did not feel like going down, so I continued to climb up and pave the way among bushes, fallen trees etc -- quite irresponsible of me, especially as it was getting dark outside -- after 40 minutes of wondering I saw the back yard of an house, navigated there, and got safe and sound indeed to somewhere in the neighborhood. Somehow felt as return to childhood -- but not for long.

Anyway, today I would like to write something about the "Return on Investment" in event processing.

Mark Palmer, the current CEO of Streambase, has recently blogged about the fact that CEP is not about "feeds and speed" but about "ease of use", it is actually refreshing to see it from a Streambase person, since in the past some Streambase people claimed that the only reason to use a CEP engine is due to its scalability properties. Actually I have written one of my first postings on this Blog, entitled "the mythical event per second" saying something about it. I agree that there are some applications that require to satisfy high throughput or any other QOS metrics as a crucial requirement, but this is a secondary ROI type. The major one is providing abstractions that reduces the cost of the development and consequently the maintenance of event-driven applications. This is similar to what the DBMS discipline provides us -- as a grey-bearded old timer who is not completely senile, I still remember the times we have worked with file systems, DBMS provided many abstractions that makes the data oriented applications much easier to develop. The same goes for event processing, I am constantly saying to people who ask -- is there something new in event processing ? the answer is -- not really, event processing were hard-coded within regular programming for ages, however, since traditional programming languages and environments were not created to process events, the manual work required is quite substantial. The reduction in cost relative to hard coding can be substantial, and some customers have estimated it in 75% reduction. It will be interesting to do an empirical study about it, probably a challenge for our EPTS use case work group. More about ROI -- at later posts.

Tuesday, March 31, 2009

On the next generation of event processing

I was asked by several people to post the presentation I have given recently in several places - the presentation including some views about where should we go in the next generation of event processing, what are the challenges in the various areas (the above illustration is a slide classifying the challenge areas) and a survey of some activities we are doing either in IBM Haifa Research Lab or with graduate students at the Technion. The presentation is now on slideshare;
enjoy.

Saturday, March 28, 2009

What the event processing discipline can learn from other disciplines ?

This is a picture of the Bahai Gardens, one of the famous sites in my home-city Haifa, the Bahai religion is an interesting one, more modern than most religions (I myself am agnostic and do not practice any religion, so this is not meant to be an endorsement), the Bahai people see Haifa as one of their holy sites and invest a lot in the city, this week they have a major celebration. I have returned to Haifa after my short trip abroad, in which I have given four times the same talk about "event processing - the next generation" (I'll post in on the web soon). One of the discussion points have been what can the event processing discipline (complex or not) learn from other disciplines that succeeded. Coming from a database background, it is always interesting to me to make a comparison there. When relational databases started to become products in the early 1980-ies, I have been a database practitioner with experience in several DBMS products, and in the beginning I looked at the relational model without much respect, it seemed to me to be over-simplification that gives up semantics and creates a lot of anomalies. However, the simplicity has been the main benefit. The relational model has won, and also created a big research community around it that concentrated forces around a single model and developed query optimizations, more semantic abstractions on top of it and some other stuff. The fact that there has been a substantial brain power dedicated towards a single direction was a contribution to success. The fact that was not a critical mass of work around object-based databases contributed to the fact that its success has been modest. What can we learn from that in the "event processing" discipline ? We need to strive to find the formal model that will be the basis for concentrating the community around. The model is not extension to the relational model, since this extension will loose the main benefit of the relational model -- simplicity. There are several relational algebras around, but all of them do not meet the simplicity criterion, on the contrary -- they are quite complex. So it is still a major challenge for the community. More on that and possible directions -- in subsequent postings.

Wednesday, March 25, 2009

Visiting Aston University and MEAP for "event processing in action"

This are pictures from Aston University in Birmingham which I am visiting today within my short trip, it turns out that from the university network I cannot get into the IBM VPN, so my hosts brought one of the university IT guys who confirmed that they are blocking access to private VPNs, as a matter of security policy, and the only person who knows how to configure a bypass is away today, and even if he has not been away, he doubts if he would have willing to do it, I could in principle go to town an look for an Internet Cafe, but I am too lazy, and can live a couple of days without Emails. Besides that the visit (which has not been over yet) has been interesting and we may have grounds for collaboration . I have given my talk about "event processing - the next generation" third time this week (fourth time tomorrow in another place...), I'll post it on the web after my trip. One of the question I was asked was whether (complex?) event processing techniques could have predicted the economic crisis, but I'll leave discussion about predictions to another Blog posting. They also may teach an event processing course next year and are looking for a textbook to base the course upon.

Speaking of book -- the publisher of the "Event Processing in Action" book has gone another step and included the EPIA book (currently the first two chapter drafts) in the MEAP (Manning Early Access Program), the referenced site explains how readers can become part of the authoring process by receiving draft of new chapters, and making comments/questions to the authors using the forum to ask questions and communicate with the authors. As mentioned before the introduction chapter has been posted as a green paper, and is free download to all. Although open type of writing is somewhat more difficult and time-consuming to the authors then just write the book without interruptions, I believe that this process can improve the quality of the book beyond the formal review process. So this is a call for the community to take advantage of this program and help in creating this book. Hans Gilde has already made the first set of comments for Chapter one.

I am not sure whether it is related to blogging about the book being written, but yesterday (Tuesday the 24th of March, 2009) has been a record high in terms of amount of visitors for this Blog in a single day, today has not ended it, and is also looks strong, so I wonder why.

Tomorrow -- visiting the IBM Hursley Lab in UK, before returning home.

Sunday, March 22, 2009

On Event processing as part of DBMS

Paris. I have arrived a few hours ago to Paris, and I went for a walk in the streets to stretch my legs after the flight, my hotel is not far from the Bastille, so I went there and watched the monument that you can see in the picture and the people who watch it.. Now, returned to my hotel to check Email and rest, before my hosts are coming to take me to dinner.

Today's topic is a short reply to some discussion that actually event processing should be done as part of DBMS, this is not a new claim, it is repeating from time to time by one database person or the other; in my past I have dealt with active databases that has attempted to put some form of event processing functionality as part of a DBMS engine, overall this approach has not lead to a big traction on the DBMS products. The main idea has been to add some language constructs in form of ECA rules (that also support composite events) to DBMS engines. The only traction on products from this works is the notion of "trigger" that does not really to justice to what the active database community has tried to do...

Anyway, twenty years have passed and the event processing thinking has been evolved from the early thinking on active databases. As said the main issue here is not performance, as some of the vendors claims, but TCO. Many of what is called "complex event processing applications" deal with the detection of patterns over multiple event instances and types, SQL may not be a natural language to express such patterns, in some cases due to its set-oriented thinking and some other limitations. In fact, in some cases customer reported that they could save 75% of the cost of development time by using language that can express patterns more naturally. This difference may not be materialized in languages that are by themselves variations or extensions of SQL, but this is only part of the EP universe.

Of course, the DBMS community can return to the idea of active databases and add language constructs to express patterns in the DBMS engine, and I guess that this may be a valid variation of event processing, but it will not naturally blend into SQL, it will have to be an hybrid language. More about this - later.

Saturday, March 21, 2009

More on event-at-a-time processing

I have borrowed this picture from Brian Connell's postings a few months ago entitled: "one-by-one can still be CEP". I have not really returned to this topic, but it is a good time to do it now, following my last posting on "set-at-a-time" vs. "event-at-a-time". Many people are used to program in set-oriented languages like SQL, thus a simple match between two entities become first creating the Cartesian products of the sets they they belong to, and then selecting the element from this Cartesian product. This is not a natural way that people are thinking about processing events. Let's look at a nice penguins in the picture, we want to trace want happen in the penguin colony, by putting observing them. Let's say that we want to observe when a young penguin stays away more than 1KM from the home glacier f0r more than an hour, which may indicate that it can get lost. Furthermore, we wish to get the alert immediately when this happens, and not at the end of any time window. Some of the "set thinkers" view the "stream processing" paradigm as something organized in which the partition of events is well defined by the notions of streams and windows, and the processing already has the input set, and all it has to do is to apply some function on the input set, and "event-at-a-time" processing as ad-hoc programming in which events arrive to some event processor who then has to hard-code the entire semantics. But this is, of course, a misconception. Let's look again at the penguin example; assuming that we are looking at the following pattern: a young penguin may be lost if it stays over a 1 KM away from the glacier for over one hour. This can be expressed by set oriented processing, as looking at observations of the penguin (let's say we watch it once every minute), and determine at the end of an hour that all observations are more than 1KM away. But -- when do we start the one hour period? the answer is -- first time that the penguin crosses the 1KM bound, so we need a notion of an event that starts a time window, actually the term context is wider than time window -- it contains here : temporal aspect (within one hour, when the one hour can be initiated by a specific event), semantic aspect : an EPA is tracing a single specific penguin, so it is associated with some penguin id, and the pattern is spatial :
all events are more than 1KM away from the glacier. Now, what is the benefit of having event-at-a-time implementation, a simple benefit: if the young penguin starting to head back then we can close this context instance, and terminate the EPA, say after 3 minutes, and don't trace this penguin anymore, until the next time it swimming far, while in the set-at-a-time we'll determine only at the end of the time window that there is nothing to detect here. Of course, the set thinkers will immediately say that we can reduce the window, so reducing the window to units of single events exactly gives us the "event-at-a-time" notion. More than that, it is not only a question of efficiency, it is also question of expressing a fine-tuned semantics. Let's look at another penguin scenario: we are now tracing lazy penguin who return to the glacier after less than 2 minutes after jumping to the water. here we have a sequence of two events relate to the same penguin within a specific temporal context ("within 2 minutes"). This is not a set operation at all, it is looking of a sequence of two individual event. True- it can be expressed in set-oriented-programming we'll have to create two streams (or one heterogeneous stream) of "jumping to the water" and "returning to the glacier", and then join them, select the appropriate instance and thus determine which members of the set matched this pattern, but this is not a natural way to think about it. While in "event-at-a-time" this can be done by just opening a context-instance for every penguin that jumped into the water, if it does not return within 2 minutes, this context-instance is closed, if the penguin returns, then there is pattern match, and the context-instance is closed even earlier.

But let's move to an example about the tune-up of semantics. Assume that we are looking for the pattern saying that the average stay in the water of a penguin is less than 2 minutes, which may indicate some laziness plague, or any other plague that makes the penguins lazy. In a set-oriented programming we'll have to define the set -- let's say a time window of one hour, thus, when the set is all accumulated we can calculate the average and match it to the threshold; however, it becomes tricky when the average is actually a moving average, thus it may be possible that if we do this calculation after 30 minutes the pattern is matched, since the average in this 30 minutes of staying in the water is 1 minutes and 56 seconds, while if we consider the whole hour, the pattern is not matched, since the average is now 2 minutes and 9 seconds. Doing the calculation event-at-a-time enables us to get the average even to set of any size, even without committing ahead on the size of the set.

This is not done in ad-hoc processing, but supported with high-level programming primitives that are sometimes easier to express than their equivalent set-oriented notation.

There are of course cases in which the set-oriented calculation makes sense -- exactly when we are doing aggregations at the end of some fixed time intervals, and in some applications this may be the main function we need -- but, I assume that we'll see more and more hybrid applications.

Last but not least -- the distinction between "simple" and "complex" event processing is considered in the fact that simple deals with events being processed "one by one" while complex process multiple events, however "event-at-a-time" is not processing "one-by-one" since in the "one-by-one" processing, an event is processed without looking at other events, and in "event-at-a-time" each event is processed individually but within the state of a certain context-driven EPA. More - Later

Wednesday, March 18, 2009

Event Processing In Action

Event Processing In Action - this is a title of the book on which I have started to work recently together with my colleague Peter Niblett - although I am typically writing my Blogs always as I and not as We, this time I'll use We in any case that what I am writing refers to Peter as well. In the next few days the first chapter of the book will be made public by the publisher as a green paper. In the picture above you can see a provisional cover of this book, but this is not final yet. The book is planned to be available towards the end of 2009.

The Web 2.0 plays a role in this process, as explained below. Here are some Q&A about it,

What is the motivation for a new book ?

The book has been initiated by Manning Publications, a computing books publisher; their market survey indicated that there is a significant market need for a new book that will articulate and provide a deep dive into the concepts and facilities of event system applications. This book is intended to be the major reference book for enterprise architects, application developers (both technical and semi-technical), and is also expected to be used for instructional purposes (a textbook for a university level course on event processing).

The book written by David Luckham entitled "The power of events" (Addison-Wesley, 2002) has been very influential in setting the initial awareness to the event processing area, and it still is a big inspiration for us; the new book is intended to reflect the contemporary thinking around event processing which has been evolved since 2002.

Why have we agreed to write this book ?

Writing a book is a big responsibility, it is a substantial burden on our time. Furthermore it is a tremendous challenge to produce a high quality book in an emerging area for these target audiences - especially considering the very high expectations that have already been generated around this book. We believe that this book is indeed required, and as technical leaders in the community it is our duty to take this task and help shape the newly emerging discipline of event processing this way. We were also encouraged by our management and colleagues to take this mission.

What is the approach taken in this book ?

The approach taken in the book will not be surprising to the readers of this Blog. Indeed, the book can be considered as a direct descendant of the Blog, it seems that the publisher has approached me based on recommendations of anonymous members of the event processing community that referred him to look at this Blog. I got feedback from others that this Blog is one of the popular sources today to learn what event processing is, but the Blog, as a Blog, is not written in methodical way, it jumps from one topic to another, it treats the various topics in a relatively superficial way, and includes "noise" like this posting; the book should be more focused, getting things in the right order, and in the proper level of depth. The style of writing is similar to that of the Blog.

The book will explain all the event processing concepts by showing step-by-step how a single use case has been constructed. The explanation, like my approach in the Blog, is aimed to be language-style neutral and explain the concepts using a patterns oriented model (although, due to the ambiguity of the term patterns in event processing we use the term building blocks). We are planning to have an appendix in which we will list existing EP products and open source offerings, and provide some high-level details, without providing evaluation or endorsement to any of them. We'll ask for collaboration of the various product owners to get accurate information about their products.

What is the relationship between this book and IBM ?

Both Peter and myself are IBM employees; Peter works in the IBM Hursley Lab in England, where I am working in the IBM Haifa Research Lab in Israel. However we are writing this book (after clearing the legal and managerial permission) as individuals and not as IBM employees; A disclaimer stating that the book represent our opinions and not necessarily the opinion of IBM will be clearly made in the preface to the book, as is done in the top of this Blog. There is a big EP oriented community inside IBM and we hope to get feedback from this community, as part of the feedback from the larger community.

How are Web 2.0 technologies going to impact the authoring process?

As any other book, there is a formal review process, in which the publisher consults with a collection of reviewers representing people from the target audiences, thus most reviewers are architects and developments from various industries, and academic instructors teaching EP courses. In addition, nowadays, book authoring is also considered as an interactive process between the authors and the readers. The MEAP program (Manning Early Access Program) enables readers to interact with the authors through a forum, and contribute comments and questions on the book while being written; when the book will get into the MEAP program I'll further explain it

What are the next steps?

As I have said, Peter and myself are facing with a substantial challenge to create a high-quality book for the readers, and are sure that feedback and reviews from the larger community can help us provide a better book for the target audience; The green paper is due to appear hopefully by the end of this week; I'll post the URL on this Blog as soon as it is available, the MEAP for this book will be set up in the next few weeks. I'll also use this Blog to tell about some dilemmas and challenges in the writing process (another Web 2.0 means of communication).

More -- Later.

Sunday, March 15, 2009

On Cool Event Processing

Thanks to a recent posting of Tim Bass, I have watched now a really cool video from the MIT Media Lab,
if you have not already done it, watch and enjoy ! still in early phases, but very impressive !

This brings us to two interesting questions:

Does this demo show an event processing application ?
Should creating cool applications be our target ?

As for the first question -- the main achievement of the MIT Media Lab video demonstration is the ability to point with the finger on some item (a person, a product in the supermarket etc..) and use image processing technologies to identify it, bring information from the Web, and screen it on the item itself (e.g. screen the Amazon book reviews on the book, screen annotations about the person on a person's body etc..). This is an extremely impressive blend of technologies, but not really an event processing. To me it looks as a request-response type of application and not event-driven. The action of pointing out an object is a request to identify it, which in turn sends another request to search the web. Not really event processing, but certainly very cool...

Which leads to the interesting question number 2 --- for sure, it is easier to impress and sell technologies through cool applications. Event processing has some cool applications in processing events in games, processing event in the smart house that automatically turns on and off the lights, re-stocks the refrigerator, and invites technician to fix the air condition. I think that the issue of event processing for the individual consumer market has not been investigated well, and in that context the "cool" stuff is certainly a good way to sell...
While looking at the majority of the work done today in event processing, it relates to enterprise computing, in enterprise computing the main criterion is ROI, there may be nothing exciting about an accounting, procurement or regulation enforcement applications, but since they are part of the enterprise's bread and butter, technologies that enable to the enterprise to do them more effective/more efficient may bring a lot of ROI. Since the decision makers are people, and decision making is not necessarily a rational process, cool demos are highly recommended...

More - Later.

Monday, March 2, 2009

International Banking Systems article on "Event Horizon"

The February issue of the journal "International Banking Systems" has an article called Event Horizon written by James Ling. I have tried to get soft copy of this article, but the journal has strict policy that only subscribers can have access to soft copies, and agreed to send me hard copy only; the hard copy arrived today and I am looking at it now. I have been interviewed for this article on my hat as EPTS chair, and also there were some other people from the EP community who have been interviewed, such as: John Morrell from Coral8, Jiles Nelson from Apama and Jeff Wooton from Aleri to name a few, and also several customers of EP technologies in the financial services sector. Some interesting points from the article:

EPTS has been mentioned in length including quotes from its charter; it seems that EPTS is getting traction as both authority source and representative of the community
Various of EP applications are mentioned besides algorithmic trading are: external surveillance by regulators, risk management, auditing, market depth analysis and more.
It is mentioned that there is a confusion around the distinction between event processing and BRMS, saying that some people say this is exactly the same thing. They quote me as saying that they are complementary technologies, but did not cite my explanation, and prefer to leave the reader in the dark about this issue.
They mention that most vendors are using the term CEP, but stick to the EP name throughout the article.
The concluding remarks are: "EP is certainly one of those technologies that will play an important role in the future of financial services".

Saturday, February 28, 2009

On fusion confusion and infusion

This is a picture of Akko (AKA Acre), an ancient city with walls, middle eastern typical market with the smells of spices, an fisherman's harbor. I am hosting now some visitors from Germany, Rainer von Ammon and his colleagues from CITT to discuss some collaboration topics including a consortium for an EU project that we are establishing with other partners. Unfortunately they chose to arrive in the most rainy period we have this year, so we could not do much sightseeing today, however, we succeeded to get two hours break in the rain to stroll around the old city of Akko.

I'll get back to the discussion about the questions I posed yesterday soon, as I would like to see if more people want to react before stating my opinion (I am in learning mode...).

Today I'll write something about Information Fusion and its relationship to event processing; I came across a recent survey article in ACM computing surveys about data fusion.

There are various kind of fusions - data fusion, information fusion and sensor fusion -- and all of them are intended to get information from distinct sources blend it together and understand what has happened. A very simple example of sensor fusion is in traffic monitoring, there is a sensor that senses the speed of a car, there is a camera that takes pictures of the car and its license plate, fusion of both can identify the fact that a certain car has violated the speed laws, this is a relatively simple case that requires some basic image processing, but it is quite easy to determine what happened. This is, of course, very simple case, and in the area of military intelligence it is much more complicated to understand what happened / happening / going to happen and some techniques are being used. The Center for Multi source information fusion in University of Buffalo maintains a site with collection of bibliography about fusion issues including tutorials and their proposals to modify the relatively old JDL model, so you can find much more information there.

So where is the confusion ? --- there are people who confuse event processing with some other different areas, somebody in IBM who saw an illustration of event processing network once tried to convince me that we are re-inventing workflows, some data management people think that event processing is just a footnote to existing query processing, everyone with a hammer looks at the rest of the world as a bunch of nails; Likewise, there are people who confuse fusion with event processing.

So what is the infusion? the fact of the matter is that information fusion and event processing are complementary technologies. The goal of fusion is to determine what happened, i.e. to determine what is the event that has occurred. Event processing is processing the event after somebody determined that the event happened it has multiple goals, the techniques are different, fusion is using conflict resolution techniques and stochastic modeling, event processing is using pattern matching, transformation, aggregation etc. Thus an event can be created using fusion techniques and then processed using an event processing system -- this is the infusion.

However -- there is also a potential synergies between these two applications - a partnership of fusion technology as a preprocessor for events and event processing can be beneficial for certain applications, this is the most obvious synergy. Another type of synergy is that techniques used in fusion can be used in event processing and vice versa, this is an interesting direction to investigate further and also investigate possible real applications for it. More on this - later.