Friday, December 17, 2010

Who is the developer of event processing applications?

One of the topics is is frequently discussed recently is -- who should be the developer of event processing applications?  a computer programmer (the top picture) or a business analyst - the bottom picture taken from a site called "business analysts mentor",  shows what are the business analyst skill.   Will a future list will also include - event processing development?    

In the BRMS area, one of the claims is that business analysts can develop, maintain, modify and manage rules.  
There is still need in programmers for setting up the data, connect it, deploy it etc...

The desire to have business analysts develop event processing applications exists in the industry,  there are surveys indicating that most users want it,  and it combines with a general trend in enterprise computing.

Getting to event processing,  there are still challenges in getting business analysts developing event processing applications,  the challenge stems from the fact that the possibilities in event processing are quite wide, and thus there are two main options:

  1. Restrict the expressive power and provide the business analyst an "easy" sub-language.  This may be enough for some classes of applications, but not enough for others
  2. Provide assisting tools for business analysts to cope with the entire capabilities of the event processing language (e.g. patterns, policies).   This requires both the right abstractions (intention language) and also tools to validate that the system is working properly.     

Why it is not easy?    first -- the semantics of the event processing various functions needs understanding, while it is not a technical difficulty, it requires understanding and training,  second -- understanding the flow, i.e. the interactions among the various functions adds to the complexity.  
It is doable,  but requires more work on setting up the right tools.     More on this  - later. 

Thursday, December 16, 2010

Where does the edBPM term come from?

Writing earlier today about a German friend,  another German friend, Rainer von Ammon has written on the complexevents forum about the source of the term edBPM,  Rainer is also the one who has drawn the nice illustration above showing the reference model, Rainer is the person promoting this term, and organized several workshops around the concept of pairing event processing and BPM technologies.     Rainer wrote that the term came from Gartner and quoted me.   This is almost true,  the original term that Gartner used is "event-based BPM"  and I have slightly modified it to "event-driven BPM" when I asked Rainer to write a value about it in the Database encyclopedia (I have been the editor of the event processing related terms).  
Here is the Gartner's original slide from 2005.  

This is the original slide,  source:  "Event-driven applications make event-driven businesses work better", a presentation by Roy Schulte from Gartner in 2005. 

As you see Gartner classified the event processing functions into: simple event processing, mediated event processing, event-based EPM, and complex event processing.    Roy Schulte has later realized that "event-based BPM" is orthogonal dimension to the three others, and changed the positioning.     So this is the source for all history lovers. 

On Alex Buchmann's 60th birthday book

 Alex Buchmann is an old friend, we first met when both of us were 20 years younger, and worked on active databases.  Alex is a little bit older than me, and recently celebrated his 60th birthday.  I could not travel to the ceremony in Darmstadt, but as a gift, contributed to the book, which includes a collection of papers edited by Alex's students (or ex-students).  Today the mail has brought me a copy of this book, with personal inscription from Alex.  The book is called "From Active Data Management to Event-Based Systems and More".   

More detailed about the book can be obtained in the Springer LNCS site.    The book includes a paper entitled "Spatial perspectives of event processing" co-authored with Nir Zolotorevsky.   Browsing the book I see that I am in a very good company, some of the other authors are: Jean Bacon and Ken Moody, Mani Chandy, Umesh Dayal, Tamer Ozsu,  Gerhard Weikum, and many others. 

When I summarized the year 2009 in this Blog, I have written that the quote of the year is taken from Alex's keynote address in DEBS 2009, stating is using regular database techniques for event-based systems is like trying to drink the water in a waterfall using a straw.    Alex also does not like the term "event processing", claiming that "processing" sounds like "data processing" which is an archaic term, and prefers to talk about event-based systems, as shown in the title of the book.

I wish Alex many more years of  good health, fruitful work and fun.

Wednesday, December 15, 2010

Revisiting EPN

This illustration, taken from the EPIA book, and drawn by Peter Niblett,  is a portion of the EPN that describes the "Fast Flower Delivery" example that accompanies this book.   In an internal discussion today somebody raised the question, why do we need EPN at all,  and not using the alternative that has been used in Amit, and other places:  each EPA subscribes to an event type, whenever an event from this event type is detected, the appropriate EPA listens to it and processes it,  and all the event flow is implicit and the person defining the system does not need to worry about it.

Since this question is actually a good question,  I wanted to share my response.  There are two main reasons why we have shifted in the thinking to the EPN model:  efficiency and usability.  

  I'll start with the usability, experience shows (and this observation is true also to inference based systems) that people feel more comfortable in ability to control the flow rather then having implicit flows, they understand better what it does, can better debug and validate it, and trust such systems more.   Note that EPN is not a workflow, it does not represent control flow, it represent event streaming flow (in a way similar to data flow, with some semantic distinctions).  

The other reason is efficiency.    If an EPA subscribes to event type then either an EPA has to process and filter out a substantial amount of irrelevant events, or the amount of event types might successfully be increased.   Imagine the following scenario:   An event of type ET1 arrives,  first it meets a filter that filters out much of the event using some assertion, and then there are various EPAs that process only the filtered-in events,  one of this EPAs is enrichment, adding some information from a database,  and then the enriched event is being sent to an aggregator for further processing.      If we use the "event type" subscription, there are two choices:  first -- create event type ET2 for the filtered-in events, identical to ET1, and create derived event of type ET2 for each filtered-in event of type ET1,  then create event type ET3 for the enriched event with added enriched attribute, and then indeed each EPA subscribes to a single event type.  The second choice is to use ET1 for all three cases, but add indication (using some derived attribute) which variation of ET1 it is, and filter inside the aggregator to have only the right type of ET1.  Both are inefficient, the first one due to the need to manage much more event types, the second is that much more events are transmitted to each EPA to filter out, and the order also becomes important here.   

The explicit EPN resolves it by the fact that each EPA sends it output to a channel and the channel can route according to source, type, assertion etc...   -  thus a specific  output terminal of a channel is really the topic which EPA subscribes to.     Note that all the possibilities mentioned before are just special cases of EPN and if one insists, such EPN can be constructed, in the extreme case, one can construct EPN with a single channel that routes every event to every EPA to decide whether it wants to use it or not,  but I would not recommend it as a good design pattern.     More - later.

Monday, December 13, 2010

On Hadoop and event processing

The region which I live in did not have much luck recently, first the big fire on the Carmel ridge, that lasted for three and half days until it got under control, and now a major storm, with winds running in velocity of >  100 KM/H  and a lot of rain.   These two pictures, taken from the Israeli news Internet sites, were taken in Haifa yesterday.   The storm is now over and some nicer days are ahead of us.

Back to professional issues -- Alex Alves (who represents Oracle in the EPTS Steering committee among other things) wrote a nice posting in his blog explaining the Hadoop programming model, if you are still not familiar with it, it provides good explanation. 

Hadoop is batch oriented and provides kind of imperative programming model, but can be wrapped and concealed by higher level language.     I am working now with a graduate student who investigates the usability of the map-reduce model for some of the event processing functions (e.g. aggregation).   I am curious  to see the analysis of this work.   More - later