Wednesday, December 15, 2010

Revisiting EPN

This illustration, taken from the EPIA book, and drawn by Peter Niblett,  is a portion of the EPN that describes the "Fast Flower Delivery" example that accompanies this book.   In an internal discussion today somebody raised the question, why do we need EPN at all,  and not using the alternative that has been used in Amit, and other places:  each EPA subscribes to an event type, whenever an event from this event type is detected, the appropriate EPA listens to it and processes it,  and all the event flow is implicit and the person defining the system does not need to worry about it.

Since this question is actually a good question,  I wanted to share my response.  There are two main reasons why we have shifted in the thinking to the EPN model:  efficiency and usability.  

  I'll start with the usability, experience shows (and this observation is true also to inference based systems) that people feel more comfortable in ability to control the flow rather then having implicit flows, they understand better what it does, can better debug and validate it, and trust such systems more.   Note that EPN is not a workflow, it does not represent control flow, it represent event streaming flow (in a way similar to data flow, with some semantic distinctions).  

The other reason is efficiency.    If an EPA subscribes to event type then either an EPA has to process and filter out a substantial amount of irrelevant events, or the amount of event types might successfully be increased.   Imagine the following scenario:   An event of type ET1 arrives,  first it meets a filter that filters out much of the event using some assertion, and then there are various EPAs that process only the filtered-in events,  one of this EPAs is enrichment, adding some information from a database,  and then the enriched event is being sent to an aggregator for further processing.      If we use the "event type" subscription, there are two choices:  first -- create event type ET2 for the filtered-in events, identical to ET1, and create derived event of type ET2 for each filtered-in event of type ET1,  then create event type ET3 for the enriched event with added enriched attribute, and then indeed each EPA subscribes to a single event type.  The second choice is to use ET1 for all three cases, but add indication (using some derived attribute) which variation of ET1 it is, and filter inside the aggregator to have only the right type of ET1.  Both are inefficient, the first one due to the need to manage much more event types, the second is that much more events are transmitted to each EPA to filter out, and the order also becomes important here.   

The explicit EPN resolves it by the fact that each EPA sends it output to a channel and the channel can route according to source, type, assertion etc...   -  thus a specific  output terminal of a channel is really the topic which EPA subscribes to.     Note that all the possibilities mentioned before are just special cases of EPN and if one insists, such EPN can be constructed, in the extreme case, one can construct EPN with a single channel that routes every event to every EPA to decide whether it wants to use it or not,  but I would not recommend it as a good design pattern.     More - later.


Alessandro said...

Hi Opher,

as far as efficiency is concerned, I completely agree with you. However, I am not convinced that offering to the users the possibility to define their EPNs explicitly is the best choice in terms of usability.
Indeed, to work with EPNs, you need to take into account a large number of variables (which are the basic EPAs you are using, which is the order of EPA a given flow has to pass through, etc.). Moreover, wrong design decisions may impact system performance.
In my opinion, it would be better to try to "raise the level of abstraction", providing a language to specify which "situations" or "complex events" we are interested in through a set of declarative rules (as in the case of Amit and other systems) and let the system "compile" this rules into EPNs.
I see two advantages on this approach: 1) users don't have to care about low level details; 2) the system can optimize the EPN when it is created (at "compile time") taking into account several variables (reuse of existing EPAs, load balancing in presence of different processing nodes, connectivity between nodes, etc.).

I really like the language of Amit, and I believe it represents a step forward in the usability of event processing systems.
So, can't we combine the two worlds? A high level declarative language, which is automatically compiled into efficient EPNs? Thanks.


Opher Etzion said...

Hi Alessandro. Thanks for your response. I think that we should have multiple type of interfaces. High level language can be used in many cases, and we should also allow using the flow based interface when it is easier; I have noted that some people like visual development environment and some like textual one. This all relate to technical people, of course, for semi-technical users we need totally different kind of abstractions.



Rainer von Ammon said...

In the beginning of this year 2010 we already had an interesting discussion when reviewing the EPIA book, I remember the two threads:

This was actually the job of table 1 of our WorldCafe at our edBPM/U-CEP workshop 13 Dec 2010 in Gent about:

"Grand challenges for modelling of Complex Dynamics and for execution platforms of edBPM and U-CEP"

Discussion is not yet finalized, even actually not yet started, and would be a nice job for a Master or even PhD thesis.

When I would still teach a university course, I would ask some student teams to model and to make executable the FFD application:

- as an OOAD model
- as a BPMN model
- perhaps also as UML based models, starting from Use Case diagram to Activity diagram, perhaps using State diagrams where needed etc.
- then as EER model as used in the EPIA book

and then let's discuss the findings and results and what is missing where

or so...

Opher Etzion said...

Hello Rainer.

Right - we are still looking at the most appropriate modeling abstractions that can capture EP applications. I guess that there is not a single solution, as you mention, but we need a methodology.
I'll write further postings about this issue soon.

Happy new year,


Alex said...


Could you explain or give a reference to an explanation those "semantic distinctions", please. Note that EPN is not a workflow, it does not represent control flow, it represent event streaming flow (in a way similar to data flow, with some semantic distinctions).

Thanks and a Nappy New Year,