Event Processing Thinking: EPN

Showing posts with label EPN. Show all posts

Saturday, July 27, 2013

Taking the complex out of complex event processing

The quote of this week is taken from an article in InformationAge that talks about operational intelligence.

The article explains what operational intelligence means, and you can read it to see if you find anything new.

The point of this post is a quote done by Ivan Casanova from TIBCO:

“We should all be focused on taking the 'complex' out of complex event processing"

This quote is in the context of explaining the acquisition of Streambase by TIBCO. I don't know Mr. Casanova personally, but what I have learned from his statement is that he believes that going forward, the programming model and tools represented by Streambase are better fit and less complex to use that TIBCO has done before, where it extended RETE based business rules system to handle stateful event processing cases, while retaining the rule-based programming model. Streambase is using an "event flow" model that is some variation of event processing network. Without getting to analysis of specific products (a restriction I have taken upon myself in this Blog), I would say that overall I believe that as a conceptual model for event processing I believe in the EPN model (which is of the family of data flow models), and in visual working environment (better than textual working environments) to design and program. This reduces the complexity for IT developers, which I think is very important trend. The ultimate reduction of complexity requires one more step - event processing modeling in the level of the business user level and automatic translation to an implementation language.

Bottom line: I agree with the statement in the quote -- actually this is my main area of interest nowadays.

Saturday, June 16, 2012

On the world wide event processing network

In a recent post on the complexevents site, David Luckham and Roy Schulte write about "complex event processing and the future of business decisions". There are some examples, and some analysis of the current market, but I would like to write about a single sentence in this article: " In the early days of the Internet, some communication experts remarked that there was theoretically only one network in the world, although some segments (subnets) hadn’t be connected into the whole yet. A similar thing can now be said about EPNs: there is theoretically only one EPN in the world, although some stove-pipes are not yet tied in – and some never will be".

This draws similarity between the WWW and the world of events. In the WWW we view the sites as the nodes in the graphs and links as edges. Typically we draw EPN as event processing agents in the nodes, and event streams as edges, but maybe to draw the analogy with the WWW, we need to switch the role, make events as the nodes, and agents as links that create other events, we also can add other types of causality relations among events as edges in the graph. The idea that all events (raw or derived) in the universe are conceptually linked within a single network has been mentioned in David Luckham's idea on "holistic event processing" . This can be thought as active addition to the WWW, and will make the world situation aware. This will require standardization in several levels - both the semantic and interoperability aspects.

Wednesday, December 15, 2010

Revisiting EPN

This illustration, taken from the EPIA book, and drawn by Peter Niblett, is a portion of the EPN that describes the "Fast Flower Delivery" example that accompanies this book. In an internal discussion today somebody raised the question, why do we need EPN at all, and not using the alternative that has been used in Amit, and other places: each EPA subscribes to an event type, whenever an event from this event type is detected, the appropriate EPA listens to it and processes it, and all the event flow is implicit and the person defining the system does not need to worry about it.

Since this question is actually a good question, I wanted to share my response. There are two main reasons why we have shifted in the thinking to the EPN model: efficiency and usability.

I'll start with the usability, experience shows (and this observation is true also to inference based systems) that people feel more comfortable in ability to control the flow rather then having implicit flows, they understand better what it does, can better debug and validate it, and trust such systems more. Note that EPN is not a workflow, it does not represent control flow, it represent event streaming flow (in a way similar to data flow, with some semantic distinctions).

The other reason is efficiency. If an EPA subscribes to event type then either an EPA has to process and filter out a substantial amount of irrelevant events, or the amount of event types might successfully be increased. Imagine the following scenario: An event of type ET1 arrives, first it meets a filter that filters out much of the event using some assertion, and then there are various EPAs that process only the filtered-in events, one of this EPAs is enrichment, adding some information from a database, and then the enriched event is being sent to an aggregator for further processing. If we use the "event type" subscription, there are two choices: first -- create event type ET2 for the filtered-in events, identical to ET1, and create derived event of type ET2 for each filtered-in event of type ET1, then create event type ET3 for the enriched event with added enriched attribute, and then indeed each EPA subscribes to a single event type. The second choice is to use ET1 for all three cases, but add indication (using some derived attribute) which variation of ET1 it is, and filter inside the aggregator to have only the right type of ET1. Both are inefficient, the first one due to the need to manage much more event types, the second is that much more events are transmitted to each EPA to filter out, and the order also becomes important here.

The explicit EPN resolves it by the fact that each EPA sends it output to a channel and the channel can route according to source, type, assertion etc... - thus a specific output terminal of a channel is really the topic which EPA subscribes to. Note that all the possibilities mentioned before are just special cases of EPN and if one insists, such EPN can be constructed, in the extreme case, one can construct EPN with a single channel that routes every event to every EPA to decide whether it wants to use it or not, but I would not recommend it as a good design pattern. More - later.

Monday, October 5, 2009

A simple example of agents and contexts

In the last few days, dishes in my house are washed by hand. Our dish washer have seen signs of old age, thus I acquired a new one. First step was to order delivery from the store. There is the software engineering principle of "separation of concerns", thus those who do the delivery don't install it. I am not allowed to install it myself, since if I open the package I am losing the warranty, only a technician of the service company related to the importer is authorized to open the package. After some coordination this technician arrived today, looked at our kitchen and said -- "this is very common, they have changed the standard, now there is a thermostat on the pipe, so the size of the hole that your need in the kitchen closet is larger". Since there is a separation of concerns, he does not do holes, and I had to call the Carpenter who came later today, so I'll be able to call the technician again tomorrow... This is the result of the fact that the event of "standard of pipe changed, thus it effects the size of the pipe" was not reported by the store, since they don't install dish washers so they probably don't know --- this separation of concerns, and not seeing event over a larger story, wasted a lot of time and money to the society...

Anyway -- enough complaining.

I got a question from Hans, related to my posting on context, I am copying the question here:

" I would be interested to hear about how the context concept works in a particular use case: Let's say I have events that contain, among other things, a number. I would like to capture the third quartile (a data point that sits immediately above 75% of the other points) of the number, every hour. I would then like to perform some operation over the previous 8 of these hourly numbers.

How would this be expressed as contexts and agents?"

OK -- here it is:

We start by event of type E1 that flows into the system from some consumer and has some numeric attribute A, for which we want to capture the third quartile.

First we define a context C1 of type of sliding fixed temporal interval with both duration and period of 1 hour.
Then we define an agent A1, which is valid within C1, and calculates the third quartile and produces event of type E2, with an attribute B which that captures the derived value.
Further we define a context C2 that is a sliding event temporal interval on event E2, with event count period and event count duration both of 8.
And finally we define an agent A2 that subscribe to events of type E2, and is valid with context C2, that does some operation on E2, and may in turn produce (say) event of type E3, which is flowing to some consumer.

This is the basic model. Of course, it is a bit more complicated, if our agent library does not contain an agent that calculate quartiles may not be a basic, this can be somehow combined from some combination of agents, since an agent can be recursive and include mini-EPN, but as such we can still model it as a single agent in the high level view.

The nice thing about this abstraction that it is quite simple to model such problems... More - Later.

Sunday, September 13, 2009

On event channels

Last week, the Disney Channel arrived to the Israeli cable system and enriched the set of already existing children channels; so speaking about channels, it is a good time to discuss another type of channel -- an event channel, which is discussed in chapter 6 of the EPIA book draft. Some people view channel as an edge in the event processing graph, but we view channel as a type of node, since it has some processing associated with it. We define a channel as a processing element that receives events from one or more source processing elements (We refer to EPA, producer and consumer as processing elements), makes routing decisions, and sends the input events unchanged to one or more target processing elements in accordance with these routing decisions. Note that like the term Event Processing Agent, channels are abstractions and can be implemented in various ways (e.g. through messaging systems, through buffers, through persistent stores etc...). Channels are classified according to their routing schemes. Some of the common routing schemes are:

Fixed routing scheme: The channel has predefined input terminals wired to predefined processing elements, and predefined output terminals wired to predefined processing elements. Every event that is received on any input terminals is sent to all output terminals. Note that this type of channels can be defined implicitly.

Subscription-based: EPAs or consumers can subscribe to the channel dynamically. The routing decision is determined according to the list of subscribers that is valid at the time that a decision is made.

Itinerary-based: The sink's input terminal identifier or identifiers are obtained from some attribute in the event's payload, this is used to send an event to a specific consumer instance, when the EPN node is the consumer class.

Context-based: The channel makes routing decisions based on the context to which the EPA belongs. This is applicable for pattern detection ("complex event processing") type of EPA. The channel selects the appropriate run-time EPA based on the context defined in the pattern- I'll discuss contexts in length in one of the next postings, as this is the topic of the next chapter in the book.

Type-based: The channel makes routing decisions based on the event type of the event that is being routed.

Content-based: The routing decision is based on the event's content, this can be phrased as assertions, rules, decision trees or decision tables, and are based on the input event content, as well as context information.

This is just the basic definition, in one of the next postings I'll show example of how all these concepts fit together.

Monday, August 31, 2009

On conceptual and run-time EPN

Working now in my spare time on completing the second third of the EPIA book, so I'll have several postings related to the next three chapters of the books that are now in the "cleaning phase". The topic I'll discuss today deals with the concept of EPN (Event Processing Network) which is a major concept in our book. The approach we have taken in the book is to explain event processing through a meta-language that provide the various event processing concepts, and the event flow through a model based on event processing network. We are now also competing an editor that will enable the reader to play with the meta-language. However, this meta-language is not an executable language (at least not in this phase), and thus we also show the readers how the same application described in the meta-language is implemented using various executable languages of different language styles. The EPN described by the meta-language is a "conceptual EPN", it consist of logical EPAs, while the run-time EPN consist of run-time artifacts that implement the run-time instances of the EPAs.
The conceptual EPN can be mapped into physical implementation in various ways, as shown in this picture:

The traditional centralized implementation is that the entire EPN is being executed using a single run-time artifact, and the EPN describe the internal flow within this artifact.

When talking about distributed EPN, the EPN can be distributed according to several criteria:

Segment partition: All the EPAs that relate to "platinum customers" are being executed by one run-time artifact, all the EPAs that relate to "gold customers" are being executed by another run-time artifact etc...
Function partition: All the EPAs that perform a certain function are being executed by a unique run-time artifact
Location partition: All the EPAs that relate to events created in a certain location.

These, of course, are just examples. The most distributed example, is, of course, a direct mapping of each EPA instance to run-time artifact.

The conceptual EPN is important for design and validation of the event processing application, while the run-time EPN is useful for control and management of the run-time.

More about EPNs and their components - in later postings.

Thursday, July 23, 2009

On logical and physical interpretations of EPN and EPA

My youngest daughter Daphna has finished last week her summer course in the Technion in the framework of the program of "science seeking youth". She studied her first programming course using "Microworlds", a variation of the rather old Logo language, this is of course been translated to lower level language when executes in practice, by this fact is totally transparent to those who program in Microworlds. I am using this analogy since there seems to be some terminology discussion going on recently about the terms EPA and EPN. These terms were introduced in the past by David Luckham, who used them to describe a physical operational view of event processing application. Thus, an EPA is mapped in 1-1 fashion to a software module, and the EPN describes the running software modules and connections among them using physical channels, the first version of the EPTS glossary reflects this view.

However, the way I am using the terms EPN and EPA is slightly different, the physical view is of interest to system administrators, but for the users, designers and developers, the logical view is more relevant, thus I am using these term in a logical way and not a physical way. In order to demonstrate the difference, let's look at the following simple example: There are many patterns that relate to the management of a call center, one of them is the frustrated customer detection: if a gold customer complains three times within a single day (possibly on multiple issues), then a supervisor should call this customer immediately.

However, there is a spectrum of ways that this application can be implemented in reality:

It is possible to have a centralized implementation with a single software module that executes all the different functions within this applications, and actually the EPN is internal to this module;
On the other extreme we can have a software module implements any single function instance, for example, an agent that detect the frustrated customer pattern for Alice, where a different agent detects the frustrated customer pattern for Bob.
Another possibility is a context oriented implementation --- all patterns related to the Alice are processed within a single software module
Yet another possibility is a functional partition -- there is a single module for detecting the frustrated customer pattern for all customers
There can be also some more combination.

Should the user / system designer / developer care about it and build a different EPN for each variation ? In the past when event processing was hard coded in general purpose programming languages, the logical EPN was also the physical EPN, but one of the gains from using dedicated event processing languages are the ability to abstract the implementation out.
The actual mapping of functions to software modules is left to an optimizer, and can be dynamically changed based on change in the system behavior, load balancing etc.. Actually the paper we presented in DEBS 2009 is part of such an optimization scheme. Thus, the way I am using the term EPA is a single logical function and not necessarily a software module. In the EPIA book we are building our entire concept based on a logical level meta-language that can be translated to various implementations, and even programming styles. As said, there is also an interest in the physical realization of EPN, but it is more of interest to system administrators and implementers of event processing products, but it should be transparent to the user of event processing applications. More on this topic - later.

Thursday, May 21, 2009

On EPN and N-Tier Architecture

This week I have watched a theater play, called: The Boys Next Door (in Hebrew, of course), which deal with the life of four retarded young persons who share an apartment (well - Wikipedia calls them mentally disabled, so I realize that retarded is not a politically correct word these days, however, it is a matter of culture, in Hebrew we don't do word laundering for political correctness, so I am quite ignorant in politically correct words). The play itself is very good and thought provoking.

Anyway, I have realized that I have not written for a while, this week has been quite busy, and next week I am travelling again for a few days, this time to Europe. I also had to do some catch up in the community Blogs, and found an interesting one by Paul Vincent about N+1 Tier Architecture describing a multi-tier architecture starting with a tier for the messaging tier, moving through filtering, preprocessing tier, then to distribution tier (data grid, caching), then to event processing agents which by itself is a multi-tier, then to process tier, and then to a persistence tier.

To me it seems somewhat complicated way to look at the universe, actually I view two different architectures here: infrastructure and event processing platform. I view data grid, messaging, communication, caches - as part of a general infrastructure that is not specific to event processing, it is being used for other purposes as well, so I view various functions of the event processing network implemented using those infrastructure parts, which in turn is implemented on top of operating systems etc... Thus, the architecture of this infrastructure is really not part of the event processing architecture at all. Getting to the event processing architecture -- this includes the tiers mentioned as: filtering, preprocessing and event processing agents. Actually, I prefer to look at it as a "network" not as "hierarchy", where each node in the network may be exploded to another network, the reason is that there is really going back and forth. Let's take an example : A producer produce event, the events are filtered, and then the filtered events go through EPA1, the events derived by this EPA1 can then go to EPA2, and then they need to be filtered again, and the result of this filtering is consumed by a consumer who applies some non event processing service, which is by itself an event producer which emits events that go into the same EPA2 in the same EPN as additional input. So if we adopt the layered approach, our quite modest application has already gone back and forth through four of five layers, while there is nothing conceptually wrong in that, it seems to be somewhat complex way to think about it, but may be I am too simple-minded... More -Later.

Tuesday, May 12, 2009

On Gartner's EPN Reference Architecture

Today is a holiday (for children, no vacation for adults..) called Lag Baomer, the highlight (besides not going to school) is that last night all children have gathered around bonfires, as seen in the picture. Fun.

Recently Gartner has published a report called "A Gartner Reference Architecture for Event Processing Networks".

On the positive side, it seems that the concept of EPN, as an underlying model for event processing is catching. The readers of the Blog may realize that I am in the opinion that we need an agreed upon conceptual and execution model for event processing (the same role that the relational model assumes in relational database, however, I never believed that the relational model per se, is appropriate also as the model behind event processing). The book I am writing now "Event Processing in Action" concentrates around the notion of EPN, and a deep dive into construction of EPN-based application.

Reading Gartner's report I found some slight differences between the way they describe EPN, and my own description. In the Gartner report they define a term called "dissemination network" that consists of event processing agents, channels and event flow among them, and then they define EPN to be a dissemination network + producers + consumers. I actually could not find any compelling reason to introduce the notion of dissemination network. According to the definition we are using, event processing network is a directed graph that has nodes for producers, channels, EPAs and consumers, and edges that determine the event flow among them. Another difference is that the Gartner report views event consumers and event producers as type of event processing agents. I have a slightly different opinions, I think that both event producers and consumers are not really event processing agents, since event processing agent is some software module that function events and may generate more events. Event consumer and producer have nodes representing them in the EPN in order to make the event flow from and to them explicitly, however, they are only proxies of the actual producer and consumer, for the event processing network, they are sources and sinks. The main difference is that EPA functionality is explicitly specified in the EPN definition, while what the producer and consumer do is "black box". We don't want to include their functionality, since we don't want to extend the event processing language ad infinitum,

Mentioning the EPIA book -- Chapter 3 is now on the Web, and can be obtained through the MEAP program, this is the last chapter in the introductory part, and deals with principles of programming with events. Chapter 4, the first in the deep dive will be sent to the publisher soon. It has been much more challenging to write, deals about what information we need to store about events -- I'll Blog about it soon.

Monday, April 27, 2009

More on Revision

Long day today, I got to the office around 8AM and left around 9PM. Since we have holiday this week I am trying to condense the remaining days of the week and the result is long day with plenty of conference calls. The picture above is a glance (from below) on the IBM Haifa Lab (the pair of connected building on the right hand side of the picture), my office is in the back building (known as the "banana" due to its shape), and is not really in the nice part of the building -- the one with the view to the Haifa Bay -- well, one can have everything in life -).

I still need to complete the previous posting on revision. I gave some explanation about the concept of revision, and now I still need to discuss implementation of revision in event processing. To recall -- a revision in event processing is getting later knowledge that asserts that either a reported event did not really happen, or some information associated with the event was wrong.

Let's look at two separate cases, one in which the processing has not gone out of the event processing network, and second that the results of the processing have gone out to the "outside world".

In the first case, there may be an opportunity to revise the impact of the revised event by doing kind of undo-redo for all the event processing agents that it passes directly or indirectly. Direct ones are easy -- those that the revised event participate as an input in them, indirect is more tricky, since we need to trace the causality among events, in this case, an event that is an output of an event processing agent in which the revised event participate (relate to the same context) has a causality relation to the original event, thus, an event processing agent, in which this event participates as an input, also needs to do an undo/redo, and causality is a transitive relation, so it continues as far as the EPN arrived so far. It should be noted that the fact that there is a causality may not require a real undo/redo, take as an example that an event of type E1 designates a bid, and the event of type E2 designates the bid with maximal value arrived in a certain time interval. Let's assume that a certain bid has been revised, however, neither the revised bid, or the revising bid change the selection of E2.

The second case is that the revised event has consequences that have been sent to an external consumer, thus, it may have triggered an action, a collection of actions, or a workflow that has been carried out, and this may propagate further ("the butterfly effect"), in this case, either we can treat it as "too late" and do nothing, however, there may be a cases that it can be critical to undo/redo also the consequences, e.g. the revised event has some financial meaning. In this case we'll need to issue compensation for the triggered action, which may be impossible (the consumer does not support compensation) or difficult. I'll blog again about revising the history and its aspects at a later phase.

Saturday, April 18, 2009

On Event Processing Building Blocks

Back to work for one day in the office, with five conference calls (one with Germany, one with France, one with UK, and two with USA...) and then back to home for the weekend. When I have free time I like to read books, the current book I am reading is "A Lion Among Men", 0ne of the books of Gregory Macguire, who writes stories that take as background famous children stories (in this case - the Wizard of Oz), actually this is the third one behind the scene of the Wizard, now taking the Lion as its main character. I have another book of the same author still waiting...

We also submitted the draft of chapter 3 of the "Event Processing in Action" book to the publisher, which hopefully be posted on the MEAP site soon.

The approach we have taken in the book, as I have written before, is to use the "building block" approach, describing event processing principles, and the use case whose construction demonstrates the application, using building blocks, which are like the chemical elements. The application itself is being built by using "definition elements" which are like atoms (my partner for writing this book, Peter Niblett, has come with the analogy from the world of chemistry). we believe that this is the right approach to teach what event processing is -- in the "deep dive" part of the book we dedicate a chapter for each of the major seven building blocks and then dive deeper into the types of event processing agents (which deserves a different discussion). We'll also provide samples of how each building blocks is realized in different models.

The seven building blocks are:

Event type: defines the event schema
Even producer: the projection of the event producer over the event processing network (note that the event producer itself is outside the scope)
Event consumer: same -- the projection of the event consumer over the EPN.
Event channel: the glue that holds the EPN together
Event processing agent: the brain that does the entire work; each agent is doing a specific task of processing.
Context: the semantic partition of events and agents
Event derivation: A building block that is possibly part of each EPA that specifies the derived event.

There are some more building blocks that are used to support these ones, but our claim is that this set of building block is what needed to build an event processing application.

Chapter 4 which is in advanced phases of being written starts the deep dive by discussing the event type building block, and in one of the next posts I'll say more about it.

Saturday, April 11, 2009

Some footnotes to the forthcoming book "Event Processing in Action" - Take One

Last night, I went to see a movie (a rare event for me) -- and chose to see "Slumdog Millionaire", my daughter told me later that people who have not read the book enjoyed it more, the movie is OK, even cute, however -- for a movie who won 8 academy awards, I have somewhat bigger expectations (comparing for example to "Gone with the wind" who also had 8 academy awards. Well -- the movie industry is probably not peaking these days...

Today, together with (most of) my tribe, we have done some hiking in a place called "Judge River", well, river in the local terms, with a modest amount of water, but bridge, a lot of trees, some flowers, and since it is a holiday, a lot of people.

Now back home and like any Sunday morning I plan to go to one of the coffee shops (I am rotating between the coffee shops in Haifa, well, to be exact, those who have free parking nearby) to work on revisions to the draft of chapter four of the "Event Processing in Action" book.

From time to time I'll blog about giving some footnote from behind the scenes of the book-being-written. Today I'll blog about several issues: scope, language and exercises.

Scope: The idea is to focus about teaching the event processing concepts step-by-step using a use case which will accompany the book throughout, so the question is -- what is the scope of event processing. We define this scope by defining the "event processing network", and thus the question, that I started discussing in my precious posting is -- whether pre-processing and post-processing to the event processing network is part of the event processing network. While we have a chapter that is dedicated to event producers (and pre-processing) and another chapter that is dedicated to event consumers (and post-processing), the scope of what we discuss as part of the specification of the event processing part do not include what is done by the producers and consumers, whose projection on the EPN is the events they produce and consume. However, there is a case in which a consumer is also a producer, and this is important since there is a possible causality relationship between the event it consumes and consumer and event it produces. As an example: the use case is talking about "fast flower delivery" and one of its functions is choosing the driver that will get the delivery among the drivers that has issued a bid. Some of the stored prefer automatic assignment by the system, and some want to get the bids and do the assignment on their own. The automatic assignment is definitely an EPA (Event Processing Action), since this is a software that performs some operation on events, however the manual assignment can be either manual, or the store is using some external software to do it, however, this is not really part of the EPN, thus it is not modelled by the system. We are of course interested to trace the assignment to the bid which is the input to the store. This is also a good example to show that the same event type can include both raw events (the manual assignment are raw events from the EPN POV) and derived events (the automatic assignment).

Language: We decided neither to use any single language to explain the concepts, nor to invent a new language. However, we believe that just a theoretical discussion will not be enough. What we have decided to do is to take a "building block" approach, in which the different parts of the systems (event types, event processing agents etc..) are specified using "definition elements" which are platform independent concepts, or in other words, meta language. In each section we'll provide the full part of the application using this meta language, in order to connect it to the "ground", we'll also make samples of these definitions using variety of languages in various style. Thus, chapter 4 that I am writing now talks about defining the event schema. We define the schema using our "event type" building block, and will also show definitions in various schema languages (XML, positional relational-schema-like etc..), the same will go for all types of event processing agents. We intend to ask owners of existing languages (from those who will agree to get their languages analyzed by the EPTS event processing languages analysis -- taking on another hat) to provide language definition of our use case, and will check the possibility of posting them all.

Last but not least are the exercises, as we want the book to be a textbook for academic course on event processing, as one of its targets, we have decided to put exercises at the end of each chapter for the benefit of the students and instructors (we also plan to provide slides in the future), one of the questions we agreed with the publisher to ask the reviewers (there is a formal review for each 1/3 of the book) is whether this is the right way or it can make other readers uncomfortable. The options are now: leave as is (exercises at the end of each chapter, make all exercises as appendix or remove them completely from the book, and have them available on a website).

That's all for now -- more footnotes - later.

Thursday, March 19, 2009

On data flows event flows and EPN

Bob Hagmann from Aleri (ex-Coral8) has advocated "data flow" model as an underlying model that unifies both engines of Aleri, and contrasts it with "event delivery systems" in which programmers create state manually if needed. I am not really familiar with the phrase "event delivery system" and don't know what he refers to, but there are event processing systems that employ different programming styles from stream processing, in which states are handled implicitly by the system and the programmer does not really deal with creating states.

But -- I have no interest in "language wars", my interest these days is somewhat different -- to find a conceptual model that can express in a seamless way functionality that exists by different programming styles.

Actually the conceptual model of EPN (event processing network) can be thought as a kind of data flow (although I prefer the term event flow - as what is flowing is really events). The processing unit is EPA (Event Processing Agent). There are indeed two types of input to EPA, which can be called "set-at-a-time" and "event-at-a-time". Typically SQL based languages are more geared to "set-at-a-time", and other languages styles (like ECA rule) are working "event-at-a-time". From conceptual point of view, an EPA get events in channels, one input channels may be of a "stream" type, and in other, the event flow one-by-one. As there are some functions that are naturally set-oriented and other that are naturally event-at-a-time oriented, and application may not fall nicely into one of them, it makes sense to have kind of hybrid systems, and have EPN as the conceptual model on top of both of them...

This is the short answer. More detailed discussion -- later.

Friday, March 6, 2009

On event processing engines and platforms

Today, Friday, is part of our weekend, so it is a good time to do shopping and other arrangements.
My wife and myself went to our local friendly bank to open some new account for some purpose. The lady that handles our account said that they have a new software to open an account that is extremely difficult to operate, with a lot of screens that one has to understand what is asked, and suggested she'll do it off-line and call us when ready, so we'll come to sign the papers. Once, opening an account was simple and lasted a few minutes, just signing some forms; the more sophisticated a software becomes, the more difficult it to operate, and sometimes it becomes obstacle to the business. Often, developers don't really care about the human engineering aspects. Hans Gilde wrote recently about the fact that CEP software is not smart. I agree, in several occasions I have given talks to an audience of high-school students which gives a rough introduction to AI, under the title: can a computer think ? while there some works in AI that strive to do it, today's software does cannot really think, and is not really smart. One can use the software to do things that look smart, but the wisdom is not in the software itself, it is in the way it is used. In the bank case, the software does not even look smart...

This week I had three visitors from Germany, Rainer von Ammon and two of his CITT colleagues, and we made some progress towards defining the EDBPM project that we plan to submit as EU project. They have asked me to pose in my office under my " wall of plaques" (half of them are in Hebrew, so they could not really read them...). So this is my most current picture..

One short clarification -- after my posting entitled : "event processing platforms - yes, but..."
I received some private communication claiming that there is a confusion between the terms "platforms" and "engines". The claim is that there are vendors who refer to their engines as platforms, moreover, some people refer to any run-time software as an engine. So I thought it worth clarifying how do I see the distinction:

Event Processing Platform is a software that enables the creation of event processing network, handle the routing of events among agents, management, and other common infrastructure issues.
Event Processing Engine is a software that enables the creation of the actual function - in the EPN term implementing agents.

This is similar to the difference between an application server and a single component.

What is the connection ---

On one extreme, there are closed platforms, i.e. platform that can run only one type of engines, in this case the distinction becomes more fuzzy.
On the other extreme -- there are open platforms, in this case these concepts are totally separated, a platform that can run multiple engines. The main issue about it is that there may be a collection of different languages that come with the different engines, and this may make the development of an application more difficult.

The first generation of event processing has started with engines that are stand-alone, the emergence of platforms, and making them open, are the signs of the second generation. I'll say more about the challenges of constructing the next generations -- more later.

Monday, September 8, 2008

A footnote to the streamSQL paper

The comment that my good friend Claudi (AKA Pattern Storm) made in the complexevents forum made me curious to actually read this paper; reading it I had the uncomfortable feeling that since people insist to use a language style that implies type of thinking about event processing, and this creates semantic problems which they try to solve by use the same type of thinking, with more complicated constructs.

I'll use one simple example taken from the paper, which they had to deal with semantic problems that were caused by the way the language semantics.

The scenario (translated to my language - without the "streams") -- Events are reported about cars that move through some segment of the road; each event consists of

There are also simultaneous events, i.e. several events that happen in the same time unit (what ever the time granularity is). The inputs are events of this type, the output is - for each event, generate a derived event that include the original attributes of the events and the average speed of cars in the same time unit. If you want to see the types of problems that the SQL implementators see in this simple example, read the streamsql paper. Instead of discussing SQL, I would like to show an alternative way to think about the same problem.

The slide below shows an alternative way to think about this problem - this is a very simple EPN (Event Processing Network) which has two functional agents, one producer (e.g. an event emitter that create events from video stream produced by a camera that looks at the road) and one consumer (whoever wants to see the output events)..

The two agents work under the same temporal context (it can be spatio-temporal if we also want to group by road segment) - in this case, a temporal context is opened and closed every beginning and end of 1 time unit.

The raw event is called "car position event" and it goes to both agents.
The first agent is an aggregator which calculates (incrementally) the average, since it is bounded to the context, the average is of events from the same time unit, at the end of the time unit it produces a single event "speed-average-event" with the structure

The second agent is a "pattern detector" which takes two input events - the "car position event" again, and the derived event "speed-average-event"; the pattern that need to be identified is AND, and the "speed-average-event" for that agent has a consumption policy of "reuse" (which means that if an event can be used for multiple patterns). The agent produces a derived event - for each AND pattern that consists of the "output-event" whose structure is:

This EPN does not involve "streams" - the thinking is "event oriented" and it attempts to provide natural thinking about event processing functionality.

Comments:

1. This is rather simple example, can also be solved by putting the average speed event on a global state (or event store/database) and then enrich it back - but the event-oriented is closer to the spirit of the original example which work on streams.

2. Aggregator and pattern detector are type of agents, there are some (not many) more types. Typically, an event processing network consist of multiple types of agents.

3. "Pattern Storm" claims that stream SQL ignore causality. One can view the relation between input events and output events of the same agent as a causality relation (he is using another scenario from the paper), and this can be set while defining the EPN.

One general comment (not related to this posting) - to "anonymous" - I'll gladly answer your question if you'll send it back and identify yourself. I don't publish anonymous comments.

I can post the solution to the rest of the examples in the stream SQL paper if anybody is interested...

Saturday, June 28, 2008

On embedded intelligence within event processing application

In the previous post I have referred to the term "Intelligent Event Processing" - one question that I have asked - is this a new term ? how does it related to the other "X event processing" terms ? -- I am not sure if the term "intelligent event processing" will stick around, I would say that a better way to explain what it is may be - "embedded intelligence in event processing".

If we look at event processing architectures

- there are: producers who produce events, consumers who consume the processing results, and the EPN (Event processing Network) in the middle, which really does the processing. So where are intelligent techniques can help - here are some (real) examples:

1. In the producer - the producer has a video stream of all cars that pass below the camera, an intelligent process (using image processing techniques) isolates the license plate number of the car, and send it for further processing (security, traffic violation, billing etc..).

2. In the "meta-data" composition -- a "pattern detection" node typically looks at pre-defined patterns and attempts to detect them in run-time. In current applications the patterns are entered by the developers or users. In some cases the patterns are "moving target" like in - fraud detection -- if the patterns for fraud are discovered they are of little value, and thus in the other side of the law - people are constantly looking for new loopholes, thus, intelligent techniques, such as machine learning are used to refresh the patterns that are looked for in run-time. The run-time does not change - same pattern-detection mechanism, just different sources of where these patterns come from.

3. Intelligent nodes within the EPN -- in some cases the process of derivation of new events cannot be expressed as derivation expression and need some intelligent derivation process - e.g. a heuristic algorithm to determine the traffic light policies based on traffic events.

There are many more examples - like creating predicted events and more -- but this was more to give some flavor. Is it useful -- yes, it is useful for a variety of applications. Does every CEP application need embedded intelligence -- not really. More - later.

Event Processing Thinking