Saturday, January 10, 2009

On disciplines and marketing devices


Yesterday I participated in the "parents teaching" program in my third daughter's junior high (8th grade) and gave the children a short introduction to the issue - does a computer think ? I did not give them an answer for this question, but gave them several basic puzzles and explained them how we can teach a computer to solved them -- one of them has been the old good missionaries and cannibals problem.



From the question --- does a computer think, I will move to the Blog of Hand Glide who phrased his posting in a form of a question -- CEP is a marketing device, so what does it say about CEP products ?

The answer is --- not much.

Let's change the TLA from CEP to SOA and ask the same question, the answer is that there are good and bad products that are marketed under the TLA of SOA, some of them have been here before SOA, and maybe some of them will be here if another TLA will dominate.

I have blogged before about the various interpretation of CEP, and the observation about what is called "CEP products" is that there is a variety of implementations that call themselves CEP, this does not teach anything about the quality of these products, their benefits to the business etc...

While TLAs became the property of marketing people to position products, somehow disciplines consist of one or two words such as: data management, image processing, graphics, information retrieval and many more - that's why I consistently use "event processing" when talking about the discipline.

Disciplines normally start in multiple places that try to solve similar (but not necessarily identical) problems, first generation of product is developed, and sometimes also hype is created and this is consistent with the "hype cycle" concept of Gartner. In the EPTS conference Brenda Michelson has argued that if anything this area is under-hyped and not over-hyped. There are some other indications that support her observation.

The early phases of a discipline lacks standard, agreed upon theory, and coherent thinking.
In the OMG meeting, March 2008, I have used the following slide as an example of what are the indications/conditions for a discipline to succeed:

The fact that EP is not in the maturity level of relational databases or some other more mature discipline is obvious, however, while there are people who made a career out of criticizing and complaining that what other people are doing is not good enough, I think that our challenge is to advance ---- it took years until there was an agreement what a relational database is, during which all databases suddenly became relational (to anybody old enough to remember, there were some funny situations of products that claim to have relational extension, when they did not understand the term), we need an event processing manifesto, and a collection of standards, but they will not be constructed in a single day, so we also need patient and persistence... I believe that EP will be 10 years from now one of the major disciplines of computing, and that we have the challenge to get there...

BTW - I agree with Hans that if products have business value for customers, they will be used regardless of the fact if at the end they will be classified EP or not. more - later

Monday, January 5, 2009

On event processing and some interesting queries

Some people have returned from the vacation with a surplus of energy, otherwise I cannot explain why my inbox today was full of mails from the same thread of discussion in the everlasting Yahoo CEP interest group trigerred by a question sent by Luis Poreza, a graduate student from University of Coimbra in Portugal. I am taking a liberty to re-write the question since it was phrased as a question in trading system, thus, some of the responders answered in trading related stuff that did not help to answer Luis' question, so getting as far away as possible from the stock market, I will base the rewriten question in the fish market. So the story is as follows: the price of 1 KG of fish is determined according to the hour, the demand, the supply and the general mood of the seller. In 10:50 he made this price as 71, then in 11:15 the price was down to 69 no more changes by 12:00. There is a computerized system that works in time windows of one hour starting every hour. The request is to find out for the time window 11:00 - 12:00 whether the price of 1 KG of fish was ever > 70. The claim is that intuitively the answer is yes, since the price in the interval [10:50, 11:15] was 71, but if we look at all the events that occurred at this window there was no event with value > 70, thus current "window oriented" tools will answer --- no.

There have been plenty of answers, some even tried to answer the question, for example by adding dummy events (one at the end of the interval ? every minute? ) with the value 71.

However -- I am going to claim the following assertions:

(1). The requirement given is not an event processing pattern.
(2). Attempts to treat it as event processing patterns are not very useful.
(3). It is in fact a kind of temporal query
(4). There may be a sense to have the capability to issue temporal queries as a response to events (AKA retrospective event processing) but this has to be done right.

Assertion one - the requirement is not an event processing pattern. Event processing pattern is a function of events, it is no surprise that Luis found some difficulty to phrase it as such. Let me take two other examples that look syntactically the same and try to understand what is the problem here:



The government agency example: A government agency known for its long queues in getting service tries to monitor the lenght of the queue. Periodically some clerk goes out and counts the number of people waiting in the queue. In 10:50 he found 71 people in the queue, in 11:15 69 people in the queue, no more samples by 12:00. Now the question is -- whether there has been some point in the time window between [11:00, 12:00] in which the number of people in the queue > 70.

Before starting the discussion, let's look at another example, the bank account example.
In 10:50 Mr. X has deposited $30, his previous balance was $41, which made his balance $71;
in 11:15 Mr. X has withdrawn $2, his balance was set to $69.

The fish market example looks from syntax point of view exactly like the queue monitoring example, in both cases we have events in the hours 10:50, 11:15 with attributes 71 and 69 respectively. However, they are not the same, the reason is that the price in the fish market is fixed until changed, while the length of the queue may have been changed several times up and down since the event here is only a sample and does not cover all events. Both of these events observe some state (price or length of queue), but the semantics is quite different. If we'll use the solution of dummy event for the queue case then the value will probably be wrong, furthermore, we cannot really answer the query in the queue case in "true" or "false", yet, in reality, periodic sampling is a totally valid type of events. Moreover, if we look at the bank account example, it looks very different from the fish market example -- it has two types of events, and the events do not observe a state, but report on change, and report the change value ("delta"). Thus looking at the two events of deposit and withdrawal we'll not be able also to answer the question, but knowing the state (balance of the account) and the delta (for the deposit and withdrawal) we are getting something which is semantically similar to the fish market example.

What can we learn from these examples? first that the property "the value is the same until it is changed" is not a property of an attribute in event, it is the property of the state (data) that may be created or updated by events. This is true for some state, this is not true for others. Solution given based on the fact that a human knows the semantics of this state, and writes ad-hoc query. However this is processing of the state, based on its semantic properties, and not of the events.

Assertion two -- Attempts to treat it as event processing is not useful.

In the past I've blogged about the hammer and the nail. There is a natal tendency of anybody who has a product to try and starch its boundaries. This may also backfire, since if trying to do some functions that this product is good at, and not doing great work can overshadow the good parts of the product. Solution like adding "dummy events" is a kind of hacking. It abuses the notion of event (since dummy event did not really happen), moreover, given the fact that this is just ad-hoc query, and there can be many such queries, in order to cover all them, we may need exponential number of dummy events... Anyway- event processing software is just a part of bigger picture, and instead of improvising, hacking or get to this functionality, it may be more advisable to use a product with better fit.


Assertion three -- This requirement is in fact a temporal query. I will not get into temporal queries now, but the actual query is over the price of 1 KG fish as changed by time. It is an existential query -- looking if some predicate holds somewhere in the interval. Other example of temporal queries can be: was there any day during the last 30 days in which the customer has withdrawn more than $10000 in a single withdrawal.

And this example brings us back to assertion four --- there may be a sense to couple event processing software with temporal queries. Example is that we have an event that makes a customer "suspect" in many laundering, but we need reinforcement by looking at some temporal queries in the past - like the one written above... I'll write about this type of functionality in a later phase.

Well - it is 1:15 AM, so I'd better take some sleep, tomorrow is again a busy day. So conclusion -- not everything that looks simple to do manually is simple to be done by a generic type of thinking, second -- event processing software should concentrate on doing event processing right, and not doing other stuff wrong... Some follow up Blog postings -- later

Sunday, January 4, 2009

On Event Processing Networks


Back in my office, with the machine-made coffee; starting the day by reading some stuff on the Web, and first I've seen David Luckham's request to write some prediction about the CEP market in 2009. It seems that I've misplaced my crystal ball, which probably means that I am not in the prophecy business recently. While there are things that are beyond our control, I think that the approach taken by Paul Vincent to talk about challenges is more constructive.

I agree that the one of the challenges is to define the borders of the area -- like other areas that have determined clear definition of their scope -- and maybe partition to sub-types. There are other challenges of interoperability -- how connectivity to many producers and many consumers of various types can be achieved, and also interoperability between event processors that can send events to each other. I view the EPTS work groups that will be launched hopefully later this month (and those who continue from the pre-EPTS era) as vehicles for the community effort to advance in these areas: the use-cases work group in defining the various sub-types, the language analysis one in working on required functions, the interoperability analysis one on interoperability issues, meta-modeling on the modeling perspective, and of course the glossary and the reference architecture as pivots in defining terms and relationships among them. We shall not finish all work in 2009, but my challenge to the community is to achieve significant progress in all of these during 2009, and make it the year in which much of the discipline will be defined.

In addition, I have also read with interest Philip Howard's short article on "Event Processing Networks" (Below is Philip's picture on the Web)

I have received it in direct Email from David Tucker, Event Zero's CEO, and later also found it on David Luckham's site. Anybody who reads my Blog may realize that I view the EPN as the
basis of the conceptual an execution model for event processing. Anybody who reads Philip's article may infer that EPN is a new concept invented by Event Zero, and this is not really true; Though
Event Zero is indeed one of the first companies to implement an EPN based solution.
The glossary defines EPN as: set of event processing agents (EPAs) and a set of event channels connecting them.
The glossary definition is very general and there can be many implementations that fit this definition. One view of EPN is as a conceptual model and implement it using existing tools, another view of EPN is as an execution architecture. With the few implementations of EPN right now we see the known phenomenon of the "Babylon tower" that I have written about in the past -- each implementation chooses its own set of primitives (in this case -- agent types).


The benefits of the EPN model is in its relative simplicity, generality, and its natural support in distributed environment and parallel processing (not for free, some more wisdom is required here!). My view is that the concept of EPN should be in the center of the event processing community efforts mentioned before --- from the fundamental theory to the execution optimizations. I'll write more on that in later Blogs.