Saturday, April 11, 2009

Some footnotes to the forthcoming book "Event Processing in Action" - Take One

Last night, I went to see a movie (a rare event for me) -- and chose to see "Slumdog Millionaire", my daughter told me later that people who have not read the book enjoyed it more, the movie is OK, even cute, however -- for a movie who won 8 academy awards, I have somewhat bigger expectations (comparing for example to "Gone with the wind" who also had 8 academy awards. Well -- the movie industry is probably not peaking these days...

Today, together with (most of) my tribe, we have done some hiking in a place called "Judge River", well, river in the local terms, with a modest amount of water, but bridge, a lot of trees, some flowers, and since it is a holiday, a lot of people.

Now back home and like any Sunday morning I plan to go to one of the coffee shops (I am rotating between the coffee shops in Haifa, well, to be exact, those who have free parking nearby) to work on revisions to the draft of chapter four of the "Event Processing in Action" book.

From time to time I'll blog about giving some footnote from behind the scenes of the book-being-written. Today I'll blog about several issues: scope, language and exercises.

Scope: The idea is to focus about teaching the event processing concepts step-by-step using a use case which will accompany the book throughout, so the question is -- what is the scope of event processing. We define this scope by defining the "event processing network", and thus the question, that I started discussing in my precious posting is -- whether pre-processing and post-processing to the event processing network is part of the event processing network. While we have a chapter that is dedicated to event producers (and pre-processing) and another chapter that is dedicated to event consumers (and post-processing), the scope of what we discuss as part of the specification of the event processing part do not include what is done by the producers and consumers, whose projection on the EPN is the events they produce and consume. However, there is a case in which a consumer is also a producer, and this is important since there is a possible causality relationship between the event it consumes and consumer and event it produces. As an example: the use case is talking about "fast flower delivery" and one of its functions is choosing the driver that will get the delivery among the drivers that has issued a bid. Some of the stored prefer automatic assignment by the system, and some want to get the bids and do the assignment on their own. The automatic assignment is definitely an EPA (Event Processing Action), since this is a software that performs some operation on events, however the manual assignment can be either manual, or the store is using some external software to do it, however, this is not really part of the EPN, thus it is not modelled by the system. We are of course interested to trace the assignment to the bid which is the input to the store. This is also a good example to show that the same event type can include both raw events (the manual assignment are raw events from the EPN POV) and derived events (the automatic assignment).

Language: We decided neither to use any single language to explain the concepts, nor to invent a new language. However, we believe that just a theoretical discussion will not be enough. What we have decided to do is to take a "building block" approach, in which the different parts of the systems (event types, event processing agents etc..) are specified using "definition elements" which are platform independent concepts, or in other words, meta language. In each section we'll provide the full part of the application using this meta language, in order to connect it to the "ground", we'll also make samples of these definitions using variety of languages in various style. Thus, chapter 4 that I am writing now talks about defining the event schema. We define the schema using our "event type" building block, and will also show definitions in various schema languages (XML, positional relational-schema-like etc..), the same will go for all types of event processing agents. We intend to ask owners of existing languages (from those who will agree to get their languages analyzed by the EPTS event processing languages analysis -- taking on another hat) to provide language definition of our use case, and will check the possibility of posting them all.

Last but not least are the exercises, as we want the book to be a textbook for academic course on event processing, as one of its targets, we have decided to put exercises at the end of each chapter for the benefit of the students and instructors (we also plan to provide slides in the future), one of the questions we agreed with the publisher to ask the reviewers (there is a formal review for each 1/3 of the book) is whether this is the right way or it can make other readers uncomfortable. The options are now: leave as is (exercises at the end of each chapter, make all exercises as appendix or remove them completely from the book, and have them available on a website).

That's all for now -- more footnotes - later.

Friday, April 10, 2009

On the boundaries of event processing

We are in the Passover vacation, to celebrate the biblical story of getting out from Egypt where the sea has split and people could move in the gap that was created, well -- I guess that at that time people just walked, but if it would have occurred today it might have looked like this.
Today I would like to write something about the "boundaries" of event processing, based on some discussions last week, related to writing a book about event processing. There are two issues related to the scope:
  • Is pre-processing to emit event by the producer, and post-processing of events by the consumer are part of the event processing systems?
  • Are pre-processing to obtain the event processing patterns that has to be monitored (i.e. using machine learning techniques) part of the event processing systems?
From the point of view of "event processing language", if we'll include the pre-processing and post-processing we'll have to extend the language to have the expressive power of any programming language, which will loose the focus on specific event processing functionality. Thus, while an application may require pre and post processing, this is typically outside the "event processing network". The main point of using "event processing language" and not hard-coding the event processing functionality in Java, C# or any other imperative general-purpose language is using higher level abstraction. As an analog, before the days of SQL we had to read from the database, loop over a record, and evaluate the conditions in hard-coded way, SQL did not provide anything we could not write in Cobol or PL/I (the languages of that time...), but just provided a more concise way to write it. The situation in event processing is similar, we can write something that is specified as:
" Match a pattern of events which is a conjunction of type E1, E2, E3 that refer to the same person and all occur within one hour since an event of type E0 for the same person, if there are several instances of E1, E2, E3 take the most recent of each at the point that the match occurred, and if there are multiple matches within this same time interval, ignore all but the first". Of course, one can write it in Java, but a language that enables to write this pattern in less than 1 minute is more cost-effective.

Back to the scope -- pre and post processing of events and patterns are not part of the event processing system, and typically done in different technologies. This does not say that they are not important, sometimes the pre-processing of events is more complicated than the event processing, especially since it is hard-code.

More on this - later

Tuesday, April 7, 2009

"Complex Event Processing poised to growth" in IEEE Computer

IEEE Computer, the flagship magazine of IEEE Computer society, publishes in the April issue an article entitled: "Complex Event Processing poised to growth" by Neal Leavitt, under the section "industry trends". The magazine, which has relatively large distribution, explains the basic concepts and trends of event processing, and cite some of the EPTS steering committee members like active John Morrell from Aleri, Alan Lundberg from TIBCO, David Luckham, Roy Schulte from Gartner and myself. It also cits some other people in the community. EPTS is also mentioned explicitly. The fact that one of the popular professional magazines chose to dedicate and article, indicates a growing interest in the area, and this is just one indication. As I noted before, the February issue of International Journal of Banking Systems, which is a specific industry journal, has published an article on "event horizon". Enjoy !

Sunday, April 5, 2009

On the "Return on Investment" in Event Processing

As part of the orders that I got from my physician, to which I am humbly comply, is to spend around one hour walking every day, if I have time I am doing it outside, if not I am doing it at home on an electric walker. Yesterday I walked around, not far from home, and decided to take a shortcut via a woodland, not far from home, but not a familiar one, I saw a trail that seemed to go up in the hill, where the top of the hill was supposed to lead me back to somewhere in my neighborhood, there was a split in the trail and I chose a random one, and after five minutes realized that it leads to a dead end, however, I did not feel like going down, so I continued to climb up and pave the way among bushes, fallen trees etc -- quite irresponsible of me, especially as it was getting dark outside -- after 40 minutes of wondering I saw the back yard of an house, navigated there, and got safe and sound indeed to somewhere in the neighborhood. Somehow felt as return to childhood -- but not for long.

Anyway, today I would like to write something about the "Return on Investment" in event processing.

Mark Palmer, the current CEO of Streambase, has recently blogged about the fact that CEP is not about "feeds and speed" but about "ease of use", it is actually refreshing to see it from a Streambase person, since in the past some Streambase people claimed that the only reason to use a CEP engine is due to its scalability properties. Actually I have written one of my first postings on this Blog, entitled "the mythical event per second" saying something about it. I agree that there are some applications that require to satisfy high throughput or any other QOS metrics as a crucial requirement, but this is a secondary ROI type. The major one is providing abstractions that reduces the cost of the development and consequently the maintenance of event-driven applications. This is similar to what the DBMS discipline provides us -- as a grey-bearded old timer who is not completely senile, I still remember the times we have worked with file systems, DBMS provided many abstractions that makes the data oriented applications much easier to develop. The same goes for event processing, I am constantly saying to people who ask -- is there something new in event processing ? the answer is -- not really, event processing were hard-coded within regular programming for ages, however, since traditional programming languages and environments were not created to process events, the manual work required is quite substantial. The reduction in cost relative to hard coding can be substantial, and some customers have estimated it in 75% reduction. It will be interesting to do an empirical study about it, probably a challenge for our EPTS use case work group. More about ROI -- at later posts.