Friday, November 20, 2009

On Inexact events


Back to chapter 11 in the EPIA book that deals with challenges that developers and users of event processing systems should be aware. One these topics is the issue of inexact events. The basic assumption about current systems is this is a projection of the "closed world assumption" kind of thinking, which assumes that every event that is reported really happened, that every details in the event payload is accurate, and that every event that happened was indeed reported. In reality, one or more of these assumptions may be invalid from several reasons, as shown in the following figure:

As shown in this figure there are several reasons for making one or more of these assumptions invalid.

The source (e.g. sensor) may malfunction; if the source is an instrumented program, there may be a bug in the instrumentation.

The source (event producer) may be malicious, and send wrong information in order to sabotage the system.

The inexactness maybe a projection of temporal anomalies discussed before, e.g. derived event that has not been detected.

This inexactness may be propagated, as a derived event is derived from an event which is by itself inexact.

The source itself may be imprecise, thus some of the content may not be accurate.

The input events may be based on sample or estimates.

The uncertainty does not stop in event content, it also exists in the bridge between events and situations, I'll write on that topic in a separate posting.


Thursday, November 19, 2009

On the Fast Flower Delivery example and various programming styles of event processing



As explained in one of the previous postings, we are using a single example in the EPIA book as an example that accompanies the book, this is part of the book's methodology. I'll write more about the methodology more, as we are now writing the preface for this book explaining the methodology (among other things). Actually today I received an additional review of the book from the publisher and the reviewer has criticized the example claiming the since most of the readers are men, example that relates to flower may be considered as too feminine, and realizing that it is too late to change, suggested that we'll select another example in the second edition.

Well, I am thinking what will appeal to the real machos.



Maybe we should go for Poker example -- this is a real macho staff...




On second thought, the real machos are engaged in boxing, so maybe we should have an example around boxing match... go figure...

Getting serious now. The FFD (Fast Flower Delivery) example is explained in the book using our building block approach, it is also demonstrated using several languages. I have approached the entire community earlier this year, and there has been a very good willingness of participating in this game, implementing the FFD example in various languages by the "language owners". We have six languages participating in the game now. Languages implemented by four commercial products:
  • Aleri (actually the CCL language originally Coral8)
  • Apama (owned by Progress Software)
  • Rulecore (a Sweden based company)
  • Streambase
and two open sources:

  • Esper
  • Etalis
The reader will be able to look at the example implemented in these six languages; furthermore, will be able to download a full or demo version of the engine implementing this language. As written before the logistics of constructing this website, validating the solutions etc... are done by students taking my event processing course. Some examples will be brought in-line to the various chapters of the book to provide the readers some glimpse of the different styles.
We should have a "Beta version" of this website within a couple of weeks.

I'll update about this experiment more.


Tuesday, November 17, 2009

When does a derived event actually happen? - (posting II)

In the previous posting I've shown some possible anomalies when dealing with derived events. The picture above shows a snowfall as a derived event, actually where I am located, in Haifa, this is a very rare event (once every 20 years for a few minutes). There are various types of derived event, this time I'll discuss derived events of two different patterns: sequence pattern, and time-out pattern.

Example 1: The pattern is: if a sequence of events E1 and then E2 occur, derive event E3.
Let's assume that event E1 occurs at 9:00 and arrives to the system at 9:02, and event E2 occurs at 9:30, and arrives to the system at 9:31. The derived event is derived by the system in 9:33. The question is when does event E3 occurs. One can think of three logical possibilities:

I: E3 occurs when it is produced in 9:33; the rationale: since it is a virtual event, it does not occur in reality, and exists only since it is derived by the system.
II: E3 occurs when the last event that triggers the pattern matching occurs, in this case, in 9:30; the rationale: the derived event occurs when the patterns conditions are satisfied in reality, and this occurs when E2 occurs.
III: E3 occurs over the interval [9:00, 9:30]; the rationale: the derived event occurs over the interval of all participating events.

Example 2: The pattern is time-out (absence event). Example: if there is no bid for an auction by the end of the auction time, derive an event "no bidders".
Scenario: A bid was issued in 9:00 and is valid for 2 hours, in 11:00 it is closed without any bidders, in 11:02 the system issues the derived event.
We have similar three alternatives here:
I: The no bidders event occurs in 11:02, the time that the derived event is issued.
II: The no bidders events occurs in 11:00, when the "bid close" event occurs, which completes the pattern.
III: The no bidders event occurs during the interval [9:00, 11:00] --- since the "absent" event relates to the entire interval.

Like some other cases, there is no single solution that fits all cases; and the actual semantics of a specific case is a matter of policies, we see here three policies, which seem to cover most cases, but not necessarily all, that's why there is a need also to enable explicit derivation of the occurrence time of a derived event, i.e. the value of the occurrence time itself can be computed and derived.

More about temporal issues -- later.

Saturday, November 14, 2009

When does a derived event actually happen? - (posting I)



Just finished reading the book "Flash Forward" by Robert Sawyer. Science fiction was always my favorite type of literature, and my favorite writers are Asimov and Hienlein. There are science fictions writers among the following generation that stand out, and the Canadian writer Sawyer, who does not forget to give Canada a role in each of his books, is one of those. I have read several (not yet all) of his books. The best of these I read so far is the Neanderthal Parallax trilogy, which is also very though provoking besides being fascinating. "Flash Forward" book, which is now also becoming a TV series deals with an experiment that get everybody in the universe to jump forward 21 years in time for 2 minutes, this is a combination of science fiction, a book that raises some philosophical issues, and a suspender, highly recommended.

The question of time and deep temporal issues also was one of my favorite research topics, since time has physical, philosophical, and also computer science implication. Back to event processing, recently I have written the "warnings" chapter in the EPIA book, and one of the interesting question is: when does a derived update occur?
As discussed before, there are two dimensions for answering the question: occurrence time which stand for the time in which an event occurs in reality, and detection time which stands for the time in which an event is detected by the event processing system. Both of these are not obvious in the case that the event is derived. If we take the naive approach that a derived data occurs when the system computes it then we can have several anomalies. Consider the following simple example: there is an auction system, each auction has some auction context time interval, in which this auction is valid, and people are doing bids. The auction works on fairness criterion, which gives preference to people who did the bid earlier, in case of multiple bidders that made the maximal bid. The raw event is bid request, but the entry to the bid process is a derived event, since the event has to be enriched, validated, and some details added from the previous bid of the same bidders (if exists). If we take the time that the derived event actually happened as its occurrence time then we can have some semantic anomalies, as shown in the following figure:


Anomaly 1 (on the right hand side) is realized by the fact that though the bid request is done within the auction validity interval, the bid entry occurs after the auction interval ends and will not get into the auction processing.
Anomaly 2 (on the left hand side) is realized by the fact that orders of the bid requests can be reversed by their corresponding derived events and thus the outcome of this auction may not be consistent with the auctions' rules.

This is just one example that create a bias into a particular solution, however, the reality is even more complicated, since in different cases the answer to the question poses in the title of the postings may not be the same, thus policies should be used to disambiguate the semantics here.

I'll have a follow-up posting with discussion about the proposed policies for this case.

Friday, November 13, 2009

On EPIA and Friday the 13th

Today is Friday the 13th, some people have superstitions about the number 13th in general (many hotels don't have 13th floor, sometimes not even X13 room), and about Friday the 13th in specific. It seems that Manning, the publisher which publishes the EPIA book is having $13 off the list price in the Manning Early Access Program, so today is an opportunity to purchase the book $13 cheaper, get into the book's MEAP site and if you purchase the book, when checking out use the code: fri13 as a promotion code.

This is also a good opportunity to update about the book status. We have received the review reports from the 2nd review (actually 3rd including the reviews on the book proposal). Somehow the reviewers keep changing, which make them somewhat inconsistent with previous reviews. Reviews are good for improving the quality of the manuscript, it is also shows the necessity of writing a forward to the book explaining exactly what is the focus of this book, as various people have in mind various thing, and as I have written in the recent three book reviews on this Blog, books come from different focus, to different audience, so it is important to set the expectations right about what the book is (a in-depth technical book about the concepts behind designing event processing applications) and what it is not: It does not follow a single language, it is generic and demonstrated through multiple languages, a concept that is new for some readers, also it is not book about how EDA fits SOA, BPM, Messaging and other adjacent concepts and does not take a business oriented perspective, we write briefly about these topics (some reviewers think they are vital, other think they are boring), we leave the business oriented discussion to the book of Chandy and Schulte, and we'll devise an "additional reading" section for each chapter. We are now working on the last 1/3 of the book and intend to finish by early December, and also get the first version of the website alive.

Yesterday we also had an internal briefing in IBM about the book, and this is the slide that ended our presentation.



Wednesday, November 11, 2009

On Defining "EVENT" in Earnest

Professional books are not that funny, this is left for comedies. My favorite comedy of all times is Oscar Wilde's "The Importance of being Earnest". In Hebrew it was translated literally to something like "The importance of seriousness", and everybody who know what it is talking about understands that this translation totally misses the point of this comedy. Anyway, I recalled Oscar Wilde's old play, when reading the book by Mani Chandy and Roy Schulte recently, since they have in their book a section called "defining "EVENT" in Earnest". In this section they are saying that there are three school of thoughts about how EVENT is defined:
  • State-change view - an event is a change in the state of something and as such is reported. Its properties: a change must occur, and this change must be reported. Example: An item previously outside the range of RFID reader, is now within the range of this RFID reader.
  • Happening view -- an event is anything that happens, or is contemplated as happening (the EPTS glossary definition), in this case, a change must occur, but its reporting to the system is optional, not every event according to this definition is of interest to be reported. Example: A person sending Email
  • Detectable-condition view -- an event is a detectable condition that can trigger a notification, in this case a change does not have to occur, but reporting should occur. Example: A GPS devise reporting track location (note -- location may not have changed since last report. since the track driver went for lunch).

This is an interesting observation, some people argue that only the first type is an event, while the other types are not. My view is that all the above are actually events. The question is whether we can come with an inclusive, agreed upon definition of event, maybe the glossary team (co-lead by Roy Schulte) should take this challenge.

More about event types - later.

Tuesday, November 10, 2009

On the Event-Driven Architecture book

Last in the series of 2009 event related books is the book entitled: "Event-driven architecture - How SOA enables the Real-Time Enterprise". This book was published early this year, and I actually purchased it while visiting the USA earlier this year, and while doing the other book reviews it is a good time to write about this book as well.

The book, unlike the others, does not deal with event processing, it deals with EDA as a central concept, starting with a "working definition": event-driven architecture is one that has the ability to detect events and react intelligently on them. I have some trouble to digest this definition, since in my mind, architectures don't possess abilities. Part I of the book talks about "The Theory of EDA", in which it starts with a second "working systemic definition" saying that EDA is the complete array of architectural elements, including design, planning, technology, organization, and so on, which enables the ability to disseminate event immediately to all interest parties, human or automated. So now this is a definition of architecture for event/message routing, but I already noted that this is not about event processing. Next it goes in depth about the relationships between EDA and SOA, explaining on its way what SOA is. The metaphor used throughout is a nervous system, and this is talking about enterprise nervous systems, the discussion about SOA and related concepts spans over four chapters, ending with some hints of how to calculate ROI of selecting architecture style, but the ROI discussion remains in title levels. The second part of the book goes from theory to practice, in this case they are saying that the products implementing EDA are called ESB (Enterprise Service Bus), and (rightfully) claiming that the main gap in using EDA is that people are not used to think in EDA. However, while they have a chapter called "thinking EDA", its insights of how to "think EDA" stay in a very high level area. Going from the thinking to the examples, the book discusses in big details three examples: Airline flight control, Anti-money laundering, and event-driven productivity infrastructure (under this name there is a description of a framework to connect workflows, E-mail, phone, document repositories, blogs, wikis, social networks and some other stuff).
The book ends after these four example chapters (which actually take more than 50% of the pages), without any conclusion chapter.

It seems that the examples are the essence of the book, and the previous chapters are introduction, the examples also remain in the transport level, and while in one of the example "rule engines" are mentioned as part of the architecture, the book says very little about them.

Looking at the reviews in Amazon, it has polar opinions going from 1 star to 5 starts, I guess that I am somewhere in the middle, for somebody who does not have a clue about what EDA is it provides simple non-technical explanation, and such people found it useful; however, I agree with the 1 star reviewer that it does not really making a convincing story on the sub-title promise - "How SOA enables the real-time enterprise".

This completes my book reviews. We'll see some more books in this area coming in 2010.