Saturday, March 21, 2009

More on event-at-a-time processing

I have borrowed this picture from Brian Connell's postings a few months ago entitled: "one-by-one can still be CEP". I have not really returned to this topic, but it is a good time to do it now, following my last posting on "set-at-a-time" vs. "event-at-a-time". Many people are used to program in set-oriented languages like SQL, thus a simple match between two entities become first creating the Cartesian products of the sets they they belong to, and then selecting the element from this Cartesian product. This is not a natural way that people are thinking about processing events. Let's look at a nice penguins in the picture, we want to trace want happen in the penguin colony, by putting observing them. Let's say that we want to observe when a young penguin stays away more than 1KM from the home glacier f0r more than an hour, which may indicate that it can get lost. Furthermore, we wish to get the alert immediately when this happens, and not at the end of any time window. Some of the "set thinkers" view the "stream processing" paradigm as something organized in which the partition of events is well defined by the notions of streams and windows, and the processing already has the input set, and all it has to do is to apply some function on the input set, and "event-at-a-time" processing as ad-hoc programming in which events arrive to some event processor who then has to hard-code the entire semantics. But this is, of course, a misconception. Let's look again at the penguin example; assuming that we are looking at the following pattern: a young penguin may be lost if it stays over a 1 KM away from the glacier for over one hour. This can be expressed by set oriented processing, as looking at observations of the penguin (let's say we watch it once every minute), and determine at the end of an hour that all observations are more than 1KM away. But -- when do we start the one hour period? the answer is -- first time that the penguin crosses the 1KM bound, so we need a notion of an event that starts a time window, actually the term context is wider than time window -- it contains here : temporal aspect (within one hour, when the one hour can be initiated by a specific event), semantic aspect : an EPA is tracing a single specific penguin, so it is associated with some penguin id, and the pattern is spatial :
all events are more than 1KM away from the glacier. Now, what is the benefit of having event-at-a-time implementation, a simple benefit: if the young penguin starting to head back then we can close this context instance, and terminate the EPA, say after 3 minutes, and don't trace this penguin anymore, until the next time it swimming far, while in the set-at-a-time we'll determine only at the end of the time window that there is nothing to detect here. Of course, the set thinkers will immediately say that we can reduce the window, so reducing the window to units of single events exactly gives us the "event-at-a-time" notion. More than that, it is not only a question of efficiency, it is also question of expressing a fine-tuned semantics. Let's look at another penguin scenario: we are now tracing lazy penguin who return to the glacier after less than 2 minutes after jumping to the water. here we have a sequence of two events relate to the same penguin within a specific temporal context ("within 2 minutes"). This is not a set operation at all, it is looking of a sequence of two individual event. True- it can be expressed in set-oriented-programming we'll have to create two streams (or one heterogeneous stream) of "jumping to the water" and "returning to the glacier", and then join them, select the appropriate instance and thus determine which members of the set matched this pattern, but this is not a natural way to think about it. While in "event-at-a-time" this can be done by just opening a context-instance for every penguin that jumped into the water, if it does not return within 2 minutes, this context-instance is closed, if the penguin returns, then there is pattern match, and the context-instance is closed even earlier.

But let's move to an example about the tune-up of semantics. Assume that we are looking for the pattern saying that the average stay in the water of a penguin is less than 2 minutes, which may indicate some laziness plague, or any other plague that makes the penguins lazy. In a set-oriented programming we'll have to define the set -- let's say a time window of one hour, thus, when the set is all accumulated we can calculate the average and match it to the threshold; however, it becomes tricky when the average is actually a moving average, thus it may be possible that if we do this calculation after 30 minutes the pattern is matched, since the average in this 30 minutes of staying in the water is 1 minutes and 56 seconds, while if we consider the whole hour, the pattern is not matched, since the average is now 2 minutes and 9 seconds. Doing the calculation event-at-a-time enables us to get the average even to set of any size, even without committing ahead on the size of the set.

This is not done in ad-hoc processing, but supported with high-level programming primitives that are sometimes easier to express than their equivalent set-oriented notation.

There are of course cases in which the set-oriented calculation makes sense -- exactly when we are doing aggregations at the end of some fixed time intervals, and in some applications this may be the main function we need -- but, I assume that we'll see more and more hybrid applications.

Last but not least -- the distinction between "simple" and "complex" event processing is considered in the fact that simple deals with events being processed "one by one" while complex process multiple events, however "event-at-a-time" is not processing "one-by-one" since in the "one-by-one" processing, an event is processed without looking at other events, and in "event-at-a-time" each event is processed individually but within the state of a certain context-driven EPA. More - Later

Thursday, March 19, 2009

On data flows event flows and EPN

Bob Hagmann from Aleri (ex-Coral8) has advocated "data flow" model as an underlying model that unifies both engines of Aleri, and contrasts it with "event delivery systems" in which programmers create state manually if needed. I am not really familiar with the phrase "event delivery system" and don't know what he refers to, but there are event processing systems that employ different programming styles from stream processing, in which states are handled implicitly by the system and the programmer does not really deal with creating states.

But -- I have no interest in "language wars", my interest these days is somewhat different -- to find a conceptual model that can express in a seamless way functionality that exists by different programming styles.

Actually the conceptual model of EPN (event processing network) can be thought as a kind of data flow (although I prefer the term event flow - as what is flowing is really events). The processing unit is EPA (Event Processing Agent). There are indeed two types of input to EPA, which can be called "set-at-a-time" and "event-at-a-time". Typically SQL based languages are more geared to "set-at-a-time", and other languages styles (like ECA rule) are working "event-at-a-time". From conceptual point of view, an EPA get events in channels, one input channels may be of a "stream" type, and in other, the event flow one-by-one. As there are some functions that are naturally set-oriented and other that are naturally event-at-a-time oriented, and application may not fall nicely into one of them, it makes sense to have kind of hybrid systems, and have EPN as the conceptual model on top of both of them...

This is the short answer. More detailed discussion -- later.

Wednesday, March 18, 2009

Event Processing In Action

Event Processing In Action - this is a title of the book on which I have started to work recently together with my colleague Peter Niblett - although I am typically writing my Blogs always as I and not as We, this time I'll use We in any case that what I am writing refers to Peter as well. In the next few days the first chapter of the book will be made public by the publisher as a green paper. In the picture above you can see a provisional cover of this book, but this is not final yet. The book is planned to be available towards the end of 2009.

The Web 2.0 plays a role in this process, as explained below. Here are some Q&A about it,

What is the motivation for a new book ?

The book has been initiated by Manning Publications, a computing books publisher; their market survey indicated that there is a significant market need for a new book that will articulate and provide a deep dive into the concepts and facilities of event system applications. This book is intended to be the major reference book for enterprise architects, application developers (both technical and semi-technical), and is also expected to be used for instructional purposes (a textbook for a university level course on event processing).

The book written by David Luckham entitled "The power of events" (Addison-Wesley, 2002) has been very influential in setting the initial awareness to the event processing area, and it still is a big inspiration for us; the new book is intended to reflect the contemporary thinking around event processing which has been evolved since 2002.

Why have we agreed to write this book ?

Writing a book is a big responsibility, it is a substantial burden on our time. Furthermore it is a tremendous challenge to produce a high quality book in an emerging area for these target audiences - especially considering the very high expectations that have already been generated around this book. We believe that this book is indeed required, and as technical leaders in the community it is our duty to take this task and help shape the newly emerging discipline of event processing this way. We were also encouraged by our management and colleagues to take this mission.

What is the approach taken in this book ?

The approach taken in the book will not be surprising to the readers of this Blog. Indeed, the book can be considered as a direct descendant of the Blog, it seems that the publisher has approached me based on recommendations of anonymous members of the event processing community that referred him to look at this Blog. I got feedback from others that this Blog is one of the popular sources today to learn what event processing is, but the Blog, as a Blog, is not written in methodical way, it jumps from one topic to another, it treats the various topics in a relatively superficial way, and includes "noise" like this posting; the book should be more focused, getting things in the right order, and in the proper level of depth. The style of writing is similar to that of the Blog.

The book will explain all the event processing concepts by showing step-by-step how a single use case has been constructed. The explanation, like my approach in the Blog, is aimed to be language-style neutral and explain the concepts using a patterns oriented model (although, due to the ambiguity of the term patterns in event processing we use the term building blocks). We are planning to have an appendix in which we will list existing EP products and open source offerings, and provide some high-level details, without providing evaluation or endorsement to any of them. We'll ask for collaboration of the various product owners to get accurate information about their products.

What is the relationship between this book and IBM ?

Both Peter and myself are IBM employees; Peter works in the IBM Hursley Lab in England, where I am working in the IBM Haifa Research Lab in Israel. However we are writing this book (after clearing the legal and managerial permission) as individuals and not as IBM employees; A disclaimer stating that the book represent our opinions and not necessarily the opinion of IBM will be clearly made in the preface to the book, as is done in the top of this Blog. There is a big EP oriented community inside IBM and we hope to get feedback from this community, as part of the feedback from the larger community.

How are Web 2.0 technologies going to impact the authoring process?

As any other book, there is a formal review process, in which the publisher consults with a collection of reviewers representing people from the target audiences, thus most reviewers are architects and developments from various industries, and academic instructors teaching EP courses. In addition, nowadays, book authoring is also considered as an interactive process between the authors and the readers. The MEAP program (Manning Early Access Program) enables readers to interact with the authors through a forum, and contribute comments and questions on the book while being written; when the book will get into the MEAP program I'll further explain it

What are the next steps?

As I have said, Peter and myself are facing with a substantial challenge to create a high-quality book for the readers, and are sure that feedback and reviews from the larger community can help us provide a better book for the target audience; The green paper is due to appear hopefully by the end of this week; I'll post the URL on this Blog as soon as it is available, the MEAP for this book will be set up in the next few weeks. I'll also use this Blog to tell about some dilemmas and challenges in the writing process (another Web 2.0 means of communication).

More -- Later.

Sunday, March 15, 2009

On Cool Event Processing

Thanks to a recent posting of Tim Bass, I have watched now a really cool video from the MIT Media Lab,
if you have not already done it, watch and enjoy ! still in early phases, but very impressive !

This brings us to two interesting questions:

  1. Does this demo show an event processing application ?
  2. Should creating cool applications be our target ?
As for the first question -- the main achievement of the MIT Media Lab video demonstration is the ability to point with the finger on some item (a person, a product in the supermarket etc..) and use image processing technologies to identify it, bring information from the Web, and screen it on the item itself (e.g. screen the Amazon book reviews on the book, screen annotations about the person on a person's body etc..). This is an extremely impressive blend of technologies, but not really an event processing. To me it looks as a request-response type of application and not event-driven. The action of pointing out an object is a request to identify it, which in turn sends another request to search the web. Not really event processing, but certainly very cool...

Which leads to the interesting question number 2 --- for sure, it is easier to impress and sell technologies through cool applications. Event processing has some cool applications in processing events in games, processing event in the smart house that automatically turns on and off the lights, re-stocks the refrigerator, and invites technician to fix the air condition. I think that the issue of event processing for the individual consumer market has not been investigated well, and in that context the "cool" stuff is certainly a good way to sell...
While looking at the majority of the work done today in event processing, it relates to enterprise computing, in enterprise computing the main criterion is ROI, there may be nothing exciting about an accounting, procurement or regulation enforcement applications, but since they are part of the enterprise's bread and butter, technologies that enable to the enterprise to do them more effective/more efficient may bring a lot of ROI. Since the decision makers are people, and decision making is not necessarily a rational process, cool demos are highly recommended...

More - Later.

On the act of creation

This is a kind of a side topic, but somehow related to this Blog. I am getting various relations to this Blog, some of them as comments, and some as off-line Emails. Recently I have received an Email about it, I'll not cite it, but reading it reminded me a really old book that I have read at the age of 16-17 0r so, called -- the act of creation by Arthur Koestler -- whose cover (from the Amazon site) you can see here. I think that this is the first heavy book I've read in English, and it was not a very easy reading. Anyway --- the book is making a claim that there are three types of "creative people": the scientist, the artist and the joker - both of them apply similar patterns of thinking to achieve a different goal. I have moved through different phases in life -- since early childhood I used to do what is called today "stand-up comedy" in all class parties, with the biggest performance in the final party after high-school graduation; in other phase in life I spent some time on writing poetry and short stories (even published some), and from a certain point in life I have deserted both and became a scientist, which is still how I define myself today.

Thinking in retrospect I tend to agree with Arthur Koestler that these three types of creation share something in common. How does it relate to my Blog ? -- I'll leave it as an exercise to the reader.... Back to "event processing thinking" - soon.