Saturday, November 14, 2009

When does a derived event actually happen? - (posting I)

Just finished reading the book "Flash Forward" by Robert Sawyer. Science fiction was always my favorite type of literature, and my favorite writers are Asimov and Hienlein. There are science fictions writers among the following generation that stand out, and the Canadian writer Sawyer, who does not forget to give Canada a role in each of his books, is one of those. I have read several (not yet all) of his books. The best of these I read so far is the Neanderthal Parallax trilogy, which is also very though provoking besides being fascinating. "Flash Forward" book, which is now also becoming a TV series deals with an experiment that get everybody in the universe to jump forward 21 years in time for 2 minutes, this is a combination of science fiction, a book that raises some philosophical issues, and a suspender, highly recommended.

The question of time and deep temporal issues also was one of my favorite research topics, since time has physical, philosophical, and also computer science implication. Back to event processing, recently I have written the "warnings" chapter in the EPIA book, and one of the interesting question is: when does a derived update occur?
As discussed before, there are two dimensions for answering the question: occurrence time which stand for the time in which an event occurs in reality, and detection time which stands for the time in which an event is detected by the event processing system. Both of these are not obvious in the case that the event is derived. If we take the naive approach that a derived data occurs when the system computes it then we can have several anomalies. Consider the following simple example: there is an auction system, each auction has some auction context time interval, in which this auction is valid, and people are doing bids. The auction works on fairness criterion, which gives preference to people who did the bid earlier, in case of multiple bidders that made the maximal bid. The raw event is bid request, but the entry to the bid process is a derived event, since the event has to be enriched, validated, and some details added from the previous bid of the same bidders (if exists). If we take the time that the derived event actually happened as its occurrence time then we can have some semantic anomalies, as shown in the following figure:

Anomaly 1 (on the right hand side) is realized by the fact that though the bid request is done within the auction validity interval, the bid entry occurs after the auction interval ends and will not get into the auction processing.
Anomaly 2 (on the left hand side) is realized by the fact that orders of the bid requests can be reversed by their corresponding derived events and thus the outcome of this auction may not be consistent with the auctions' rules.

This is just one example that create a bias into a particular solution, however, the reality is even more complicated, since in different cases the answer to the question poses in the title of the postings may not be the same, thus policies should be used to disambiguate the semantics here.

I'll have a follow-up posting with discussion about the proposed policies for this case.

Friday, November 13, 2009

On EPIA and Friday the 13th

Today is Friday the 13th, some people have superstitions about the number 13th in general (many hotels don't have 13th floor, sometimes not even X13 room), and about Friday the 13th in specific. It seems that Manning, the publisher which publishes the EPIA book is having $13 off the list price in the Manning Early Access Program, so today is an opportunity to purchase the book $13 cheaper, get into the book's MEAP site and if you purchase the book, when checking out use the code: fri13 as a promotion code.

This is also a good opportunity to update about the book status. We have received the review reports from the 2nd review (actually 3rd including the reviews on the book proposal). Somehow the reviewers keep changing, which make them somewhat inconsistent with previous reviews. Reviews are good for improving the quality of the manuscript, it is also shows the necessity of writing a forward to the book explaining exactly what is the focus of this book, as various people have in mind various thing, and as I have written in the recent three book reviews on this Blog, books come from different focus, to different audience, so it is important to set the expectations right about what the book is (a in-depth technical book about the concepts behind designing event processing applications) and what it is not: It does not follow a single language, it is generic and demonstrated through multiple languages, a concept that is new for some readers, also it is not book about how EDA fits SOA, BPM, Messaging and other adjacent concepts and does not take a business oriented perspective, we write briefly about these topics (some reviewers think they are vital, other think they are boring), we leave the business oriented discussion to the book of Chandy and Schulte, and we'll devise an "additional reading" section for each chapter. We are now working on the last 1/3 of the book and intend to finish by early December, and also get the first version of the website alive.

Yesterday we also had an internal briefing in IBM about the book, and this is the slide that ended our presentation.

Wednesday, November 11, 2009

On Defining "EVENT" in Earnest

Professional books are not that funny, this is left for comedies. My favorite comedy of all times is Oscar Wilde's "The Importance of being Earnest". In Hebrew it was translated literally to something like "The importance of seriousness", and everybody who know what it is talking about understands that this translation totally misses the point of this comedy. Anyway, I recalled Oscar Wilde's old play, when reading the book by Mani Chandy and Roy Schulte recently, since they have in their book a section called "defining "EVENT" in Earnest". In this section they are saying that there are three school of thoughts about how EVENT is defined:
  • State-change view - an event is a change in the state of something and as such is reported. Its properties: a change must occur, and this change must be reported. Example: An item previously outside the range of RFID reader, is now within the range of this RFID reader.
  • Happening view -- an event is anything that happens, or is contemplated as happening (the EPTS glossary definition), in this case, a change must occur, but its reporting to the system is optional, not every event according to this definition is of interest to be reported. Example: A person sending Email
  • Detectable-condition view -- an event is a detectable condition that can trigger a notification, in this case a change does not have to occur, but reporting should occur. Example: A GPS devise reporting track location (note -- location may not have changed since last report. since the track driver went for lunch).

This is an interesting observation, some people argue that only the first type is an event, while the other types are not. My view is that all the above are actually events. The question is whether we can come with an inclusive, agreed upon definition of event, maybe the glossary team (co-lead by Roy Schulte) should take this challenge.

More about event types - later.

Tuesday, November 10, 2009

On the Event-Driven Architecture book

Last in the series of 2009 event related books is the book entitled: "Event-driven architecture - How SOA enables the Real-Time Enterprise". This book was published early this year, and I actually purchased it while visiting the USA earlier this year, and while doing the other book reviews it is a good time to write about this book as well.

The book, unlike the others, does not deal with event processing, it deals with EDA as a central concept, starting with a "working definition": event-driven architecture is one that has the ability to detect events and react intelligently on them. I have some trouble to digest this definition, since in my mind, architectures don't possess abilities. Part I of the book talks about "The Theory of EDA", in which it starts with a second "working systemic definition" saying that EDA is the complete array of architectural elements, including design, planning, technology, organization, and so on, which enables the ability to disseminate event immediately to all interest parties, human or automated. So now this is a definition of architecture for event/message routing, but I already noted that this is not about event processing. Next it goes in depth about the relationships between EDA and SOA, explaining on its way what SOA is. The metaphor used throughout is a nervous system, and this is talking about enterprise nervous systems, the discussion about SOA and related concepts spans over four chapters, ending with some hints of how to calculate ROI of selecting architecture style, but the ROI discussion remains in title levels. The second part of the book goes from theory to practice, in this case they are saying that the products implementing EDA are called ESB (Enterprise Service Bus), and (rightfully) claiming that the main gap in using EDA is that people are not used to think in EDA. However, while they have a chapter called "thinking EDA", its insights of how to "think EDA" stay in a very high level area. Going from the thinking to the examples, the book discusses in big details three examples: Airline flight control, Anti-money laundering, and event-driven productivity infrastructure (under this name there is a description of a framework to connect workflows, E-mail, phone, document repositories, blogs, wikis, social networks and some other stuff).
The book ends after these four example chapters (which actually take more than 50% of the pages), without any conclusion chapter.

It seems that the examples are the essence of the book, and the previous chapters are introduction, the examples also remain in the transport level, and while in one of the example "rule engines" are mentioned as part of the architecture, the book says very little about them.

Looking at the reviews in Amazon, it has polar opinions going from 1 star to 5 starts, I guess that I am somewhere in the middle, for somebody who does not have a clue about what EDA is it provides simple non-technical explanation, and such people found it useful; however, I agree with the 1 star reviewer that it does not really making a convincing story on the sub-title promise - "How SOA enables the real-time enterprise".

This completes my book reviews. We'll see some more books in this area coming in 2010.

Monday, November 9, 2009

On Stream Data Processing book by Chkravarthy and Jiang

Another related book that arrived yesterday is the book entitled: "Stream Data Processing: A Quality of Service Perspective - modeling, scheduling, load shedding and complex event processing".

First - let's start with a lesson in economics. Looking at the Amazon query about "event processing books", one can realize that the Amazon price for the book of Chandy and Schulte that I described yesterday is $32.97, the new EDA book, by Taylor et al costs in Amazon $37.30, and the book I am talking about today has Amazon price of $112.45 -- roughly a price of four books. So the economic question is what makes it so expensive? My guess is that the answer is that books of the type of the two referred book (and probably our upcoming book is within the same category) relies on the fact that people will want to buy these books out of their own pocket, while academic books, especially part of Springer series (this one is part of the series "Advances in Database Systems"), have captive audience of university libraries. I wonder how many people are willing to pay this price out of their own pocket for that book.

Now -- from the business side to the book itself. Sharma is an old colleague from my active database days. The book takes a database approach and starts by explaining why data streams are paradigm shift relative to traditional databases, then it moves to explain the notion of data streams, and gets into QoS metrics, moving to data stream challenges, and introduces CEP as a complementary technology whose support as part of the data stream management system is posed as a challenge, follows by a literature review, including a survey of commercial and open sources stream and CEP systems, that seems to me to have false positives and false negatives. Then start the more academic oriented discussion about modeling continuous queries, with theorems and Greek letters, next is discussion about engineering oriented aspects of DSMS like scheduling and load shedding.

After discussing all this, the authors move to discuss integration between stream and complex event processing, starting with differences, and stating that it will be difficult to combine incompatible execution models, nevertheless, the authors are not afraid of difficulties and a page later describe an integrated architecture, which is a layered architecture, where the stream processing is done first, as a result there is a phase of event generation, as a second layer, where the event processing is a third layer, and rule processing as a fourth layer. I think that strict hierarchical architectures are somewhat simplistic for realistic scenarios (I'll need to write something about it at later point) , then the authors dedicate two chapters to describe their prototypes, and the books concludes with conclusions and future directions, but they seem to be ideas to extend the current issues discussed.

Bottom line -- seems like an academic journal paper that has scaled up (324 pages including long list of references (not lexicographically sorted), and index. May have interest to those who wants to study the formal aspects of stream processing.

I also got with the package two books about causality models, but I need to read them first before making any comment on them.

Sunday, November 8, 2009

On the Event Processing book by Chandy and Schulte

Today I got a package of books from Amazon that included two new event processing related books. I'll review the first of them today. This is the book by Mani Chandy and Roy Schulte, called "event processing - designing IT systems for agile companies". The title itself (agile companies) indicates that the book is business related, and indeed it is primarily answers the questions: why use event processing, and how it is related to other concepts in enterprise architecture concepts. The book is non-technical and fits the level of managers/CIOs/ business analysts. The book starts with overview and business context of event processing, talks about business patterns of event processing (another type of patterns, besides all other types of event processing patterns), talks about costs and benefits of event-processing applications, and types of event processing applications. After doing the ROI part, it goes to more architectural discussion -- getting top-down approach: EDA, events, and employing the architecture. Next there are two chapters about positioning event processing against the rest of the universe: SOA, BPM, BAM, BI, rule engines (I'll write about this positioning attempts in later postings). Towards the end there is a chapter of advices how to handle event processing applications (and this chapter reads like analysts report). Last chapter talks about the future of event processing, again from business perspective, future applications, barriers and dangers (again a topic for which I should dedicate a complete discussion), and drivers for adoption.

In conclusion: good book to everybody who wants to know what event processing is and what is its business value. Things that I thought such a book might also include --- some reference to what currently exists in the industry, how the state-of-the-practice relates to these theoretical concepts presented in the book, when COTS event processing should be used vs. hard-coded, which are practical considerations of event processing applications
(maybe in the second edition?)

For those who asked me what is the relationships between the book Peter Niblett and myself are writing and this book, the answer is that our book has a totally different focus, explaining step-by-step, what is needed to build an event processing technology, providing the reader an opportunity to experience the various approaches in the state-of-the-practice by providing a free downloadable versions of various products and open source. The target population is also different - we aim for designers, architects, developers and CS students, while The book by Mani and Roy is aimed at managers, business analysts and MBA students. The review of the second related book - later.

On challenging topics for event procesing developers and users

Spent much of the weekend in working on the EPIA book, time is getting closer to finish, and now it is the last 1/3 of the book. While in the first 2/3 of the book we concentrate on explaining what event processing is, and going step-by-step on the different ingredients of building applications, the last part of the book deal with some implementation issues, focus on challenging topics, and our view for the event processing of tomorrow. The chapter that I worked on in the last few days - chapter 11 (has nothing to do with bankruptcy), deals with challenging topics for event processing developers and users. This means -- topics that the developers and users have to pay attention, since: there are issues that can influence the quality of results obtained from an event processing systems, and the current state of the art does not have magic bullets to resolve them. In this postings I'll just provide the list of topics discussed in this chapter, I'll write about some of them in the future, here is the list:
  • Occurrence time that occur over intervals: Events typically occur over intervals, but for computational reasons it is convenient to approximate it to a time-point, and look at events in the discrete space; however, for some events this is not an accurate thing to do, and interval-based temporal semantics should be supported, along with operations associated with them.
  • Temporal properties of derived events: For raw event, we defined occurrence time as the time it occurred in reality, and detection time, as the time that the system detected its existence. What are the temporal properties of derived events? there is no unique solution to this question.
  • Out-of-order events: This topic is the topic most investigated among the challenging topics, however, current solutions are based on assumptions that are sometimes problematic. This problem is about events that arrive out of order, where the event processing operation is order-sensitive.
  • Uncertain events: Uncertainty whether event has happened, due to malfunction, malicious or inaccurate sources
  • Inexact content of events: Similar to uncertain events, some content in the event payload including temporal and spatial properties of the events may not be accurate.
  • Inexact matching between events and situations. Situations are the events that require reaction in the user's mind. This is in getting us back from the computer domain to the real-world domain. Situation is being represented as a raw or derived event, but this may be only approximation, since there may be false positives and false negatives in the transfer between the domains.
  • Traceability of lineage for event or action, this gets to the notion of determination of causality. Since in some cases there are operations in the middle of the causality network outside the event processing systems boundaries (e.g. event consumer who is also event producer) causality may not be automatically determined.
  • Retraction of event: ways to undo the logical effects of events, sometimes tricky or impossible, but seems to be a repeating pattern.

More about some of them - later.