Monday, November 9, 2009

On Stream Data Processing book by Chkravarthy and Jiang


Another related book that arrived yesterday is the book entitled: "Stream Data Processing: A Quality of Service Perspective - modeling, scheduling, load shedding and complex event processing".

First - let's start with a lesson in economics. Looking at the Amazon query about "event processing books", one can realize that the Amazon price for the book of Chandy and Schulte that I described yesterday is $32.97, the new EDA book, by Taylor et al costs in Amazon $37.30, and the book I am talking about today has Amazon price of $112.45 -- roughly a price of four books. So the economic question is what makes it so expensive? My guess is that the answer is that books of the type of the two referred book (and probably our upcoming book is within the same category) relies on the fact that people will want to buy these books out of their own pocket, while academic books, especially part of Springer series (this one is part of the series "Advances in Database Systems"), have captive audience of university libraries. I wonder how many people are willing to pay this price out of their own pocket for that book.

Now -- from the business side to the book itself. Sharma is an old colleague from my active database days. The book takes a database approach and starts by explaining why data streams are paradigm shift relative to traditional databases, then it moves to explain the notion of data streams, and gets into QoS metrics, moving to data stream challenges, and introduces CEP as a complementary technology whose support as part of the data stream management system is posed as a challenge, follows by a literature review, including a survey of commercial and open sources stream and CEP systems, that seems to me to have false positives and false negatives. Then start the more academic oriented discussion about modeling continuous queries, with theorems and Greek letters, next is discussion about engineering oriented aspects of DSMS like scheduling and load shedding.

After discussing all this, the authors move to discuss integration between stream and complex event processing, starting with differences, and stating that it will be difficult to combine incompatible execution models, nevertheless, the authors are not afraid of difficulties and a page later describe an integrated architecture, which is a layered architecture, where the stream processing is done first, as a result there is a phase of event generation, as a second layer, where the event processing is a third layer, and rule processing as a fourth layer. I think that strict hierarchical architectures are somewhat simplistic for realistic scenarios (I'll need to write something about it at later point) , then the authors dedicate two chapters to describe their prototypes, and the books concludes with conclusions and future directions, but they seem to be ideas to extend the current issues discussed.

Bottom line -- seems like an academic journal paper that has scaled up (324 pages including long list of references (not lexicographically sorted), and index. May have interest to those who wants to study the formal aspects of stream processing.

I also got with the package two books about causality models, but I need to read them first before making any comment on them.

1 comment:

Anonymous said...

"the stream processing is done first, as a result there is a phase of event generation, as a second layer, where the event processing is a third layer, and rule processing as a fourth layer."

Yes this does seem a mite simplistic, although there is no problem in having a default (but overridable) layering of "functions"... eg filter, detect, decide, react or somesuch.

PS thanks for the review!

Cheers