Saturday, July 30, 2011

More on event processing as analytics/business intelligence

This is nice  picture I borrowed from a BI blog,  talking about seven ways to report BI results.  

Last week I have written on the fact that nowadays we see various sources who classify event processing as a kind of analytics,   Gartner, in a recent report, used the term "active analytics" in a report classifying event processing as analytics.    Jeff Wooton from Sybase said in the Sybase Blog that I am pondering whether event processing is analytics, and stated his positive answer to that,    After looking up the meaning of the word "pondering" (I still have some holes in my English vocabulary), I am not sure it is the right phrase to express my attitude.   I have drawn an observation that people did not use to classify event processing as analytics in the past, and started to do it in the last year or so.  Since I did not see a big shift in what event processing is doing at that time, then either there is a shift in how analytics is defined, or that the event processing marketing guys are riding on the analytics hype.  I guess that there is a truth in both.   The analytics guys are talking about event processing as part of analytics from their side.  
Paul Vincent noticed that there is a recent Communication of ACM article entitled "An overview of Business Intelligence technology".    Paul complains about the fact that the authors looks at event processing as equals to continuous queries, while in reality many of the models (and products) are base on other programming models such as rules (Paul has illustrated the distribution into programming models on a genealogy illustration of event processing languages).      Paul is right in the sense that people from the database research community view all the universe as centered around their terminology, and are sometimes ignorant about other approaches.  What makes it more interest is that one of the CACM article authors is Umesh Dayal (currently in HP Labs),  Umesh  (which is one of  my early professional inspirations) is considered to be the father of "active databases", and the HiPaC project in CCA that he managed has coined the Event-Condition-Action architecture, and in a conversation I had with him a few years ago he thought that the ECA architecture had a lot of traction;  Seems that Umesh did not keep track on the descendants of HiPac!    
Anyway -- besides the complains,  this is an indication that the BI guys adopted event processing to be part of BI from their side,  which is somewhat extending the definition of what BI is.   Interesting!

Friday, July 29, 2011

The 10 rules of scalability

Yesterday I got an Email from LinkedIn with 77 pictures that show those of my contacts who moved from the beginning of 2011,  while some of them just sent title within their organizations, most of them indeed moved to another organizations, some from senior positions in big companies to start-ups.   Is it a trend now that 2011 is a year in which many people are moving?   

ACM keeps sending me hard copy of the "Communication of ACM", in addition to send Email whenever the electronic copy is available.   Yesterday I browsed through the June 2011 issue (it takes more than a month to get it delivered),  and found out the paper entitled: "10 rules for scalable performance in simple operation's Datastores"  by Mike Stonebraker and  Rick Kattell.   The 10 rules are summarized in the illustration below. 

This is a mix of various types of advises:  from -  use high availability and automatic recovery, to don't try to build ACID consistency yourself,  through don't be afraid to use high level languages, and even use open source.  The domain of this paper are - as declared in the title -  "simple operation's data stores",  the question is what can we learn about scalability in event processing --  which is somewhat different -- neither focused around data stores, nor around simple operations.  Also, scalability in event processing have several dimensions, not just scalability in the number of events, in fact in the DEBS 2011 tutorial we mapped all the scalability dimensions in event processing 

I guess that shared-nothing architecture is always a good practice, the use of high level languages and the utilization of main memory are also good practices.  Recovery is a matter of application's requirement, for some application recoverability is vital, for others - it is not really necessary.  As for the use of open source,  it is again depending on the context.     In summary -- some of the rules are rather well known best practices, some are subjective, and some are context dependent.

Tuesday, July 26, 2011

Event processing = big data + real time ?

BIG DATA is a hot phrase, one can also be labelled as "BIG DATA NERD" by purchasing the shirt shown above.   The term BIG DATA refers to the explosion of data in the universe, especially methods to store and process huge amount of data.   

Paul Vincent poses the question:  CEP = real-time bid-data?     Actually he doesn't answer this question on his blog posting, but the spirit of the postings makes the impression that the "=" relations is evaluated to "True".    

The answer is that it is not really an equality, but more like an intersecting concepts.   

Event processing of course does neither necessarily works on big data nor it necessarily has to satisfy real-time constraints.

On the other hand,  real-time big data, is not necessarily event-driven and process events at all, as big data can be fairly static.    As an example for real-time big data, I'll return to an invited talk in DEBS 2011 on Watson, in which there has been a live demonstration of  the Jeopardy! game with the Watson computerized system, you can see in the picture that Paul Vincent is playing (very skilfully) the host.  

Watson is a "big data" crunching application, it has very strict real-time constraints, so it certainly can qualify as "real-time big-data", alas, it has nothing to do with event processing, in fact the data exists prior to the game  and no additional data is added during the game.

While the equality does not work,  there is an obvious relation;   in some cases there are substantial number of data inputs, which much of it can be reduced, or filtered out.   Event processing can be used to process streaming data on the fly, and not follow the paradigm of store now process later.   This is important especially one the processing is required in real-time.

More on event processing and big data -- later

Monday, July 25, 2011

On event processing critical success factors

Some other input related to student's work, is the posting of Sascha Retter, based on his diploma thesis (the term discloses the origin in Germany).   The observation is about critical success factors of event processing systems -  integration with workflow management system, and modeling tools.   There is some truth in both observations, but I would like to take broader view. 

There is a market for stand-alone event processing systems, but we have observed that the larger market is  event processing embedded in other systems,  workflow management systems, or in the more modern name "Business Process Management" (BPM) is one of these systems, but this is not the only one, there are both other systems in which event processing can be embedded inside, e.g. analytic frameworks, sensor networks and more.  It may also be embedded within packaged applications - like trading platforms, supply chain management and much more.  

Modeling tool are a vehicle for usability, and indeed usability and ease of use has been identified as a critical success factor.   This is now a well recognized fact.

There are other critical success factors for various applications, like non-functional requirements (performance).  I also believe that standards are critical success factor for the entire area.

The EPTS use case survey provides more insights into properties and preferences of event processing customers. 


On terminology again

While the EPTS glossary team is about to produce additional version of the glossary,  I received by Email a question about terminology from a student investigating this area, whether "event driven architecture" and "event processing"  (both titles of books) are indeed the same thing  

The answer is -- not really, but the difference is simple:   

Event Driven Architecture is an architecture in which there are event producers, event processors, and event consumers, with the following principles:  1). All players are decoupled;  2). All the communication between players is that they send events to one another:  3). All communication between players is asynchronous. 

Event Processing is a type of processing that takes as input one or more events and produce as output one or more event applying functions such as: filtering, aggregation, transformation and pattern matching.   

The concepts are related in the sense that it is common to implement event processing on top of event driven architecture, but it is not tightly coupled:
Event Driven Architecture can implement  pub/sub systems that are doing routing only and do not do any processing of events;
Event Processing can be be implemented in synchronous mode and not on top of  EDA.