Friday, May 10, 2013

Event processing - small data vs. big data and the Sorites Paradox.

This picture is taken from a blog post from the "Big Data Journal" by Jim Kaskade entitled "Real-time Big Data or Small Data".  

Kaskade attempts to define quantitative metrics to what is "small data" vs. what is "big data".  
In terms of throughput big data is defined as >> 1K event per second, while small data is << 1K per second, I guess that around 1K event per second is defined as medium data...  
On variety big data is defined as at least 6 sources of structured events and at least 6 sources of unstructured events.  There are other dimensions like - small data relates to one function in the organization, while big data to several lines of business.     

The attempt to define where "big data" starts is interesting, the main issue is what are the conditions in which implementation of systems should become different, and here the borders are not that clear, since there are currently systems that can scale both up and down.

Interestingly -- "Big" and "Small" are fuzzy terms.  Which reminds me on one of the variations of the Sorites Paradox,  that I've came across during my Philosophy studies, many years ago, which goes roughly like this.

Claim:  Every heap of stones is a small heap.
Proof by mathenatical induction.
Base:  A heap of 1 stone is a small heap
Inductive step:  Take a small heap of K stones and add 1 stone, surely it will stay a small heap.



Thursday, May 9, 2013

Causality vs. correlation - statistical reasoning is not enough - NY Times Interview with Dave Ferrucci


Dave Ferrucci, who was until several months ago an IBM Fellow  and was known as the father of Watson, was interviewed by the NY Times in his new working place at Bridgewater Associates.

In the interview Ferrruci somewhat continues the line of thought of Noam Chomsky,  saying that AI has concentrated around statistical reasoning based on correlations, but the drawback is that one cannot understand why the prediction made by the statistical reasoning is correct.  While Chomsky bluntly stated that statistical reasoning does not create a solid model of the universe, Ferruci claims that a complementary approach is required -  understanding causality.    This is a rather old issue, in symbolic logic, there is a distinction between "material implication"  which states that  IF A is true then B is true, and the meaning is that always when A is true then B is also true, which makes a sentence like  "If the week has seven days than  the capital city of France is Paris" - a valid statement in logic.    Entailment, on the other hand, said that "A ENTAILS B" if it is necessary and relevant, in other word, there is a causality among them.  Thus, Ferruci concentrates now on building causality models to model the world economy.      I concur with the assertion that understanding causalities give better abilities of reasoning and prediction.   As David Luckham already noted, causality among events is one of the major abstraction of event processing models.   Here is a rather old discussion about causality of events.  

Tuesday, May 7, 2013

Event processing academic course at the university of Potsdam

I am following academic courses on event processing, and today came across a graduate seminar entitled "event processing technology"  given by Mathias Weske, who is known for his work on business process management,  given in the Hasso Plattner Institute at the University of Potsdam.

It is interesting to note the topics covered in this seminar: 
  1. Scalability: complex event processing solutions for high performance and low latency 
  2. Aggregation concepts: event processing approaches to extract business information from raw events
  3. Correlation: combining BPM and CEP
  4. Uncertainty: handling of noise in data streams
  5. Prediction: predict future events
  6. Heterogeneity: Processing heterogeneous events

All of them are active research topics in event processing.    Some of them are citing our work on uncertainty and proactive event processing.   It will be interesting to collect information about event processing academic courses worldwide and lesson learned from them.  Academic courses are enablers of making a technology part of main stream computing.