Saturday, October 20, 2012

multidisciplinary research -- duck, cheetah, sailfish and spine-tailed-swift

I have recently written about "lead vs. impact" in industry research,  today I would like to continue these series of thoughts by observing that achieving a lead is often a result of multidisciplinary research.  There are two ways to approach multidisciplinary research, one is:  develop versatile, multidisciplinary researchers, and the other is tight collaboration of researches from multiple disciplines.   The difference can be explained by examples from the animal world.

A duck is a multidisciplinary animal


It swims, it walks and it flies.  It is not excellent in any of them, but he can do all.  

Looking at the question who is the fastest swimmer, flier and runner we come across the sailfish, spine-tailed-swift and cheetah 



It is obviously cheaper to keep one duck than to keep these three animals,  and in many cases the walking, swimming and flying abilities of the duck are "good enough".   This is particularly true when the scale of ambition for the research is "impact".  However, when the aim is "lead", it is often the combination of people who are excellent in their disciplines collaborating together which constitutes the lead.    Versatility cannot replace excellence.     More  posts in these series - later.   

Friday, October 19, 2012

Lead vs. impact in industrial research

The quarterly letter of John Kelly, the head of IBM Research Division reminded us that his first directive is to shift the main goal of what IBM research is doing from impact and contribute to lead.   

Much of the work done in industrial research today falls into the contribution impact,  as illustrated in the picture, the human models are already there, and the contribution is in coloring them.  Much of the industrial research today concentrates upon incremental contribution to the company's products or services.  




The alternative is lead.  Take the Israeli invention known as USB stick  (in Israel we call it disk-on-key), the picture below explains why.



Lead means the creation of something new that is not an increment of some existing stuff.

This leads to several questions - first, is this scalable, in the sense that - are there enough revolutionary ideas, or if there are enough people skilled enough - even in the research community - to generate such ideas. 
Another question is - work towards lead is much more risky relative to work towards incremental contribution.  It might also take less resources, although there is no exact correlations, some of the revolutions were done by small teams,  e.g. the relational model was devised by a single person.   Furthermore,  leading may mean disrupting some existing interest, thus accumulate enemies and encounter the corporate's immune system, see my post about the "innovator's dilemma".
In my opinion the answer is that while incremental contribution should not be eliminated, the best researchers have to be geared towards the drive for lead.    This requires a supporting culture, in which risks are tolerated.   This is sometimes against the DNA of the risk-averse companies, and the tendency to focus on the pressing business as usual stuff which is mainly incremental.   I think that even if we assume a certain rate of failures,   one revolutionary result worth thousand evolutionary ones, and this is the relative scale of scores that should be weighted.     

While I referred to the context of research in industry  and corporate cultures, this is also somewhat true for academic research which also have great deal of incremental work AKA "delta papers".


I am investigating recently the impact of my own research work over the years and will write about it soon, with some interim conclusions.  


Call for papers - DEBS 2013 in Arlington Texas


The DEBS conference is returning to the USA, in 2013 it will occur in Arlington, Texas.
The call for papers (in the research and industrial tracks), tutorials, demos, PhD workshop, and grand challenge can be found in the conference's website.    Note that the industrial track consists of industry papers, and industry experience reports do not require full papers.

The deadline for submission in most categories is February 8, 2013  and the conference itself will be held on June 29 - July 3, 2013.


Thursday, October 11, 2012

On gesture events as regular expressions - Proton from Berkeley

Proton is a name of a project in which have investigated the proactive event-driven approach (see our talk in DEBS'2012). I came across another proton, this time from UC Berkeley.  It deals with codifying gestures as regular expressions of touch event symbols.  In the website you can find tutorial, downloadable version and papers.   Interesting idea,  enjoy!

Wednesday, October 10, 2012

SAS announcement on event processing


SAS announced today that a new "SAS DataFlux Event Stream Processing Engine" will be available in December.  It is described as: "the new software is a form of complex event processing (CEP) technology...incorporates relational, procedural and pattern-matching analysis of structured and unstructured data".     Welcome to the event processing club,  this seems to be an indication that the analytics guys see the value of adding event processing to their portfolio, I guess that either the "limited appeal" of event processing has somewhat changed in the last couple of years to justify it.  Anyway - I welcome SAS to the club, and hope that they will also become active  part of the event processing community.  


Sunday, October 7, 2012

On big data, small things and events that matter

In a recent post in the Harvard Business Review Blog entitled: "Big Data Doesn't Work if You Ignore the Small Things that Matter" ,  Robert Plant argues that in some cases organization invest a lot in "big data" projects trying to get insights around their strategy, while failing to notice the small things, like customers leaving due to bad service.   Indeed big data and analytics are now fashionable and somewhat over-hyped.  There is also some belief, fueled by the buzz that it solves all the problems of the universe, as argued by Sethu Raman in his DEBS'12 keynote address.   Events are playing both in the big data game, but also in the small data game, trying to observe a current happening, such as time-out on service, long queues etc..., when it relates to service, and other phenomena in other domains.  Sometimes the small things are the most critical.
I'll write more about big data and statistical reasoning in a subsequent post.

Saturday, October 6, 2012

More on the semantic overloading of derived events


I am recently getting back to the time in which I have dealt with semantic data models, and now I am trying to view current event-driven applications in that way, thus the semantic overloading is one of the interesting first issues that emerge.  I'll write more about semantic modeling of event processing later, but right now I'll concentrate in the semantic overloading of derived events.   There are various definitions of the term "event", but in all of them event represents a VERB in the natural language.   Looking at what we defined as derived events, it seems that some of the derived events we are looking at can indeed be described by a verb in the natural language, while others are really described by nouns.    Thus my current thinking is to have the semantic notion of DERIVATION, but the derivation can yield different concepts:
Events - when indeed the derived conclusion is that something (virtually) happened.
Entity facts - when the derived conclusion is a value of some fact
Messages - when the derived conclusion is some observation that has to be notified to some actor. 

Examples from the Fast Flower Delivery use case that we used in the EPIA book.  

The automatic assignment creates a real event -- can be expressed by the verb ASSIGN
The timeout pattern "pickup alert" which means that a pickup was not done on time --- this is an observation that is notified to somebody.  It is therefore a message that can be expressed by NOTIFICATION
The driver-ranking calculated as a function of assignment count, is actually a fact related to driver, driver-ranking is a noun, thus it is a derived fact.

More - later.