Saturday, March 15, 2014

Google flu trends as a lesson in big data prediction

A recent article in the science section of TIME magazine reports that prediction using "big data" techniques is not as easy as portrayed.  It analyzes the Google Flu Trend case, in which the assumption has been that there is a strong correlation between the spread of flu, and the searchers for flu related terms in Google.   It seems that this does not produce accurate results.   The article claims that while using the big data methods is useful, they should be combined with traditional "small data" methods.  There are various definitions of what a small data is - for example, the one from "small data group"Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.   

I guess that this also relates to the discussion about understanding causality in addition to statistical correlation that I've discussed before on this blog.

Sunday, March 9, 2014

Big Data analytics by Robin Bloor

Today I had to give students in a seminar introduction to big data analytics -- I chose a recent presentation by Robin Bloor (from slideshare). Bloor states that the term "data science" is a misnomer, since all science is empirical and involves analysis of data.   This is true for many of the sciences, still if my memory does not mislead me Einstein did not use empirical analysis of data to come with the relativity theory.  It also goes to the discussion of causality vs. correlation in science.   In any event, Bloor asserts that data science is actually a multidisciplinary efforts involves software engineering, statistics and domain knowledge. 

BI, according to this presentation, is partitioned to:  
  • Hindsight: regular reporting
  • Oversight: dashboards etc,
  • Insight: data mining & statistical analysis
  • Foresight: predictive analytics
He does not get as far as prescriptive analytics,  and puts the heavyweight on the insight. 
The second part of the presentation gives fast introduction to machine learning.       Overall, it gives introductory level insights on insights from big data, and is well presented as such.

Sunday, March 2, 2014

On building bad (and good) research centers


I am sitting at my new office, where my task is to construct the "Technological Empowerment Institute".
My YVC colleague Rachel Or-Bach attracted my attention to an article in the current issue of CACM entitled "how to build a bad research centers".  The author David Patterson from UC Berkeley provides first the negative side --- how to build a bad research center, and then turning to the reverse of each premise, building the positive side.   His advises for the good side are:

Good Commandment 1. Thou shalt mix disciplines in a center.

Good Commandment 2. Thou shalt limit the expanse of a center.

Good Commandment 3. Thou shalt limit the duration of a center. 

Good Commandment 4. Thou shalt build a centerwide prototype

Good Commandment 5. Thou shalt disturb thy neighbors.

Good Commandment 6. Thou shalt talk to strangers.

Good Commandment 7. Thou shalt find a leader.

Good Commandment 8. Thou shalt honor impact.

Some advices to take under consideration.  I do believe in mixing disciplines, though its means also mixing cultures which is not always easy.   Focus, and short to medium range targets are also key properties.   
Last but not least, I also believe in tangible impact,  beyond publications, while the impact criteria should be well-defined.  

More about our own activity - later.   






Wednesday, February 26, 2014

Comptel announces event processing on mobile

The Wall Street Journal brings a news report from Helsinki, that Comptel announced the use of its mediation software by tier 1 mobile operator in the USA.  In this news report it is said that  the mobile operator required a high performing solution capable of handling real-time data collection and complex event processing (CEP) with ease.   Looking at the website of Comptel for more details, I found that Comptel says it has comprehensive and complex event processing, however no  more details were available about the solution.   I have written before about event processing and the mobile world, and believe that we'll see a lot more of it.  It is interesting to view how this one was implemented. 

Thursday, February 20, 2014

Moving On

Next week I am moving on,  after 16.5 years in IBM, I have decided to move on and pursue a societal challenge, by accepting an offer to establish and manage a new applied research institute, whose temporary name is "Technological Empowerment Institute".  The empowerment aspect is aimed to empower both sectors and populations that require substantial enhancement of their capabilities to enjoy high technology.  
The societal aspect is twofold -- both the societal challenges and the location, it will be based in the periphery.   The question is whether it has any relation to my past work on event processing?  the answer is definitely yes,  I view the Internet of Things as a major vehicle for achieving many of the challenges, and the work I was recently involved in, making event processing accessible to larger audiences as one of the key ares.  The institute will include multi-disciplinary activities, and implementation projects to meet the challenges.  It is initiated by YVC,  a relatively young but ambitious  academic institute, and I'll report to the president of YVC.   However, it will include activities and affiliate members that will span multiple sites and researchers.
I'll provide more details about it,  in fact, this Blog will from now on reflect my experience in trying to establish and operate this institute (but I'll not change its name as I am always think in events).  
Next week I'll wrap up my presence in IBM (this week I packed my office, and unpacked in the new one -- which had a toll on my aching back), and I'll summarize my tenure there.    I am taking an exciting but very challenging task,  and will issue call for people  to be involved in various capacities. 

Sunday, February 2, 2014

Revisiting the FFD example in the EPIA book - call for contributions

In 2009 when Peter Niblett and myself wrote the "Event Processing in Action" book we came with an example called "Fast Flower Delivery" that accompanied the book.   it seems that the book is still being used (it was cited so far 276 times),  one idea that came recently in discussions with the publisher is to refresh the implementation page.   We now issue a call for implementations.   All owners of event processing tool (commercial or open source) is called to show how this example is developed in their tool so the reader (and interested general audience) will be able to view it.   In order to qualify one has to create a webpage that contains - source code of the use case with any additional reference to the language and product behind it.  
The book's webpage will include links of all such solutions.  For more details -- please contact me.  
The full details are the FFD example are linked here. 

Tuesday, January 21, 2014

Some simplification goals in the design of the event model

I have written in this Blog about our work on "The Event Model" which is based on the search for simplification in event-based modeling.   Here are some of the simplification goals that we strive to achieve while designing the model.   These are a high level goals.  

1. Stick to the basics by eliminating technical details.    Looking at designs and implementations of event-driven applications, one can observe that there are two types of logic: the business logic, which directly states how derived events are generated and how the values of their attributes are assigned, and supporting logic that is intended to enrich events, or query databases as part of the processing.
2. Employ top down, goal oriented design.    Many of the design tools require logic completeness (such as referential integrity) at all times.  This entails the need to build the model in a bottom up fashion, namely all the meta-data elements are required to be defined (events, attributes, data elements) prior to referring to them in the logic definition.   Our second simplification design goal is to support top down design, and allow temporary inconsistency working in the “forgive” mode  in which some details may be completed at a later phase.  This design goal complements the “stick to the basics” goal, by concentrating on the business logic first, and completing the data aspects later.
3. Reduce the quantity of logic artifacts.  In typical event processing application, there may be multiple logic artifacts (event processing agents, queries, or processing elements depending on the programming model) that stand for different circumstances in which a single derived event is being derived.  Our design goal is to have a single logic artifact for every derived event that accumulates all circumstances in which this derived event is generated.  This goal reduces the number of logic artifacts and makes it bounded by the quantity of derived events.  It also eases the verifiability of the system, since possible logical contradictions are resolved by the semantics of this single logic artifact.
4. Use fact types as first class citizens in the model.  In many of the models, terms in the user’s terminology are modeled as attributes that are subordinates of entities or relationships.  In some cases it is more intuitive to view these concepts as “fact types” and make them first class citizen of the model, where the entity or event they are associated with is secondary (and may be a matter of implementation decisions).  This is again consistent with the “stick to the basic” goal. 

These goals are high level.  I'll write more details in the future about the ways we chose to satisfy each of these goals, and discuss alternatives for doing that.  I guess that over time we'll accumulate more simplification goals.