Thursday, March 27, 2014

My talk in the Technion Big Data workshop




Yesterday, I gave a talk in the Technion Computer Engineering Big Data days --  the talk dealt with three topics:  why  the Internet of things did not happen yet,  very brief introduction to "The Event Model", and a new introduction of the Technological Empowerment Institute.  I'll write more about the institute soon.


Tuesday, March 25, 2014

Latency might be subjective

Last week I have been in a short visit to England, served as external examiner in a PhD exam (called by the locals: "Viva"). The exam was strange in the sense that it was the first PhD exam I have ever attended where the advisor was not present (actually he was not invited, I guess it is a matter of culture).   The dissertation that was submitted by Jenny Li dealt with performance metrics and benchmark framework for event processing system.  One of the ideas raised in this dissertation is that the measured latency may be subjective in what it means.   It took me some time to understand the idea, so I'll explain it through example.  

Let's assume that the pattern that we are looking for is a sequence of four events, of types E1, E2, E3, E4.
The two common metrics associated with latency are:

Measurement starts at E4, since the consumer expects to see results only when the last event that closes the loop occurs.

Measurement start at any event occurrence, and ends when the system finishes processing the individual event, which can be merely storing it in the buffer.

Note that the first metric is biased towards eager evaluation, doing the minimal work at the end, and preparing sub-patterns, while the second metric is more balanced. 

The proposed metric is -- start measuring in the event most significant to the consumer, and end at the end of the processing of the pattern.   Let's say that the most significant event to the consumer is E2, then the latency starts at the occurrence of E2, and ends either when there is a match of a pattern, or when the event E2 can be discarded since there is no match.   This can be applicable in cases that all events in the sequence typically happen (e.g. when they are time series events) in relatively fixed differences.   This is an interesting metric, we asked the student to define definite criteria for when it is an applicable metric and when it is not.   

Saturday, March 15, 2014

Google flu trends as a lesson in big data prediction

A recent article in the science section of TIME magazine reports that prediction using "big data" techniques is not as easy as portrayed.  It analyzes the Google Flu Trend case, in which the assumption has been that there is a strong correlation between the spread of flu, and the searchers for flu related terms in Google.   It seems that this does not produce accurate results.   The article claims that while using the big data methods is useful, they should be combined with traditional "small data" methods.  There are various definitions of what a small data is - for example, the one from "small data group"Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.   

I guess that this also relates to the discussion about understanding causality in addition to statistical correlation that I've discussed before on this blog.

Sunday, March 9, 2014

Big Data analytics by Robin Bloor

Today I had to give students in a seminar introduction to big data analytics -- I chose a recent presentation by Robin Bloor (from slideshare). Bloor states that the term "data science" is a misnomer, since all science is empirical and involves analysis of data.   This is true for many of the sciences, still if my memory does not mislead me Einstein did not use empirical analysis of data to come with the relativity theory.  It also goes to the discussion of causality vs. correlation in science.   In any event, Bloor asserts that data science is actually a multidisciplinary efforts involves software engineering, statistics and domain knowledge. 

BI, according to this presentation, is partitioned to:  
  • Hindsight: regular reporting
  • Oversight: dashboards etc,
  • Insight: data mining & statistical analysis
  • Foresight: predictive analytics
He does not get as far as prescriptive analytics,  and puts the heavyweight on the insight. 
The second part of the presentation gives fast introduction to machine learning.       Overall, it gives introductory level insights on insights from big data, and is well presented as such.

Sunday, March 2, 2014

On building bad (and good) research centers


I am sitting at my new office, where my task is to construct the "Technological Empowerment Institute".
My YVC colleague Rachel Or-Bach attracted my attention to an article in the current issue of CACM entitled "how to build a bad research centers".  The author David Patterson from UC Berkeley provides first the negative side --- how to build a bad research center, and then turning to the reverse of each premise, building the positive side.   His advises for the good side are:

Good Commandment 1. Thou shalt mix disciplines in a center.

Good Commandment 2. Thou shalt limit the expanse of a center.

Good Commandment 3. Thou shalt limit the duration of a center. 

Good Commandment 4. Thou shalt build a centerwide prototype

Good Commandment 5. Thou shalt disturb thy neighbors.

Good Commandment 6. Thou shalt talk to strangers.

Good Commandment 7. Thou shalt find a leader.

Good Commandment 8. Thou shalt honor impact.

Some advices to take under consideration.  I do believe in mixing disciplines, though its means also mixing cultures which is not always easy.   Focus, and short to medium range targets are also key properties.   
Last but not least, I also believe in tangible impact,  beyond publications, while the impact criteria should be well-defined.  

More about our own activity - later.   






Wednesday, February 26, 2014

Comptel announces event processing on mobile

The Wall Street Journal brings a news report from Helsinki, that Comptel announced the use of its mediation software by tier 1 mobile operator in the USA.  In this news report it is said that  the mobile operator required a high performing solution capable of handling real-time data collection and complex event processing (CEP) with ease.   Looking at the website of Comptel for more details, I found that Comptel says it has comprehensive and complex event processing, however no  more details were available about the solution.   I have written before about event processing and the mobile world, and believe that we'll see a lot more of it.  It is interesting to view how this one was implemented. 

Thursday, February 20, 2014

Moving On

Next week I am moving on,  after 16.5 years in IBM, I have decided to move on and pursue a societal challenge, by accepting an offer to establish and manage a new applied research institute, whose temporary name is "Technological Empowerment Institute".  The empowerment aspect is aimed to empower both sectors and populations that require substantial enhancement of their capabilities to enjoy high technology.  
The societal aspect is twofold -- both the societal challenges and the location, it will be based in the periphery.   The question is whether it has any relation to my past work on event processing?  the answer is definitely yes,  I view the Internet of Things as a major vehicle for achieving many of the challenges, and the work I was recently involved in, making event processing accessible to larger audiences as one of the key ares.  The institute will include multi-disciplinary activities, and implementation projects to meet the challenges.  It is initiated by YVC,  a relatively young but ambitious  academic institute, and I'll report to the president of YVC.   However, it will include activities and affiliate members that will span multiple sites and researchers.
I'll provide more details about it,  in fact, this Blog will from now on reflect my experience in trying to establish and operate this institute (but I'll not change its name as I am always think in events).  
Next week I'll wrap up my presence in IBM (this week I packed my office, and unpacked in the new one -- which had a toll on my aching back), and I'll summarize my tenure there.    I am taking an exciting but very challenging task,  and will issue call for people  to be involved in various capacities.