Event Processing Thinking

Saturday, March 29, 2014

The Technological Empowerment Institute -- first exposure

I have written a month ago about moving on, I still need to post a summary of my IBM time, but it will have to wait as I am quite busy in my new role. The role is an attempt to establish (from scratch) an applied research institute called "The Technological Empowerment Institute (TEI)". This is a first of a series of posts about the institute's plan.

The slide below shows the idea in a nutshell:

The domain that we are looking at is in general, exploiting Internet of Things for societal purposes.

The mission is to help developing areas, first in the Israeli periphery, in this case, the concentration will be on the northern part of Israel, and developing countries over the world.

The idea is to create partnership with:

1. Multidisciplinary researchers dealing with technology, the human aspect of creating and consuming smart systems (a topic that anybody following this blog realized I have been focused in the last couple of years), and the domain oriented research (agriculture, gerontology, healthcare and more). The affiliated researchers will be international.

2. Partnership with high-tech companies for using their platforms and products for the implementation projects (see below).

3. Using students projects and internship program to carry out concrete implementation projects that fit the institute's mission.

4. Partnership with academic institutes in developing countries to collaborate on the above.

In the next series of posts I'll write about each of these items. I am now spending much of my time in creating all these partnership -- a big challenge, and also fun.

Friday, March 28, 2014

More from the Big Data workshop -- crowd wisdom vs. expert wisdom

Yesterday I spent all day in the second day of the Technion Computer Engineering center workshop on big data. There were a few interesting talks, and the organizers promised to put the slides of all talks on their website (eventually). I chose to write about an interesting talk given by Tova Milo, from Tel-Aviv University. Tova talked about her work on crowd wisdom, and also presented a video in which a contestant in a TV show who did not know an answer used the option of - ask the audience, and followed the audience to the wrong result, and out. The talk discussed some means of knowledge acquisition, how to phrase questions. The examples she gave were: what to do when I have headache, and I am looking for a place to go for children attraction in NYC and a nearby restaurant which is children friendly.

I asked her whether in the case of constant headache it is not better to ask an expert physician, and her answer was that people trust the crowd wisdom more than they trust their physicians, well I think it is a function of who the person is, and who the physician is. When we planned the trip to New Zealand , We could use crowd wisdom, there is a lot of material on the web, of course, but we chose to go to an expert travel advisor and ask for a trip plan (including all travel arrangements), it certainly saved us time, but if one has enough time, getting advices from the crowd is useful. I wonder if somebody researched the trade-offs between expert wisdom and crowd wisdom, and classified the cases in which each should be used.

Thursday, March 27, 2014

My talk in the Technion Big Data workshop

Yesterday, I gave a talk in the Technion Computer Engineering Big Data days -- the talk dealt with three topics: why the Internet of things did not happen yet, very brief introduction to "The Event Model", and a new introduction of the Technological Empowerment Institute. I'll write more about the institute soon.

The talk is on slideshare - enjoy.

Tuesday, March 25, 2014

Latency might be subjective

Last week I have been in a short visit to England, served as external examiner in a PhD exam (called by the locals: "Viva"). The exam was strange in the sense that it was the first PhD exam I have ever attended where the advisor was not present (actually he was not invited, I guess it is a matter of culture). The dissertation that was submitted by Jenny Li dealt with performance metrics and benchmark framework for event processing system. One of the ideas raised in this dissertation is that the measured latency may be subjective in what it means. It took me some time to understand the idea, so I'll explain it through example.

Let's assume that the pattern that we are looking for is a sequence of four events, of types E1, E2, E3, E4.

The two common metrics associated with latency are:

Measurement starts at E4, since the consumer expects to see results only when the last event that closes the loop occurs.

Measurement start at any event occurrence, and ends when the system finishes processing the individual event, which can be merely storing it in the buffer.

Note that the first metric is biased towards eager evaluation, doing the minimal work at the end, and preparing sub-patterns, while the second metric is more balanced.

The proposed metric is -- start measuring in the event most significant to the consumer, and end at the end of the processing of the pattern. Let's say that the most significant event to the consumer is E2, then the latency starts at the occurrence of E2, and ends either when there is a match of a pattern, or when the event E2 can be discarded since there is no match. This can be applicable in cases that all events in the sequence typically happen (e.g. when they are time series events) in relatively fixed differences. This is an interesting metric, we asked the student to define definite criteria for when it is an applicable metric and when it is not.

Saturday, March 15, 2014

Google flu trends as a lesson in big data prediction

A recent article in the science section of TIME magazine reports that prediction using "big data" techniques is not as easy as portrayed. It analyzes the Google Flu Trend case, in which the assumption has been that there is a strong correlation between the spread of flu, and the searchers for flu related terms in Google. It seems that this does not produce accurate results. The article claims that while using the big data methods is useful, they should be combined with traditional "small data" methods. There are various definitions of what a small data is - for example, the one from "small data group" : Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.

I guess that this also relates to the discussion about understanding causality in addition to statistical correlation that I've discussed before on this blog.

Sunday, March 9, 2014

Big Data analytics by Robin Bloor

Today I had to give students in a seminar introduction to big data analytics -- I chose a recent presentation by Robin Bloor (from slideshare). Bloor states that the term "data science" is a misnomer, since all science is empirical and involves analysis of data. This is true for many of the sciences, still if my memory does not mislead me Einstein did not use empirical analysis of data to come with the relativity theory. It also goes to the discussion of causality vs. correlation in science. In any event, Bloor asserts that data science is actually a multidisciplinary efforts involves software engineering, statistics and domain knowledge.

BI, according to this presentation, is partitioned to:

Hindsight: regular reporting
Oversight: dashboards etc,
Insight: data mining & statistical analysis
Foresight: predictive analytics

He does not get as far as prescriptive analytics, and puts the heavyweight on the insight.

The second part of the presentation gives fast introduction to machine learning. Overall, it gives introductory level insights on insights from big data, and is well presented as such.

Sunday, March 2, 2014

On building bad (and good) research centers

I am sitting at my new office, where my task is to construct the "Technological Empowerment Institute".

My YVC colleague Rachel Or-Bach attracted my attention to an article in the current issue of CACM entitled "how to build a bad research centers". The author David Patterson from UC Berkeley provides first the negative side --- how to build a bad research center, and then turning to the reverse of each premise, building the positive side. His advises for the good side are:

Good Commandment 1. Thou shalt mix disciplines in a center.

Good Commandment 2. Thou shalt limit the expanse of a center.

Good Commandment 3. Thou shalt limit the duration of a center.

Good Commandment 4. Thou shalt build a centerwide prototype

Good Commandment 5. Thou shalt disturb thy neighbors.

Good Commandment 6. Thou shalt talk to strangers.

Good Commandment 7. Thou shalt find a leader.

Good Commandment 8. Thou shalt honor impact.

Some advices to take under consideration. I do believe in mixing disciplines, though its means also mixing cultures which is not always easy. Focus, and short to medium range targets are also key properties.
Last but not least, I also believe in tangible impact, beyond publications, while the impact criteria should be well-defined.

More about our own activity - later.