Showing posts with label real-time analytics. Show all posts
Showing posts with label real-time analytics. Show all posts

Thursday, August 15, 2013

On machine learning as means for decision velocity

Chris Taylor has written in the HBR Blog a piece that advocates the idea that machine learning should be used to handle the main issue of big data - decision velocity.  I have written recently on decision latency, which according to some opinions - real-time analytics will be the next generation of what big data is about.
Chris' thesis is that the amount of data is substantially increasing with the Internet of Things, and thus one cannot get a decision manually in viewing all relevant data,  there will also not be enough data scientists to look at the data.   Machine learning which is goal oriented and not hypothesis asserting oriented will take this role.     I agree that machine learning will take a role in the solution, but here are some comments about the details:

Currently machine learning is off-line technology, case sensitive, and cannot be the sole source for decisions.


It is off-line technology, systems have to be trained, and typically it looks at historical data in perspective and learns trends and patterns using statistical reasoning methods.  There are cases of applying continuous learning, which again done mostly off-line, but is incrementally updated on-line.    When a pattern is learned it needs to be detected in real-time on streaming data, and here technology like event processing is quite useful, since what it does is indeed detect that predefined patterns occur on streaming data.  These predefined patterns can be achieved by machine learning.    The main challenge will be the online learning -- when the patterns need change, how fast this can be done in learning techniques.  There are some attempts at real-time machine learning (see presentation about Tumra as an example), but it is not a mature technology yet.

Case sensitive means that there is no one-size-fits-all solution for machine learning, and for each case the models have to be established in a very specific way for that case.  Thus, the shortage in data scientists will be replaced by shortage of statisticians,  there are not enough skills around to build all these systems, thus the state of the art need to be improved to make the machine learning process itself more automated.

Last but not least - I have written before that get decisions merely based on history is like driving a car by looking at the rear mirror.  Conclusion from historical knowledge should be combined with human knowledge and experience sometimes over incomplete or uncertain information.  Thus besides the patterns discovered by machine learning, a human expert may also insert additional patterns that should be considered, or modify the machine learning introduced patterns.




Tuesday, July 2, 2013

On DEBS 2013: First keynote speaker - Roger Barga from Microsoft

Roger Barga has been an excellent choice (of the organizers) for first keynote in DEBS 2013.
He has a wide perspective from the multiple roles he occupied over the last few years in MSFT.   
Roger mentioned Thomas Kuhn's seminal work on the structure of revolutions - saying that there are two competing forces, those who push for paradigm shift and those who resist them and try to find "good enough" way to resolve everything in the old paradigm where they feel comfortable.   The use of event-driven thinking is just a paradigm shift, this is consistent with our tutorial given in the previous day.  
Some comments I took while listening to Roger:

1. Evolution of analytics:  analytics 1.0 - descriptive analytics based on warehouses, analytics 2.0 - big data, with NOSQL and Hadoop, we are going to analytics 3.0  -"rapid analytics with business impact".    Hadoop will become a niche technology, and real-time analytics (based on event processing) will take over. 
2. Some examples: Rols Royce is giving "engine up-time" as a service instead of selling aircraft engines, with a lot of instrumentation for maintaining the engines.  Other examples in the area of telemetry.  Some companies are making huge investments on real-time analytics.
3. Today there are two types of analytics: operational analytics  based on the speed of business, but use little information to get decisions;  investigative analytics based on the speed of  data scientists, and is based on a lot of information. There is a gap between the two that need to be unified.
4.  People don't know how to make use of new technologies to find new useful applications and tend to apply new technologies to old applications -- example: the first TV area was just visual radio, until the industry learned that TV opens new opportunity.
5. Bottom line: there are velocity pressure to do real-time analytics, but it requires paradigm shift and education.  Very compatible with our conclusions.   More about DEBS 2013 - later.