Tuesday, August 31, 2010

Some thoughts on data mining and event processing - take one

Somebody with whom I've talked to last week said with some irony that in order to get attention today - no matter what you do, you have to say that it has something to do with clouds, and that it performs some kind of analytics. Well - it is a clear bright day today here without any cloud, so I'll delay the discussion about clouds to rainy day.   As for analytics, there are various types of analytics, today I'll write something about data mining and event processing.  There are two sides here: what event processing can do for data mining, and what data mining can do for event processing.  Let's focus the discussion now about the second issue.  The answer seems easy,  an event processing application is modeled by event processing network, that consists of event processing agents, which in most current implementations are implemented by rules/queries or some other constructs.  Today the application is being composed manually using some authoring tool -  however, there is a frequently asked question, can the computer somehow use magic to compose the application itself -- this is a natural candidate problem for data mining.    The achievements of data mining in event processing until today are somewhat modest, but there still might be a promise there.

So - let's go further and explore the potential.   We can look at three types of functions  that an event processing application may be assisted by data mining:

  1.  Case in which we are looking for anomalies in general, the data mining can assist in identifying that we have anomaly now,  this is actually a different type of application in which there are no preset patterns.
  2. Case of  detect of trends/thresholds oriented, where the thresholds can be adjusted by data mining
  3. Case of pattern detection - where the patterns can be determined by data mining.

The first type is a classification issue - and this can be done by some types of learning of what is normal behavior. 

The second type -- learning thresholds also has some known methods in data mining.
The main problem is the third one -- learning patterns.   There are several difficulties there, the first one is the intent;  data mining typically discovers events that happen together, this by itself may not be of interest, since the aim of patterns are to detect situations that require reaction, thus there is some additional semantic knowledge here that is not captured by data mining without providing additional informations, furthermore the pattern may occur very rarely, such that it will not be captured within the existing data; another difficulty is the richness of pattern types and the various variations of patterns, so looking at the space of large possibilities.  
Successes in this area were typically limited to a certain type of pattern within a certain temporal window -- for example, there was some work that I familiar with to mine sequence of two events within a given temporal window, this again belongs to events that happen together, where a human has to go over all combination and decide whether they create an interesting situation.     

Bottom line -- no magic bullet,  but any breakthrough in this area will be helpful  

1 comment:

SJ said...

Hi Opher,

I think "Cloud technologies" contain some promising approaches towards event analytics. Therefore I felt free to write some thoughts about this issue (http://tinyurl.com/2ff7sro)