Wednesday, December 24, 2008

On Data Mining and Event Processing

Today I have travelled to Beer Sheva, the capital of the Negev, which is the south part of Israel, that consists mostly of desert. I have visited Ben-Gurion University, met some old friends, and gave a talk in a well-attended seminar on "the next generation of event processing". I have travelled by train (2 hours and 15 minutes each direction), and since my last visit there five years ago or so, they have built a train station from which there is a bridge which goes to the campus, very convenient. Since I am not a frequent train rider in Israel, I discovered that in both ends of the line, there are no signs saying which trains go on which track, and this is assumed to be common knowledge... Although they do notify when a train entered the station where it is going and from which track, but still they have a point to improve.

Since some of the people attended my talk were data mining people they have wondered about the relationships of event processing and data mining, since I've heard this question before I thought the answer will be of interest to more people.

In the "pattern detection" function of event processing, there is a detection in run-time of patterns that have been predetermined, thus, the system knows what patterns it is looking for, and the functionality is to ensure correct and efficient detection of the predetermined patterns.

Data mining is about looking at historical data to find various things. One of the things that can be found are patterns that have some meaning and we know that if this pattern occurs again then it requires some reaction. The classic example is "fraud detection", in which, based on mining past data, there is a determination that a certain pattern of action indicates suspicion of fraud. In this case the data mining determines the pattern, and the event processing system finds in run-time that this pattern occurs. Note, that not all patterns can be mined from past data, for example- if the pattern is looking for violation of regulations, then the pattern stands for the regulation, but this is not mined, but determined by some regulator and is given explicitly.

So - typically data mining is done off-line and event processing is performed on-line, but again, this is not true for all cases, there are examples in which event processing and mining are mixing in run-time. An example is: there is a traffic model, according to which the traffic light policies are set, but there is also constant monitoring of the traffic, and when the monitored traffic is deviating significantly then the traffic model has to be changed and the policies should be set according to the new traffic model, this is kind of mix between event processing and mining, since the actual mining process is triggered by events, and the patterns may change dynamically as the result of this mining process.

More - Later.

No comments: