Wednesday, December 5, 2007

On False positives and False Ngatives


From syntactic point of view, CEP looks for patterns and derives event / trigger action based on each pattern detected, however, detecting the patten is the mechanic work, the patterns designate a "situation" which is an "event" in the customer's frame of reference to which the customer wants to react to (there are also "internal" situation for further processing. There is obviously a gap between the intention (situation) and the way it is detected (patter no the event flow). In many cases,satisfying the pattern is sufficient condition to detect the intended situation, however, in other cases, this serves as "best approximation". This leads to the phenomenon of false positives (detecting of patterns, but the situation did not really happen) and post negatives (situation occurred but pattern has not been detected). Some reasons are:
  • Raw events are missed - do not get at all, or do not get on time (source or communication issues).
  • Raw events are not accurate - values are not accurate (source issues).
  • Temporal order issues - Uncertainty in correct order of events.
  • Pattern does not accurately reflect the conditions for situation (e.g. there are probabilistic elements)
  • (other reasons) ?

Like the time constraints case there are various utility functions to designate the damage from either false positives or false negatives.

More on that issue - later.

4 comments:

Hans said...

In the Other Reasons category, I would put code bugs. A code bug might be classified under a pattern that doesn't properly reflect the situation, but I find differences between these cases.

A code bug can, in theory, be caught by traditional software development methods like testing or coding techniques.

A misspecified pattern can come from something as simple as an edge case that was missed or was missing from the data that was used to identify patterns. This can be caught using analysis techniques but not as much by software development methods.

So when planning for things that can go wrong, I separate these scenarios.

Opher Etzion said...

Hans.

You are right, of course.

Software bugs can be reasons to false positives as well as false negatives. Debugging CEP applications is an area that deserves attention by itself.

cheers,

Opher

Unknown said...

Here is an example of a real world need to match patterns at scary rats and big data for each event...

http://cosmicvariance.com/2007/11/16/high-energy-spam-filter/

AlbertM said...

Hi Opher, I think you've touched on a very important subject. It seems to me that EP technologies are operating on one basic assumption -- which is to detect a situation EP must observe 100% of events with zero loss. That may not be achievable in practice, also it may require an architect to built into CEP rules the conditions to compensate fo the loss, which complicates the whole process dramatically. In my view that would limit the adoption and effectiveness of CEP based technologies due the above mentioned reasons.
I see that problem when applying CEP in transaction tracking, where transaction exchange patterns are detected based on inflow of message exchnage events. Even a small % event loss will cause false negatives or false positives.

Any suggestions on how to deal with this?