Saturday, February 11, 2012

Uncertainty in event processing

This cartoon is taken from indicates uncertainty about uncertainty. 
And indeed, there has been a lot of work about uncertainty in data over the years in the research community, but very little got into the products, the conception has been that while data may be noisy, there is a cleansing process that is applied before using the data.    Now with the "big data" trend, this assumption seems not to hold at all times,  the nature of data (streaming data that need to be processed online), the volume of the data, and the velocity of having also imply that the data, in many cases, cannot be cleansed before processing, and that decisions may be based on noisy, sometimes incomplete or uncertain data. Veracity (data in doubt) was thus added as one of the four Vs of big data. 
Uncertainty in event is not really different from uncertainty in data (that may represent either fact or event).
Some of the uncertainty types are:

  • Uncertainty whether the event occurred (or forecast to occur)
  • Uncertainty about when event occurred (or forecast to occur)
  • Uncertainty about where the event occurred (or forecast to occur) 
  • Uncertainty about the content of an event (attributes' value)

There are more uncertainties relate to the processing of events

  • Aggregation of uncertain events (where some of them might be missing)
  • Uncertainty whether a derived even matches the situation it needs to detect -- this is a crucial point, since the pattern indicates some situation that we wish to detect, but sometimes the situation is not well-defined by a single pattern.  Example:  a threshold oriented pattern such as:  "event E occurs at least 4 times during one hour".   There are false positives and false negatives.  Also if event E occurs 3 times during an hour,  it does not necessarily indicate that the situation did not happen.

We are planning to submit a tutorial proposal for DEBS'12 to discuss uncertainty in events, and now working on it.   I'll write more on that during the next few months


Celticht32 said...

Here is the issue I have Opher...
how can you certify then if an action is warranted if you take into account Uncertainty? Would you provide a weighting threshold.. IE if 80% of your events come in then well its ok, that's enough....
This is where I see PEP and EPA crossing over to each other and I am not so sure that makes complete sense to mix and match the two.

I see EPA to be a more concrete implementation where PEP is just that a predictive event architecture. Both have their places but I don't know that they can be intermixed. Where I do see a synergy between the two is the following:
a PEP architecture which is just another component of a bigger EDA architecture where the PEP system feeds the EDA system its events of note (things that are likely to happen) where business rules can then be applied and other PEP feeds can then be fed into the EDA for correlation.


Opher Etzion said...

Hi Chris.

In this post I've talked about the types of uncertainties, the issue of how to take actions in presence of uncertainty. Using thresholds is a simple method of decision making, however there are more sophisticated ones. This is an area we investigate, and I'll write some insights about it in one of the next postings... stay tuned.