And indeed, there has been a lot of work about uncertainty in data over the years in the research community, but very little got into the products, the conception has been that while data may be noisy, there is a cleansing process that is applied before using the data. Now with the "big data" trend, this assumption seems not to hold at all times, the nature of data (streaming data that need to be processed online), the volume of the data, and the velocity of having also imply that the data, in many cases, cannot be cleansed before processing, and that decisions may be based on noisy, sometimes incomplete or uncertain data. Veracity (data in doubt) was thus added as one of the four Vs of big data.
Uncertainty in event is not really different from uncertainty in data (that may represent either fact or event).
Some of the uncertainty types are:
- Uncertainty whether the event occurred (or forecast to occur)
- Uncertainty about when event occurred (or forecast to occur)
- Uncertainty about where the event occurred (or forecast to occur)
- Uncertainty about the content of an event (attributes' value)
There are more uncertainties relate to the processing of events
- Aggregation of uncertain events (where some of them might be missing)
- Uncertainty whether a derived even matches the situation it needs to detect -- this is a crucial point, since the pattern indicates some situation that we wish to detect, but sometimes the situation is not well-defined by a single pattern. Example: a threshold oriented pattern such as: "event E occurs at least 4 times during one hour". There are false positives and false negatives. Also if event E occurs 3 times during an hour, it does not necessarily indicate that the situation did not happen.
We are planning to submit a tutorial proposal for DEBS'12 to discuss uncertainty in events, and now working on it. I'll write more on that during the next few months