Wednesday, October 10, 2007

Causality and lineage in event processing

[Today I am drilling into the micro (technical issue), with some macro level lesson at the end]

This is what I've found on the web looking for an image describing causality

David Luckham , in his book has made "causality" as one of the major abstractions in CEP. As you can see in the Wikipedia entry, this term has interpretations in multiple disciplines. Causality simply says that existence of event E2 is caused by event E1. Anybody familiar with discussions around the meaning of "causality" in logic, realizes that there are various approaches and terms in this area. In our context, event processing, we can look at different types of causality.

Type I: predetermined causality - Event E2 always (or conditionally) occurs as a result the occurrence of E1, thus we don't need to have any sensor to get event E2 we may assume it happened if E1 happened (and the condition is satisfied), some time offset or interval may be attached to this causality. Note that in this case E1 and E2 are both raw events.

Type II: The event E1 is an input to an EPA (Event Processing Agent) AG, and the event E2 is an output of AG. In this case E2 is a derived (virtual) event. This can be further refined to other types according to the issues - whether E2 is really a function of E1, this is known since the agent's specification is part of the system.

Type III: The event E1 is an event that is sent from an EPN to a consumer C. C applies (conditionally) some action AC, where the specification of AC is not known to us, but we observe that it emits the event E2. This is another type of causality (the event E2 would not have been emitted, if E2 would not have triggered AC), however, E2 may or may not have functional dependency with respect to E1 (i.e. the value of E2 is somehow function of E1) .

Some questions have been asked on my previous post on ECA rules, if EPA is not just an action, why is this distinction important - and the answer is - in the EPA case we have dependency of type II - which means both causal and functional dependency, while in a general action, it is not known whether there is functional dependency.

The question is -- why is this all causality discussion important ? is there just a theoretical notion, and the answer here is -- lineage tracing.

In some cases, it is important to be able to answer questions like:
  • What have been the chain of events and transformations that caused a certain action (decision) to occur ?
  • What are all the consequences of a certain events (of a certain type) ?
  • What would have happened if a certain event that did not happen would have happened in a certain time-point ? (or the reverse - an event that happened would not have happened).

Applications ? auditing, decision analysis... the last type of question relates to some past work done in the temporal area -- temporal issues are very pervasive in event processing and deserve one (or more) postings at a later point.


woolfel said...

I know many people use business rule engines to do backward chaining logic. I've only been using rules engines for 7 years, and one of the main reasons for using a BRE is tracing the cause. Many engines like OPSJ, Haley, iLog JRules and JESS provide backward chaining or equivalent functionality.

Tracing the cause of an event dates back to the early AI research in backward chaining. I could be wrong, but BRE have had tracing capability for a long time in commercial products. As long as the rules are well defined, one can trace the cause.

one difference I see is that CEP products are purpose built for the scenarios. Whereas with a BRE, the developer has to do more work. Some companies provide vertical modules for different business domains to make that easier.

Opher Etzion said...

To Woolfel.

One thing is sure - there is nothing new in any of the concepts we are discussing in event processing. The term causalty is known from logic, in AI there are "truth maintanance systems" to trace causality of facts and rules, in expert systems, causality is an importan factor.

The term causality in event processing is the projection of this term to the world of events. While in TMS - there is a trace of casuality of facts, using inference rules, in event processing there is a trace of events, using event processing agents. The semantics, and computational techniques have to be adjusted for this world.


woolfel said...

From a rule engine perspective, events are just facts. I tend to think Truth maintenance is different than causality as you describe it for event processing.

There's many kinds of truth maintenance. The most common one used in BRE is logical truth maintenance, which gaurantees that if a fact is derived from some other facts, the removal of the trigger fact, also removes the derived fact. Since event processing tends to move forward and doesn't normally have modification of facts or events, that kind of cause-effect doesn't apply.

I know that both forward and backward chaining have been used to build application with auditing and tracing. Doing it correctly and properly from a developer perspective does require experience and skill, so it's not something a business user can do unassisted. If CEP products provide causality out of the box, it definitely would make life easier for the user.