Event Processing Thinking: uncertainty in event processing

Showing posts with label uncertainty in event processing. Show all posts

Friday, May 9, 2014

Internet of Things - what's holding us back?

InformationWeek published an article this week by Chris Murphy entitled: "Internet Of Things: What's Holding Us Back". In this article Murphy describes several reasons that hold us back from exploiting the potential of the IoT. The reasons he mentions are:

The data is not good enough: the claim is that the conception that all requested data is readily available is not consistent with reality, where data suffers from quality, frequency and spatial coverage of the sensors, and data integration issues.
Networks aren't ubiquitous: The product owners don't have control over the availability of networks
Integration is tougher than analysis: The main problem is not to analyze the data, but to integrate all data needed for analysis
More sensor innovation needed: The stated areas of required innovation are - combine video sources which today are under-utilized; more-refined and more-affordable environmental sensors; software-defined sensor,a combination of multiple sensors plus computing power that sits out on a network and "calculates rather than measures."
Status quo security doesn't cut it. Security systems for IoT should be radically different than those developed for traditional IP.

I agree that all of these contribute in one way or another to the difficulties around exploiting the potential of IoT. Dealing with inexact or uncertain data is a major issue, a link to our tutorial on the topic can be obtained from this blog post. What Murphy refers as "software defined sensor", is in fact, the ability to use multiple sensors and get sense out of it in real-time, this is exactly what the event processing discipline produces, furthermore, our work on event modeling contributes to make it simpler.

I am planned to deliver a tutorial on "Internet of Everything" in DEBS 2014 in Mumbai, where I'll discuss all these issues.

More - later.

Tuesday, June 25, 2013

On speed and accuracy in event processing

This scary picture is taken from Theo Priestley's post in "Business Intelligence". As a follow-up to his previous post about the two recent acquisitions in event processing, he talks about the focus in this world on speed. While speed can provide relative advantage it can also be a double edge sword, if it comes on the expense of accuracy, as the recent Twitter hoax indicates.

When talking about the the four Vs of big data - one of them is velocity, and the other is veracity - which is defined as "data in doubt". Indeed processing uncertain, inexact or inaccurate events or data is a major part of what big data is all about -- while there are some works in this area (for example: see my post from last year), it is still the less investigated part of the four Vs.

Priestley is right -- doing things fast and inaccurate can incur big damage, doing things slow and accurate can also incur big damage. The wisdom is to balance and minimize the risk. Resolving the uncertainty issue is the key.

Saturday, August 11, 2012

How can the level of uncertainty be determined?

You may identify this formula as Heisenberg uncertainty principle, nature is full of uncertainty and so is the world of business and any other world. It is not difficult to convince people that uncertainty representation and handling is needed, but people wonder how in practice people will be able to consume and digest uncertainty based systems. I'll try to refer to it in a series of posts, the first of them deals with the following issue: uncertainty handling models assume that the level of uncertainty can be quantified. Quantification can be in the discrete world, a collection of values, and in the continuous world -- probability, measure of belief, and similar metrics. The question is -- how can we determine the value that represents uncertainty.

There are three ways to determine uncertainty: prior knowledge, observation and statistical reasoning.

The prior knowledge often exists due to physical properties: the accuracy of sensor may be a property of the sensor reported by the producer, mathematical models can determine the error rate of physical measurement due to friction or other physical phenomena, this can also be rooted in statistical analysis, but for a various system it is given as prior knowledge.

Observation: In some cases and observation includes some uncertainty, such as: I left home somewhere between 8 - 8:30am, there was an accident somewhere in main street that made me late for the meeting, I heard that the accident was caused due to a drunk driver (but not sure this rumor is true), I arrived to the meeting a few minutes after the meeting start, and waited a long time to the elevator that took me into the 35th floor.

This observation is full of inexact fact: time, space, event attributes and more. People typically does not know how to quantify it, but can use fuzzy terms that can be translated into quantified values either in the discrete space or in the continuous space.

The statistical reasoning path is based on learning mechanism that is based on the ratio between historical input and the real value. The assumption here is that eventually the real value is known and can be compared to the reported value.

There are some interesting questions about the representation formalism, the coverage that can be obtained by these methods, and the methodology for value assignment. I'll write about these questions in a following post.

Friday, August 10, 2012

On what you need vs. what you can consume

In a recent review on our uncertainty project, the major question that came up was -- will user be able to consume uncertain data, or uncertain decision making? The question of consumability is a major question -- and it seems that in some cases consumability is considered as a barrier of introducing new technologies. This is true for all technologies based on statistical reasoning that have the reputation that in order to use them one has to be PhD in statistical reasoning... Consumability comes in all levels --- there are products that come with hundreds of configurable parameters to control its performance, some famous expert systems have been conceived to configure these parameters, as the quantity of humans who can successfully do it is too small, Consumability comes to application developers that use languages and tools. in many cases the amount of language commands that are not used is high, on the other hand, the developer is working hard to achieve the desired functionality, since it is not supported in a simple way. Then of course consumability of any product to consumers. Today we have very sophisticated washers and dryers with many programs, which totally not understood by the consumer, or just look at a remote control for any home appliance -- TV, Air-condition and more -- do we really know what are all these buttons for?

Back to uncertainty --- it is quite clear that the world is uncertain, and ignoring uncertainty in decision making may yield undesirable results. It is also clear that in the daily life we often make decisions under uncertainty, the main problem is how to move this intuition to a computerized form in a way that people will be able to utilize. The major questions are:

how uncertainty quantification (such as probability) are obtained?
how uncertain results are consumed in decisions?
how quality of uncertain decisions are evaluated and taken into account?

I'll write more about it soon - in general, consumability issue in event processing application design and construction will occupy much of my agenda in the near future. More - later.

Monday, July 16, 2012

DEBS 2012 tutorial on uncertainty in event processing

Some people react badly to uncertainty, as seen in the illustration above, some people are very good in dealing with uncertainty, I am still uncertain to which of these two categories I belong.

Today I have delivered a tutorial on uncertainty in event processing, in the tutorial day that precedes the main conference of DEBS.

The tutorial was 3 hours, I delivered around half of it, and the other half was given by Alex Artikis. Fabiana Fourier and Zohar Feldman from our team in IBM Haifa Research Lab participated in the preparation. I have written earlier this year about the issue of uncertainty in event processing. In the tutorial we provided some motivation, talked about the representation of uncertainty, handling uncertainty, and Alex contributed his part by talking about different AI techniques to uncertainty event handling. I'll write a series of posts about the different chapters of this tutorial in a later phase, but meanwhile you can view this presentation on slideshare. I'll write more on DEBS 2012 later.

Saturday, February 11, 2012

Uncertainty in event processing

This cartoon is taken from Cartoonsbyjosh.com indicates uncertainty about uncertainty.
And indeed, there has been a lot of work about uncertainty in data over the years in the research community, but very little got into the products, the conception has been that while data may be noisy, there is a cleansing process that is applied before using the data. Now with the "big data" trend, this assumption seems not to hold at all times, the nature of data (streaming data that need to be processed online), the volume of the data, and the velocity of having also imply that the data, in many cases, cannot be cleansed before processing, and that decisions may be based on noisy, sometimes incomplete or uncertain data. Veracity (data in doubt) was thus added as one of the four Vs of big data.
Uncertainty in event is not really different from uncertainty in data (that may represent either fact or event).
Some of the uncertainty types are:

Uncertainty whether the event occurred (or forecast to occur)
Uncertainty about when event occurred (or forecast to occur)
Uncertainty about where the event occurred (or forecast to occur)
Uncertainty about the content of an event (attributes' value)

There are more uncertainties relate to the processing of events

Aggregation of uncertain events (where some of them might be missing)
Uncertainty whether a derived even matches the situation it needs to detect -- this is a crucial point, since the pattern indicates some situation that we wish to detect, but sometimes the situation is not well-defined by a single pattern. Example: a threshold oriented pattern such as: "event E occurs at least 4 times during one hour". There are false positives and false negatives. Also if event E occurs 3 times during an hour, it does not necessarily indicate that the situation did not happen.

We are planning to submit a tutorial proposal for DEBS'12 to discuss uncertainty in events, and now working on it. I'll write more on that during the next few months

Friday, September 30, 2011

On the four Vs of big data

In my briefing to the EU guys about the "data challenge", I have talked about IBM's view on "big data", recently Arvind Krishna, the IBM General Manager of the Information Management division, talked in the Almaden centennial colloquium about the 4Vs of big data. The first 3 Vs have been discussed before:

Volume
Velocity
Variety

While the 4th V has just been added recently - Veracity -- defined as "data in doubt".

The regular slides are talking about volume (for data in rest) and velocity (for data in motion), but I think that we need velocity to process sometimes also data in rest (e.g. Watson), and we need sometimes also to process high volume of moving data; the variety stands for poly-structured data (structured, semi-structured, unstructured).

The veracity --- deals with uncertain/imprecise data. In the past there was an assumption that this is not an issue, since it would be possible to cleanse the data before using it, however, this is not always the case. In some cases, due to the need of velocity in moving data, it is not possible to get rid of the uncertainty, and there is a need to process data with uncertainty. This is of course true when talking about events, uncertainty in event processing is a major issue still need to be conquered. Indeed among the four Vs, the veracity is the one which is least investigated so far. This is one of the areas we investigate, and I'll write more about it in later posts.

Friday, November 20, 2009

On Inexact events

Back to chapter 11 in the EPIA book that deals with challenges that developers and users of event processing systems should be aware. One these topics is the issue of inexact events. The basic assumption about current systems is this is a projection of the "closed world assumption" kind of thinking, which assumes that every event that is reported really happened, that every details in the event payload is accurate, and that every event that happened was indeed reported. In reality, one or more of these assumptions may be invalid from several reasons, as shown in the following figure:

As shown in this figure there are several reasons for making one or more of these assumptions invalid.

The source (e.g. sensor) may malfunction; if the source is an instrumented program, there may be a bug in the instrumentation.

The source (event producer) may be malicious, and send wrong information in order to sabotage the system.

The inexactness maybe a projection of temporal anomalies discussed before, e.g. derived event that has not been detected.

This inexactness may be propagated, as a derived event is derived from an event which is by itself inexact.

The source itself may be imprecise, thus some of the content may not be accurate.

The input events may be based on sample or estimates.

The uncertainty does not stop in event content, it also exists in the bridge between events and situations, I'll write on that topic in a separate posting.

Sunday, November 8, 2009

On challenging topics for event procesing developers and users

Spent much of the weekend in working on the EPIA book, time is getting closer to finish, and now it is the last 1/3 of the book. While in the first 2/3 of the book we concentrate on explaining what event processing is, and going step-by-step on the different ingredients of building applications, the last part of the book deal with some implementation issues, focus on challenging topics, and our view for the event processing of tomorrow. The chapter that I worked on in the last few days - chapter 11 (has nothing to do with bankruptcy), deals with challenging topics for event processing developers and users. This means -- topics that the developers and users have to pay attention, since: there are issues that can influence the quality of results obtained from an event processing systems, and the current state of the art does not have magic bullets to resolve them. In this postings I'll just provide the list of topics discussed in this chapter, I'll write about some of them in the future, here is the list:

Occurrence time that occur over intervals: Events typically occur over intervals, but for computational reasons it is convenient to approximate it to a time-point, and look at events in the discrete space; however, for some events this is not an accurate thing to do, and interval-based temporal semantics should be supported, along with operations associated with them.
Temporal properties of derived events: For raw event, we defined occurrence time as the time it occurred in reality, and detection time, as the time that the system detected its existence. What are the temporal properties of derived events? there is no unique solution to this question.
Out-of-order events: This topic is the topic most investigated among the challenging topics, however, current solutions are based on assumptions that are sometimes problematic. This problem is about events that arrive out of order, where the event processing operation is order-sensitive.
Uncertain events: Uncertainty whether event has happened, due to malfunction, malicious or inaccurate sources
Inexact content of events: Similar to uncertain events, some content in the event payload including temporal and spatial properties of the events may not be accurate.
Inexact matching between events and situations. Situations are the events that require reaction in the user's mind. This is in getting us back from the computer domain to the real-world domain. Situation is being represented as a raw or derived event, but this may be only approximation, since there may be false positives and false negatives in the transfer between the domains.
Traceability of lineage for event or action, this gets to the notion of determination of causality. Since in some cases there are operations in the middle of the causality network outside the event processing systems boundaries (e.g. event consumer who is also event producer) causality may not be automatically determined.
Retraction of event: ways to undo the logical effects of events, sometimes tricky or impossible, but seems to be a repeating pattern.

More about some of them - later.

Friday, January 23, 2009

On Complexities and event processing

For those who read the title and grinned -- not again, discussion about what is the meaning of the term CEP, relax --- I explained in a previous posting entitled: "Is my cat CEP" , why such a discussion is futile, and I am typically consistent. BTW - when I have written that posting I did not have a cat, since than my daughter has adopted one, and he does not seem to me complex.

However, I would like to answer a more interesting question that somebody asked me recently -- what are the sources of complexity in event processing ?

In high school we have learned about "complex numbers", we liked this topic, since it was one of the most simple topics in the matriculation exam in Mathematics... Complex number is just a combination of two numbers, thus the complexity is in the structure. David Luckham also coined the term "complex events", where the complexity is also in the structure. However, there are more levels of complexity that may serve as a motivation to use COTS instead of hand-coding this functionality. What type of complexities can we observe beside the structural
complexity ?

Complexity derived from uncertainty:

The applications specification is not known a priori and has to be discovered, example: fraud detection. This is related to the "pattern discovery" I have discussed in the previous posting.
There are no reliable sources to obtain the desired events, or the events achieved can have uncertainty associated with them. This is a distinct complexity, since there may be the case where the application specification is well defined but the events cannot be obtained, and vice versa-- the patterns are unknown, but once discovered, the required events are easily available.

Complexity derived from connectivity:

Producer related complexities --- semantic differences among various sources, problems of time synchronization among various sources etc..
Consumer related complexities --- similar to the producer ones, these two are, of course, orthogonal to each other, and to all other complexities.
Interoperability complexity where various processing elements are involved.

Complexity derived from functionality:

Complex functions requirements -- e.g. complex patterns that may involve temporal, spatial, statistical operators and combinations of them.
Complex topology of the event processing graph, with a lot of dependencies among the various agents, which creates a complexity in validation and control.
Complex subscription / routing decisions.

Complexity derived from quantities

High throughput of input events.
High throughput of output events.
High number of producers
High number of consumers
High number of event processing agents (imagine 1M agents in a single application)
Requirement to maintain high amount of space for processing state.

Complexity derived from quality of service requirements:

Hard real-time latency constraints.
Compliance with QOS measurements such as threshold on average latency, threshold on percentage of events that don't comply with some latency constraint etc...
High availability requirements.

Complexity derived from agility requirements

Dynamic, frequent changes in the logic of the event processing
Need for programming by various types of "semi-technical" people among the business users community...

I am sure that this list is not complete, but it provides some indication...

Of course, a single application may be the ultimate complex application of event processing and need ALL of these complexities, finding this application is, for sure, the dream of every researcher --- getting a lifetime of research challenges, but in reality different applications have different combinations of complexities. An application can be simple in all metrics, but have hard real time constraints, it can have very complex functionality, but no quality of service, or quantities issues. Another applications may need pattern discovery, but again the rest is simple, another combination can be relatively simple application, with complexity in quantity of producers and consumers and in semantic integration with all of them, and with the wonder of combinatorics, one can get to many more combinations....

More on complexities - later.

Wednesday, August 20, 2008

On Event Processing Network and Situations - the semantic bridge

One of the challenges in building a semantic meta-language that captures event processing behavior is to bridge the gap between terms that are in different domains. I have written before about Situations following the CITT meeting, in which this term has been discussed. In fact, we have used the term "situation" in AMiT , we also called its core component the "situation manager" We used the term situation to denote a meta-data entity the defines combination of pattern and derivation, but I must admit that this has been a mismatch of terms, although there is a strong correlation among these terms.

First - let's go back to the operational world, in this world events are flowing within event processing networks, in the EPN there are agents of various types: simple (filtering), mediating (transformation, enrichment, aggregation, splitting), complex (pattern detection) and intelligent (uncertainty handling, decision heuristics... ), in the roots of the network there are producers, and in the leaves of the network there are consumers. This is an operational view of what is done, the term "situation" is in fact, not an operational term, but a semantic term, in the consumers' terminology and can be defined as circumstances that require reaction.

How can we move from the operational world to the semantic world - we have two cases here:

Deterministic case: there is an exact mapping between concepts in the operational world and the situation concept;
Approximate case: there is only approximate mapping.

In order to understand it, let's take two examples:

Example 1 - toll violation (determinstic, simple)

The cirumstance that require reaction is the case that somebody crosses a highway toll booth without paying (in Israel the only toll highway is completely automatic, but we'll assume that it occurs elsewhere, and that the toll is not applicable between 9PM - 6AM and in weekends).
Getting it to the operational domain - there are two cases - one: go in the EZ pass lane and don't pay, two: go in a manual lane and somehow succeed to cross the barrier (obstacle?) without paying.
The action in both cases: apply camera to capture the license plate, SMS the picture to officer on duty in the other side of the bridge.

From the EPN perspective, we have events of car cross a certain section of the road (the raw event), the EZ pass reading is an attribute of this event, and if no EZ pass it gets a "null" value. There is context information which is --- temporal (it is in hour and day where toll is in effect), spatial (the location of EZ pass lane), note: sometimes people mix the notion of context with the notion of situation, I have explained the difference in the past. Within this context a filter agent that looks for null value in the EZ pass reading is applied, if the filter agent evaluates to true then the situation phrased has been applied in deterministic way, and indeed the edge going out of the filter agent is going directly to a consumer (the camera snapshot may or may not be considered as part of the EPN). This is a case of "simple event processing", stateless filtering, whose output is a situation. This gives a counter example to the misconception that situation is closely related with complex event processing [I can continue the example to the other case, but I think that you got the point by now]

Example 2 - Angry Customer (approximate, complex)

The setting is a call center, the situation is -- detect an angry customer -- refer him or her to the "angry customers officer".

Here the life is not that easy, a human agent can detect angry customers by the tone of their voice (or electronic message), but this does not include all cases of angry customers, so we can look at some pattern saying -- a customer that applied 3rd time in a single day is an angry customer, and then we need to have a "pattern detection" agent that detects the pattern "3rd instances of application" where the context partition refers to the same date, same customer, same product. In this case also a leaf edge is mapped to a situation, but there are two differences from the previous case:

1. The agent is now complex event processing agent since it detects pattern in mu;tliple agents;

2. The edge represents the situation in an approximate way, which means that it can have false positives (the CEP pattern is satisfied but the customer is not really angry, just asked for a lot of information to install the product), or false negatives (the customer called twice, and does not talk in an aggressive tone, yet he is furious).

In some case it also makes sense to associate "certainty factor" with the leaf edge, approximating the measure of belief in the fact that this edge represents the situation. I'll leave the discussion about uncertain situations to another time.

Monday, August 11, 2008

On faithfull representation and other comments

Back home from the vacation in Turkey, the vacation took place in the Limak Limra hotel, about 1.5 hours drive from Antalya airport (see picture of one of the many swimming pools above). It was a great British philosopher who preached to workaholists people like myself about "in praise of idleness" . So - not taking the laptop with me, I have learned several things:

1. Unlike the Israeli beach which consists of soft sand, the beach in Turkey consists of small and large stones;

2. Turkish chefs know how to cook many types of foods quite well, but have a lot to learn still in preparing Sushi,

3. The reputation of Charter flights about long delays is actually true (however, this is also true today for many regular flights).

Since Richard Veryard has sent me an Email about his Blog postings entitled "Faithfull Representation" in which he referred to an illustration that I have made as a "simple situation model" and attributed this model to both Tim Bass and myself (goodness gracious me!). Tim, who constantly claims that he has much more general view than me, could not believe that his name and my name are mentioned in the same sentence as agreeing on something, and asserted (I am using "cut and paste" from Tim's Blog:) "Opher tends to view CEP as mostly an extension of active database technology where I see CEP as a technology that is much more closely aligned with the cognitive models".

Here are some comments:

1. The illustration that Richard is quoting does not mean to explain what a situation is, but to show the relations among several concepts, I am enclosing it again -

As can be seen I am writing there that composite events (which are taken from active database terminology) and complex events (which are not) may both represent situations, which does not say that this is the only way to represent situation (as saying that fish is an animal does not define what is an animal).

2. I have explained the basic idea of situation in this posting , simply said - a situation is a concept in the "real world" domain (not in the computer domain) that requires reaction. In some cases a single event determines a situation, in some cases, detecting a pattern determines a situation, and in other cases, patterns only approximate the notion of situation, and there is no 1-1 mapping between events and situation, note that in that posting I also have provided an example of non deterministic situations.

3. Regardless of the situation definition, Richard is absolutely right that all over the event processing life-cycle we may have instances in which the events are inaccurate or uncertain , and the reader is referred to this posting for some examples of uncertainty issues we are dealing with. This is an area that I am investigating in the last few years together withAvi Gal from the Technion and Segev Wasserkrug (our joint Ph.D. student who graduated recenlty with a Ph.D. dissertation was denoted as excellent by the exam committee). Hot from the oven - A paper about it is published in the recent (August 2008) issue of IEEE Transactions on Knowledge and Data Engineering, which is dedicated to "SPECIAL SECTION on Intelligence and Security Informatics". The actual paper can be downloaded from Avi Gal's website. Another paper related to the same study has been presented in DEBS 2008.

4. While I totally agree that in some cases the uncertainty is needed - and certainly some security applications are example, I also believe that the potential market for the more basic deterministic world is much higher, and we are far from picking up all the low hanging fruits of the deterministic event processing.

5. We still have challenges in defining the semantics of the different cases of handling uncertain events/patterns/situations. The fact that there are arithmetic of uncertainty help, but not everything that exists in AI research fits the real world requirements of scalability, performance etc..

6. About the comment of me viewing event processing as extension of active database technology -- I view event processing as a discipline by its own right (and this is a topic for another discussion which I'll defer), it has origins in several disciplines, one of them is active databases, but it has several more ancestors - sensor fusion, discrete event simulation, distributed computing/messaging/pub-sub and some more, and draws concepts from each of them. Anybody who reads my Blog can realize that there is a fundamental difference between active database that extends database engines and event processing that is not based on database technology, there are some other differences too.

7. My friendly advice to Tim is that before he makes assertion about how and what people think (and this does not refer necessarily to myself) he will re-read his own excellent posting :"red herring fallacies" .

More on event processing as a discipline - at a later post.

Tuesday, May 6, 2008

On the three meanings of CEP

There is an old Jewish story on two people who had some dispute, and decided to go to the Rabbi and ask his opinion. The Rabbi listened to the first person and told him: you are right, then he listened to the second person and also told him: you are right. The Rabbi's assistant who has been present asked him: Rabbi, how can they both be right ? and received the obvious answer: you are also right.

Recently, a dispute between two opinionated persons - Hans Glide and Tim Bass has stormed the network, and somehow got also to my Blog. Somehow both of them understood that what I've written is consistent with their view - which encourages me to change career and become a Rabbi, but I think that there are some skills I lack - so anyway, I'll stay on event processing thinking.

While the dispute started around POSETS, the last posting by Tim Bass in the CEP forum re-focuses the discussion around - what is CEP ? so I'll stay with this topic for a while, since I think that there are (at least) three different interpretations of what CEP is - and this is a source of a lot of confusion. Thus I'll try to explain the three interpretations.

Interpretation One: the glossary interpretation -- complex event is the processing of complex events, where complex event is an abstraction or aggregation of events. According to this interpretation, a software function can be defined as CEP if it involves the processing of multiple events - typically collecting or matching some pattern among multiple events; The test for CEP according to this interpretation is typically support of a state that accumulate the relevant events - since they typically arrive over time. This definition does not say anything about the number of events, number of event types, whether they are totally ordered or not, or whether causality relation is supported or not - these are all attributes of specific implementations.

Interpretation Two: Event Processing = Complex Event Processing. This is a common practice in the commercial work to label all event processing functions as CEP. This spans from support of CEP (according to the interpretation one) and other event processing functions (such as: routing, enrichment, transformation) that don't satisfy the CEP test (since they are stateless and deal with a single event at a time). It can get to an absurd that some product that does not support CEP at all according to interpretation one, calls itself CEP. This is the most confusing interpretation, IMHO.

Interpretation Three: Complex event processing is event processing that has some complexity associated with it. According to Tim Bass - hidden causality and Markov Processes are vital for something to be defined as CEP. This really says that it CEP must involve uncertain events, causality that need to be discovered (by mining and other techniques), and the general usage of CEP is to predict something according to analysis of recent (past) events. According to Interpretation three, indeed the products that current products that call themselves CEP, do not satisfy this criteria, and thus are not CEP.

My opinion: as stated in the past, the term CEP has some inherent ambiguity, therefore I always thought it is confusing term. As far as my own taste in terminology - I prefer Interpretation One, saying that CEP is a subset of EP functions that deal with "complex events", it also seems that this is the closest to glossary definition. Interpretation two is confusing, as it turns CEP from a well-defined term to a marketing buzzword, and thus there is no test for what it is. Interpretation three is interesting, there are certainly applications that require prediction and various usages of stochastic processing and use of AI techniques (machine learning and others) in event processing. Hidden causalities is an important term, and I'll refer to it in another posting, since it has some pragmatic difficulty to obtain. However, I prefer to stick with the concept that CEP is processing of complex events, and not complex processing of events, and for interpretation one, we don't really need (necessarily) to apply AI techniques, this is just one type of application, there are a variety of applications that does not require it, and that detect predefined patterns on events is sufficient.

So the terminology that I personally prefer is:

Interpretation One = Complex Event Processing

Interpretation Two = Event Processing

Interpretation Three = Intelligent Event Processing.

Monday, April 14, 2008

On the spectrum of event processing applications

Back in my office and reading some of the EP Blogs. In the picture above, somebody has tried to draw the spectrum of Blogs (you may want to link to the original in order to see better). One of the last Blogs has dealt again with "simple and complex event processing" claiming that everything done so far in this area is indeed "simple event processing", while real "complex event processing" should support uncertainty and backward chaining. Several posts on this area has been posted by Greg Reemler. We don't have "CEP manifesto" that makes an official definition what is CEP and what is not, and I am not sure that this will be very useful, as it will confuse the customers even more. There is a spectrum of applications that have spectrum of functional and non-functional requirements. On my scientist hat, I am partner to a research work about "uncertainty in event processing" together with Avi Gal and our co-supervised Ph.D. student Segev Wasserkrug However, while there there are applications that require uncertainty reasoning in event processing, there are many others that don't. As I have written several times before, I am not a big fan of the term "complex event processing", due to its ambiguity - some people mean complex processing of events and some mean processing of complex (derived from more than 1) events, some people actually mean complex processing of complex events. I think that we should continue to classify applicatiosn and match the right functional and non-functional requirements to the right applications, but we'll never get to a single functional or non-functional benchmark that will cover all applications in this area. It is better to attract the energy to areas that can help most customers to deal with the problems for which they would like to apply event processing. See my previous posting on : killer applications of EP

More - later.

Friday, January 18, 2008

More thoughts on Rules in the context of Event Processing

I spent today (and tomorrow) in the IBM Hursley Lab - my second home in the last couple of years, this is a picture of the "Hursley House" (well - from the back side) - a countryside English manor that serves today as place for meetings and conferences, and the office of the Lab Director - besides this building there is a complex of connected building with multiple systems for room numbering that can provide in-door navigation challenges. I'll write today about some ideas that came out from discussions in the CITT meeting earlier this week about the role of rule technology. I still hold my opinion that although it is possible to take a technology that has been developed for one purpose and "hack" it to use it to other purpose, it may not be the most natural/effective/efficient way to do. This is true for SQL as well as rules when we are talking about pattern detection in complex event processing (I am working now on tutorial on the issue of patterns). However, this does not say that rule technology does not have a place in event processing in general. Here are some places it can be used:

(1). Decision-based routing in event processing networks.

(2). Transformation of events.

(3). Validation of events.

(4). Orchestration rules.

(5). Intelligent Event Processing.

Note that different type of rules are being used for the different cases -

for routing and transformation - it is typically - if-then rules/decision trees/decision tables.

for validation - constraint oriented rules.

for orchestration - ECA rules (note that orchestration rules are in the domain of the consumer that receives an event from the event processing network and has to decide what to do with it).

for intelligent event processing - all type of rules - deductive, inductive, abductive, rules with uncertainty - can play in different cases.

More about ECA rules and event processing - soon.

Monday, December 17, 2007

CEP and the story of the captured traveller

Reading the recent posting of my friend Tim Bass entitled "CEP and the story of the Fish" I decided to answer with another story (from the other side of Asia) :

A traveller went in the jungle somewhere on the globe and unfortunately was captured by a tribe that is still using ancient weapons. He is brought to the chief, and the chief says - " You have trespassed into the tribe's territory, which is punishable by death, however, I am a very curious person, if you'll show me something I haven't seen before I'll let you go"; our unlucky traveller started to look in his pockets and the only meaningful thing he found was a lighter, so he took his chance, showing it to the chief saying: "this thing makes fire", however, since he was in under a big pressure, he pressed once - no fire, pressed twice - no fire, in the third time the lighter indeed has produced the promised fire, the chief did not hesitate and said "let him go", so our relieved traveller muttered to himself - "I knew that they have not seen a lighter", but surprisingly to him the chief said - "oh, I have seen many lighter, but a Zippo lighter that does not light in the first time I have never seen".

When someone disagrees with somebody else, it is very easy to assume that my point of view is right since I am smarter / knows more / more qualified / older / more experienced / generally always right etc... My preference is not to doubt the wisdom, experience or qualification of anybody that I am arguing / discussing / debating with, but make the arguments on the issue and not on the person who makes the arguments....

Enough introduction -- now for the main message of this posting, the term CEP (Complex Event Processing) has more or less agreed now in the industry to denote "computing that performs operations on complex events", where complex event is an "abstraction or aggregation of events". The term complex does not say that the processing is complex, but that it deals with complex events, as defined. Complex event processing is typically detecting predefined patterns that can be expressed by queries/rules/patterns/scripts and are deterministic in nature. Regardless if I think that this is the best term, I think that it is important to have common agreed terminology, otherwise we are confusing the industry, the customers (and sometimes ourselves). Now, Tim Bass claims that since event processing with stochastic/probabilistic/uncertain nature is more complex than what we call "complex event processing", we have to call this one - "complex event processing", and rename what we call "complex event processing" to be "simple event processing". Unfortunately, it is too late for that - and also not justified, again, since the "complex" in the "complex event processing" does not say that this is "complex processing of events" but that this is "processing of complex events" (very common misconception !). Bottom line: yes - there is another class of event processing capabilities that requires techniques from AI, machine learning, OR etc.. and that is not deterministic in nature; no - I don't think we should call it "complex event processing", we have suggested the term "intelligent event processing" which I have already referred to in previous posting , there are a variety of other postings that I have dedicated to terminology.

More - later

Tuesday, December 11, 2007

On sources for uncertainty in Event Processing

There are various sources of uncertainties associated with event processing - here is an attempt to list some of them:

Uncertainties related to the source:

Uncertainty that an event happened due lack of credible source, or inaccuracy in the source reporting (e.g. has the sensor really detected an object, or there has been some power failure in the process).
Uncertainty to classify an event that happened (murder? suicide? accident?)
Uncertainty about a value of a certain attribute in the event (again - inaccuracy of measurement or lack of information)
Uncertainty about the timing of an event (happened sometimes during last night, but we don't know when).
Uncertainty that our sources reported all events (we cannot assume "closed world")
Events that are inherently probabilistic (e.g. future/predicted events).

Uncertainties related to the processing:

A pattern in the event history designates a "business situation" in the application domain

Uncertainty whether the pattern detection is a sufficient condition to identify the situation, or it is only an approximation (which is a major source for "false positives" and "false negatives").
Uncertainty about the meaning of a "partial satisfaction" of the pattern, e.g. the pattern consists of a conjunction of four events, what happens if three out of the four occur ? is it a really a binary game?
Uncertainty that is driven by one of the uncertainties related to the source (e.g. uncertainty in the timing of event occurrence may inflict uncertainty in a temporal-oriented pattern).
Processing of probabilistic events.

There are also uncertainties associated with the event consumer - but there are for now outside the scope of this discussion. More - Later.

Wednesday, December 5, 2007

On False positives and False Ngatives

From syntactic point of view, CEP looks for patterns and derives event / trigger action based on each pattern detected, however, detecting the patten is the mechanic work, the patterns designate a "situation" which is an "event" in the customer's frame of reference to which the customer wants to react to (there are also "internal" situation for further processing. There is obviously a gap between the intention (situation) and the way it is detected (patter no the event flow). In many cases,satisfying the pattern is sufficient condition to detect the intended situation, however, in other cases, this serves as "best approximation". This leads to the phenomenon of false positives (detecting of patterns, but the situation did not really happen) and post negatives (situation occurred but pattern has not been detected). Some reasons are:

Raw events are missed - do not get at all, or do not get on time (source or communication issues).
Raw events are not accurate - values are not accurate (source issues).
Temporal order issues - Uncertainty in correct order of events.
Pattern does not accurately reflect the conditions for situation (e.g. there are probabilistic elements)
(other reasons) ?

Like the time constraints case there are various utility functions to designate the damage from either false positives or false negatives.

Event Processing Thinking