- The data is not good enough: the claim is that the conception that all requested data is readily available is not consistent with reality, where data suffers from quality, frequency and spatial coverage of the sensors, and data integration issues.
- Networks aren't ubiquitous: The product owners don't have control over the availability of networks
- Integration is tougher than analysis: The main problem is not to analyze the data, but to integrate all data needed for analysis
- More sensor innovation needed: The stated areas of required innovation are - combine video sources which today are under-utilized; more-refined and more-affordable environmental sensors; software-defined sensor,a combination of multiple sensors plus computing power that sits out on a network and "calculates rather than measures."
- Status quo security doesn't cut it. Security systems for IoT should be radically different than those developed for traditional IP.
This is a blog describing some thoughts about issues related to event processing and thoughts related to my current role. It is written by Opher Etzion and reflects the author's own opinions
Friday, May 9, 2014
Internet of Things - what's holding us back?
Tuesday, June 25, 2013
On speed and accuracy in event processing
Saturday, August 11, 2012
How can the level of uncertainty be determined?
Friday, August 10, 2012
On what you need vs. what you can consume
- how uncertainty quantification (such as probability) are obtained?
- how uncertain results are consumed in decisions?
- how quality of uncertain decisions are evaluated and taken into account?
Monday, July 16, 2012
DEBS 2012 tutorial on uncertainty in event processing
Saturday, February 11, 2012
Uncertainty in event processing
And indeed, there has been a lot of work about uncertainty in data over the years in the research community, but very little got into the products, the conception has been that while data may be noisy, there is a cleansing process that is applied before using the data. Now with the "big data" trend, this assumption seems not to hold at all times, the nature of data (streaming data that need to be processed online), the volume of the data, and the velocity of having also imply that the data, in many cases, cannot be cleansed before processing, and that decisions may be based on noisy, sometimes incomplete or uncertain data. Veracity (data in doubt) was thus added as one of the four Vs of big data.
Uncertainty in event is not really different from uncertainty in data (that may represent either fact or event).
Some of the uncertainty types are:
- Uncertainty whether the event occurred (or forecast to occur)
- Uncertainty about when event occurred (or forecast to occur)
- Uncertainty about where the event occurred (or forecast to occur)
- Uncertainty about the content of an event (attributes' value)
There are more uncertainties relate to the processing of events
- Aggregation of uncertain events (where some of them might be missing)
- Uncertainty whether a derived even matches the situation it needs to detect -- this is a crucial point, since the pattern indicates some situation that we wish to detect, but sometimes the situation is not well-defined by a single pattern. Example: a threshold oriented pattern such as: "event E occurs at least 4 times during one hour". There are false positives and false negatives. Also if event E occurs 3 times during an hour, it does not necessarily indicate that the situation did not happen.
We are planning to submit a tutorial proposal for DEBS'12 to discuss uncertainty in events, and now working on it. I'll write more on that during the next few months
Friday, September 30, 2011
On the four Vs of big data
In my briefing to the EU guys about the "data challenge", I have talked about IBM's view on "big data", recently Arvind Krishna, the IBM General Manager of the Information Management division, talked in the Almaden centennial colloquium about the 4Vs of big data. The first 3 Vs have been discussed before:
- Volume
- Velocity
- Variety
The regular slides are talking about volume (for data in rest) and velocity (for data in motion), but I think that we need velocity to process sometimes also data in rest (e.g. Watson), and we need sometimes also to process high volume of moving data; the variety stands for poly-structured data (structured, semi-structured, unstructured).
The veracity --- deals with uncertain/imprecise data. In the past there was an assumption that this is not an issue, since it would be possible to cleanse the data before using it, however, this is not always the case. In some cases, due to the need of velocity in moving data, it is not possible to get rid of the uncertainty, and there is a need to process data with uncertainty. This is of course true when talking about events, uncertainty in event processing is a major issue still need to be conquered. Indeed among the four Vs, the veracity is the one which is least investigated so far. This is one of the areas we investigate, and I'll write more about it in later posts.
Friday, November 20, 2009
On Inexact events


Sunday, November 8, 2009
On challenging topics for event procesing developers and users

- Occurrence time that occur over intervals: Events typically occur over intervals, but for computational reasons it is convenient to approximate it to a time-point, and look at events in the discrete space; however, for some events this is not an accurate thing to do, and interval-based temporal semantics should be supported, along with operations associated with them.
- Temporal properties of derived events: For raw event, we defined occurrence time as the time it occurred in reality, and detection time, as the time that the system detected its existence. What are the temporal properties of derived events? there is no unique solution to this question.
- Out-of-order events: This topic is the topic most investigated among the challenging topics, however, current solutions are based on assumptions that are sometimes problematic. This problem is about events that arrive out of order, where the event processing operation is order-sensitive.
- Uncertain events: Uncertainty whether event has happened, due to malfunction, malicious or inaccurate sources
- Inexact content of events: Similar to uncertain events, some content in the event payload including temporal and spatial properties of the events may not be accurate.
- Inexact matching between events and situations. Situations are the events that require reaction in the user's mind. This is in getting us back from the computer domain to the real-world domain. Situation is being represented as a raw or derived event, but this may be only approximation, since there may be false positives and false negatives in the transfer between the domains.
- Traceability of lineage for event or action, this gets to the notion of determination of causality. Since in some cases there are operations in the middle of the causality network outside the event processing systems boundaries (e.g. event consumer who is also event producer) causality may not be automatically determined.
- Retraction of event: ways to undo the logical effects of events, sometimes tricky or impossible, but seems to be a repeating pattern.
Friday, January 23, 2009
On Complexities and event processing

For those who read the title and grinned -- not again, discussion about what is the meaning of the term CEP, relax --- I explained in a previous posting entitled: "Is my cat CEP" , why such a discussion is futile, and I am typically consistent. BTW - when I have written that posting I did not have a cat, since than my daughter has adopted one, and he does not seem to me complex.
However, I would like to answer a more interesting question that somebody asked me recently -- what are the sources of complexity in event processing ?
In high school we have learned about "complex numbers", we liked this topic, since it was one of the most simple topics in the matriculation exam in Mathematics... Complex number is just a combination of two numbers, thus the complexity is in the structure. David Luckham also coined the term "complex events", where the complexity is also in the structure. However, there are more levels of complexity that may serve as a motivation to use COTS instead of hand-coding this functionality. What type of complexities can we observe beside the structural
complexity ?
Complexity derived from uncertainty:
- The applications specification is not known a priori and has to be discovered, example: fraud detection. This is related to the "pattern discovery" I have discussed in the previous posting.
- There are no reliable sources to obtain the desired events, or the events achieved can have uncertainty associated with them. This is a distinct complexity, since there may be the case where the application specification is well defined but the events cannot be obtained, and vice versa-- the patterns are unknown, but once discovered, the required events are easily available.
- Producer related complexities --- semantic differences among various sources, problems of time synchronization among various sources etc..
- Consumer related complexities --- similar to the producer ones, these two are, of course, orthogonal to each other, and to all other complexities.
- Interoperability complexity where various processing elements are involved.
- Complex functions requirements -- e.g. complex patterns that may involve temporal, spatial, statistical operators and combinations of them.
- Complex topology of the event processing graph, with a lot of dependencies among the various agents, which creates a complexity in validation and control.
- Complex subscription / routing decisions.
- High throughput of input events.
- High throughput of output events.
- High number of producers
- High number of consumers
- High number of event processing agents (imagine 1M agents in a single application)
- Requirement to maintain high amount of space for processing state.
- Hard real-time latency constraints.
- Compliance with QOS measurements such as threshold on average latency, threshold on percentage of events that don't comply with some latency constraint etc...
- High availability requirements.
- Dynamic, frequent changes in the logic of the event processing
- Need for programming by various types of "semi-technical" people among the business users community...
Of course, a single application may be the ultimate complex application of event processing and need ALL of these complexities, finding this application is, for sure, the dream of every researcher --- getting a lifetime of research challenges, but in reality different applications have different combinations of complexities. An application can be simple in all metrics, but have hard real time constraints, it can have very complex functionality, but no quality of service, or quantities issues. Another applications may need pattern discovery, but again the rest is simple, another combination can be relatively simple application, with complexity in quantity of producers and consumers and in semantic integration with all of them, and with the wonder of combinatorics, one can get to many more combinations....
More on complexities - later.
Wednesday, August 20, 2008
On Event Processing Network and Situations - the semantic bridge
One of the challenges in building a semantic meta-language that captures event processing behavior is to bridge the gap between terms that are in different domains. I have written before about Situations following the CITT meeting, in which this term has been discussed. In fact, we have used the term "situation" in AMiT , we also called its core component the "situation manager" We used the term situation to denote a meta-data entity the defines combination of pattern and derivation, but I must admit that this has been a mismatch of terms, although there is a strong correlation among these terms.
First - let's go back to the operational world, in this world events are flowing within event processing networks, in the EPN there are agents of various types: simple (filtering), mediating (transformation, enrichment, aggregation, splitting), complex (pattern detection) and intelligent (uncertainty handling, decision heuristics... ), in the roots of the network there are producers, and in the leaves of the network there are consumers. This is an operational view of what is done, the term "situation" is in fact, not an operational term, but a semantic term, in the consumers' terminology and can be defined as circumstances that require reaction.
How can we move from the operational world to the semantic world - we have two cases here:
- Deterministic case: there is an exact mapping between concepts in the operational world and the situation concept;
- Approximate case: there is only approximate mapping.
In order to understand it, let's take two examples:
Example 1 - toll violation (determinstic, simple)
- The cirumstance that require reaction is the case that somebody crosses a highway toll booth without paying (in Israel the only toll highway is completely automatic, but we'll assume that it occurs elsewhere, and that the toll is not applicable between 9PM - 6AM and in weekends).
- Getting it to the operational domain - there are two cases - one: go in the EZ pass lane and don't pay, two: go in a manual lane and somehow succeed to cross the barrier (obstacle?) without paying.
- The action in both cases: apply camera to capture the license plate, SMS the picture to officer on duty in the other side of the bridge.
From the EPN perspective, we have events of car cross a certain section of the road (the raw event), the EZ pass reading is an attribute of this event, and if no EZ pass it gets a "null" value. There is context information which is --- temporal (it is in hour and day where toll is in effect), spatial (the location of EZ pass lane), note: sometimes people mix the notion of context with the notion of situation, I have explained the difference in the past. Within this context a filter agent that looks for null value in the EZ pass reading is applied, if the filter agent evaluates to true then the situation phrased has been applied in deterministic way, and indeed the edge going out of the filter agent is going directly to a consumer (the camera snapshot may or may not be considered as part of the EPN). This is a case of "simple event processing", stateless filtering, whose output is a situation. This gives a counter example to the misconception that situation is closely related with complex event processing [I can continue the example to the other case, but I think that you got the point by now]
Example 2 - Angry Customer (approximate, complex)
The setting is a call center, the situation is -- detect an angry customer -- refer him or her to the "angry customers officer".
Here the life is not that easy, a human agent can detect angry customers by the tone of their voice (or electronic message), but this does not include all cases of angry customers, so we can look at some pattern saying -- a customer that applied 3rd time in a single day is an angry customer, and then we need to have a "pattern detection" agent that detects the pattern "3rd instances of application" where the context partition refers to the same date, same customer, same product. In this case also a leaf edge is mapped to a situation, but there are two differences from the previous case:
1. The agent is now complex event processing agent since it detects pattern in mu;tliple agents;
2. The edge represents the situation in an approximate way, which means that it can have false positives (the CEP pattern is satisfied but the customer is not really angry, just asked for a lot of information to install the product), or false negatives (the customer called twice, and does not talk in an aggressive tone, yet he is furious).
In some case it also makes sense to associate "certainty factor" with the leaf edge, approximating the measure of belief in the fact that this edge represents the situation. I'll leave the discussion about uncertain situations to another time.
Monday, August 11, 2008
On faithfull representation and other comments

As can be seen I am writing there that composite events (which are taken from active database terminology) and complex events (which are not) may both represent situations, which does not say that this is the only way to represent situation (as saying that fish is an animal does not define what is an animal).
2. I have explained the basic idea of situation in this posting , simply said - a situation is a concept in the "real world" domain (not in the computer domain) that requires reaction. In some cases a single event determines a situation, in some cases, detecting a pattern determines a situation, and in other cases, patterns only approximate the notion of situation, and there is no 1-1 mapping between events and situation, note that in that posting I also have provided an example of non deterministic situations.
3. Regardless of the situation definition, Richard is absolutely right that all over the event processing life-cycle we may have instances in which the events are inaccurate or uncertain , and the reader is referred to this posting for some examples of uncertainty issues we are dealing with. This is an area that I am investigating in the last few years together withAvi Gal from the Technion and Segev Wasserkrug (our joint Ph.D. student who graduated recenlty with a Ph.D. dissertation was denoted as excellent by the exam committee). Hot from the oven - A paper about it is published in the recent (August 2008) issue of IEEE Transactions on Knowledge and Data Engineering, which is dedicated to "SPECIAL SECTION on Intelligence and Security Informatics". The actual paper can be downloaded from Avi Gal's website. Another paper related to the same study has been presented in DEBS 2008.
4. While I totally agree that in some cases the uncertainty is needed - and certainly some security applications are example, I also believe that the potential market for the more basic deterministic world is much higher, and we are far from picking up all the low hanging fruits of the deterministic event processing.
5. We still have challenges in defining the semantics of the different cases of handling uncertain events/patterns/situations. The fact that there are arithmetic of uncertainty help, but not everything that exists in AI research fits the real world requirements of scalability, performance etc..
6. About the comment of me viewing event processing as extension of active database technology -- I view event processing as a discipline by its own right (and this is a topic for another discussion which I'll defer), it has origins in several disciplines, one of them is active databases, but it has several more ancestors - sensor fusion, discrete event simulation, distributed computing/messaging/pub-sub and some more, and draws concepts from each of them. Anybody who reads my Blog can realize that there is a fundamental difference between active database that extends database engines and event processing that is not based on database technology, there are some other differences too.
7. My friendly advice to Tim is that before he makes assertion about how and what people think (and this does not refer necessarily to myself) he will re-read his own excellent posting :"red herring fallacies" .
More on event processing as a discipline - at a later post.
Tuesday, May 6, 2008
On the three meanings of CEP
There is an old Jewish story on two people who had some dispute, and decided to go to the Rabbi and ask his opinion. The Rabbi listened to the first person and told him: you are right, then he listened to the second person and also told him: you are right. The Rabbi's assistant who has been present asked him: Rabbi, how can they both be right ? and received the obvious answer: you are also right.
Recently, a dispute between two opinionated persons - Hans Glide and Tim Bass has stormed the network, and somehow got also to my Blog. Somehow both of them understood that what I've written is consistent with their view - which encourages me to change career and become a Rabbi, but I think that there are some skills I lack - so anyway, I'll stay on event processing thinking.
While the dispute started around POSETS, the last posting by Tim Bass in the CEP forum re-focuses the discussion around - what is CEP ? so I'll stay with this topic for a while, since I think that there are (at least) three different interpretations of what CEP is - and this is a source of a lot of confusion. Thus I'll try to explain the three interpretations.
Interpretation One: the glossary interpretation -- complex event is the processing of complex events, where complex event is an abstraction or aggregation of events. According to this interpretation, a software function can be defined as CEP if it involves the processing of multiple events - typically collecting or matching some pattern among multiple events; The test for CEP according to this interpretation is typically support of a state that accumulate the relevant events - since they typically arrive over time. This definition does not say anything about the number of events, number of event types, whether they are totally ordered or not, or whether causality relation is supported or not - these are all attributes of specific implementations.
Interpretation Two: Event Processing = Complex Event Processing. This is a common practice in the commercial work to label all event processing functions as CEP. This spans from support of CEP (according to the interpretation one) and other event processing functions (such as: routing, enrichment, transformation) that don't satisfy the CEP test (since they are stateless and deal with a single event at a time). It can get to an absurd that some product that does not support CEP at all according to interpretation one, calls itself CEP. This is the most confusing interpretation, IMHO.
Interpretation Three: Complex event processing is event processing that has some complexity associated with it. According to Tim Bass - hidden causality and Markov Processes are vital for something to be defined as CEP. This really says that it CEP must involve uncertain events, causality that need to be discovered (by mining and other techniques), and the general usage of CEP is to predict something according to analysis of recent (past) events. According to Interpretation three, indeed the products that current products that call themselves CEP, do not satisfy this criteria, and thus are not CEP.
My opinion: as stated in the past, the term CEP has some inherent ambiguity, therefore I always thought it is confusing term. As far as my own taste in terminology - I prefer Interpretation One, saying that CEP is a subset of EP functions that deal with "complex events", it also seems that this is the closest to glossary definition. Interpretation two is confusing, as it turns CEP from a well-defined term to a marketing buzzword, and thus there is no test for what it is. Interpretation three is interesting, there are certainly applications that require prediction and various usages of stochastic processing and use of AI techniques (machine learning and others) in event processing. Hidden causalities is an important term, and I'll refer to it in another posting, since it has some pragmatic difficulty to obtain. However, I prefer to stick with the concept that CEP is processing of complex events, and not complex processing of events, and for interpretation one, we don't really need (necessarily) to apply AI techniques, this is just one type of application, there are a variety of applications that does not require it, and that detect predefined patterns on events is sufficient.
So the terminology that I personally prefer is:
Interpretation One = Complex Event Processing
Interpretation Two = Event Processing
Interpretation Three = Intelligent Event Processing.
Monday, April 14, 2008
On the spectrum of event processing applications
Friday, January 18, 2008
More thoughts on Rules in the context of Event Processing
Monday, December 17, 2007
CEP and the story of the captured traveller
Reading the recent posting of my friend Tim Bass entitled "CEP and the story of the Fish" I decided to answer with another story (from the other side of Asia) :
A traveller went in the jungle somewhere on the globe and unfortunately was captured by a tribe that is still using ancient weapons. He is brought to the chief, and the chief says - " You have trespassed into the tribe's territory, which is punishable by death, however, I am a very curious person, if you'll show me something I haven't seen before I'll let you go"; our unlucky traveller started to look in his pockets and the only meaningful thing he found was a lighter, so he took his chance, showing it to the chief saying: "this thing makes fire", however, since he was in under a big pressure, he pressed once - no fire, pressed twice - no fire, in the third time the lighter indeed has produced the promised fire, the chief did not hesitate and said "let him go", so our relieved traveller muttered to himself - "I knew that they have not seen a lighter", but surprisingly to him the chief said - "oh, I have seen many lighter, but a Zippo lighter that does not light in the first time I have never seen".
When someone disagrees with somebody else, it is very easy to assume that my point of view is right since I am smarter / knows more / more qualified / older / more experienced / generally always right etc... My preference is not to doubt the wisdom, experience or qualification of anybody that I am arguing / discussing / debating with, but make the arguments on the issue and not on the person who makes the arguments....
Enough introduction -- now for the main message of this posting, the term CEP (Complex Event Processing) has more or less agreed now in the industry to denote "computing that performs operations on complex events", where complex event is an "abstraction or aggregation of events". The term complex does not say that the processing is complex, but that it deals with complex events, as defined. Complex event processing is typically detecting predefined patterns that can be expressed by queries/rules/patterns/scripts and are deterministic in nature. Regardless if I think that this is the best term, I think that it is important to have common agreed terminology, otherwise we are confusing the industry, the customers (and sometimes ourselves). Now, Tim Bass claims that since event processing with stochastic/probabilistic/uncertain nature is more complex than what we call "complex event processing", we have to call this one - "complex event processing", and rename what we call "complex event processing" to be "simple event processing". Unfortunately, it is too late for that - and also not justified, again, since the "complex" in the "complex event processing" does not say that this is "complex processing of events" but that this is "processing of complex events" (very common misconception !). Bottom line: yes - there is another class of event processing capabilities that requires techniques from AI, machine learning, OR etc.. and that is not deterministic in nature; no - I don't think we should call it "complex event processing", we have suggested the term "intelligent event processing" which I have already referred to in previous posting , there are a variety of other postings that I have dedicated to terminology.
More - later
Tuesday, December 11, 2007
On sources for uncertainty in Event Processing
There are various sources of uncertainties associated with event processing - here is an attempt to list some of them:
Uncertainties related to the source:
- Uncertainty that an event happened due lack of credible source, or inaccuracy in the source reporting (e.g. has the sensor really detected an object, or there has been some power failure in the process).
- Uncertainty to classify an event that happened (murder? suicide? accident?)
- Uncertainty about a value of a certain attribute in the event (again - inaccuracy of measurement or lack of information)
- Uncertainty about the timing of an event (happened sometimes during last night, but we don't know when).
- Uncertainty that our sources reported all events (we cannot assume "closed world")
- Events that are inherently probabilistic (e.g. future/predicted events).
Uncertainties related to the processing:
A pattern in the event history designates a "business situation" in the application domain
- Uncertainty whether the pattern detection is a sufficient condition to identify the situation, or it is only an approximation (which is a major source for "false positives" and "false negatives").
- Uncertainty about the meaning of a "partial satisfaction" of the pattern, e.g. the pattern consists of a conjunction of four events, what happens if three out of the four occur ? is it a really a binary game?
- Uncertainty that is driven by one of the uncertainties related to the source (e.g. uncertainty in the timing of event occurrence may inflict uncertainty in a temporal-oriented pattern).
- Processing of probabilistic events.
There are also uncertainties associated with the event consumer - but there are for now outside the scope of this discussion. More - Later.
Wednesday, December 5, 2007
On False positives and False Ngatives
- Raw events are missed - do not get at all, or do not get on time (source or communication issues).
- Raw events are not accurate - values are not accurate (source issues).
- Temporal order issues - Uncertainty in correct order of events.
- Pattern does not accurately reflect the conditions for situation (e.g. there are probabilistic elements)
- (other reasons) ?
Like the time constraints case there are various utility functions to designate the damage from either false positives or false negatives.
More on that issue - later.