Showing posts with label event processing. Show all posts
Showing posts with label event processing. Show all posts

Saturday, August 27, 2011

On streams, events, programming-in-the-large and programming-in-the-small


In the tutorial I've given in VLDB 2010, one of the first slides was a rhetorical questions - see above. 
There are four opinions: some people think these are aliases, some people view stream processing as a subset of event processing that deal with ordered events, some people view event processing as a subset of stream processing, saying that event stream is one type of stream, and there are also other type of data streams such as voice stream, video streams,  and there are also people who think that these two are actually totally different concepts, relating to different types of applications.   There is something of true in each of them, looking at some interpretations, but IMHO none of the above is really true,


Curt Monash decided to renew the old terminology discussion on his Blog,  Taking the "stream" approach which is favored by the database people which look at "data streams" as data in motion, and view events as type of data that does not need any real special handling.


The difference of opinions and terminology stems from the fact that some people are thinking about apples and some about oranges.    


What is the apple?  - let's take as an example the S4 from Yahoo Labs,  in the Blog post I referenced here I mentioned that S4 is a platform for doing "programming in the large" for stream processing, what does it mean? -- it supports a data flow graph, where streams are flowing on the graph's edges, and the processing logic is embedded in the graph's nodes.  How is this logic implemented?  this is not part of the model, each developer can use the platform and implement the nodes, the platform takes care of the flow, and some non-functional properties (distribution, fault tolerance, cluster management, scalability in some aspects etc..).   
It is a pure programming-in-the-large framework.  There are others like that, in this case the model is blind to the type of stream, and the stream can indeed be video stream, voice stream etc..   I would call such a framework as "stream processing".


What is the orange? -- if we look at the abstract model of event processing, the way we defined it in the EPIA book,  it is a model that is centered around the programming-in-the-small,  with language primitives that related to the semantics of events: mainly the notion of context (when? where? to whom?) of events and patterns over multiple event occurrence.   The orange does not sound at all like the apple.


Can something be both apple and orange?  -- the answer is positive,   while event processing can be implemented using various "programming in the large" models, we advocate the "event flow" one, and the "event processing network" can be mapped to the data-flow graph model of streams.   So it is possible, but not necessarily to implement event processing as a kind of stream processing.   It turns out that there are some benefits to do it, and we see that indeed this seems to become a dominant way for "programming-in-the-large", while the programming in the small is still based on the semantics of events.   


The view point is always the hammer and nail issue.   Those who have the stream processing  "programming-in-the-large"  see event processing as just an applications of their platform,  and think that the platforms is the main thing.  Those who are having event processing language view the semantics and functionality of the language as the main thing, and the platform as facilitator.  

The intersection is not an overload, in stream processing one can add a node dealing with audio processing, but the event processing language might be of little value,  likewise, there are implementations of event processing that are based on other programming-in-the-large models (such as: logic programming framework) and not on the stream model.  



When looking at current state-of-the-art, we see that many of them indeed lie in the intersection of both, thus each of the sides can classify them its own way.    The fact that most classify them as event processing may show where the market thinks that the value is.  

Saturday, December 4, 2010

Event processing as blasphemy




The fire in the Carmel mountain is still on  for the third day, although it is somewhat reduced due to the fire fighting aircraft from all over the world - the most notable one from Russia, but also from other countries; it is good to see that when a disaster happens, many countries in the world are getting to help, Israel has also a record of helping other countries while disaster occurs.   I guess that the fire will be overcome eventually, but this does not change the very bitter feeling that the population here has about the incompetency of the government, where a series of faults came together to bring this disaster.   For everybody who sent me worried Emails from all over the universe:  the fire did not get into the city of Haifa, so have not been in real danger, the site of IBM Haifa Research Lab, is relatively close to the fire area, as it is located on the Haifa University campus.  The campus was confiscated to be the headquarter of the fire fighting forces, so it is still blocked.   While in Haifa we were not in danger, people in some villages lost their home to the fire, and 42 people were killed when a bus was caught in the fire.   

Now for this posting's topic -- you probably wonder what's event processing has to do with blasphemy or religion at all?  I think that I have written before about all of these, but will put it within a single perspective. When I was young I had several friends who moved through the process of becoming religious (what the Christians call "borne again", Jewish people are using a different term), I watched this process with interest, and has many discussions with them (well, they tried to convince me that they saw the light and I just have to look more carefully to discover it).   One thing that I have learned about religious people is that it is useless to argue with them, since their beliefs are based on axioms, and once you identified this fact, one cannot argue over axioms, since this is the nature of an axiom.    Likewise, there are many professional religions, I have seen religious wars in other areas of computing, and this is not really a new phenomenon, just different gods.

Here are three religions for which event processing serve as blasphemy to that specific religions.

Religion one:   The data-centric religion.
The religion's belief:   the world is data centric, everything can be done within database tools. The is a small niche which requires high scalability, but it can also be dealt using database techniques,
The blasphemy:   events processing is a distinct discipline; it has some unique characteristics. 

Religion two:  The programming model religion
The religion's belief:   all functions need to be expressed using the programming languages we know and love.
The blasphemy:   event processing has various languages abstractions that are not part of the regular languages. 

Religion three:  The "true CEP" religion
The religion belief:  The term CEP was coined to cope with application of types of intrusion detection; any person who did not work directly on intrusion detection applications is not qualified as a priest for the religion, thus cannot really deal with event processing. 
The blasphemy:  Anybody using the term "CEP" for any other application type is a blasphemer, any technique that tries to address any other event processing application is simply irrelevant  (comment:  I don't really  tend to use the term CEP, but some of the vendors  indeed use it).


As said,  the prophets (and disciples) of these religions believe in them in an emotional way, and there is no use arguing with them, so the best way is just to expose the axioms they believe in and let people think whether they believe in these axioms or not.     One of the motivations of the EPTS use case survey is to find out about the  usage of event processing today;   since it is generally agreed that the event processing area barely scratched the surface of its potential, an equally important issue is to identify what are the gaps in th state of the art  that are required in order to achieve it, and this is another major activity of the community that will be discussed within the event processing manifesto and   other related activities -- more about these topics - later. 

Thursday, July 16, 2009

On the smarter planet and the big brother



An interesting comment to my previous posting on smart cities, said as follows:


Even if I am a big supporter of event based systems, I have a question on this topic. Can you explain me, who is the owner of the event based routing, filtering integration services?

I hope, that you do not think about weather the government, or some industrial companies, cause this would empower them to rule the world.

There need to be a self organizing way like e.g. the Internet, but getting there will be even a larger competition than introducing the web, because of already competing companies in that market segment.

I have written before on the big brother when talking about the previous NY governor whose felling out of power began when a computer program indicated him as suspicious in money laundering. This reminds my of somebody who told me that he is afraid that information is gathered in computerized systems and used for other purposes, that person lived in NY, and moved at some point to live with his girl-friend in New Jersey. He did not want to change his NY address for various reasons (among them not to bother changing his driver licence). He said that he never used EZPASS to pay for toll roads, since somebody could conclude from the fact that he drives every working day in the "garden state parkway" northbound, that he actually lives in NJ.

So the question is -- using all the smarter planet services will have the possibility to gather much information about the individual, who owns this information and what can this information be used to. Governments typically do not own infrastructures, but they have powers to make laws that will compel infrastructure owners to provide information, they do it today. Actually Internet service providers has a potential to know a lot about us. The same goes for events, if we'll have smarter planets, there will be a lot of events about individual person floating around, and if somebody will be able to join events from various sources, this somebody have a potential to know much about us. I guess that here there will be a need to have some legal structure that will prevent infrastructure suppliers as well as governments to abuse this information, like the bank secrecy laws in Switzerland.

Monday, January 5, 2009

On event processing and some interesting queries

Some people have returned from the vacation with a surplus of energy, otherwise I cannot explain why my inbox today was full of mails from the same thread of discussion in the everlasting Yahoo CEP interest group trigerred by a question sent by Luis Poreza, a graduate student from University of Coimbra in Portugal. I am taking a liberty to re-write the question since it was phrased as a question in trading system, thus, some of the responders answered in trading related stuff that did not help to answer Luis' question, so getting as far away as possible from the stock market, I will base the rewriten question in the fish market. So the story is as follows: the price of 1 KG of fish is determined according to the hour, the demand, the supply and the general mood of the seller. In 10:50 he made this price as 71, then in 11:15 the price was down to 69 no more changes by 12:00. There is a computerized system that works in time windows of one hour starting every hour. The request is to find out for the time window 11:00 - 12:00 whether the price of 1 KG of fish was ever > 70. The claim is that intuitively the answer is yes, since the price in the interval [10:50, 11:15] was 71, but if we look at all the events that occurred at this window there was no event with value > 70, thus current "window oriented" tools will answer --- no.

There have been plenty of answers, some even tried to answer the question, for example by adding dummy events (one at the end of the interval ? every minute? ) with the value 71.

However -- I am going to claim the following assertions:

(1). The requirement given is not an event processing pattern.
(2). Attempts to treat it as event processing patterns are not very useful.
(3). It is in fact a kind of temporal query
(4). There may be a sense to have the capability to issue temporal queries as a response to events (AKA retrospective event processing) but this has to be done right.

Assertion one - the requirement is not an event processing pattern. Event processing pattern is a function of events, it is no surprise that Luis found some difficulty to phrase it as such. Let me take two other examples that look syntactically the same and try to understand what is the problem here:



The government agency example: A government agency known for its long queues in getting service tries to monitor the lenght of the queue. Periodically some clerk goes out and counts the number of people waiting in the queue. In 10:50 he found 71 people in the queue, in 11:15 69 people in the queue, no more samples by 12:00. Now the question is -- whether there has been some point in the time window between [11:00, 12:00] in which the number of people in the queue > 70.

Before starting the discussion, let's look at another example, the bank account example.
In 10:50 Mr. X has deposited $30, his previous balance was $41, which made his balance $71;
in 11:15 Mr. X has withdrawn $2, his balance was set to $69.

The fish market example looks from syntax point of view exactly like the queue monitoring example, in both cases we have events in the hours 10:50, 11:15 with attributes 71 and 69 respectively. However, they are not the same, the reason is that the price in the fish market is fixed until changed, while the length of the queue may have been changed several times up and down since the event here is only a sample and does not cover all events. Both of these events observe some state (price or length of queue), but the semantics is quite different. If we'll use the solution of dummy event for the queue case then the value will probably be wrong, furthermore, we cannot really answer the query in the queue case in "true" or "false", yet, in reality, periodic sampling is a totally valid type of events. Moreover, if we look at the bank account example, it looks very different from the fish market example -- it has two types of events, and the events do not observe a state, but report on change, and report the change value ("delta"). Thus looking at the two events of deposit and withdrawal we'll not be able also to answer the question, but knowing the state (balance of the account) and the delta (for the deposit and withdrawal) we are getting something which is semantically similar to the fish market example.

What can we learn from these examples? first that the property "the value is the same until it is changed" is not a property of an attribute in event, it is the property of the state (data) that may be created or updated by events. This is true for some state, this is not true for others. Solution given based on the fact that a human knows the semantics of this state, and writes ad-hoc query. However this is processing of the state, based on its semantic properties, and not of the events.

Assertion two -- Attempts to treat it as event processing is not useful.

In the past I've blogged about the hammer and the nail. There is a natal tendency of anybody who has a product to try and starch its boundaries. This may also backfire, since if trying to do some functions that this product is good at, and not doing great work can overshadow the good parts of the product. Solution like adding "dummy events" is a kind of hacking. It abuses the notion of event (since dummy event did not really happen), moreover, given the fact that this is just ad-hoc query, and there can be many such queries, in order to cover all them, we may need exponential number of dummy events... Anyway- event processing software is just a part of bigger picture, and instead of improvising, hacking or get to this functionality, it may be more advisable to use a product with better fit.


Assertion three -- This requirement is in fact a temporal query. I will not get into temporal queries now, but the actual query is over the price of 1 KG fish as changed by time. It is an existential query -- looking if some predicate holds somewhere in the interval. Other example of temporal queries can be: was there any day during the last 30 days in which the customer has withdrawn more than $10000 in a single withdrawal.

And this example brings us back to assertion four --- there may be a sense to couple event processing software with temporal queries. Example is that we have an event that makes a customer "suspect" in many laundering, but we need reinforcement by looking at some temporal queries in the past - like the one written above... I'll write about this type of functionality in a later phase.

Well - it is 1:15 AM, so I'd better take some sleep, tomorrow is again a busy day. So conclusion -- not everything that looks simple to do manually is simple to be done by a generic type of thinking, second -- event processing software should concentrate on doing event processing right, and not doing other stuff wrong... Some follow up Blog postings -- later

Thursday, August 28, 2008

On the "Event Processing Thinking" Blog - after the first year

One of the ways to obtain events is through "calendar events", this is useful for time-out management, periodic triggering etc. Today I saw in my calendar a reminder: this is the one year anniversary of the "event processing thinking" Blog - you should write something about it. Actually, yesterday I got a note from one of the analyst firms that research the impact of Web 2.0 on companies and was asked to participate in this study on my Blogger hat... This is not the first time that people approach me based on reading my Blog for various purposes, and actually I can say that I have under-estimated the power of Blogs and the amount of visibility it gets. This is probably the most visible communication vehicle exists today (how many people are reading papers?)

Looking at the Blogland I also realized that the visibility can be a double-edged sword, since people can easily expose their own ignorance, so I am trying to write only on stuff that I think
I know something about...

One thing that is interesting is the statistics (who reads the Blog) - it seems that the previous time I've written about statistics has been one of the most read postings (see below).

Looking at the Google Analytics statistics it seems that since the start of measurement (I've installed Google Analytics 2 weeks after the Blog start) more than 10,000 distinct persons (10,139 to be exact) have read this Blog. I don't have any illusion that there are 10,000 people who are interested in event processing, and some got due to the wonders of the almighty Google (e.g. looked for a picture of unicorn), so a better metrics is to see that 1/3 of the readers returned more that once, and 1432 readers returned more than 50 times - which is the more reasonable number the amount of people interested in the content. It seems that the amount of people who read all or at least 2/3 of the Blog postings is around 800, and this seem to be the size of effective readership.

What else can I learn from the statistics? The most popular postings are:

(1). Agnon, the dog, playing and downplaying is still, and by far the most popular one, in this posting is one of the postings where I claim that "event processing" is a discipline that stands on its own fits, and not a footnote to database technology or business rule technology.

(2). Revisiting the Blog **2 again which, like this posting, is talking about statistics around this Blog, I wonder why this posting is so popular (or people wanted to look at the map of Arkansas to plan their next holiday.

(3). On infant, professor and unicorn despite the fact that this posting is much younger, it had a lot of traction, some because people are looking for pictures of unicorns, and some because always disputes bring more rating... However, rating is not all, and when I think that I've said all that I need to say about particular topic, I move on.

As far as the geographical distribution of readers: there have been readers from 124 countries.
In terms of amount of entries - the big ones are:
(1). USA, (2). UK, (3). Israel, (4). Japan, (5). Germany, (6). Canada, (7). France and (8).India. As far as the amount of individual readers - the big ones are:
(1). USA, (2). UK, (3). Germany, (4). India, (5). Australia, (6). Israel, (7). France and (8). Holland. So it seems that in Japan I have relatively small (less than 100) but loyal set of readers - I am still looking for some opportunity to travel to Japan - never been there (actually I have never been in India either).
In the USA there are now readers from all 50 states (+ DC) and the leading are: California, Massachusetts and New York. Putting Arkansas map helped - and now Arkansas in the 16th place in the USA in visits.

The three big cities in terms of visits are still : (1). London, (2). New York City, (3). Bangalore.

I'll not survey the negative and positive reviews about this Blog - and let every reader judge. that is the essence of the entire Web 2.o business! -- well, that's all for today; Will return soon with a more professional posting.

Saturday, June 14, 2008

On Event Proessing Platforms and Engines



Here are pictures of some engine and some platform, and they don't seem to be compatible, however if a car will replace the horse in the 2nd picture, all of a sudden, the engine (of the car) will have a role in the platform. In event processing we hear more and more about platforms, and even about event-based middleware that will be a basis for XTP and other stuff. The platform is providing some services for agents to run, like a road system that can enable to get from place to place and provide services such as: signs, traffic lights etc... The way to go there can be by foot, riding an hoarse, and driving a car, among other options. Taking this analogy further -- in the event processing world, every agent can be implemented in ad-hoc fashion (going by foot), use C/Java code with some tools to help create EP applications, or use engines, which are COTS tools that do event processing. Implementation of heterogenous agents on the same platform requires interoperability standards, if we also wish to implement it in a seamless applications we'll need language standards. There are some starts of event processing platform, quite basic at that point, but the direction seems promising, and may, at some point in time we may see event processing platforms as a basis for the new generation of enterprise application servers.