Thursday, March 25, 2010

More on event processing agents


There are various agents types of agents in reality, like the one shown above, likewise there are various agents types in computing, as I've discussed long time ago in this Blog, but since that time the thinking was somewhat evolved. Recently, Jim Odell, a long time agents advocate, has been hosted by the TIBCO CEP Blog, and advocated the use of agent technology in event processing, providing scalability as main motivation. In the Event Processing in Action book we are making EPA (Event Processing Agent), a term coined by David Luckham, as the most notable building block in our model. The term agent is used as software agent, and not necessarily agent in the AI sense. We are using the term agent as a meta level, where in the run-time level there are agent instances that can be implemented in various ways. Event Processing Agents in event processing are event-driven in the sense that they are taking one or more events as input, perform some processing on these events, and derive one or more events.
EPA can filter events, transform events, detect event patterns or do any combination of the above. Event processing agents are typically (but not always) associated with context, thus context related operations determine when an EPA instance is open or close, assume that the context is a temporal sliding window of non overlapping periods of 2 hours, and there is an EPA associated with this context, then every 2 hours, an instance of this EPA terminates and another instance is initiated. In essence EPA interacts with events in various ways:
  • An EPA receive events as input
  • An EPA processes events
  • An EPA may query historical events
  • An EPA derives events as output
  • Through context -- an EPA instance may be initiated or terminated by events

Each EPA instance is autonomous in the sense that it does not communicate with any other EPA instance, and thus can be implemented by different run-time artifact, which indeed can enable scalability. However, there can be various grouping of EPA instances to run-time artifacts, where the two extremes are: run-time artifact for every EPA instance, and a single run-time artifact that contains all the EPA instances within the application.

Benefits of using EPA are -- simplicity of the model, modularity, and as said flexibility in implementation that may support various scalability and performance objective functions. I'll write more about EPA - later.

Monday, March 22, 2010

Some media events

I have written last week about participating in an entrepreneurship panel, where there has been a discussion on whether innovation in big companies is counted or not, it turns out that the echo of this discussion got to the national press in Israel, although you have to read Hebrew in order to understand what is written. Well - for those who do, they did not quote me accurately, and even did not spell my name correctly... go figure.

And from past media event to future one, in Thursday I am participating in an IBM Webinar given on the ebizQ platform. The Webinar is a blend of the IBM "business agility now" message and its relations to event processing, and some selected topics from the EPIA book. The Webinar is planned for Thursday, March 25th at noon USA EDT. You are all invited.

Saturday, March 20, 2010

On the goat hill

This picture was taken in the "goat hill", which is a very nice hill below our neighborhood that is used for retreat into nature, 10 minutes walk from home, and many children activities. Alas, there is now a decision to build new neighborhood and destroy the nice hill, yesterday, the children active in the "nature protection society" including my daughter Hadas did a guided excursion to show people what is about to be destroyed, and help mark flowers, since the society is going to move the flower away at the end of the summer while they hibernate. Unlike some of the other cities in Israel that are built on sands and are just collection of houses, Haifa is a city that was built on a ridge that contains a collection of hills, with a lot of trees and flowers. There is a balance between the green and the grey. However, the green parts are getting smaller. When I was a child the ridge was much greener, the house that we live in now was a pine forest, some pine trees still remain among the houses, it is important to leave some parts of the city as nature reserves that will stay as they are. Sometimes the city leaders give in to greedy constructors, and the people of Haifa are fighting in multiple fronts against it; all we could do is to join the people who participated in this excursion, but it seems that there is no way to stop the new neighborhood construction.

Wednesday, March 17, 2010

Participating in entrepreneurship panel

Today was a rare event where people could see me walking around with a tie, this happened due to the fact that I was invited to participate in a panel in a big conference that was organized by the entrepreneurship institute at the Technion, which also organizes a students' entrepreneurship competition. I was invited to participate in a panel that dealt with "entrepreneurship everywhere", which included people who initiated various activities in various areas. I have shared with the audience some of my experience as intraprenuer. I heard that my usual kind of talk which involves humor and some provocative statement, was well accepted, although, as usual, some people were not happy from things I've said.
One person felt strongly that I really don't belong to panel about entrepreneurship, as a real entrepreneur is a person who takes person risks by investing his own money, the panel moderator did not agree with her. Well, I have never claimed to be entrepreneur, but I think there are some similarities and also some differences between entrepreneur and intraprenuer. Another person who did not like what I said approached me when I was leaving. When talking about my background I told the audience that I have been a faculty member in the Technion, and did not feel I have enough impact on the world by just writing papers, he said that there were people who had great impact on the world by just writing papers, like Einstein. I agreed with him and said that I have talked about the fact that I did not think that I can have enough impact on the world by just writing papers, and he said that I should have emphasized that it is possible to have great impact on the world by just writing papers. This is of course true, though in the area of computer software is not that easy.
When asked what is the most important advice for those who want to succeed in achieving things within large organizations, I quoted the rule: Remember it is easier to ask for forgiveness than for permission.

Saturday, March 13, 2010

On events versus data

The word "data" always reminds me of the android from Star Trek The Next Generation whose name was data. The word data (in computing) typically is very general and refers to anything the is represented on digital media, the picture of data above is also a piece of data, like many other things. The word "event" also has a broad term which means something that happened.

Recently Paul Vincent wondered in his Blog about the difference between event and data, as some people think that events are footnotes to data. Since by the definitions above, obviously event and data are not really the same, I'll try to talk about the touch points among them, since those are the reason of misconceptions.

There are various touch points between events and data:

  1. Event representation contains data. Event is represented in the computing domain by "event object" or "event message" which usually is also is called "event" as a short name. This event representation includes some information about -- what is the event type, where it happened, when it happened, what happened, who were the players etc... Example: the event is "enter to the building", the event's payload contains information that answer questions such as: what building? who entered? when ? and maybe more. The payload of the event is data, it may be stored (see event store), or just pass by the system.
  2. Data store can store historical events. Event representations can be accumulated and stored in a data store, for further usage. There are large data stores that collect weather events. Note that in order to navigate in historical events, these events may be stored in a temporal database an area that I've dealt with in the past, sometimes if the events are spatial then it have to be stored in spatiotemporal database.
  3. Database can be event producer. In active databases the event were database operations; insert, modify, delete and retrieve, in this case the fact that some data-element has been updated or accessed is the "something that happens" (which may or may not reflect something that happens in reality), and the database acts as event producer and emits event for processing by an event processing network. Note that actually all event producer contains some data that is turned into event, for example transaction instrumentation like what IBM has done in CICS as event producer.
  4. Derived events as database updates. An event processing application take events from somewhere as input, does something, and creates derived events, and send them somewhere, this is all event processing is in one sentence, a derived event created in this process may go to an event consumer, the event consumer may be a DBMS or another type of consumer whose action is to update some data store.
  5. Event enrichment by data during the event processing. During the event processing operations, sometimes enrichments of events is requested, let's return to the event of a person enters a building, the event processing application deals with security access control, and needs to know what is the person security clearance, this information is not provided with the event which provides only identification of the person, and there need to be some enrichment process in which an enrichment event processing agent accesses some global store, in this case reference data, to extract the clearance value and put it inside the event for further processing.
Thus the main issue is not the "versus" issue but the various relationships between the two terms.

Thursday, March 11, 2010

On automatic translation

A known urban legend about automatic translation is that an automatic translation program got as an input the phrase "the spirit is willing but the flesh is week" and translated it from English to Russian and then back to English, the end result was "the vodka is good but the steak is lousy", there are some translation pearls collected all over the Web. I am using automatic translations from time to time, mainly since my good friend Rainer von Ammon has a habit of forwarding me Emails and documents in German, the automatic translation programs I can find on the web are not that good, but I can understand more or less what is written. However, last night I had my moment of loud laughing. While searching the Web for something using the almighty Google search, I came across a webpage written in Hebrew, I realize that most of the Blog readers don't read Hebrew so I'll summarize the reading experience: first -- it looks like a collection of words in the wrong order and syntax that does not make any sense, second --- looking closer I realized that I actually wrote it, well - it is not that I forgot how to write in Hebrew, on the contrary, my Hebrew is still much better than my English, but it seems that it is supposed to be a translation to Hebrew of a Blog posting I have written in English in January 2009. Trying to get to the bottom of it, I've found that there is a site called the "Unix and Linux form" which copied some of my Blog posting (not sure in what context) using some crawler that is called "Linux Bot", it seems that it did not just copy it, but also translated it to Hebrew. Since Hebrew is not the most popular language in the universe, I wonder to how many other languages it is translated, and if somebody is making any quality control. Funny.

Wednesday, March 10, 2010

Revisiting race condition with FFD example

In the past I have written about race conditions and this triggered some responses. We recently realized that in the example we created for the EPIA book (the Fast Flower Delivery and has got already around ten different implementations, six of them can be viewed on the book's webpage, some more will be added) there is an case that if will not be handled carefully may yield wrong results due to race conditions. Here is the case:

There is an aggregate EPA per driver and day that collects assignment events for a driver and in the end of the day creates a derived event which counts the number of assignment per driver, there is a second EPA per day that collects all the drivers count at that day and calculates mean and standard deviation for the number of assignments per active drivers in that day; there is a third EPA, again per driver and day, which gets the derived events from the first two EPAs and calculate for each driver its deviation from the mean, in standard deviation units. These three EPAs are all aggregation type EPA which has some order among them, until now -- no problem. Now, the issue is that all these calculations occur at the end of the day, and have causal dependencies. If we are not careful, the first EPA calculates the count per driver at the end of the day, but until it finishes the calculation the time is say, 12:01, so the result is classified to the next day, but it is required to calculate the statistics for this day, and then if it gets into the statistics of the next day, then we get some inconsistency in the system. Obviously a naive implementation will get wrong results here. There are various ways to handle it and ensure correctness, however the main issue is whether the developer needs to be aware of it while designing the application, or the compiler that takes the definition of these EPAs and creates the actual implementation should be the one which will do the job. My opinion is that if the developer will have to take care of such things in hard coding, the life will be quite difficult, as this is only one case of race condition, and it is better that it will be transparent to the developer. This will eat the cake and have it too --- both using high level tool that makes the programming easier and lower the total cost of ownership, and fine tune the semantics in a way that require typically dedicated, and even complicated programming. More about other aspects of semantic fine tuning - later.