Friday, October 5, 2007
Today's topic is about getting events from unstructured sources. Event has been defined as something that happens, and one of the questions - how do we know it happen. Some of the events are pushed (e.g. stock quotes), but many of the events in the universe, should be obtained by analysis of various sources of input. Video streams is one source - e.g. understanding from a video stream that somebody climbs a security fence, a stolen car just entered the highway, and a person whose driving licence has been revoked is driving. Some other interesting events can be obtained from news streams, even from Emails, and other type of unstructured texts.
I believe that the event processing of the future will deal not only with the events that are easy to get, since somebody pushes them, but in many cases, with events that are not easy to get, and not easy to realize that they happened, since it will involve analyzing different media streams, we can see some early implementation of this area, and I believe it has a big potential, good topics for theses, also.... have fun.
Wednesday, October 3, 2007
Looking at Bill Gassman's presentation in the Gartner EPS on BAM - he talks about the BAM market and partition it to:
- 65% - embedded inside vertical solutions
- 20% - embedded inside BPM (+ ESB) middleware/products
- 10% - embedded inside BI products
- 3% - embedded inside IT operations (e.g. BSM)
- 2% - general purpose BAM products
Gassman's interpretation of BAM is quite wide (I have somewhat narrower interpretation), and covers most of the EP types of applications, so let's take it as a starting point. Without commitment to the exact numbers, the order is consistent with my observation on this market. While the early adopters used stand-along engine, and built applications on top of it, this will become a relatively small route to market, and the segment that will grow most is that of EP technology embedded inside vertical solutions, we see signs of this is multiple industries now. The second largest segment is EP technology embedded inside middleware, we see that the big players in this area are taking this approach, the rational behind it is twofold - from the middleware point of view - EP capabilities is now becoming a must, due to competitive pressures, and from the ROI to customers POV - EP applications are typically not isolated, and the biggest investment is to connect them with the consumers/producers of events, thus application integration middlewares with adapters to multiple systems may assist. There will be always a market for a stand-alone event processing technologies, and this market can be segmented to "general purpose" engines, and optimized for special purpose ones..... I am not sure that what I am writing now reflects the current reality, but it certainly reflect the trend....
More on the role of general and specific frameworks - later.
Monday, October 1, 2007
Event stream: a linearly ordered sequence of events. Here we have two issues - one to distinguish between the "pipe" and the collection of events flowing on this "pipe", which is the same type of ambiguity we have when talking about "event" vs. "event message", however, we can tolerate such ambiguity, the second problem is in general the collection of such events may not be totally ordered, thus it does not conform with the definition of stream.
There are two possible solutions --- one to modify the definition of stream to be more general (I am not sure that the word stream inherently mean sequence, it is just happened to be the way it was implemented in a certain academic project), the other possibility is to invent a new name for the edge - like event pipe (or some other creative name). What do you think ?
I am also putting it as a comment on the glossary website.
I am taking this opportunity to suggest that anybody who wishes to make a terminology comment will do it soon, since we would like to close a first version in the next month or so.
More terminology issues - later.
Sunday, September 30, 2007
This reminds me that in the first conference I have ever organized: NGITS 1993 (we did not have conference webpages at those days) there was a discussion about the relations between Artificial Intelligence and Databases that followed the keynote address of John Mylopoulos, whom I always considered as one of the most visionary people I've ever met, John said something like this "the difference between AI and database discipline is that AI is a scientific discipline and database is an engineering discipline, which deals of efficiency issues", he, of course, made the database people who were present, quite angry, however, now that I am looking from the outside (at that time I have looked from the inside) on the way that database people think, I realize that he was, as usual, right.
While, high performance is one of the reasons that customers turn to COTS in this area, this is only the secondary reason, the main reason is that event processing software is being used is the level of abstraction they provide, and consequently the improvement in ROI. It seems also that the main competition between different products will be more in the ROI (ease of use) front, then in the performance front.
Event processing is different in the required functionality from database processing, the fact that database processing processes a state ("snapshot"), and event processing processes a set of transitions ("event cloud") impose different thinking, and hence different abstractions. Trying to introduce event pattern detection as extension to database processing (as we have seen in the EPTS meeting, the proposal being prepared now) have several attributes - simplicity is not part of them, and thus it totally misses the point of "ease of use", only to satisfy the assertion that event processing should be done within database processing. While these are nice academic attempts, and probably researchers will be able to write a lot of papers about the pattern extensions to SQL, I don't believe that they will catch in reality.
However - databases do have several roles in event processing, here are a few of them:
(1). Databases will be used to store events that should be used for retrospective processing. These database require to support temporal (or even spatio-temporal) characteristics; the database products don't provide yet good support of this area, and this deserves a separate blog.
(2). Databases (or in-memory databases) Will be used to store intermediate states for recoverability.
(3). Databases will be used to enrich events for processing (mainly reference data, but sometimes transaction data).
(4). Data warehouses will be used for embedded analytics.