Wednesday, March 10, 2010

Revisiting race condition with FFD example

In the past I have written about race conditions and this triggered some responses. We recently realized that in the example we created for the EPIA book (the Fast Flower Delivery and has got already around ten different implementations, six of them can be viewed on the book's webpage, some more will be added) there is an case that if will not be handled carefully may yield wrong results due to race conditions. Here is the case:

There is an aggregate EPA per driver and day that collects assignment events for a driver and in the end of the day creates a derived event which counts the number of assignment per driver, there is a second EPA per day that collects all the drivers count at that day and calculates mean and standard deviation for the number of assignments per active drivers in that day; there is a third EPA, again per driver and day, which gets the derived events from the first two EPAs and calculate for each driver its deviation from the mean, in standard deviation units. These three EPAs are all aggregation type EPA which has some order among them, until now -- no problem. Now, the issue is that all these calculations occur at the end of the day, and have causal dependencies. If we are not careful, the first EPA calculates the count per driver at the end of the day, but until it finishes the calculation the time is say, 12:01, so the result is classified to the next day, but it is required to calculate the statistics for this day, and then if it gets into the statistics of the next day, then we get some inconsistency in the system. Obviously a naive implementation will get wrong results here. There are various ways to handle it and ensure correctness, however the main issue is whether the developer needs to be aware of it while designing the application, or the compiler that takes the definition of these EPAs and creates the actual implementation should be the one which will do the job. My opinion is that if the developer will have to take care of such things in hard coding, the life will be quite difficult, as this is only one case of race condition, and it is better that it will be transparent to the developer. This will eat the cake and have it too --- both using high level tool that makes the programming easier and lower the total cost of ownership, and fine tune the semantics in a way that require typically dedicated, and even complicated programming. More about other aspects of semantic fine tuning - later.

No comments: