Sunday, June 5, 2011

On the countrywide database seminar, Jeff Ullman and MapReduce

Today I took a long train drive to the south part of Israel, to attend the countrywide database seminar.  We used to have such seminars in the past, but they somehow disappeared, and Ehud Gudes from Ben-Gurion University took the initiative and hosted us today.   The program included various local speakers, some of them graduate students which got 18 minutes to give a talk.   

While the database community has significant intersection with the event processing community, still, many of the database folks are quite unfamiliar with the basics,  some of the questions I was asked is about -- why do you have to re-invent the wheel, where everything can be done using triggers in databases.    My answer was that everything could also be done with programming Turing machines, but it is a question of cost-effectiveness,  in fact the database area itself is also a set of abstraction over file systems, and tools to implement them efficiently.   This also goes for event processing.     There were some other questions about -- could you do it with X,  where X is quite diversified.   

The keynote speaker in this event has been Jeff Ullman,  who talked about MapReduce.

After explaining the principle idea of MapReduce, he spent half of the time talking about one  his favorite topics from the distant past, computing transitive closures with Datalog,  and the way that can be computed with MapReduce.  His reasoning for going back to Datalog was the need to compute path of links for search engines, Blog postings that respond to one another, and social networks in general.     I would not intuitively think on Datalog as a tool for that, but it was interesting.

Also interesting was Jeff's claim that the main benefit of MapReduce over other methods of parallel programming is the fact that if one task fails, it is possible to restart this task only, and not the entire process, which he claims to be a unique property of Mapreduce.   I am not an expert in parallel programming, but it will be interesting to verify this claim.

No comments: