Thursday, March 11, 2010

On automatic translation

A known urban legend about automatic translation is that an automatic translation program got as an input the phrase "the spirit is willing but the flesh is week" and translated it from English to Russian and then back to English, the end result was "the vodka is good but the steak is lousy", there are some translation pearls collected all over the Web. I am using automatic translations from time to time, mainly since my good friend Rainer von Ammon has a habit of forwarding me Emails and documents in German, the automatic translation programs I can find on the web are not that good, but I can understand more or less what is written. However, last night I had my moment of loud laughing. While searching the Web for something using the almighty Google search, I came across a webpage written in Hebrew, I realize that most of the Blog readers don't read Hebrew so I'll summarize the reading experience: first -- it looks like a collection of words in the wrong order and syntax that does not make any sense, second --- looking closer I realized that I actually wrote it, well - it is not that I forgot how to write in Hebrew, on the contrary, my Hebrew is still much better than my English, but it seems that it is supposed to be a translation to Hebrew of a Blog posting I have written in English in January 2009. Trying to get to the bottom of it, I've found that there is a site called the "Unix and Linux form" which copied some of my Blog posting (not sure in what context) using some crawler that is called "Linux Bot", it seems that it did not just copy it, but also translated it to Hebrew. Since Hebrew is not the most popular language in the universe, I wonder to how many other languages it is translated, and if somebody is making any quality control. Funny.

1 comment:

Rainer v. Ammon said...

The funny thing is that MT was mainly initiated by IBM, already starting from 1950...

BTW: This wikipedia entry starts written in Germany, then after some paras is continued in English.

I did a project evaluating of German-Japanese translation using English as a switching language. Results were horrible.

Translations were distinguished between good-enough translations (just understandable) and high-quality translations which only work on the basis of special thesauri for special domains. Today it is usual to use Systran in order to translate documents in the automotive domain etc.

But the mentioned ALPAC report from 1966 is true until today. And without such thesauri translations are often funny stuff...