You may not know Luis von Ahn by name, but if you’re online, you’ve almost certainly encountered his work. Luis invented CAPTCHA and reCAPTCHA — those squiggly words and letters you often have to type into web forms to prove that you’re a human being.
Today, Luis launches his next big project: Duolingo.
It’s a website that aims to translate the entire web into every major world language, using volunteers. Nora interviewed Luis about Duolingo, and why he enjoys working on such large-scale crowdsourcing projects. An edited version of this interview will air on Spark 164, but right now, you can hear the full, uncut interview below, or download the MP3. [runs 19:59]
Play audio:
If you like hearing these extended interviews, why not subscribe to Spark Plus? It’s a podcast feed full of additional blog-only content like this. [Subscribe via RSS] or [Subscribe with iTunes]
Yeah, because if there's one thing you can do by crowdsourcing, it's mass out-of-context translation, a fragment at a time.
And you thought Google Translate came back with garbage.
He has a PhD in CS (specialized in human computation) and a proven track record of being able to solve complex tasks using "crowdsourcing". It's easy to be an armchair critic without knowing what you're talking about
I agree with lol, Luis makes a major conceptual mistake (and PhD people can make major conceptual mistake when they are talking about stuff they don't know very well). For him, the more massive the parallel corpus, the more efficient the system will become at translating. It's simply wrong. For one simple reason: the sheer volume of data can't make up for the computer inability to "think", i.e., among other things, to understand context, and if the machine cannot think (and we are far from building thinking computers), the machine CAN NOT translate consistently well. Period.
I see some translators ranting about his idea. I am not a native English speaker myself but write specialized? articles on a daily basis. My point is, do not be skeptical to change the way things are.
Just a note:
"If you like hearing these extended interviews, why not subscribe to Spark Plus? **You’ll get regular weekly episodes**, plus additional blog-only content like this."
*I thought this was no longer true?
You\’re right about that. Good catch. Making the change now. Thanks, Anson.
I find the idea intriguing and dropped my e-mail address into the sign up field. Could be fun.
Even though it isn't available for Italian, I'm going to try it out. Very curious about it.
I'm most interested in learning some of the most Asian language, but I suppose my French is a little rusty.
Part of me is wondering about context in this formula. For instance, I know some people, myself included, who like to eavesdrop on some foriegn language websites mostly staffed by hobbyists. There still is a language barrier, but I can still tell that what they are saying is riddled with all sorts of vernacular and inside jokes specific to the website. How would a crowd-sourced system be able to tell if and when these items occur?
Interesting idea and I think it could work. Here's why.
A few years ago for work, I had to have a product monograph translated from Spanish into English for an FDA submission. The only Spanish-speaker we had wasn't available for 10 days, and the first drafts of the submission were due in a week. It was four pages of sometimes technical/medical words and descriptions.
I don't speak any Spanish, so I translated it myself, using Altavista babelfish. As von Ahn said, the online translators don't work perfectly giving disjointed sentences and strange grammar and tenses. However, by translating the pamphlet from English back to Spanish, then back to English, then back to Spanish, it worked. Each iteration of the translation got a little bit closer and after several tries, it made sense.
I completed the submission on time, and a few days later had the services of the translator. Turned out this method worked, as there was only word different in his translation from the one by babelfish. And in checking, since the translator wasn't familiar with medical terminology, my translation of that word was the correct one.
ME GUSTA
Moi aussi, ça me plait! J'ai hâte qu'on ajoute le français à la liste des langues à traduire
i love free~! i'm gonna do-it! in my spare thyme.
This guy ROCKS!!!!!!!!!! He is thinking on all cylinders.