How Skype Used AI to Build Its Amazing New Language Translator

Very soon now, a select group of Skype beta testers will have a new Microsoft technology that seems borrowed from the world of Star Trek. It’s called the Skype Translator—a Skype add-on that listens to the English words you speak into Microsoft’s internet phone-calling software and translates them into Spanish, or vice versa. As you […]

Very soon now, a select group of Skype beta testers will have a new Microsoft technology that seems borrowed from the world of Star Trek. It's called the Skype Translator---a Skype add-on that listens to the English words you speak into Microsoft's internet phone-calling software and translates them into Spanish, or vice versa.

As you can see from demos like the one below, it's an amazing technology, and it's based on work that's been going on quietly inside Microsoft's research and development labs for more than a decade. Microsoft is already using some of the text translation technology underpinning Skype Translate to power its Bing Translate search engine translation service, and to jump start the foreign language translation of its products, manuals, and hundreds of thousands of support documents. "One of the largest, published, untouched machine translation repositories on the internet is the Microsoft customer support Knowledge Base," says Vikram Dendi, strategy director with Microsoft Research.

The translation machine intelligence is only part of the story, though. Skype Translate takes the words you speak, converts them into text, translates that text, and then synthesizes them into spoken words in the language of the person on the other end of the call. And that voice recognition component---recognizing your speech and converting it into text---has long been the trickiest part of the equation.

But voice recognition has come a long way in the past few years, thanks in large part to a burgeoning field of artificial intelligence research known as deep neural networks. Neural networks have been around since the 1980s, but they're experiencing a renaissance. They hit the mainstream in 2012 when Google announced that it had used deep neural nets to recognize cat videos on YouTube. That work led directly to a big boost in Android's voice recognition software, but behind the scenes, Microsoft was quietly laying the groundwork for Skype translate.

>We published those results, then the world changed

Microsoft was tinkering with neural networks nearly a decade before Google's cat videos, to improve the handwriting recognition on tablet PCs, Microsoft researcher John Platt recently told WIRED. But the work that led to Skype Translate's most startling breakthrough---the ability to reliably recognize almost anybody's speech---began just before Christmas 2009, when Microsoft sponsored a mini-symposium on the technology in Whistler, British Columbia. The invited speaker, University of Toronto's Geoff Hinton, had developed a machine learning model that mimicked neurons in the human brain, gradually building a deeper and deeper understanding of things such as English speech.

Microsoft soon ponied up the funds so that Hinton's ideas could be tested out with the latest graphical processor units. The results were "stunning," a 25 percent boost in speech recognition accuracy, Microsoft Research head Peter Lee told us earlier this year. "We published those results, then the world changed," he said.

Now Microsoft is ready to change the world in earnest, offering people from completely different languages and cultures a kind of immediate, face-to-face, method of communication that has previously been the stuff of science fiction.

Vikram Dendi said that about 50,000 people had signed up for pre-beta access to Skype Translate before Monday, when the company announced that it was set to go into beta by year's end. A day later the waiting list has approximately doubled. "Because people are so excited about what this means to communications, it's exploding," he says.