Facebook is claiming that a new approach to machine translation using convolutional neural networks (CNNs) can help translate languages more accurately (read: increase quality on a BLEU scale) and up to nine times faster than the traditional recurrent neural networks (RNNs). CEO Mark Zuckerberg himself announced the news in his own Facebook page.
The company’s bold claims were anchored on results of a study conducted by five members of Facebook’s Artificial Intelligence Research (FAIR) team and outlined in detail in a paper entitled “Convolutional Sequence to Sequence Learning.” “To help us get there faster, we’re sharing our work publicly so that all researchers can use it to build better translation tools,” Zuckerberg said. Research authors Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin shared in an accompanying post on the Facebook developer blog that the FAIR sequence modeling toolkit (fairseq) source code and the trained systems are available under an open source license on GitHub.
Dr. John Tinsley, CEO & Co-Founder, Iconic Translation Machines Ltd., who reviewed the paper, told Slator that the results are impressive. “It’s quite a different approach, using convolutional neural networks (CNNs) as opposed to recurrent neural networks (RNNs). The reason this hasn’t been looked at for translation before is that CNNs typically work well with fixed-length input and RNNs with variable-length input. Obviously, with language, things are very variable so RNNs were the natural starting point,” he explained. His concern, though, is quality. But he observed that some of the shared task data reported are comparable to if perhaps a little better than existing approaches to Neural MT. “However, the single biggest impact of this work is the speed,” he said. “One of the current drawbacks of Neural MT is how long it actually takes to train the models, and this approach by Facebook using CNNs allows them to be trained up to seven times faster. This is because it’s much easier to parallelise the training process of CNNs given how they process different parts of the data (simultaneously as opposed to sequentially). That being said, it still requires powerful hardware.”
On the release of the source code, Dr. Tinsley said he approves of the open source approach. He said, “It’s good to see the tech behemoths taking this approach now and opening up their research to the wider community.”