Machine translation – the task of automatically translating between languages – is one of the most active research areas in the machine learning community. Among the many approaches to machine translation, sequence-to-sequence (“seq2seq”) models [1, 2] have recently enjoyed great success and have become the de facto standard in most commercial translation systems, such as Google Translate, thanks to its ability to use deep neural networks to capture sentence meanings. However, while there is an abundance of material on seq2seq models such as OpenNMT or tf-seq2seq, there is a lack of material that teaches people both the knowledge and the skills to easily build high-quality translation systems.
Today we are happy to announce a new Neural Machine Translation (NMT) tutorial for TensorFlowthat gives readers a full understanding of seq2seq models and shows how to build a competitive translation model from scratch. The tutorial is aimed at making the process as simple as possible, starting with some background knowledge on NMT and walking through code details to build a vanilla system. It then dives into the attention mechanism [3, 4], a key ingredient that allows NMT systems to handle long sentences. Finally, the tutorial provides details on how to replicate key features in the Google’s NMT (GNMT) system  to train on multiple GPUs.
The tutorial also contains detailed benchmark results, which users can replicate on their own. Our models provide a strong open-source baseline with performance on par with GNMT results . We achieve 24.4 BLEU points on the popular WMT’14 English-German translation task.
Other benchmark results (English-Vietnamese, German-English) can be found in the tutorial.
In addition, this tutorial showcases the fully dynamic seq2seq API (released with TensorFlow 1.2) aimed at making building seq2seq models clean and easy:
- Easily read and preprocess dynamically sized input sequences using the new input pipeline in tf.contrib.data.
- Use padded batching and sequence length bucketing to improve training and inference speeds.
- Train seq2seq models using popular architectures and training schedules, including several types of attention and scheduled sampling.
- Perform inference in seq2seq models using in-graph beam search.
- Optimize seq2seq models for multi-GPU settings.
We hope this will help spur the creation of, and experimentation with, many new NMT models by the research community. To get started on your own research, check out the tutorial on GitHub!