Story flagged by:
Last year, Microsoft’s speech and dialog research group announced a milestone in reaching human parity on the Switchboard conversational speech recognition task, meaning we had created technology that recognized words in a conversation as well as professional human transcribers.
After our transcription system reached the 5.9 percent word error rate that we had measured for humans, other researchers conducted their own study, employing a more involved multi-transcriber process, which yielded a 5.1 human parity word error rate. This was consistent with prior research that showed that humans achieve higher levels of agreement on the precise words spoken as they expend more care and effort. Today, I’m excited to announce that our research team reached that 5.1 percent error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year. A technical reportpublished this weekend documents the details of our system.
Switchboard is a corpus of recorded telephone conversations that the speech research community has used for more than 20 years to benchmark speech recognition systems. The task involves transcribing conversations between strangers discussing topics such as sports and politics.
We reduced our error rate by about 12 percent compared to last year’s accuracy level, using a series of improvements to our neural net-based acoustic and language models. We introduced an additional CNN-BLSTM (convolutional neural network combined with bidirectional long-short-term memory) model for improved acoustic modeling. Additionally, our approach to combine predictions from multiple acoustic models now does so at both the frame/senone and word levels.
Moreover, we strengthened the recognizer’s language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation.
Our team also has benefited greatly from using the most scalable deep learning software available, Microsoft Cognitive Toolkit 2.1 (CNTK), for exploring model architectures and optimizing the hyper-parameters of our models. Additionally, Microsoft’s investment in cloud compute infrastructure, specifically Azure GPUs, helped to improve the effectiveness and speed by which we could train our models and test new ideas.
Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. Microsoft’s willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. It’s deeply gratifying to our research teams to see our work used by millions of people each day.
Advances in speech recognition have created services such as Speech Translator, which can translate presentations in real-time for multi-lingual audiences.
Many research groups in industry and academia are doing great work in speech recognition, and our own work has greatly benefited from the community’s overall progress. While achieving a 5.1 percent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available. Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent. Moving from recognizing to understanding speech is the next major frontier for speech technology.
Story flagged by:
As language-industry professionals, we hear a lot about endangered languages and how the number of spoken languages keeps dwindling worldwide. But what about language writing systems? With roughly 6,000 languages throughout the world, there are surprisingly only about 120 to 140 written language scripts and alphabets. Many of these are disappearing as well.
What does it mean to the people who speak languages with dying writing systems? What happens when a new generation can no longer read its traditional script? And why do writing systems matter when language is essentially an oral process?
These are just some of the questions Renato Beninatto and Michael Stevens discuss with Tim Brooks on this week’s episode of Globally Speaking.
Tim is the founder of the Endangered Alphabets Project, an organization whose mission is to help preserve endangered cultures by using their writing systems to create artwork and educational materials.
His story is a fascinating one, and so are the many different ways writing can impact and preserve cultures. Topics include:
- Why writing can be viewed as a beautiful form of art.
- What are some of the languages whose writing systems are disappearing?
- Why is there a growing effort to revive traditional scripts?
- How can we help protect more writing systems from disappearing?
Listen to podcast >>
Issues related to gender, the workplace, and family are among the most important social and political concerns today. Language services – translation, localization, interpreting, and related tasks – are frequently seen as a female-dominated profession, but little data has been available to compare with other industries.
“Gender and Family in the Language Services Industry” is the first in a series of CSA Research reports dealing with gender and family issues among those who are employed in the language industry or who work with language services. It covers high level issues concerning men’s and women’s experience with the workplace and family/work-life balance. Based on 2,200 global responses, the report shines light on topics ranging from pay to personality to promotions.
The data from this research separates perception from reality.
- What region of the world has the lowest language services gender pay gap?
- Should gender balance be mandatory in the hiring process?
- Which gender has experienced discrimination – either positive or negative – based on specific personal characteristics?
- Who are more likely to reduce work hours, take time off, quit jobs, or turn down promotions to care for family members?
“Gender and Family in the Language Services Industry,” and subsequent reports, is available for free with registration.
Dowload the report at Common Sense Advisory >>
During the Cold War, Soviet leader Nikita Khrushchev made a statement to a group of Western ambassadors that was translated as “We will bury you.” Naturally, what was seen as a rude and bellicose remark by a top Soviet official speaking to foreign diplomats made headlines, and it exacerbated tensions between the rival Eastern and Western blocs. But what Khrushchev actually said was slightly different.
ELSPEET, THE NETHERLANDS, AUGUST 10th 2017 – Pieter Beens, freelance translator and owner of Dutch translation company Vertaalt.nu, introduces xl8 review. This new review project focuses on products that will bring health and productivity improvements for translators. The project initially starts with a monthly review, but inventors and manufacturers are already eager to participate.
“xl8 review is a great new way for translators to look at innovations that can improve their lives”, says Pieter Beens. He started the project out of curiosity, initially reviewing books on his business blog. Combining his interest in product innovations and review experience for various newspapers and magazines, he decided to bring out xl8 review to specifically focus on products that can be of use for translators. “Every year many tools and products are introduced to improve our lifes, but it is up to xl8 review to prove what they are worth.” The success of the new series of product reviews is already indicated by a huge list of inventors and manufacturers wanting to have their products reviewed, says Beens. “I have a book scanner, innovative flower pot for offices and hydration bottle among others. Manufacturers are really interested in having their products tested for the specific translation industry.”
The first review is to be published in a couple of weeks, and a new review will be added each month afterwards. Beens: “My initial plan is to publish monthly, but the long list makes it almost essential to increase the frequency.” xl8 reviews will be posted on The Open Mic as well. ProZ.com has also shown an interest in useful reviews for translators. The xl8 review project will therefore have a reach of tens of thousands of translators.
Facebook announced this morning that it had completed its move to neural machine translation — a complicated way of saying that Facebook is now using convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to automatically translate content across Facebook.
Google, Microsoft and Facebook have been making the move to neural machine translation for some time now, rapidly leaving old-school phrase-based statistical machine translation behind. There are a lot of reasons why neural approaches show more promise than phrase-based approaches, but the bottom line is that they produce more accurate translations.
Traditional machine translation is a fairly explicit process. Relying on key phrases, phrase-based systems translate sentences then probabilistically determine a final translation. You can think of this in a similar light as using the Rosetta Stone (identical phrases in multiple languages) to translate text.
In contrast, neural models deal in a higher level of abstraction. The interpretation of a sentence becomes part of a multi-dimensional vector representation, which really just means we’re trying to translate based on some semblance of “context” rather than phrases.
It’s not a perfect process, and researchers are still tinkering with how to deal with long-term dependencies (i.e. retaining understanding and accuracy throughout a long text), but the approach is incredibly promising and has produced great results, thus far, for those implementing it.
Google announced the first stage of its move to neural machine translation in September 2016 and Microsoft made a similar announcement two months later. Facebook has been working on its conversion efforts for about a year and it’s now at full deployment. Facebook AI Research (FAIR) published its own research on the topic back in May and open sourced its CNN models on GitHub.
“Our problem is different than that of most of the standard places, mostly because of the type of language we see at Facebook,” Necip Fazil Ayan, engineering manager in Facebook’s language technologies group, explained to me in an interview. “We see a lot of informal language and slang acronyms. The style of language is very different.”
Facebook has seen about a 10 percent jump in translation quality. You can read more into the improvement in FAIR’s research. The results are particularly striking for languages that lack a lot of data in the form of comparative translation pairs.
Source: TechCrunch, article by John Mannes posted on 3 August 2017 – Read the original at:
Just a reminder that voting in the 2017 ProZ.com community choice awards ends when July does, so if you haven’t cast your votes yet, don’t wait!
Story flagged by:
Most marketers are unsatisfied with the way their teams are localizing branded content for different markets, yet fail to prioritize this need in their budgets, a new CMO Council report revealed.
According to the report, 63% of marketers feel they’re “not doing well at all,” “need improvement,” or “getting better” when asked how effectively they adapt, modify and/or localize branded content for different markets, audiences, partners, and geographies. Just 33% rated themselves high, saying their organizations are “very advanced in this area” or “doing well.”
Despite the clear need to localize – with 50% of marketers saying it’s essential to business growth and profitability – most marketing teams simply do not have the budget to execute their goals. As high as 75% said they are spending 10% or less of their budgets on localization efforts.
Partnering with HH Global, the CMO Council released its “Age of Adaptive Marketer” report where it detailed the findings of a poll conducted among 150 marketing executives in a range of industries during the second quarter of 2017. The report included comments from the top management of US-headquartered companies Pepsi, Chobani, and Starwood Hotels and Resorts.
As consumers increasingly expect brands to engage with them in the most relevant ways, almost half of survey respondents cited localization demands – including language, cultural values, and other sensitivities – as the top factor “putting pressure” on marketing teams to more effectively deliver branded content at scale.
But at the same time, ensuring that content is properly localized (34%) without diluting the brand’s overall identity (43%), as well as shorter lead times and deadlines (47%) are among the biggest challenges for marketers.
“In today’s day and age, there is an expectation that customer experiences happen in total context to the consumer, yet localization – whether it’s around the globe or around the corner – is still a far-off goal for far too many organizations,” the CMO Council noted in its report.
Read more >>
Story flagged by:
Acclaro is excited to announce the general availability of the My Acclaro translation management platform. This SaaS based platform provides clients instant access to their translation work as well as options to directly connect content to Acclaro’s translation environment via API, cloud and CMS(Content Management System) integrations.
With instant, at-a-glance access to a translation management dashboard, users will be able to create orders and request quotes, get up-to-the-minute translation statuses, pick up files when translations are completed, communicate with their dedicated project team, and track their translation budgets.
“We’ve received very positive feedback over the last several months from users who have been working within a fully functional, pre-launch version of My Acclaro. I am confident that new users will be impressed with My Acclaro’s capabilities including its ease of use and integration to content management tools,” said Michael Kriz, Acclaro’s founder and CEO.
A key feature of My Acclaro is the ability to connect and share content via popular web publishing and cloud storage tools such as Dropbox, Box, Zendesk, Hubspot, WordPress, Drupal, Craft CMS and Adobe Experience Manager eliminating the cost and errors typically associated with manual exports or copy and paste.
“We’ve made sure companies can establish seamless content integrations between their environments and Acclaro’s translation management platform and teams of professional linguists,” Kriz said. ”The transparency, productivity and connectivity available through the My Acclaro translation management platform results in faster turnaround times and lower costs with the same high quality translation services – all benefits that are increasingly vital in a competitive global economy.”
Read more >>
Story flagged by:
With emojis increasingly showing up in everything from ad campaigns to legal cases, a clear understanding of what the symbols mean — especially across different cultures — has become an in-demand skill. So much so that last year, global firm Today Translations placed an ad for the position of “emoji translator.”
Following a months-long application process that included emoji tests and the drafting of an emoji handbook, the position was given to Irishman Keith Broni.
His qualifications encompass far more than frequent texting. Broni just completed his master’s in business psychology at University College London, where his dissertation was on “the influence that emojis can have in digital context when brands are using them to communicate with potential consumers.”
VICE News spoke to Broni about what exactly an emoji translator does, the problems that can arise when brands use emojis, and how to manage the ever-changing definitions of emojis and their varying uses across cultures.
View video >>
Story flagged by:
Media content localization across Europe, Middle East and Africa (EMEA) is expected to increase from USD 2bn this year to USD 2.5bn before 2020, according to research conducted on behalf of the Media & Entertainment Services Alliance (MESA) Europe.
According to MESA, media content localization involves preparing TV, film and video titles ready for global distribution. Jim Bottoms, MESA Europe’s Executive Director, told Slator the market is covers subtitling, dubbing, video localization, and access services. Dubbing currently accounts for 70% of total spending, according to the MESA Europe report.
“There is a huge demand for content,” says Bottoms. “Some of it is new release, but a lot of it is is catalog or stuff that they thought would never sell again.”
Back catalog TV series and movie titles are finding new outlets and a new audience in regions where they haven’t been seen previously, as they are licensed by foreign channels to include in their programming to appeal to a particular demographic or age group.
“So, the program makers are suddenly finding that not only is there a huge demand for new release titles to go out to more and more markets. There is also a demand for getting some of their catalog product localized,” shares Bottoms.
MESA Europe noted that the strong growth in channels is also driven in part by so called over-the-top players OTT (i.e. content delivered over the Internet), which has opened up more opportunities for program makers to sell their titles into new markets.
Netflix, for one, ended the year 2016 with 93 million users, delivering about 150 million hours of streaming video per day. This was a year after the company announced the global rollout of its streaming service to 130 countries, which was previously available only in select countries. Amazon, meanwhile, made its Prime Video available in 200 countries in December 2016, competing head on with Netflix.
With the fast growing global demand for content, a shortage of talent has become one of the industry’s biggest challenges.
“Given the way the market is growing, there are already capacity shortages and this is likely to get worse in the short term,” explains Bottoms.
Of course, dubbing has been done for decades, but the current shortfall in talent is because of the massive growth as well as an indication that new talent isn’t coming through. As Bottoms points out, “In Germany in particular, the concern is that the talent is aging and perhaps younger people aren’t coming into the sector for whatever reason.”
Read more >>
Story flagged by:
From Publisher Perspectives: “Using his winnings from the International Dublin Award, translator Daniel Hahn has established his own new prize for emerging translators—and their equally overlooked editors.”
One good competition has led to another. On June 21, when author José Eduardo Agualusa’s A General Theory of Oblivionwas named winner of this year’s €100,000 (US$114,640) International Dublin Literary Award, the prize was split with translator Daniel Hahn.
The Dublin prize, now in operation for 22 years, is said to be the richest for a single novel published in English. When there’s a translator involved, the purse is divided, €75,000 going to the author and €25,000 to the translator. Having translated the book from the Portuguese, Hahn delivered Agualusa’s acceptance speech at Dublin’s Mansion House.
And then he took some of his own winnings and created a new award.
The TA First Translation Prize—”TA” for the UK’s Translators Association—is so new that it hasn’t yet been added to the Society of Authors list of other translation prizes the society administers. Antonia Lloyd-Jones, who is joint chair of the Translators Association, calls it “a ground-breaking addition to the world of literary translation. By encouraging talented new translators, as well as visionary editors, it will increase the range of great literature that’s available in translation, and strengthen the relationships between publishers and translators.”
Visionary editors? Yes, the prize’s £2,000 (US$2,570) purse will have something in common with the Dublin award, the Man Booker International Prize, and a few others. Just as those prizes are split between author and translator, the award Hahn has endowed will be split—equally, as the Booker does it—between a first-time translator and her or his editor.
In a conversation with Hahn from between Brighton and Lewes in Sussex, what comes across is that translators—at times overlooked and underappreciated in the industry–have learned the hard way how important it is to share recognition. And his prize honors new translators, Hahn says, because breaking into the business is so difficult without recognition.
“There’s a kind of bottleneck,” Hahn says. “If you’re a publisher and you want to commission a translation from Portuguese, you’ll ask Margaret [Jull Costa] and you’ll ask Alison [Entrekin], and if they can’t do it, you’ll ask one or two other people and,” he says wryly, “you might then ask me. But a new translator of Portuguese has relatively little odds of getting in because there’s a queue of people who are through the door already.”
And Hahn didn’t want to stop with his prize’s recognition of a new translator. “It’s funny,” he says, “we translators complain about not being sufficiently visible in our work—and I think it’s a legitimate complaint—but nobody thinks about editors. And not just for the acquisition and commissioning but for the actual editing. Something on behalf of our profession that recognizes that profession is important.”
Think about how many prizes you’ve encountered that honored editors. Right.
“If I’m a better translator than I was 10 years ago,” Hahn says, “it’s because I’ve been edited well.”
More on “Daniel Hahn on Translation, Awards, and Dodging Oblivion”, Publisher Perspectives.
Story flagged by:
Trailers for season 7 of Game of Thrones have supporters of the various in-universe character factions on tenterhooks. Meanwhile, more dedicated and geeky fans of Game of Thrones might be able to appreciate a new option being offered by language translating app Duolingo: learning Valyrian.
Unlike English, High Valyrian uses an aorist tense, similar to Ancient Greek and Sanskrit. David J. Peterson, the linguist who created the Dothraki and Valyrian languages for the TV series, worked on the Duolingo course, so you can be assured any dragon-training commands you learn will be effective.
Peterson created the language mostly from scratch, constructing the grammar around the two key phrases used in George R.R. Martin’s A Song of Ice and Fire books: “Valar Morghulis” (“All men must die”) and “Valar Dohaeris” (“All men must serve”).
The language, which has been in Duolingo’s “Incubator” for the last several months, has now been released in beta.
Story flagged by:
Arabic-Spanish translator Shadi Rohana is judging the Arabic portion of a rotating translation contest:
The contest was begun by a Mexico-based group, with each choosing a poem from the language they work in, presents it, and then solicits submissions. Rohana chose Mona Kareem’s poem “حدود.”
“Like many other poets she is also a translator,” Rohana said over email, “and and I chose ‘حدود‘ because it’s my favorite poem in her recent book.”
The contest open until end of July, and the Spanish translation of the poem should be sent directly to concurso1×email@example.com. There are no special requirements; the contest is open to all.
“The winner will be announced on the website,” Rohana said, “and there might be some ceremony involved in December—we’re still not so sure. This year we had concursos in Russian, Catalan and Mayan language. We are also trying to see what kind of prizes we can give, but for now what’s certain is that the winner will be published in places like Periódico de poesía, which is a poetry magazine that belongs to the UNAM (the National Autonomous University of Mexico) and another magazine called Sin Fin, and both are part of the organizers.”
The contest began two years ago as a translator-led initiative, and thus far it has featured poems from the German, Chinese, Italian, Japanese, Hebrew, Portuguese and modern Greek. Rohana added that, “Special emphasis is put on Mexican indigenous languages and there have been participations in Náhuatl, Ayuuk, Miixteco and Zapoteco. The idea is to celebrate literary dialogues, translation, poetry, poetry translation and linguistic diversity.”
Story flagged by:
I’m here with Dame Wendy Hall, Kluge Chair in Technology and Society, Regius Professor of Computer Science at the University of Southampton and early pioneer in web protocols; with Alexandre Loktionov, AHRC Fellow at the Kluge Center and an expert on hieroglyphic and cuneiform legal texts; and with Jessica Lingel, Kluge Fellow, assistant professor at the Annenberg School for Communication at the University of Pennsylvania and an expert on social media.
We ventured into talking about emoji and social media during a hallway conversation and thought it would be fun to pursue this further via blog.
The text of our Google Docs conversation was edited for length and clarity.
DT: There is much to explore, but it began with emoji, so let’s start there: elevated art form or corruption of language?
AL: For me, they’re essentially hieroglyphs and so a perfectly legitimate extension of language. They’re signs which, without having a phonetic value of their own, can ‘color’ the meaning of the preceding word or phrase. In Egyptology, these are called ‘determinatives’ — as they determine how written words should be understood. The concept has been around for 5,000 years, and it’s remarkably versatile because of its efficiency. You can cut down your character count if you supplement words with pictures — and that’s useful both to Twitter users today and to Ancient Egyptians laboriously carving signs into a rock stela.
DT: How does everyone feel about using emoji to write literature? The Library of Congress acquired an emoji version of none other than “Moby Dick” just a few years ago.
AL: I think you can definitely write literature with emoji — the question is, who will be able to read it? Do we have enough standardization in sign deployment? I think a full emoji dictionary/sign list would be necessary, unless, of course, we want to create a literature with multiple strands of interpretation (in a literal sense — where people see the same signs but interpret them in different ways).
JFL: I think part of it is about a fascination with how technology may be reshaping cultural production. I’m thinking of games around Twitter and literature, for example; the Guardian ran a challenge asking authors to write a story in 140 characters or less. (There’s a long and wonderful history of literature produced through challenges/games like these; I’m thinking of Shelley and Hemingway.) At the root, I think, is an anxiety around what it means to make art and how technology is making art better or worse.
DT: I’m optimistic because I see technological innovations opening up the range of what is possible artistically — Gutenberg, and so forth. On the other hand, certain technological turns have been very specific in their application. Think of Morse code: incredibly useful in certain contexts, but unlikely that we will ever write a novel in Morse.
AL: I think that gets to the heart of it — we have to think of the purpose of the means of communication, and in the case of emoji, we as a culture need to decide what they are: do we want them to be a bona fide script with full capability, or are they just a tool reserved for very specific purposes (alongside conventional means of writing)?
JFL: I don’t know about Morse code novels, but Morse code poetry is definitely a thing.
AL: It’s also worth thinking about canonicity — can emoji become canonical, in a way in which originally purely utilitarian hieroglyphs could after several millennia? Are we in this for the long run?
DT: Right, will there ever be an emoji dictionary? Perhaps there is already?
WH: There is a crowd-sourced emoji dictionary. It’s not very helpful at the moment, but then, neither was Wikipedia initially.
Read more >>
Story flagged by:
The number of people employed in the translation and interpretation industry has doubled in the past seven years, and the number of companies in the industry has jumped 24 percent in that same time period, according to the ATA, citing data from the Department of Labor. Through 2024, the employment outlook for those in the business is projected to grow by 29 percent, according to the Bureau of Labor Statistics.
“As the economy becomes more globalized and businesses realize the need for translation and interpreting to market their products and services, the opportunities for people with advanced language skills will continue to grow sharply,” said David Rumsey, president of the ATA, adding that the association predicts the largest growth is within contracted positions, giving workers and companies more flexibility.
While salaries within the industry vary, those who specialize in a difficult language can easily bring in six figures annually. The ATA helps connect freelance translators and interpreters with companies including Microsoft, Netflix and Honda, as well as government agencies such as the State Department and FBI, Rumsey said.
Philadelphia-based CETRA Language Solutions and companies like it work with about 1,000 independent contractors in translation services in any given year and recruit on a daily basis. And while there was once a fear that technology would replace humans in the process as demand for services increased, the opposite has happened — it’s enhanced their work.
“The overall industry is growing because of the amount of content out there — it’s increasing exponentially,” said Jiri Stejskal, president and CEO of CETRA. “Technology is helping to translate more content, but for highly specialized content, you need an actual human involved.”
But finding successful employment is about much more than just speaking multiple languages fluently. Translators who want to distinguish themselves as professionals have to continue to work and hone their skill sets, the ATA’s Rumsey said.
“It’s a lifelong practice, and it requires keeping up not only your language skills but your subject matter skills so that you really understand the industries and fields you are working in,” Rumsey said.
Read more >>
Story flagged by:
For the fifth consecutive year Independent market research firm Common Sense Advisory recognizes Lionbridge as the world leader in the growing, $43 billion global language services industry
Waltham, Mass. – July 12, 2017 — Lionbridge Technologies, Inc., announced today its official ranking as the largest language services provider (LSP) in the global translation, localization and interpreting industry. Issued July 2017 by independent market research firm Common Sense Advisory (CSA Research), the report titled “The Language Services Market: 2017” ranked Lionbridge as a top-grossing LSP in the US $43.08 billion global market for outsourced language services and technology.
As part of the study, the firm surveyed providers from every continent to collect actual reported revenue for 2015, 2016 and expected revenue for 2017. Lionbridge leads the industry due to its innovative language technology-platform, its global program management excellence and its trusted network of in-country translation and localization professionals.
More than 800 global brands rely on Lionbridge to manage their business-critical content, applications and communications across channels, platforms and languages.
CSA Research, which has published market size estimates and global rankings for the past 13 years, found that the demand for language services and supporting technologies continues and is growing at an annual rate of 6.97%, representing an increase over last year’s rate of 5.52%.
Read more >>
Story flagged by:
Machine translation – the task of automatically translating between languages – is one of the most active research areas in the machine learning community. Among the many approaches to machine translation, sequence-to-sequence (“seq2seq”) models [1, 2] have recently enjoyed great success and have become the de facto standard in most commercial translation systems, such as Google Translate, thanks to its ability to use deep neural networks to capture sentence meanings. However, while there is an abundance of material on seq2seq models such as OpenNMT or tf-seq2seq, there is a lack of material that teaches people both the knowledge and the skills to easily build high-quality translation systems.
Today we are happy to announce a new Neural Machine Translation (NMT) tutorial for TensorFlowthat gives readers a full understanding of seq2seq models and shows how to build a competitive translation model from scratch. The tutorial is aimed at making the process as simple as possible, starting with some background knowledge on NMT and walking through code details to build a vanilla system. It then dives into the attention mechanism [3, 4], a key ingredient that allows NMT systems to handle long sentences. Finally, the tutorial provides details on how to replicate key features in the Google’s NMT (GNMT) system  to train on multiple GPUs.
The tutorial also contains detailed benchmark results, which users can replicate on their own. Our models provide a strong open-source baseline with performance on par with GNMT results . We achieve 24.4 BLEU points on the popular WMT’14 English-German translation task.
Other benchmark results (English-Vietnamese, German-English) can be found in the tutorial.
In addition, this tutorial showcases the fully dynamic seq2seq API (released with TensorFlow 1.2) aimed at making building seq2seq models clean and easy:
- Easily read and preprocess dynamically sized input sequences using the new input pipeline in tf.contrib.data.
- Use padded batching and sequence length bucketing to improve training and inference speeds.
- Train seq2seq models using popular architectures and training schedules, including several types of attention and scheduled sampling.
- Perform inference in seq2seq models using in-graph beam search.
- Optimize seq2seq models for multi-GPU settings.
We hope this will help spur the creation of, and experimentation with, many new NMT models by the research community. To get started on your own research, check out the tutorial on GitHub!
See more >>
Story flagged by:
Today’s post is about the improvements in the field of terminology support for interpreters through computer-assisted interpreting (CAI) tools. InterpretBank is an example of such tools, it was developed as part of a PhD project and it uses IATE as one of its terminology sources. Our guest writer Claudio Fantinuoli (Johannes Gutenberg University Mainz in Germersheim) tells us all about it.
InterpretBank is a computer-assisted interpreting (CAI) tool originally developed at the Johannes Gutenberg Universität Mainz in Germersheim as part of a PhD research project. The objective of this project was to create a computer program to support professional interpreters during all phases of the interpreting workflow, from preparation to the act of interpreting. With the aim of improving interpreting quality especially in the context of specialised events, InterpretBank focuses on the creation and management of specialised glossaries as well as on facilitating terminology memorization and retrieval during interpretation.
InterpretBank implements the results of several years of research and the feedbacks of a growing number of users. The tool integrates automatic translation and high-quality terminology databases, such as IATE, to reduce the effort and the time involved in writing glossaries. During preparation, a memorization utility helps interpreters learning the event-related terms. While interpreting, intelligent algorithms allow the user to access relevant terminology quickly and without distracting the interpreter from his or her primary activity – translating between languages. Several independent studies have confirmed that the tool can contribute to increasing the overall interpreting quality. We have now taken a further step forward integrating Speech Recognition.
The interest for the emerging field of CAI tools is growing: InterpretBank is taught in a large number of universities and in dedicated seminars held by professional associations around the world. InterpretBank is the tool of choice not only of many professionals but also when it comes to empirical research in the field of translation technology. In Germersheim, for example, an ongoing PhD project is investigating cognitive load in simultaneous interpreting with the support of terminology management tools.
More information about the tool at www.interpretbank.com
Looking for a good book to read this summer? Something both insightful and entertaining?
We have an exclusive discount for readers of Proz.com news on The Ultimate Guide to Becoming a Successful Freelance Translator! Act now to get it at the price of a one-day sunbed rental – the deal is valid till the end of July. Just go to http://translatorsbook.com and apply the code “SummerDiscount” during checkout to get a 50% discount.
Topics within the book include:
• Skills and qualifications
• Finding and winning new clients
• Marketing tips for freelance translators
• How to handle some of the trickiest translation problems
There’s also a wealth of information beyond these subjects, including a comprehensive list of resources for translators.
If you’re interested in learning more, visit www.translatorsbook.com or our Amazon product page to see what The Ultimate Guide To Becoming A Successful Freelance Translator can do for you.
Once you’ve read the book, please do let us know your feedback.
Stay informed on what is happening in the industry, by sharing and discussing translation industry news stories.
I read the daily digest of ProZ.com translation news to get the essential part of what happens out there!
I receive the daily digest, some interesting and many times useful articles!
ProZ.com Translation News daily digest is an e-mail I always look forward to receiving and enjoy reading!