Corpus Analysis
Thread poster: Juan Martín Fernández Rowda

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 12:10
English to Spanish
+ ...
Aug 23, 2016

As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking mainly from a linguist’s perspective, the challenge is how can one make any sense of all of these millions of words? What do you do if you want to find out whether a corpus is good enough to be used in your MT system? How do you know what to improve if you realize a corpus is not good? How can you know what are the main topics covered in your corpus?

It’s unrealistic to try to understand your corpus by reading every single line or word.

Corpus analysis can help you find answers to these questions. It can also help you understand how your MT system is performing and why. It can even help you understand how your post-editors are performing.

I cover some analysis techniques and tips that I believe are useful and effective to understand your corpus better in this post:ín-fernández-rowda?trk=pulse_spock-articles


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

Corpus Analysis

Advanced search

Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for users! Save over 13% when purchasing Wordfast Pro through Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search