Corpus Analysis
Thread poster: Juan Martín Fernández Rowda

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 13:36
English to Spanish
+ ...
Aug 23, 2016

As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking mainly from a linguist’s perspective, the challenge is how can one make any sense of all of these millions of words? What do you do if you want to find out whether a corpus is good enough to be used in your MT system? How do you know what to improve if you realize a corpus is not good? How can you know what are the main topics covered in your corpus?

It’s unrealistic to try to understand your corpus by reading every single line or word.

Corpus analysis can help you find answers to these questions. It can also help you understand how your MT system is performing and why. It can even help you understand how your post-editors are performing.

I cover some analysis techniques and tips that I believe are useful and effective to understand your corpus better in this post:ín-fernández-rowda?trk=pulse_spock-articles


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

Corpus Analysis

Advanced search

PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search