Corpus Analysis
Thread poster: Juan Martín Fernández Rowda

Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 06:59
English to Spanish
+ ...
Aug 23, 2016

As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking mainly from a linguist’s perspective, the challenge is how can one make any sense of all of these millions of words? What do you do if you want to find out whether a corpus is good enough to be used in your MT system? How do you know what to improve if you realize a corpus is not good? How can you know what are the main topics covered in your corpus?

It’s unrealistic to try to understand your corpus by reading every single line or word.

Corpus analysis can help you find answers to these questions. It can also help you understand how your MT system is performing and why. It can even help you understand how your post-editors are performing.

I cover some analysis techniques and tips that I believe are useful and effective to understand your corpus better in this post:

https://www.linkedin.com/pulse/corpus-analysis-part-i-juan-martín-fernández-rowda?trk=pulse_spock-articles


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Corpus Analysis

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search