Mobile menu

Help concerning bilingual aligned texts (Corpus)
Thread poster: GabLuz
Local time: 16:48
English to Portuguese
Jul 17, 2007

Hi, folks.

I have about 60 Gb of txt file texts to be aligned (english to portuguese, portuguese to english, japanese to english, japanese to portuguese) and I'm really thinking about indexing all these stuff but I don't know how I should proceed!
I tried Google Desktop but it's really annoying because it doesn't show "exactly" what I need and how I need it.
The best solution I've ever could think of was a Corpus system such COMPARA but I want to make my own Corpus database and offline. I'm trying my best but I can't find an useful one.

I really want something like this:

Here I just have to type the text and all results are displayed. That's quick and simple!

I already tried:
- Google Desktop (it works but it requires manual searching);
- mkAlign (almost there! but it really needs A LOT OF improving!);
And many others...

Yes, I have a really fast computer (for me, it's enough).

I got a Desktop PC with all these specs:
AMD Athlon 3200+, 512 Mb Ram, 250 Gb 7200 RPM HDD.

If anybody knows a similar tool, just let me know.
I hope I'm not destroying any forum rules.

Direct link Reply with quote

Vito Smolej
Local time: 21:48
Member (2004)
English to Slovenian
+ ...
two issues in one mail Jul 18, 2007

o aligning 60 Gb - my maximum so far has been a few Mb so I cant really tell...
o mining the aligned material (aka using the translation memory).

The important thing is to have the result (whatever its size) in TMX (i.e. Translation Memory eXchange) format, then you have the mining decoupled from the first part of the job.

In any case I have yet to hear of a TM application that can handle 60 Gb of stuff. Theres something in Canada/Nepean/Montreal (?) that goes in this direction. But I honestly cant remenber the name.

You can start with OmegaT and aligning tools accompanying it (see wikipedia for OmegaT).

Also good for aligning could be +Tools

Keep us posted



Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Help concerning bilingual aligned texts (Corpus)

Advanced search

Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
SDL Trados Studio 2015 Freelance
The industry-leading translation software used by over 200,000 translators.

SDL Trados Studio helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »

All of
  • All of
  • Term search
  • Jobs