Mobile menu

Help concerning bilingual aligned texts (Corpus)
Thread poster: GabLuz
GabLuz
Local time: 23:29
English to Portuguese
Jul 17, 2007

Hi, folks.

I have about 60 Gb of txt file texts to be aligned (english to portuguese, portuguese to english, japanese to english, japanese to portuguese) and I'm really thinking about indexing all these stuff but I don't know how I should proceed!
I tried Google Desktop but it's really annoying because it doesn't show "exactly" what I need and how I need it.
The best solution I've ever could think of was a Corpus system such COMPARA but I want to make my own Corpus database and offline. I'm trying my best but I can't find an useful one.

I really want something like this: http://www.linguateca.pt/COMPARA/Welcome.html

Here I just have to type the text and all results are displayed. That's quick and simple!

I already tried:
- Google Desktop (it works but it requires manual searching);
- mkAlign (almost there! but it really needs A LOT OF improving!);
And many others...

Yes, I have a really fast computer (for me, it's enough).

I got a Desktop PC with all these specs:
AMD Athlon 3200+, 512 Mb Ram, 250 Gb 7200 RPM HDD.

If anybody knows a similar tool, just let me know.
I hope I'm not destroying any forum rules.


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 04:29
Member (2004)
English to Slovenian
+ ...
two issues in one mail Jul 18, 2007

o aligning 60 Gb - my maximum so far has been a few Mb so I cant really tell...
o mining the aligned material (aka using the translation memory).

The important thing is to have the result (whatever its size) in TMX (i.e. Translation Memory eXchange) format, then you have the mining decoupled from the first part of the job.

In any case I have yet to hear of a TM application that can handle 60 Gb of stuff. Theres something in Canada/Nepean/Montreal (?) that goes in this direction. But I honestly cant remenber the name.

You can start with OmegaT and aligning tools accompanying it (see wikipedia for OmegaT).

Also good for aligning could be +Tools
http://www.global-tm.net/index.php?whichpage=plustools&lang=engb

Keep us posted

regards

Vito


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Help concerning bilingual aligned texts (Corpus)

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs