Wordfast - Getting started, building TM from old translations
Thread poster: twintrad
twintrad
twintrad
French to English
Jul 23, 2008

Hello
I work inhouse at a company and now need a CAT tool. I was looking into Wordfast and was wondering if there was a way to build TM from old translations (source/target versions)?
Thanks in advance
Melinda


 
Gerard de Noord
Gerard de Noord  Identity Verified
France
Local time: 02:19
Member (2003)
English to Dutch
+ ...
You'll need Wordfast's free little helper Jul 23, 2008

Have a look at +Tools:

http://www.wordfast.net/index.php?whichpage=plustools&lang=engb

http://www.wordfast.net/index.php?whichpage=knowledge&Task=view&questId=67&catId=15

Regards,
Gerard


 
R.M. Susil Premaratne
R.M. Susil Premaratne  Identity Verified
Australia
Local time: 09:49
Member (2007)
Sinhala (Sinhalese) to English
+ ...
Wordfast query Jul 23, 2008

I suggest that you visit the website www.wordfast.com and forward your question.

They will definitely give a satisfactory reply/


 
twintrad
twintrad
French to English
TOPIC STARTER
Wordfast, getting started... Jul 24, 2008

Thanks both of you!
Melinda


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 02:19
English to Hungarian
+ ...
The best I know of... Jul 24, 2008

... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.)

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for a
... See more
... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.)

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for all I know. Winalign isn't. But then even if the plustool aligner works well, it will only provide an alignment based on the wordfast segmentation of the documents. If the segmentation doesn't match with great reliability - and it won't - then you'll have an waful lot of correcting to do, because if one segment is off somewhere, everything after that will be out of alignment until you correct it. Hunalign may mis-align segments, but it automatically corrects the error further down the line.

Google hunalign, read the description on the site, and, for preprocessing, use the sentence boundary detector from here: http://www.statmt.org/europarl/v3/tools.tgz

It's command line so it won't do fancy graphics... bu then I prefer fancy performance to fancy graphics.

Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF.
All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


All of this requires what I consider fairly basic computer skills and some time investment. Nag me with questions if you need to (read manuals and google first).


If people are interested in the whole procedure I may write up an article about how I did it. Also, if someone has a large amount of material the'd like aligned and no computer skills or time, we may be able to work something out.

[Edited at 2008-07-24 13:11]
Collapse


 
Milan Condak
Milan Condak  Identity Verified
Local time: 02:19
English to Czech
I am using hunalign and +Tools/+Align Jul 24, 2008

FarkasAndras wrote:

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis)

Basic workflow description:
you convert your files to txt,
run the europarl tool to chop it into sentences,
feed them to hunalign,
copy the output to excel,
make corrections,
delete unnecessary bits,
and insert tags to make a standard tmx file
(or wordfast translation memory) out of it,
copy to notepad, save and use in WF.

All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


My workflow description:

I convert/save my files to/as txt,
sometimes Extract them into sentences with Wordfast/Tools/Extract,
feed them to hunalign = I use short editable bat file.
I open the output in MS Word (Excel cell has limited size), convert text to table and I delete 3rd column with index.
I break/split the table to 100-pages files.
I run PlusTools/+Align and open one short file with table created with PlusTools to activate +Align menu, I open file for correction and close short file.
I make corrections (mostly split some segment and delete tildas),
I create Wordfast TM with button Create TM. I merge all created TMs.
-
I tested Hunalign without the bilingual glossary only with "null.dic" on all EU languages in pairs with Czech.

I thank to authors of Hunalign for this free tool.

Milan

[Edited at 2008-07-24 19:09]


 
Milan Condak
Milan Condak  Identity Verified
Local time: 02:19
English to Czech
Example of using Hunalign and PlusTools/+Align Aug 2, 2008

Milan Condak wrote:

I tested Hunalign without the bilingual glossary only "null.dic" with Czech.

Milan


Here is example of aligment EN text + CS (machine translation)

http://www.condak.net/tools/hunalign2/en/00.html

Milan


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Wordfast - Getting started, building TM from old translations







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »