Wordfast - Getting started, building TM from old translations
Thread poster: twintrad
twintrad
French to English
Jul 23, 2008

Hello
I work inhouse at a company and now need a CAT tool. I was looking into Wordfast and was wondering if there was a way to build TM from old translations (source/target versions)?
Thanks in advance
Melinda


Direct link Reply with quote
 

Gerard de Noord  Identity Verified
France
Local time: 01:58
Member (2003)
German to Dutch
+ ...
You'll need Wordfast's free little helper Jul 23, 2008

Have a look at +Tools:

http://www.wordfast.net/index.php?whichpage=plustools&lang=engb

http://www.wordfast.net/index.php?whichpage=knowledge&Task=view&questId=67&catId=15

Regards,
Gerard


Direct link Reply with quote
 

Susil  Identity Verified
Australia
Local time: 10:28
Member (2007)
Sinhala (Sinhalese) to English
+ ...
Wordfast query Jul 23, 2008

I suggest that you visit the website www.wordfast.com and forward your question.

They will definitely give a satisfactory reply/


Direct link Reply with quote
 
twintrad
French to English
TOPIC STARTER
Wordfast, getting started... Jul 24, 2008

Thanks both of you!
Melinda


Direct link Reply with quote
 
FarkasAndras
Local time: 01:58
English to Hungarian
+ ...
The best I know of... Jul 24, 2008

... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.)

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for all I know. Winalign isn't. But then even if the plustool aligner works well, it will only provide an alignment based on the wordfast segmentation of the documents. If the segmentation doesn't match with great reliability - and it won't - then you'll have an waful lot of correcting to do, because if one segment is off somewhere, everything after that will be out of alignment until you correct it. Hunalign may mis-align segments, but it automatically corrects the error further down the line.

Google hunalign, read the description on the site, and, for preprocessing, use the sentence boundary detector from here: http://www.statmt.org/europarl/v3/tools.tgz

It's command line so it won't do fancy graphics... bu then I prefer fancy performance to fancy graphics.

Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF.
All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


All of this requires what I consider fairly basic computer skills and some time investment. Nag me with questions if you need to (read manuals and google first).


If people are interested in the whole procedure I may write up an article about how I did it. Also, if someone has a large amount of material the'd like aligned and no computer skills or time, we may be able to work something out.

[Edited at 2008-07-24 13:11]


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 01:58
English to Czech
I am using hunalign and +Tools/+Align Jul 24, 2008

FarkasAndras wrote:

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis)

Basic workflow description:
you convert your files to txt,
run the europarl tool to chop it into sentences,
feed them to hunalign,
copy the output to excel,
make corrections,
delete unnecessary bits,
and insert tags to make a standard tmx file
(or wordfast translation memory) out of it,
copy to notepad, save and use in WF.

All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


My workflow description:

I convert/save my files to/as txt,
sometimes Extract them into sentences with Wordfast/Tools/Extract,
feed them to hunalign = I use short editable bat file.
I open the output in MS Word (Excel cell has limited size), convert text to table and I delete 3rd column with index.
I break/split the table to 100-pages files.
I run PlusTools/+Align and open one short file with table created with PlusTools to activate +Align menu, I open file for correction and close short file.
I make corrections (mostly split some segment and delete tildas),
I create Wordfast TM with button Create TM. I merge all created TMs.
-
I tested Hunalign without the bilingual glossary only with "null.dic" on all EU languages in pairs with Czech.

I thank to authors of Hunalign for this free tool.

Milan

[Edited at 2008-07-24 19:09]


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 01:58
English to Czech
Example of using Hunalign and PlusTools/+Align Aug 2, 2008

Milan Condak wrote:

I tested Hunalign without the bilingual glossary only "null.dic" with Czech.

Milan


Here is example of aligment EN text + CS (machine translation)

http://www.condak.net/tools/hunalign2/en/00.html

Milan


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Wordfast - Getting started, building TM from old translations

Advanced search


Translation news related to Wordfast





CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search