Wordfast - Getting started, building TM from old translations Thread poster: twintrad
|
Hello I work inhouse at a company and now need a CAT tool. I was looking into Wordfast and was wondering if there was a way to build TM from old translations (source/target versions)? Thanks in advance Melinda | | | Gerard de Noord France Local time: 02:19 Member (2003) English to Dutch + ... | Wordfast query | Jul 23, 2008 |
I suggest that you visit the website www.wordfast.com and forward your question. They will definitely give a satisfactory reply/ | | | twintrad French to English TOPIC STARTER Wordfast, getting started... | Jul 24, 2008 |
Thanks both of you! Melinda | |
|
|
The best I know of... | Jul 24, 2008 |
... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.) If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for a... See more ... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.) If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for all I know. Winalign isn't. But then even if the plustool aligner works well, it will only provide an alignment based on the wordfast segmentation of the documents. If the segmentation doesn't match with great reliability - and it won't - then you'll have an waful lot of correcting to do, because if one segment is off somewhere, everything after that will be out of alignment until you correct it. Hunalign may mis-align segments, but it automatically corrects the error further down the line. Google hunalign, read the description on the site, and, for preprocessing, use the sentence boundary detector from here: http://www.statmt.org/europarl/v3/tools.tgz It's command line so it won't do fancy graphics... bu then I prefer fancy performance to fancy graphics. Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF. All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through. Hunalign performs much better if you feed in a bilingual dictionary/glossary. All of this requires what I consider fairly basic computer skills and some time investment. Nag me with questions if you need to (read manuals and google first). If people are interested in the whole procedure I may write up an article about how I did it. Also, if someone has a large amount of material the'd like aligned and no computer skills or time, we may be able to work something out.
[Edited at 2008-07-24 13:11] ▲ Collapse | | | I am using hunalign and +Tools/+Align | Jul 24, 2008 |
FarkasAndras wrote: If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF. All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through. Hunalign performs much better if you feed in a bilingual dictionary/glossary. My workflow description: I convert/save my files to/as txt, sometimes Extract them into sentences with Wordfast/Tools/Extract, feed them to hunalign = I use short editable bat file. I open the output in MS Word (Excel cell has limited size), convert text to table and I delete 3rd column with index. I break/split the table to 100-pages files. I run PlusTools/+Align and open one short file with table created with PlusTools to activate +Align menu, I open file for correction and close short file. I make corrections (mostly split some segment and delete tildas), I create Wordfast TM with button Create TM. I merge all created TMs. - I tested Hunalign without the bilingual glossary only with "null.dic" on all EU languages in pairs with Czech. I thank to authors of Hunalign for this free tool. Milan
[Edited at 2008-07-24 19:09] | | | Example of using Hunalign and PlusTools/+Align | Aug 2, 2008 |
Milan Condak wrote: I tested Hunalign without the bilingual glossary only "null.dic" with Czech. Milan Here is example of aligment EN text + CS (machine translation) http://www.condak.net/tools/hunalign2/en/00.html Milan | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Wordfast - Getting started, building TM from old translations Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |