How can I pull out internal repetitions and internal fuzzies on WFP
Thread poster: jyuan_us

jyuan_us  Identity Verified
United States
Local time: 15:00
Member (2005)
English to Chinese
+ ...
Dec 6, 2015

I have been assigned to have a large document translated, for which I must have several translators share the task. They must be working simultaneously because of the time constraints.

More than 50% of the word count on the document are Repetitions, and a substantial amount of 95%-99% fuzzy matches exist, too. If I divide the file, say, into 4 pieces of equal length, the repetitions/95-99% fuzzy matches will spread over the 4 pieces. In other words, the translators will be translating the same sentences simultaneously.

My question is: is there a way to pull out the segments of repetitions/95-99% fuzzy matches into a separate document? If so, I will have one translator translate this document first, and have the TM ready for the other linguists when they translate their part of the file.

I tried to figure this out on the PM panel on WordFast Pro, and the only relevant tabs I can find there are "Exact Freq" and "Populate Freq". I used a 400 segment document (which doesn't contain a lot of repetitions) to do a test of "Exact Freq" but the resulting document contain 330 segments. So I guess this is not the way to do it.

Can you kindly explain what to do in this situation? Any solution will be highly appreciated. Thanks!


 

Dušan Ján Hlísta  Identity Verified
Slovakia
Local time: 21:00
English to Slovak
+ ...
I wonder Dec 6, 2015

why you accepted the work from such a "stupid" contractor, who doesn’t know that everything need its own time? We had during commie rat régime here in CSSR the following proverb: "We do impossible immediately, miracles within three days". It is the same stupid thinking valid for your contractor: or else "Crazy man crazy" Bill Haley....

[Edited at 2015-12-06 11:44 GMT]


 

Andrea Bauer  Identity Verified
Italy
Local time: 21:00
Member (2011)
Italian to German
+ ...
Extracting frequents Dec 6, 2015

Maybe this link can help: http://www.wordfast.com/WFP2/PM_Using_PM_plug-in/Extracting_frequents.htm

 

Dominique Pivard  Identity Verified
Local time: 22:00
Finnish to French
Not possible Dec 6, 2015

jyuan_us wrote:
I tried to figure this out on the PM panel on WordFast Pro, and the only relevant tabs I can find there are "Exact Freq" and "Populate Freq".

These would be your only "weapons" in the WFP arsenal. They wouldn’t work with fuzzies (whether from the TM or internal). I’m not aware of any other tool that would be able to extract internal fuzzies.

Your best bet would be to have all translators share the same TM in real-time, so they would get fuzzy matches from segments translated by other translators earlier on. This wouldn’t solve the issue of how to pay each translator in a fair way: for that, you would need to collect the TXML’s from all of them and (manually) figure out from them the split between match rates for each translator.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:00
Member (2006)
English to Afrikaans
+ ...
Just a thought Dec 6, 2015

jyuan_us wrote:
Is there a way to pull out the segments of repetitions/95-99% fuzzy matches into a separate document?


I haven't tried any of this.

For only exact matches and external fuzzies:

Pre-translate a copy of the file against the TM, and then use WFP to create a bilingual export of the file (PM Perspective > Bilingual Export). Open that export in MS Word, and you should be able to sort the table by score. You should be able to then copy the source column (minus the non-matching segments that you removed) into a new TXML file, which the translator can translate (to create a translated translation memory).

For internal fuzzies:

Copy the source column from a bilingual export of the file into a new MS Word file (either all segments or only the ones that had no matches in the previous step), and open it in WFP. Start with a blank TM, and set WFP to copy source to target if there is no match, and to automatically insert the best fuzzy match. Then run through the segments one by one (this is where an AutoIt script can be useful, if you want to get up and make some coffee while the script runs through the document). Then, do a bilingual export again. Again the match percentages will be in the one column, and you should be able to remove segments that had no matches.

I'm no expert at WFP, but I think you need to set the following two options in Edit > Preferences: Translation Memory > Write unedited fuzzy and exact matches to a TM, and Copy source on mo match in editor. While you're at it, lower the match threshold to 50% (you can decide afterwards what threshold you want to give your translator). You'll also have to use Alt+Insert on the first segment to get the ball rolling, after that you can just press Alt+down. WFP will ask for confirmation once, if you try to accept an unedited fuzzy match.

How many files do you have?

Samuel



[Edited at 2015-12-06 16:39 GMT]


 

B D Finch  Identity Verified
France
Local time: 21:00
Member (2006)
French to English
+ ...
Only 100% matches should be extracted Dec 7, 2015

An apparently very small difference between source segments can sometimes require very different translations in the target segments. If you extract fuzzies, you risk ending up with a bad translation. The idea of translators sharing a real-time TM is also a bad one, because the first translator of a segment might not be the one who produces the best translation. I simply do not accept TMs produced by somebody else as anything other than reference material, because I take responsibility for my own translation, not for other people's. Also, remember that a real-time TM has not yet incorporated proofreading changes and will almost certainly need revision. By all means share glossaries between translators (so long as they are free to criticise and amend those glossaries), but not TMs.


[Edited at 2015-12-07 17:13 GMT]

[Edited at 2015-12-07 17:14 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:00
Member (2006)
English to Afrikaans
+ ...
@Finch Dec 7, 2015

B D Finch wrote:
Only 100% matches should be extracted.


How would that help maintain consistency in the rest of the translations? Presumably the 100% matches all have a translation in the TM already, so all translators would have access to those segments anyway. Extracting those segments and giving it to one translator will not help.

An apparently very small difference between source segments can sometimes require very different translations in the target segments.


That is true, but usually a very small difference in the source text means that the one segment's translation can be re-used for the most part to translate the other segment.

If you extract fuzzies, you risk ending up with a bad translation.


I'm sorry, I have no idea what you mean. Extracting fuzzies will likely result in a better translation, if the translation is done by more than one person.

For example, if you have these six segments that you want to split between two translators:

1. The rain in Spain falls mainly on the plains.
2. The rain in Germany falls mainly in the forests.
3. In France there is also sunshine on the plains.
4. In the forests, the weather can be nice.
5. The rain in France falls mainly on the plains.
6. The rain in Belgium falls mainly in the forests.

Notice how segments 1, 2, 5 and 6 are all fuzzy matches of each other.

If you were to split the segments down the middle (segments 1, 2, and 3 to one translator, and segments 4, 5 and 6 to the other), then the first two and the last two segments may end up having divergent translations because they are translated by two different translators.

But... if you were to extract the fuzzies (namely, segments 1, 2, 5 and 6) and give them to one translator, and give the remainder of the text (segments 3 and 4) to the other, you'll have a far more consistent translation as a whole.


 

jyuan_us  Identity Verified
United States
Local time: 15:00
Member (2005)
English to Chinese
+ ...
TOPIC STARTER
More than 30 files Dec 8, 2015

Samuel Murray wrote:

How many files do you have?

Samuel



[Edited at 2015-12-06 16:39 GMT]


I have only one large file approved to proceed but I have quoted for more than 30 files and the repetitions/fuzzies exist across all of these files.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 21:00
Member (2006)
English to Afrikaans
+ ...
@jyuan Dec 8, 2015

jyuan_us wrote:
Samuel Murray wrote:
How many files do you have?

I have only one large file approved to proceed but I have quoted for more than 30 files and the repetitions/fuzzies exist across all of these files.


Oh dear, your troubles multiply...

Well, if it is useful to you to merge TXML files into a single file, then my TXML merge/split script may be useful to you. Note that the split function has been known to fail for other people, but merging should work.

http://wikisend.com/download/179542/TXML%20MS%20scripts.zip

The real problem, as I see it, is that you should find internal fuzzies from all 30 files, but at first only give segments from the first file to your first translator. This means that after you have generated a list of internal fuzzy segments from all files, you should remove all segments from it that do not occur in the first large file (otherwise the translator might end up translating segments that you do not get authorisation for in the end). If you are certain that you will definitely get authorisation to translate all 30 files, then this step is not necessary, of course.


 

Lori Cirefice  Identity Verified
France
Local time: 21:00
French to English
WFA? Dec 8, 2015

Could the translators share a TM over WFA or some other server-based TM solution, so that can team members can re-use repeated/fuzzy segments previously translated by another team member?

 

Dominique Pivard  Identity Verified
Local time: 22:00
Finnish to French
The point in sharing resources in a large project split between several translators Dec 8, 2015

B D Finch wrote:
The idea of translators sharing a real-time TM is also a bad one, because the first translator of a segment might not be the one who produces the best translation.

If a large project must be split between translators, you are going to face the same problem with translators of varying quality, whether or not they share the same TM in real-time. What happens when a "bad" translator translate a unique segment (= for which there is no internal fuzzy)? They may produce a "bad" (sub-par) translation that will go unnoticed. That unique segment may contain terms found in segments allocated to other translators. How can you ensure these common terms will be translated in a consistent way, if resources are not shared in real-time? With a shared TM, all translators working on the project can perform concordance searches (when encountering "difficult" terms): this will allow the "good" translators to spot poorly translated terms (by the "bad" translators), and also allow "bad" translators to adopt terms correctly translated (by the "good" translators). In addition to a shared TM, terminology could (and should) be shared as well: this means that any newly added term will be available to all translators, and again incorrectly translated terms can be spotted by the "good" translators. None of that will happen (during the translation process) if resources are not shared.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I pull out internal repetitions and internal fuzzies on WFP

Advanced search


Translation news related to Wordfast





Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search