How can I change Tag Editor segmentation rules?
Thread poster: Aleksandr Okunev (X)
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 14:31
English to Russian
Sep 13, 2005

I have run into a problem. I translated DTP export with Wordfast excluding the tags as much as possible, the resulting bilingual RTF files and the TM are mostly English sentences and their Russian translations, with a very few internal tags. The rest has been left outside translation.

Now the client cleaned my RTF into a Trados TM, opened the TXT files in Tag Editor and ran 'Translate to Fuzzy". He got very few matches because Tag Editor segments differently, my major problem is th
... See more
I have run into a problem. I translated DTP export with Wordfast excluding the tags as much as possible, the resulting bilingual RTF files and the TM are mostly English sentences and their Russian translations, with a very few internal tags. The rest has been left outside translation.

Now the client cleaned my RTF into a Trados TM, opened the TXT files in Tag Editor and ran 'Translate to Fuzzy". He got very few matches because Tag Editor segments differently, my major problem is that it joins a lot of sentences into one segment, like this:
--------------------------------
Replace motor assembly ("Axis Motor Removal/Installation").<r> <SIZE 14> • <SIZE 10> Wiring is broken, shorted, or missing shield (Alarms 153-156, 175, 182-185).<r> <SIZE 14> • <SIZE 10> Dust in the motor from brushes has shorted out the motor (WE only) (Alarms 153-156, 175, 182-185).
---------------------------------
(there are scarier chunks)

It happens when the End-of-Sentence punctuation is followed by a "<" and also when a sentence ends with a number or a closing bracket.

Besides, TE sometimes includes Part Numbers into the segment:
------------------------
87-8756 Bushing
-------------------------
sometimes it leaves them out. The same with list item numbers (they are transferred from the DTP into the export TXT file.

I would like to change TE segmentation rules to make it segment by the sentence.

I tried to find appropriate settings, I asked a question on Trados Yahoo group and got no answer. Thus, my question is: are there any standard settings in TE allowing to instruct it to segment differently? The ones which could, of course, be replicated on my client's system.

Any advice will be very much appreciated.
Stay well,
Aleksandr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>
Collapse


 
Antoní­n Otáhal
Antoní­n Otáhal
Local time: 12:31
Member (2005)
English to Czech
+ ...
Segmentation rules Sep 13, 2005

In Translator's Worbench, go File/Setup, the Segmentation Rules tab and push the help button - there are some explanations, even if not always easy to understand. Some of your problems might be resolved with setting the "Trailing whitespace" option in the "FullStop" rule equal to 0 (zero); perhaps graying some other fields of the "Full Stop" rule would make the segmenting bevaiour a bit more predictable (but not necesssarily "better").

The inconsistency in segmentation bothers me a
... See more
In Translator's Worbench, go File/Setup, the Segmentation Rules tab and push the help button - there are some explanations, even if not always easy to understand. Some of your problems might be resolved with setting the "Trailing whitespace" option in the "FullStop" rule equal to 0 (zero); perhaps graying some other fields of the "Full Stop" rule would make the segmenting bevaiour a bit more predictable (but not necesssarily "better").

The inconsistency in segmentation bothers me a lot too, and I do not know what to do about it. There does not seem to be any hard rule or setting which would say: leave all starting/ending tags and numbers (and other things of translator's choice) out of the segments.

If I could set this, my life would be much easier, especially in "Word files open as .ttx by TagEditor" under Trados 7. Here one simple sentence preceded and followed with a lot of tags may be stored in several different ways in the TM, just because the segmentation of tags before and after the text is different in each of them (but the tags seem to be exactly the same).

Perhaps someone else knows what to do about it?

Antonin
Collapse


 
Harry Bornemann
Harry Bornemann  Identity Verified
Mexico
Local time: 05:31
English to German
+ ...
Déjà Vu X Sep 14, 2005

The segmentation problem with tagged text is nearly the same in DVX, but using its function "Assemble from portions" it would assemble your "agglutinated" sample sentence, if it had its elements in the TM already.

 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 14:31
English to Russian
TOPIC STARTER
Thank for the tips! :) Sep 17, 2005

Thanks a lot for the feedback, folks. Currently I am translating the Trados export with Wordfast, the export was made with standard Trados segmentation in order to make the resulting TM compatible with the client's one. When I encounter a segment made of several concatenated sentences (they are glued together owing to the reasons stated above) I use 'unsegment' command of Wordfast, then Wordfast segments the sentences properly, one by one and then I intend to clean the Word documents into Workbe... See more
Thanks a lot for the feedback, folks. Currently I am translating the Trados export with Wordfast, the export was made with standard Trados segmentation in order to make the resulting TM compatible with the client's one. When I encounter a segment made of several concatenated sentences (they are glued together owing to the reasons stated above) I use 'unsegment' command of Wordfast, then Wordfast segments the sentences properly, one by one and then I intend to clean the Word documents into Workbench and use a lot of 'shrink segment' in Tag Editor.

Yes with DVX it would be just great, but WF is not capable of assembly yet by default. I thought up a simple procedure of making it able to autoassemble: a couple of FR passes in a copy of my TM in order to make it a glossary, but this a) is tinkering; b) overloads terminology recognition engine of Wordfast and therefore; c) is strongly discouraged in Wordfast manual. So I decided to do it by hand, luckily context search in WF is fast, reliable and does not imply use of the mouse (I hardly ever touch the rodent when I translate).

Now some philosophy.
What my client had. They had a corpus of documents with as many tags left outside translation as possible, premium quality, portable, suitable for more jobs and more clients and so on.
What they will have. They will have the same manual in TE format, where some sentences are joined and single sentences come basically in 4 variants:
a) the sentence proper - Stepless motor is running.
b) the sentence with leading tags - <GTABS $>Stepless motor is running.
c) the sentence with trailing tags - Stepless motor is running.<r>
d) the sentence with leading and training tags <GTABS $>Stepless motor is running.<r>
Plus they will have variations of the above, like this: <FONT "Wingdings"><SIZE 14><P><GTABS $>Stepless motor is running.<r>
Which are numerous.
Here I see Trados beating itself even in what is regarded by most people as its strong (selling) point: On the same tagged document where Wordfast gives several 100% matches, Trados will give, AFAIU, a considerable number of FUZZY matches to the very same sentence: "Stepless motor is running." Making the translation more expensive. And suitable basically for re-translation of the next revision of the manual of the same client next year because the tag and formatting differences in my particular manual, given the short sentences, effectively drive good translations below the reasonable fuzzy level.

I do not mention ease of use and features supported because these are incomparable.

Now, in Trados 7, the 'export unknown segments' feature in Analysis, which I was using in order to make my life easier and my translation better, is very wisely absent. That is a heartening trend, in addition to all the problems I described and already having with version 6.5, I do not think I am ever going to upgrade. It's much easier and cost-effective for me to find a few more direct clients who do not care about fuzzy matches or the type of CAT tool you are using, instead, they are all out to get from you top quality and fast turnaround - two things that Wordfast has been made to deliver.

Thanks again,
Stay healthy,
Aleksandr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>

P.S. The rules are changed in Workbench (File - Setup - Segmentation rules) they are quite flexible, but so are the WF rules I am having to override.
Collapse


 
nicomigo
nicomigo
Germany
Impossible to use just "dot" as a separator for segmentation Oct 15, 2010

Aleksandr Okunev wrote:

Thanks a lot for the feedback, folks. Currently I am translating the Trados export with Wordfast, the export was made with standard Trados segmentation in order to make the resulting TM compatible with the client's one. When I encounter a segment made of several concatenated sentences (they are glued together owing to the reasons stated above) I use 'unsegment' command of Wordfast, then Wordfast segments the sentences properly, one by one and then I intend to clean the Word documents into Workbench and use a lot of 'shrink segment' in Tag Editor.

Yes with DVX it would be just great, but WF is not capable of assembly yet by default. I thought up a simple procedure of making it able to autoassemble: a couple of FR passes in a copy of my TM in order to make it a glossary, but this a) is tinkering; b) overloads terminology recognition engine of Wordfast and therefore; c) is strongly discouraged in Wordfast manual. So I decided to do it by hand, luckily context search in WF is fast, reliable and does not imply use of the mouse (I hardly ever touch the rodent when I translate).

Now some philosophy.
What my client had. They had a corpus of documents with as many tags left outside translation as possible, premium quality, portable, suitable for more jobs and more clients and so on.
What they will have. They will have the same manual in TE format, where some sentences are joined and single sentences come basically in 4 variants:
a) the sentence proper - Stepless motor is running.
b) the sentence with leading tags - Stepless motor is running.
c) the sentence with trailing tags - Stepless motor is running.
d) the sentence with leading and training tags Stepless motor is running.
Plus they will have variations of the above, like this: Stepless motor is running.
Which are numerous.
Here I see Trados beating itself even in what is regarded by most people as its strong (selling) point: On the same tagged document where Wordfast gives several 100% matches, Trados will give, AFAIU, a considerable number of FUZZY matches to the very same sentence: "Stepless motor is running." Making the translation more expensive. And suitable basically for re-translation of the next revision of the manual of the same client next year because the tag and formatting differences in my particular manual, given the short sentences, effectively drive good translations below the reasonable fuzzy level.

I do not mention ease of use and features supported because these are incomparable.

Now, in Trados 7, the 'export unknown segments' feature in Analysis, which I was using in order to make my life easier and my translation better, is very wisely absent. That is a heartening trend, in addition to all the problems I described and already having with version 6.5, I do not think I am ever going to upgrade. It's much easier and cost-effective for me to find a few more direct clients who do not care about fuzzy matches or the type of CAT tool you are using, instead, they are all out to get from you top quality and fast turnaround - two things that Wordfast has been made to deliver.

Thanks again,
Stay healthy,
Aleksandr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>

P.S. The rules are changed in Workbench (File - Setup - Segmentation rules) they are quite flexible, but so are the WF rules I am having to override.



Until now, it is impossible in my opinion to say in the segmentation rules that trailing spaces after a dot are set to 0. Which means that I will always have 1 segment for two sentences if the space was forgotten in the original file. This is quite a pain. Can someone confirm?


 
nicomigo
nicomigo
Germany
Any idea? Oct 18, 2010

Any idea on the above question?

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I change Tag Editor segmentation rules?







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »