Hurrah! Proper Segmentation of Abbreviated Names at Last in Studio!
Thread poster: Tom Fennell

Tom Fennell
United States
Local time: 09:16
Member (2010)
Russian to English
+ ...
Dec 9, 2011

Segmentation of initials with last name
Segmentation of initials and last name
Segmentation of Russian names in Trados
Сегментация русских И.О. Фамилия
Сегментация русских имён

Привет традософорумчане!
Hello Trados forumites,

Yesterday the SDL support centre gave me a solution for a problem that has vexed me for several years:

Serious mis-segmentation in Trados to due a false segment end with Russian name abbreviations (V.V. Putin, P.I. Tchaikovsky). As many of you know -

There are many mixed feelings about V.V. Putin in Russia.

becomes:

1 There are many mixed feelings about V.V.
2 Putin in Russia.

P.I. Tchaikovsky was the first Russian composer with broad European fame.

becomes:

1 P.I.
2 Tchaikovsky was the first Russian composer with broad European fame.

(Note: "Tchaikosvky, P.I. is a great composer." is properly segmented due to the lack of capitalization in "is").

This may sound trivial, but it has caused me and others mass havoc with excess mergers and resultant inability to pre-translate files, auto-propagate and populate segments properly.

Problem now solved.

1. Do the following:

1. View>Translation Memories View
2. File>Open>Language Resource Template>Russian Language Resources
- (or any other language resource template)
3. File>Settings /or/ Right click on Russian Language Resource Template>settings
4. Select Segmentation Rules
5. Click Edit button
6. Select full stop rule (it is the default)
7. Click Edit button
8. Click Add button (Exceptions group)
9. Click Advanced View button
10. Paste the following into the "Before break" box (you can type, but be careful!):
\w\.\w\.+
11. Confirm that the following is in the "After break" box:
\s
12. Click OK>OK>OK>OK

Voila!

2. The Language Resource Template is used to create new translation memories. You will need to repeat step 1 for any current TM you plan to use, because as far as I understand, segmentation is driven by the first project TM (I have not tested this extensively).

Very slight change for TMs:

1. View>Translation Memories View
2. File>Open>Open Translation Memory
3. File>Settings /or/ Right click on TM>settings
3a. Select>Language Resources
4. Select Segmentation Rules
5. Click Edit button
6. Select full stop rule (it is the default)
7. Click Edit button
8. Click Add button (Exceptions group)
9. Click Advanced View button
10. Paste the following into the "Before break" box (you can type, but be careful!):
\w\.\w\.+
11. Confirm that the following is in the "After break" box:
\s
12. Click OK>OK>OK>OK

Enjoy!


Direct link Reply with quote
 

Alexey Ivanov  Identity Verified
Russian Federation
Local time: 18:16
English to Russian
Many thanks, Tom Dec 11, 2011

Indeed it was a big headache. I never hoped to have it resolved, but thanks to your persistence with the SDL support we shall have one problem less.

Direct link Reply with quote
 

Tom Fennell
United States
Local time: 09:16
Member (2010)
Russian to English
+ ...
TOPIC STARTER
We are not quite there Dec 11, 2011

I realized that this works for the standard format which is:

P.I. Tchaikovsky

but not if there is a (non-standard) space:

P. I. Tchaikovsky
(P.[space]I.Tchaikovsky)

The solution is adding one more exception rule.

I am almost sure the regex code will be:

\w\.\s\w\.+

but I don't have the time or energy to test it right now.

I'll get to it eventually, but if someone else can test this, and advise the result, that would be great.

Best,

Tom


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Hurrah! Proper Segmentation of Abbreviated Names at Last in Studio!

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search