Mobile menu

How can I change the text segmentation rules in Trados?
Thread poster: Pavel Tsvetkov

Pavel Tsvetkov  Identity Verified
Bulgaria
Local time: 12:39
Member (2008)
English to Bulgarian
+ ...
Jan 25, 2009

Here is the problem: Trados segments the texts automatically following certain rules like the rule to segment every time when there is a full stop. However, that is not always wise, because a full stop is not always the end of a sentence. So, let us say, that I would like to instruct Trados not to segment in certain cases, for example: "чл.", "т.", and so on. How and where do I list these exceptions so that Trados does not consider those cases as "segmentation flags".

Direct link Reply with quote
 

Attila Piróth  Identity Verified
France
Local time: 11:39
Member
English to Hungarian
+ ...
User-defined list of abbreviations Jan 25, 2009

Go to File / Setup / Segmentation Rules / User List / Abbreviations, where you can add a user-defined list of abbreviations.
HTH,
Attila


Direct link Reply with quote
 

Pavel Tsvetkov  Identity Verified
Bulgaria
Local time: 12:39
Member (2008)
English to Bulgarian
+ ...
TOPIC STARTER
This is a TM specific change, no? Jan 25, 2009

Attila Piróth wrote:

Go to File / Setup / Segmentation Rules / User List / Abbreviations, where you can add a user-defined list of abbreviations.
HTH,
Attila


Thank you for your time, Attila!

Now, this should only work for the translation memory currently open, or not? Is there a way to instruct Trados to segment ALL Bulgarian texts with these user-defined exceptions in mind? Because the wrong segmentation also applies to WinAlign when matching left and right between two source-target files to be aligned, etc?


Direct link Reply with quote
 

Attila Piróth  Identity Verified
France
Local time: 11:39
Member
English to Hungarian
+ ...
Yes, it is TM specific Jan 26, 2009

Hi again,
You can set the segmentation rules in Winalign (New project / General / Source Segmentation + Target Segmentation) by specifying the file that contains a user-defined list of abbreviations. This is also project specific, however, when you create a new project, go to the above tab, and click on the Browse button, you are taken to the folder you used the last time.
Therefore you should create a master list in an easily accessible place. Each time you need it in a new Winalign project, you can have access to it with a few clicks, whereas you will need to open the file and copy+paste its contents each time you create a new TM. If you have one TM per client, you will need to do this fairly rarely. (Also, you can avoid creating new Winalign projects by adding just pairs of files to existing Winalign projects.)
However, if you want to save time on alignment projects, consider dumping Winalign in favor of PlusTools (Wordfast's free add-on, which is fully operative even with the trial version of WF). Learning the proper use of +Align takes less than half an hour, and segmentation can be very comfortably customized in Wordfast (as a global option).
Attila


Direct link Reply with quote
 

Pavel Tsvetkov  Identity Verified
Bulgaria
Local time: 12:39
Member (2008)
English to Bulgarian
+ ...
TOPIC STARTER
? Jan 26, 2009

Thank you, Attila!

This master file that I should create... it is actually a text file with the extension *.abr, is that correct? What should be the delimitation of abbreviations inside, for example:

1) each one on a new line

aaa.
bbb.
ccc.

2) one following the other with some kind of a division symbol in between

3) Should there be empty spaces left (to instruct TRADOS to expect spaces after, say, a full stop)?


Direct link Reply with quote
 

Attila Piróth  Identity Verified
France
Local time: 11:39
Member
English to Hungarian
+ ...
Entries in new lines Jan 26, 2009

Pavel Tsvetkov wrote:

This master file that I should create... it is actually a text file with the extension *.abr, is that correct? What should be the delimitation of abbreviations inside, for example:

1) each one on a new line

aaa.
bbb.
ccc.


Hi again, Pavel,
Yes, in the customised list of abbreviations each item in the must be on a line of its own in Workbench. Press [Ctrl]+[Enter] to move to a new line. (See the documentation of WB).
The Winalign manual does not give any specific details - therefore I suppose the same applies to it. (It is indeed a simple text file with an .abr extension.) So, try adding each abbreviation into a new line, and if it does not work, try some others (comma separated, etc.).
Attila


Direct link Reply with quote
 
FarkasAndras
Local time: 11:39
English to Hungarian
+ ...
How is this supposed to work? Feb 9, 2011

I have dredged up this thread because I have a related question.
I use Trados Studio SP3 and want to add an abbreviation to the abbreviation list for Hungarian. This should be a trivial matter, but SDL has managed to obfuscate it quite remarkably. Perhaps a new category should be added to the obfuscated code contest for obfuscated UI/documentation.

Anyway, could somebody explain to me how this is supposed to work in Studio? SDL says that "There is a set of default language resources for every language supported by SDL Trados Studio". That sounds great, that's the default rule set that I would like to edit so that every Hungarian text is segmented correctly from here on.
However, just when the documentation appeared, for a sweet but brief moment, to make some sense, it starts to talk about what I can do when I create a new TM. I don't want to create a new TM at all. I want to change segmentation rules by adding a new abbreviation. How and why are "language resources", which are mostly segmentation rules, related to translation memories in SDL's mind? They have absolutely nothing to do with each other, and segmentation rules don't belong in the translation memories view at all.

Even if I were to accept the lunatic concept that I have to bind segmentation settings to TMs, how would I go about applying them to a translatable document/project? I mean, not all projects have a TM assigned to them, and some projects have several. This just makes no sense to me. Why do TMs contain this stuff at all?

In the Translation Memories view, I also get the much more attractive Language Resource Templates menu in the bar on the left, but it seems to be designed to create new templates only, not edit the defaults. Is that a feature SDL has simply left out? I still refuse to believe that. Do I really have to create a new language resource template, save it and specify it every single time I create a new project?

[Edited at 2011-02-09 18:21 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 11:39
English
Using Language Resources and Project Templates Feb 10, 2011

FarkasAndras wrote:
Anyway, could somebody explain to me how this is supposed to work in Studio? SDL says that "There is a set of default language resources for every language supported by SDL Trados Studio". That sounds great, that's the default rule set that I would like to edit so that every Hungarian text is segmented correctly from here on.


Hello Farkas,

The language resources is the place to go for this, and every TM has a set of these resources attached to it. The reason for the link to the TM is that normally people work with a TM (but granted you can work without one if you want to) and you want the segmentation rules applied, and abbreviation lists etc. when you use this to be the same as the ones that were used during its creation.

However, to answer your more specific question on how do you make sure you always get the one you want whether you use a TM or not. The answer lies in the use of Project Templates combined with Language Resources.

First of all you create a Language Resource template set up for all the languages you want, configured as you want, and save this somewhere safe (I guess not in My Documents) I'm assuming you know how to create the Language Rresource Template so I'll skip that bit.

Then go to File - Set up - Project Templates. In this next window you'll see that you could create a new template and make that the default, but you could also edit the default already there. So, assuming this is what you want to do, select the default and then click on Edit. Navigate in the tree menu to Language Pairs - All Language Pairs - Translation Memory and ... - Language Resources. You can now select the Language Resource template you created and this will now be the set of resources that is always used by default.

I hope this is clear?

Regards

Paul


Direct link Reply with quote
 
FarkasAndras
Local time: 11:39
English to Hungarian
+ ...
TM settings <-> project settings Feb 10, 2011

SDL Support wrote:
Then go to File - Set up - Project Templates. In this next window you'll see that you could create a new template and make that the default, but you could also edit the default already there. So, assuming this is what you want to do, select the default and then click on Edit. Navigate in the tree menu to Language Pairs - All Language Pairs - Translation Memory and ... - Language Resources. You can now select the Language Resource template you created and this will now be the set of resources that is always used by default.

I hope this is clear?

Thanks, that's pretty clear. I mean, not the UI, which is pretty atrocious, but your explanation. Essentally, it's Create a new language resource template in Translation memory view (Why there?) and then make it the default (well, sort of, in a roundabout way) in the File menu under project templates (Why there?). It would be a tad easier for users if it was edit the default language resource for the language in question in the Language Resources menu in File/Setup.

SDL Support wrote:
The reason for the link to the TM is that normally people work with a TM (but granted you can work without one if you want to) and you want the segmentation rules applied, and abbreviation lists etc. when you use this to be the same as the ones that were used during its creation.

I still don't follow this line of thinking. I can see why you would want the text in the TM to be segmented like the text you're translating (to get more TM hits), but how does Studio's behaviour help achieve this? You set up a TM, and specify segmentation settings. Fine. Now, if you populate the TM by importing a TMX, your segmentation settings go right out the window, but let's forget about that. When you are populating the TM as you're translating, the segmentation rules are detemined by the project. You just explained to me how to set this up for new projects via the project template (or picking language resources manually when you create the project). That's the set of language resources that's taken into account when you add a file to a project, isn't it? Do the TM's language resource settings override this? I sure hope not. Even if so, which TM's? Studio (at long last) introduced some flexibility here, so you can have 10 TM's in a project, 3 of which are updated as you translate, so all 3 can be considered the "main" TM. Then halfway through the project you can decide to change your "main" TMs to a set of 5 different TMs. What then? Newly imported files will be segmented based on different language resources? That sure won't be helpful.
This just makes no sense to me, I can't see how the TM segmentation settings could influence anything or be of any use if they did. Projects have language resource settings and that should suffice. What am I missing?

[Edited at 2011-02-10 11:40 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 11:39
English
Language Resources and TMs Feb 11, 2011

Thanks, that's pretty clear. I mean, not the UI, which is pretty atrocious, but your explanation. Essentally, it's Create a new language resource template in Translation memory view (Why there?) and then make it the default (well, sort of, in a roundabout way) in the File menu under project templates (Why there?). It would be a tad easier for users if it was edit the default language resource for the language in question in the Language Resources menu in File/Setup.


Hi Farkas,

You have some support on this, it would be better to also place this in a more logical place. I think it sits in the Translation Memories view because we have linked the Language Resources to Translation Memories and so expect users to go there when working with them. I guess once you know, it's not an issue, but I think it would be good to be able to create a new one from the Project Template window so you only went to one place in the first instance (once you accept the concept of Project Templates).

This just makes no sense to me, I can't see how the TM segmentation settings could influence anything or be of any use if they did. Projects have language resource settings and that should suffice. What am I missing?


Well, I needed a little help to answer this one thoroughly, or try to anyway. So here's the only explanation I can give you.

Right now most of the Language Resource functionality is “at or near” the TM – and you are right in that the segmentation is not a functionality of the TM itself. There are several reasons for why things are as they are:

  • Language resources, including segmentation rules, are packaged up with TMs so that you can send a TM along with some files to a translator and have all the required resources in one place. With the package concept introduced by Studio that’s definitely the preferred way.
  • Historically, language resources were packaged in .tmw (Workbench) TMs, and we kept this approach in Studio, until everyone migrated to the package concept.
  • An alternative approach would be to package the language resources in a package and not include them in the TM (which is in the package).
  • However, although segmentation is not an inherent TM functionality, the leverage against a TM critically depends on the segmentation rules. It is quite beneficial to “link” TMs with “their” segmentation rules, even though the TM itself does not segment.


We see the point that you want to often modify “central” segmentation rules (including abbreviation lists etc.) and apply these across all TMs. In particular, once you change settings for the “global” resources, you’d ideally want these changes to be applied by all TMs. Right now, you’d have to re-apply the (global) language resource template to all the TMs once the “global”/default template was changed.

However, if you receive packages with embedded language resources (and there are many reasons to embed them in the package), the translator should use the packaged resources and not the global ones. Obviously a TM’s language resources or resources included in a package must get precedence over locally configured ones, as otherwise TM leverage will be negatively affected.

What would perhaps be needed, therefore, is a mechanism which allows a project manager to bundle language resource in a package (be that through additional package settings or by embedding the resources in the project TM, should that be a file-based TM), and that packaged resources get precedence over locally configured ones. Obviously, when using server-based TMs in a package or project, the language resources would need to be pulled from the server as well.

So, food for thought and it is an area we will probably continue to improve as we go forward. I hope this answers your questions anyway?

Regards

Paul


Direct link Reply with quote
 
FarkasAndras
Local time: 11:39
English to Hungarian
+ ...
Thougths Feb 11, 2011

SDL Support wrote:


  • An alternative approach would be to package the language resources in a package and not include them in the TM (which is in the package).

  • I have never used Trados packages so I'm not well versed in these affairs, but it seems pretty clear to me that this would be the best solution. This seems to be a classic case of the developers "getting high on their own supply" and failing to take a step back and realize that their whole approach needs to be revisited. I may be wrong, though. Fellow prozians can chime in and be the judge, I guess.

    SDL Support wrote:
    We see the point that you want to often modify “central” segmentation rules (including abbreviation lists etc.) and apply these across all TMs. In particular, once you change settings for the “global” resources, you’d ideally want these changes to be applied by all TMs. Right now, you’d have to re-apply the (global) language resource template to all the TMs once the “global”/default template was changed.

    That's very badly broken, if I may be so bold. I wonder how many Studio users actually know this... My guess would be well under 5%, probably under 1%. At the risk of repeating myself, segmentation rules should not be linked to TMs in my opinion.

    SDL Support wrote:
    However, if you receive packages with embedded language resources (and there are many reasons to embed them in the package), the translator should use the packaged resources and not the global ones. Obviously a TM’s language resources or resources included in a package must get precedence over locally configured ones, as otherwise TM leverage will be negatively affected.

    What would perhaps be needed, therefore, is a mechanism which allows a project manager to bundle language resource in a package (be that through additional package settings or by embedding the resources in the project TM, should that be a file-based TM), and that packaged resources get precedence over locally configured ones. Obviously, when using server-based TMs in a package or project, the language resources would need to be pulled from the server as well.

    I can see the point of that, although this needs to be done transparently. The translator has to be told that the package/project/TM has overridden his default segmentation settings and be allowed to change them back.
    This whole approach is focused fully on maximum leverage, which may not be the right way to do it. There are many projects out there with no or minimal leverage, where any sensible translator would prefer a "good" segmentation that requires less manual readjusting to a "TM-identical" segmentation. On top of that, (good) translators fix the segmentation manually as they go along anyway, which makes the fixed-in-TM segmentation useless or harmful in many cases.
    Here's an example: Kft. is a Hungarian abbreviation (meaning Ltd.) It's missing from Studio's Hungarian abbreviation list. Translator 1 translates a contract, where Kft. occurs 150 times. 150 sentences cut in half by the segmenter. Translator 1, being conscientous, fixes all of them by merging segments, so the TM contains the correctly segmented sentences. Later, the contract is revised and the new version is sent to Translator 2 for translation. He is very smart, he has already added Kft. to his abbreviation list. He gets the new text and the old TM... you can already see what's coming: the TM has someone else's poor segmentation rules in it, which override Translator 2's own rules and Trados segments the text incorrectly again. Thanks to the integration of segmenting rules and TMs, Translator 2 has to manually merge 150 segments again instead of getting 100% TM hits on them right away on opening the file*.
    In contrast, the advantage that can be derived from putting the segmentation rules in the TM is pretty marginal: it only materializes if the translator who populates the TM fails to fix segmentation errors, which is hardly the best case scenario. What SDL should be shooting for is to allow the best, continually improving segmentation in every new text, hopefully matching the (manually fixed) segmentation that was previously entered into the TM. That's what's going to work best for most people most of the time, isn't it?


    Now, we have both posted way too much on the issue already, but I'm afraid I still have no idea on how this works presently in Studio. Do the segmentation rules in the TM override the ones set in the project? Which TM wins out if there are several in a project with different rules? What if a TM is added to a project later on? If the TMs take precedence (which, I repeat, I consider a very poor decision) why even have segmentation settings in the project at all? As far as I can tell, all it does is confuse users as most projects have TMs assigned to them.


    *By the way, the merge segments functionality, which worked so well in T2007 needs a revamp, or rather, needs to be restituted. It used to require the Ctrl-Alt-Pgdn shortcut, which was really convenient. Now, you have to select the active segment and the next segment first. How is that useful? You want to merge with the next segment, what other segment would you want to merge with? The fifth one down the page? Even if you want to merge 4 segments, pressing Ctrl-Alt-Pgdn 3 times is still faster and more convenient than the current "click all over the place and rummage around in a context menu" solution. The two can coexist as well, of course.


    Direct link Reply with quote
     

    SDL Community  Identity Verified
    United Kingdom
    Local time: 11:39
    English
    Segmentation and merge segments Feb 12, 2011

    FarkasAndras wrote:
    Now, we have both posted way too much on the issue already, but I'm afraid I still have no idea on how this works presently in Studio. Do the segmentation rules in the TM override the ones set in the project? Which TM wins out if there are several in a project with different rules? What if a TM is added to a project later on? If the TMs take precedence (which, I repeat, I consider a very poor decision) why even have segmentation settings in the project at all? As far as I can tell, all it does is confuse users as most projects have TMs assigned to them.


    Once you have your Project the segmentation is done, so adding a TM won't change anything. The effect takes place when you add files, so if you add new files to your Project, then yes it is possible to override the setings of the original Project.

    If there are several in a Project the one at the top of the list wins out.

    To be honest I'm not sure how many users will be confused by this. Certainly quite technical users like you may have different views on this (and as I said there is some sympathy with you on this but any changes now would not be soon) but many LSP's or Corporates who create Projects to issue to Translators may well want the initial segmentation to be based on their own TM's and rules. So adding your TM's after would only be beneficial for reference.

    Translators creating their own Project will be using the rules they set, so this is why I am not sure too many users will be confused. But as you say, we have discussed quite a bit already and think we probably agree it could be improved.

    FarkasAndras wrote:
    *By the way, the merge segments functionality, which worked so well in T2007 needs a revamp, or rather, needs to be restituted. It used to require the Ctrl-Alt-Pgdn shortcut, which was really convenient. Now, you have to select the active segment and the next segment first. How is that useful? You want to merge with the next segment, what other segment would you want to merge with? The fifth one down the page? Even if you want to merge 4 segments, pressing Ctrl-Alt-Pgdn 3 times is still faster and more convenient than the current "click all over the place and rummage around in a context menu" solution. The two can coexist as well, of course.


    You could use alt+shift+down and then use ctrl+alt+s but I agree a single shortcut would be far better. Maybe use autohotkey to combine these into ctrl+alt+pgdn?

    Good to use ideas.sdl.com.

    Regards

    Paul


    Direct link Reply with quote
     


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    How can I change the text segmentation rules in Trados?

    Advanced search


    Translation news related to SDL Trados





    memoQ translator pro
    Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

    With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

    More info »
    Across v6.3
    Translation Toolkit and Sales Potential under One Roof

    Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

    More info »



    All of ProZ.com
    • All of ProZ.com
    • Term search
    • Jobs