Adding new segmentation rule to TM in Studio 2011
Thread poster: Richard Hill

Richard Hill  Identity Verified
Mexico
Local time: 03:26
Member (2011)
Spanish to English
Nov 10, 2011

I think I might be loosing some hits from my TMs due to not having segmentation with semicolons so I've spent a couple of hours trying to add a new (semicolon) rule and to update the corresponding TM but nothing has changed, so my TM is not segmented at semicolons. I'm not sure what I'm doing wrong but I'll try and explain how I've gone about doing so, in the hope that someone can put me right.

As seen in this image I've added a new rule including a semicolon



and as seen in this image, when updating the TM, I'm applying the language resources of the applicable TM.



and finally, as seen here, segmentation is not applied where there are semicolons



If anyone could tell me the right way to go about adding a new rule to have segmentation applied to semicolons, I'd be really grateful.

Thanks,

Rich

[Edited at 2011-11-10 05:26 GMT]


 

SDL Community  Identity Verified
United Kingdom
Local time: 10:26
English
When the segmentation rule apply Nov 10, 2011

Hi Rich,

The change you made worked fine for me:

Before


After


I think there may be question of how you expect this to work. The rules are applied when you first open the file, so applying a TM on a file that has already been segmented and converted to sdlxliff will not do this. Perhaps this is where you are going wrong?

Regards

Paul


 

Richard Hill  Identity Verified
Mexico
Local time: 03:26
Member (2011)
Spanish to English
TOPIC STARTER
I think you are right Nov 10, 2011

Hi Paul,

One thing I was expecting to happen, is for the actual TM (not only the file I'm about to translate), to be segmented at the semicolons, when I update the TM with the new semicolon rule. Is this not the case?

BTW, the third image in my last post is not a file I'm about to translate, but a view of the TM itself, in the Translation Memory View.

Thanks

Rich


 

SDL Community  Identity Verified
United Kingdom
Local time: 10:26
English
Nice idea ;-) Nov 10, 2011

Hi Rich,

I hadn't ever thought about an option like this, so a nice idea, but no... this is not how it works. The TM contains the results of the translated file so to have segments in there that are segmented from the semicolon then you need to send them there from your translation.

If you did change them here then you still wouldn't get the hits you want because unless the file was similarly segmented it still wouldn't match. Chicken and egg almost...

Regards

Paul


 

hhl
Local time: 10:26
English to German
exceeding software intelligence Nov 10, 2011

Hi Rich,
a function doing what you want would not only have to separate the source text segments (that is all, what a segmentation rule does), it would also need to introduce breaks into the target language segments, otherwise it would not be very useful.

Try to do it in the Editor ... with a text, where you have a populated target column.
Step into one segment (source column) and "split segment" - your text will be split into 2 segments, but only the source side will have the break where you wanted. The (existing) translation of the original segment will completely move to the 1st of the 2 new segments, while the second remains empty. This is expected, because how should the software know, where the corresponding break could be in the target translation?

Same thing with your idea. You may of course argue, that the split should take place with each semicolon, be it source or target - but can you seriously rule out, that any one segment in the whole TM may have less or more semicolons on the target than on the source side?

The approch you take is more like an "Alignment" function, where the software guesses which target part belongs to which source part - but still you need the human's brain at the end of the process to confirm that the segmentation is ok on both sides.

[Edited at 2011-11-10 09:08 GMT]


 

Richard Hill  Identity Verified
Mexico
Local time: 03:26
Member (2011)
Spanish to English
TOPIC STARTER
I wonder how segmentation affects the Autosuggest function? Nov 10, 2011

Hi hhl,

Thanks for your explanation, it all makes much more sense to me now.

So I will have to work out whether it might be worth manually realigning my TM content at semicolon level and creating new TMs from the results. I'm not sure whether or not the benefits would be worth the work involved. They maybe though, considering that I do a lot of legal work, such as the example in image 3 of my first post, where they are listing powers of attorney which are often separated by semicolons, saying the same thing but just in a different order. Still not sure it'd be worth it though. It might be less work to add these short repetitive phrases to my termbase or autotext or wait 'till I have 25,000 segments in my TM, create a new autosuggest dictionary and see how many of these phrases are thrown up by autosuggest.

Mmmmmm! This makes me wonder how the Autosuggest dictionary throws up its suggestions. i.e. If I took the trouble to separate some TM content at semicolon level, would this "help" the autosuggest function create and suggest more phrases rather than single words?


 

Aude Sylvain  Identity Verified
France
Local time: 10:26
English to French
+ ...
projects with several TMs Nov 11, 2011

SDL Support wrote:

The rules are applied when you first open the file


Hi Paul,

A short question in that regard. I tried to test that several times but could't reach a clear conclusion.

When one creates a project with several TMs among which only one uses a specific segmentation rule (say, Rich's semicolon rule) and all the others are set with the standard segmented rules, would the source text be segmented using the specific rule? Or is it necessary, for this, that all the TMs involved use this specific segmentation rule?

Many thanks,
have a good weekend all!


 

SDL Community  Identity Verified
United Kingdom
Local time: 10:26
English
Which language resources take precedence? Nov 11, 2011

Hi Audrey,

A very good question..! It's the TM at the top of the list when you select your TMs that will take precedence. So if you have three TMs and only one of them uses segmentation rules for a semicolon then this will only be used if it is at the top of the list, like this:


Regards

Paul


 

Aude Sylvain  Identity Verified
France
Local time: 10:26
English to French
+ ...
- Nov 11, 2011

Thank you Paul, most useful!
I didn't check that point when I made my tests indeed, I guess that is why I was getting inconsistent results.


 

Richard Hill  Identity Verified
Mexico
Local time: 03:26
Member (2011)
Spanish to English
TOPIC STARTER
BTW Nov 12, 2011

I've been doing a couple of tests to try and work out how worthwhile adding certain segmentation rules might be.

For example, I opened a 10,000 word document (with lots of repetitive text) with default segmentation rules and ran an analysis in Studio and it shows, under "New", 63.8%, then I added two new segmentation rules, semicolon and dot dash (".-" is often used as a form of Mexican Spanish punctuation, then opened the same document and reran the same analysis, and now have 61.78% under "New", being virtually 2% (200 words) less words to translate. I guess this isn't much, but then again I reckon this percentage could increase after having applied these two new rules over a few months work in Studio.


 

Emma Goldsmith  Identity Verified
Spain
Local time: 10:26
Member (2010)
Spanish to English
dot dash punctuation Nov 12, 2011

rich. wrote:

I added two new segmentation rules, semicolon and dot dash (".-" is often used as a form of Mexican Spanish punctuation, then opened the same document and reran the same analysis, and now have 61.78% under "New", being virtually 2% (200 words) less words to translate.


I don't want to be an "aguafiestas", but I suspect the 200 fewer words that you no longer have to translate after adding the dot dash segmentation rule stem from the fact that in Spanish a number often precedes a dot dash (especially in list form). So Studio will automatically translate those numbers for you.

However, your experiments with new segmentation are very interesting and I think you may well get more leverage from your TMs in the long run.


 

Richard Hill  Identity Verified
Mexico
Local time: 03:26
Member (2011)
Spanish to English
TOPIC STARTER
dot dash Nov 12, 2011

Emma Goldsmith wrote:
I don't want to be an "aguafiestas", but I suspect the 200 fewer words that you no longer have to translate after adding the dot dash segmentation rule stem from the fact that in Spanish a number often precedes a dot dash (especially in list form). So Studio will automatically translate those numbers for you.


Hi Emma,

No problem! I had considered that, and you're right, so I did run the same test with the only new segmentation rule being the semicolon rule, and this resulted in only a 0.53% saving. So two things here, one: I'd expect this saving to increase after a few months work with the implemented semicolon segmentation rule, and two: having the numbers outside of the TM unit may be beneficial in terms of TM percentage scores, in that, at least theoretically the same text may appear in the same or other documents listed in a different order or with a different number.

Cheers!

Rich


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Adding new segmentation rule to TM in Studio 2011

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search