Pages in topic:   [1 2 3] >
Trados 2011 new segmentation rule
Thread poster: Pascal Zotto

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
Oct 10, 2012

Hi,

I know how to implement new segmentation rules but have some probs getting this one to work:

I need to bread at \r\n and similar rules but when I tell Trados that \r\n is to be used it segments after each r and each n so how do I define the rule that Trados does a break at each \r\n ? (btw. why the hell did Trados programmers define that rules just use any char in a row instead of any char group separated by some specified char like normal programms do?)

thanks for the help,
Pascal


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
found it Oct 15, 2012

http://kb.sdl.com/#tab:homeTab:crumb:7:artId:3676



Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
well at least in theory Oct 16, 2012

in practice it does not work at all... anyone with experience with regex segmentation rules?

Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
can you provide some... Oct 16, 2012

... example text with chars that you wish to break at... and also that which you don't? Just annotate in this thread should be fine.

Thanks

Paul


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
examples Oct 16, 2012

Here are some examples: (taken from Excel cells)

E&nter Product Key\r\n\r\nChoose this option if you have obtained a Product Key from a full product package or some other authorized source.
Buy Product Key &Online\r\n\r\nChoose this option if you do not have a Product Key to use for the conversion and would like to buy one using a credit card over the Internet. After purchasing the key, return to this screen and choose "Enter Product Key" to procceed with the conversion.
Inviting people isn't enabled here. Please get in touch with your help desk for more information.\n\nThis is shared with anyone permitted to access its location:
I work on this \tIt does not work.

Chars to breakt at (and not to be shown in the segments):

\r\n
\n\n
\n
\t


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
And a dumb question... Oct 16, 2012

... for my sanity. Do your characters represent real line feeds, carriage returns and tabs or is it actually the text like this... not that I know how to enter the real things inside an excel cell


Thanks

Paul


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
that's the way it is... Oct 16, 2012

when I get the files. I guess they are taken from another program that interprets the \r\n aso as what they really should be when imported back.

Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
ok - two steps I think Oct 16, 2012

Pascal Zotto wrote:

when I get the files. I guess they are taken from another program that interprets the \r\n aso as what they really should be when imported back.


The first thing is to use ascii instead... I don't know why but I get more success with this. So I'll ask tomorrow when the right developer is back in.

It's not quite there yet... probably need to fiddle with this... but if you use this:

Before break:
Anything

Break characters
(?:\x5C\x72\x5C\x6E)|(?:\x5C\x6E\x5C\x6E)|(?:\x5C\x74)

After break:
Anything

Then it's almost right... the next step is to make the chars external tags so they don't appear at all. To do this just create a placeholder tag out off \r \n and \t in the embedded content part of the excel filetype. If you click on the advanced button when you do this you can set the segmentation hint to be exclude and then the segments with the chars disappear.

My quick test isn't working properly... but it may set you on the right path. I get this so far:


So closer than you were before I think... but something is wrong. It might even be a bug so I'll check that out tomorrow too. Sorry I can't give you a complete job but perhaps this will help a little?

Regards

Paul


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
If worse comes to worse... Oct 16, 2012

... you could use the embedded content to protect the tags at least and manually segment as you work?


Not as good as segmenting but may be a better solution than otherwise.

Regards

Paul


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
yes, closer than I came Oct 16, 2012

for the first part ... waiting for results with your techs... and trying to go on on my side too...

SDL Support wrote:

... you could use the embedded content to protect the tags at least and manually segment as you work?


manually segmenting is not possible... the example is only a very small exerpt... I get lots of files of up to a few thousand strings, so I guess you can imagine how time consuming manual segmentation would be, to this comes that Trados files get buggy the more manual splits you do... not a good idea...


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
tried to implement your solution... Oct 16, 2012

to no avail... I get no change... might be the tag part is not correctly set... never worked with that part of Trados till now.

Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
ok - still two steps... Oct 17, 2012

.... but different ones. You can't use the embedded content because this parses the file first and they they become tags and the segmentation doesn't pick them up. So I created two rules like this:

PZ_after
Before break : [\w\p{P}]
After break: (?:\\r\\n\\r\\n)|(?:\\r\\n)|(?:\\n\\n)|(?:\s\\t)

PZ_before
Before break : [\w\p{P}]?(?:\\r\\n\\r\\n)|(?:\\r\\n)|(?:\\n\\n)|(?:\\t)
After break: \w

This renders your file like this:


The problem that caused me to use ascii last time is that I added this into the basic view which then attempted to convert the regex I wrote into another regex. Newbie error on my part... so I switched to the advanced view and then I could use the shortcut regex instead. The same problem is responsible for the segmentation not working properly too... so my error.

The last problem is how to exclude the segments with \r\n\t etc. Unfortunately the only way I can do this is to convert them to tags, but because doing this prevents the segmentation rule from working I can't. So the best solution is probably to filter on these segments now that they are easily separated from the text, copy source to target and confirm and lock them, like this:


Then filter on unlocked like this:


So I guess if you merged files together you could do all of this in one go to prepare the files for translation... or maybe in groups of them depending on the sizes?

I hope this helps anyway... not exactly what you wanted but probably a lot better than nothing at all..!

Meant to add that these look like java resource files (*.properties) so if you had the originals, or files in this format instead of excel you might not have to do any of this..!

Regards

Paul




[Edited at 2012-10-17 12:31 GMT]


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 14:22
English
Thanks Paul Oct 17, 2012

Hi Paul

I think the way you are going is the right one.

Actually what you try to do here Pascal is not creating a "segmentation rule".
It is here embedded content that you should use like Paul describes.


Just wanted to make sure that the difference is known here not to confuse other users who are reading the post.


Cheers
Richard


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
trados doesn't like me... Oct 17, 2012

Hi Paul,

looks great on your side but not on mine... I set up the rules with copy & paste on my TM but when I add a new file and have it prepared nothing happens.

Do I have to close Trados after every change of segmentation or can I just change it and then add a file?

The second part would be no problem... I just need to "translate" and confirm them once and they propagate and for the next translation they will be pretranslated


Direct link Reply with quote
 

Pascal Zotto  Identity Verified
Austria
Local time: 14:22
Member (2009)
Dutch to Letzeburgesch
+ ...
TOPIC STARTER
Hi Richard, Oct 17, 2012

Paul stated that we could NOT use the embedded content but would have to create rules instead, but they don't seem to work on my side for some unkown reason. Or where can I create rules except for segmentation?

Direct link Reply with quote
 
Pages in topic:   [1 2 3] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Trados 2011 new segmentation rule

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search