Segmentation and placeable recognition issues - help needed
Thread poster: Richard Hill

Richard Hill  Identity Verified
Mexico
Local time: 00:43
Member (2011)
Spanish to English
Jul 25, 2011

If you could give me some tips on how to sort out these segmentation and placeable recognition issues I'd be really grateful.



Firstly, all the numbers, like “1.” Etc. were getting segmented so I added “1.” to “0.” In the abbreviations list which fixed that. I hope that doesn’t affect any other segmentation and wonder if there’s another way to achive the same aim?
I would have thought that if I’d included “1.” In the abbreviations list then it wouldn’t get semented at “11.” but no so!
Also, is there a was to force Studio no recognize numbers as placeables if directly preceeded by a currency symbol?
And lastly, is there a way to establish date formats such as 25 de marzo del 2011 (as in the screen shot) to be recognized as a placeable?
Thanks
rich


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 06:43
English
Culture Sets and Automatic Numbering Jul 26, 2011

Hi Rich,

If I create a little document in word like this:


Then open it in Studio I see this:


Note that when I use the automatic numbering Studio moves this outside the segment and only gives you the text to translate because it knows what this is. When you use simple text and even Word doesn't know this is supposed to be a numbered list then Studio doesn't either and will treat this as text followed by a full stop that needs to be segmented. You could probably edit the segmentation rules to not break after a number followed by a full stop but unless you have lots of documents that are always created in this way then I would not do this.

On the placeables. Studio recognises date/time/number/currency placeables based on the Microsoft Culture sets for the specific language you are using. So if you have a number in your source that is not written using the recognised convention for that language then it won't be identified as a placeable. Similarly the target number you place will be autolocalized to match the culture set for the target language. You can see these here:

http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx

I hope this explains a little anyway?

Regards

Paul


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 06:43
Member (2003)
Polish to German
+ ...
No Jul 26, 2011

rich. wrote:
Also, is there a was to force Studio no recognize numbers as placeables if directly preceeded by a currency symbol?


Unfortunately no, as it is supposed to be written separated by the space.
However Studio will recognize 22mm as a measurment and can insert it in target correctly separated by a non-breaking space (setting in Auto Substitution for a specific language pair).
Please be aware, that all programs do have limits: if the human being creating the text is breaching all language specific typographic rules and types dates, number and so on in wrong format, no software can recognize that properly.
What Paul says about the numbered list is exactly as true, as the example with missing space in measurments - unfortunately most text authors do not pay any attention to such "unimportant" things.

From my experience it is easier to merge segments, when you need dotted numbers (1., 11. and so on) within the segment as to change the segmentation rules. Usualy these numbers are used for numbering in text and so can be kept outside of the segment. Because Studio allows you to deal with such numbers very quckly using the display filter, you can copy all such number to target and change the status of such number-only segment to translated or whatever with just few clicks.

What I however indeed do miss in Studio is the ability to replace strings like 111.xxx by 555.yyy automaticaly - this feature can be seen in Transit.


Direct link Reply with quote
 

Richard Hill  Identity Verified
Mexico
Local time: 00:43
Member (2011)
Spanish to English
TOPIC STARTER
I see Jul 26, 2011

Jerzy Czopik wrote:

From my experience it is easier to merge segments, when you need dotted numbers (1., 11. and so on) within the segment as to change the segmentation rules.


Hi Jerzy

I've seen your around the forum and taken some of your really useful advise even though some of it is beyond me. I don't understand how to change segmentation rules and don't feel ready to tackle that yet. I had read your post http://fra.proz.com/forum/sdl_trados_support/177254-studio_2009_auto_substitution.html and that is beyond me too. Anyway, no rush on that, I'll stay with the basics for a while yet.

Anyway, I do get a lot of documents with numbers in simple text and for now, rather than changing the segmentation rules I've just put "1." to "0." in the abbreviations list. I did so hoping that not only "1." would avoid segmentation but also "11." or "111." and so on but no, only numbers 1 to 9 avoid segmentation that way, but I guess I could put say numbers 1. to 50. in the abbreviations list as not too many lists go much further than that in the documents I translate, then again I'd have to add roman numerals too in upper an lower case and numbers with brackets etc. etc. but it might be worth the trouble as it only need be done once.

Staying from this point I'm messing around with creating AutoText dictionaries for work specific areas http://www.proz.com/forum/sdl_trados_support/204001-getting_the_most_out_of_autotext.html. For example I'm now translating a medical document so I pasted the contents of a massive MS Word medical dictionary to an exported autoTxt file which works fine but the inconvenience seems to be that I can't get rid of all those thousands of words in one go when I finish the project so as to import the next project-specific autoTxt file, but rather I have to hit delete thousand of times to get rid of them one by one, making it more difficult than I had hoped to set up project specific AutoText dictionaries rather than the massive autosuggest dictionaries.

thanks Jerzy


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 06:43
Member (2010)
Spanish to English
Oxford likes a space Jul 26, 2011

Jerzy Czopik wrote:

rich. wrote:
Also, is there a was to force Studio no recognize numbers as placeables if directly preceeded by a currency symbol?


Unfortunately no, as it is supposed to be written separated by the space.

... the text is breaching all language specific typographic rules and types dates, number and so on in wrong format, no software can recognize that properly.


Unfortunately in English the pound sign should come immediately before the numbers, without a space, or at least the Oxford Style Manual says so:

"Amounts in whole pounds should be printed with the £ symbol, numerals
and unit abbreviation close up: £2,542, £3m., £7.47m."
http://proxy.bookfi.org/genesis/306000/5ee387f04cfef5bdd5fbb32bcff81695/_as/[R._M._Ritter]_The_Oxford_Guide_to_Style(BookFi.org).pdf


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 06:43
Member (2010)
Spanish to English
Numbers in your TM Jul 26, 2011

rich. wrote:

Anyway, I do get a lot of documents with numbers in simple text


I wouldn't bother changing segmentation rules or merging segments with 1. etc. You only have to "translate" them once to get them into your TM and then you won't be bothered by them again in future files.

In my case (like most people I imagine) all the variations of
1.
1.2.
2.
2.1.
2.2.

are in my TM and get automatically transferred to the target text when I prepare a new project.

HTH,
Emma


Direct link Reply with quote
 

Richard Hill  Identity Verified
Mexico
Local time: 00:43
Member (2011)
Spanish to English
TOPIC STARTER
deeerrr! Jul 26, 2011

Emma Goldsmith wrote:

I wouldn't bother changing segmentation rules or merging segments with 1. etc. You only have to "translate" them once to get them into your TM and then you won't be bothered by them again in future files.

Emma


of course! I hadn't thought of that.
I need to make an effort to think logically learning Trados.
I'm used to opening a new program and tweaking a couple of settings and that's it. And only a couple of weeks ago I was really skeptical about Trados, but I'm turning around. It's getting addictive. Now that I mention it, it's 3am here and here I am again. ay ay ay!

thanks Emma


Direct link Reply with quote
 

Richard Hill  Identity Verified
Mexico
Local time: 00:43
Member (2011)
Spanish to English
TOPIC STARTER
I see, I think! Jul 26, 2011

SDL Support wrote:
You could probably edit the segmentation rules to not break after a number followed by a full stop but unless you have lots of documents that are always created in this way then I would not do this.
Paul


Emma Goldsmith wrote:
I wouldn't bother changing segmentation rules or merging segments with 1. etc. You only have to "translate" them once to get them into your TM and then you won't be bothered by them again in future files
Emma


Hi Paul.
All makes perfect sense. I need to learn to thing more logically.

thanks again.
rich


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Segmentation and placeable recognition issues - help needed

Advanced search







SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search