How do various CAT tools handle text language settings?
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 09:19
Member (2006)
English to Afrikaans
+ ...
Nov 13, 2013

EDITED: See my next post in this thread for an update and an updated test file.

==

Hello everyone

I tested this in Wordfast Pro, Trados 2011 and OmegaT, so I'm hoping some of you might be willing to tell me what happens in your CAT tool. The question is whether an MS Word file's language settings are transferred into the translated version of the file.

For this I created a test document that consists of three sentences, separated by a single line break. My MS Word's default language is Afrikaans. I selected the middle sentence and marked it as "English". I then selected a word in the middle sentence and marked it as Zulu. And then I translated it in the three CAT tools. You can get the test document here:

http://wikisend.com/download/135898/test.docx

Wordfast Pro: When I opened the file in WFP, there were no tags to indicate that WFP intended to retain the Zulu-ness of the one word marked as Zulu. The project's languages in WFP was Afrikaans to English. In the final target file, the entire text was marked as Afrikaans (all three sentences, all words).

Trados 2011: Similar to WFP, the file did not contain any tags in Trados, so I expected that my lone Zulu word would be lost. And it was. The project's languages in Trados was Dutch to Hebrew (the only other languages that I could select, due to the 5-language limit in Trados), and in the final target file, the text was all marked as Dutch... except for when I moved my cursor to the very end of the line, where it became Hebrew (so the document's default language was set to Hebrew by Trados, and all the text was marked as Dutch).

OmegaT: OmegaT was the only tool that retained the Zulu-ness of the lone Zulu word, using tags. The text in the translated file was marked as exactly the same languages as in the source file, even though I had translated the file. So, the project's language setting in OmegaT does not affect the target file, but OmegaT does retain all of the language settings for various pieces of text in the document.

What do your CAT tools do?

Samuel



[Edited at 2013-11-14 13:12 GMT]


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 10:19
Finnish to French
Practical benefits? Nov 13, 2013

FWIW, memoQ doesn't treat the word specified as being isiZulu in any particular way. I don't see what potential benefits there would be in any CAT tool treating that word in a special way, for instance by enclosing it into tags. Maybe if it used a different character set?

Compare this source segment (as displayed in memoQ's editor):



with this one:



In the first case, the word спасибо was specified as being Russian (while the rest of the segment was English UK). In the second case, the whole segment was English UK. I don't think there's any difference between the two when translating the segment into the target language. As a tag hater, I'd prefer the tagless version.


Direct link Reply with quote
 
Transit NXT + unclear reason Nov 14, 2013

Same results with Transit NXT SP7 translation from Afrikaans to English (US):
1) No tags for "zuluness"
2) After the export, English (US) is assigned to the entire text.

I wonder what you expected the CAT tool to do.
If the Zulu part shall be left untranslated, I'd rather assign a special character format. Then the CAT tool can be adapted to treat Zulu-formatted text as write-protected inline tag. Doing so, you make sure that the Zulu-content will remain unchanged.
Even if you don't adapt the tool, you will have Zulu-tags in the editor and can re-assign the Zulu language after the export easily in Word.

And I agree with Dominique: Tagging the language settings would create a huge number of usually unneeded/unwanted tags.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 09:19
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Dominique and @Oiseau Nov 14, 2013

Dominique Pivard wrote:
MemoQ doesn't treat the word specified as being isiZulu in any particular way.


Erm... it does. It tags it. Don't it?

I don't see what potential benefits there would be in any CAT tool treating that word in a special way, for instance by enclosing it into tags.


No, but it is interesting to see how different CAT tools do this differently. It relates to the assumptions of the designers of the tool.

Suppose I'm translating from Xhosa to Afrikaans, then I would welcome it if all text that was marked as "Xhosa" in the source document would be marked as "Afrikaans" in the target document, at least. But not all CAT tools do that -- they all do different things.

Otherwise I would have to mark it as Afrikaans myself, afterwards, which is unreliable because I'd have to be really really sure that I've marked all text as the right language. If some footnote or text box contains Afrikaans but is still marked as Xhosa, then MS Word will not warn me about spelling errors in that footnote or text box when I perform a spell-check on the target text (unless I also have a Xhosa spell-checker installed).

The only reason why I marked one word as "Zulu" in my test file was to see what the CAT tool would do to a single word in mid-sentence. I rarely encounter that in real life, and I think there may be several valid ways of handling it in a CAT tool which might please some but not others.

Oiseau noir wrote:
Same results with Transit NXT SP7 translation from Afrikaans to English (US):
1) No tags for "zuluness"
2) After the export, English (US) is assigned to the entire text.


Okay, thanks.

If the Zulu part shall be left untranslated, I'd rather assign a special character format.


I'm working from the assumption of a regular user, not a power user. Pre-editing the file before feeding it to your CAT tool to e.g. assign character formatting is something that CAT hackers do.

Here's an example of how changing the entire file's language may be less than useful for the translator. Suppose you're just an ordinary CAT user, and the file that you have to translate contains sections that must stay in the source language and sections that must be translated. The sections that should not be translated may be marked as "do not spell-check", and in the CAT tool you'd simply skip over them, but if the CAT tool changes all text to "spell-check in English" then it will be very cumbersome for you to do a spell-check on the final file, since all the text that used to be marked as "do not spell-check" are now not only marked as "do spell-check" but marked in a language that is different to the text's actual language.

However, I did not "expect" the CAT tool to do anything, because just as I came up with the above example, so too can you come up with an example of how your tool's behaviour is more useful.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 09:19
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
2nd test Nov 14, 2013

Samuel Murray wrote:
EDITED: See my next post in this thread for an update and an updated test file.


All right, based on the replies so far I have discovered that my original test doesn't mean much, so I created a second test (which you're welcome to try out in your CAT tool) to see how the tools handle text that are marked with a certain language.

http://wikisend.com/download/772030/2ndtest.docx

This new file consists of more sentences and more sentences per paragraph, and more languages, and some sentences are marked as "do not spell-check" and others aren't.

The following applies to my three CAT tools:

OmegaT: The target file retains the exact language marking structure as the original file. Every language change is a pair of tags. Oddly, some language tags caused OmegaT to not split the paragraph into sentences at that point.

Wordfast Pro: The entire target file is set to the source document's default language and is set to "do spell-check". A single tag at the start of paragraph 3 (and 1). Omitting the tag causes the directly preceding line break to be marked as the language that that line break was originally marked as in the source file, but does not otherwise affect the language settings.

Trados 2011: The entire target file is set to the project's target language and is set to "do spell-check". Not a single tag in sight.

Of these three, Trados 2011's approach seems the most useful to the garden variety translator.

Added: Wordfast Classic: The individual segments that were translated are changed to the TM's target language, and all pieces of text (e.g. spaces between sentences) retain their language setting from the source text. All text is marked as "do spell-check". I include this, but WFC works on the original file, so it doesn't really count in this test, does it?

Samuel


[Edited at 2013-11-14 13:25 GMT]


Direct link Reply with quote
 
Transit keeps "do not spellcheck" setting Nov 15, 2013

I'm working from the assumption of a regular user, not a power user. Pre-editing the file before feeding it to your CAT tool to e.g. assign character formatting is something that CAT hackers do.


IMHO, using character formats in Word are no pre-CAT-editing, but using Word as it is intended to be used. That does not require "Word hackers", but just very basic Word knowledge.
But I agree: A lot of Word users prefer to work inefficiently and expect other tools to correct their mess.

The sections that should not be translated may be marked as "do not spell-check"...


That's a different question compared to your original post and test file.

Good news: Yes, Transit NXT keeps the "do not spellcheck" setting.

I opened your test file In Word and assigned "Do not spellcheck" to the word "two".
- After the import, Transit display tags for the word "two".
- After the export, "two" is still "Do not spellcheck".
Conclusion: The "Do not spellcheck" property is not when translating the Word document in Transit.

I wonder how the other CAT tools handle the "Do not spellcheck" property...


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 09:19
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
do-not-spellcheck combined with no language change Nov 15, 2013

Oiseau noir wrote:
I opened your test file In Word and assigned "Do not spellcheck" to the word "two".
- After the import, Transit display tags for the word "two".
- After the export, "two" is still "Do not spellcheck".
Conclusion: The "Do not spellcheck" property is not when translating the Word document in Transit.


Interesting... and unexpected. I had assumed that the language change code and the code for enabling or disabling spell-checking would be the same, and so if a tool doesn't support language change then it would not support the do-not-spellcheck option either... but Transit does. This means I have to retest my three tools with this new assumption...

OmegaT: retains the do-not-spellcheck setting
WFP and Trados 2011: does not retain the do-not-spellcheck setting


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How do various CAT tools handle text language settings?

Advanced search







LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search