TM containing sentences in another language
Thread poster: alessandra bocco

alessandra bocco  Identity Verified
Local time: 10:20
Member (2006)
English to Italian
+ ...
Dec 15, 2010

Hi all,
one of my best clients (agency) sent me a big translation project (about 150.000 words) together with its TM, provided by the end client. The quality of this TM is good but the problem is that some sentences included in it were translated from English into German, instead that from English into Italian. Of course the TM target language is Italian. This is rather annoying for 2 reasons: first of all the German sentences appear first in the concordance window and I have to lose a lot of time to scroll down the list until the Italian sentences appear; then, in the analysis of the file are included the German sentences too so the number of repetitions and 100% matches is not correct.
I told the agency about it but they said thay can't do nothing since they didn't create the TM and the client does not have other TMs for that job. The only thing they can do is to count all the words as new words, but I don't think it's fair (although really good for me!).
Any suggestions on how to delete those German sentences?
Thanks a lot!
Alessandra

P.S. I'm working on Trados 8.0


Direct link Reply with quote
 
FarkasAndras
Local time: 10:20
English to Hungarian
+ ...
Hope they were dropped in together Dec 15, 2010

alessandra bocco wrote:

Hi all,
one of my best clients (agency) sent me a big translation project (about 150.000 words) together with its TM, provided by the end client. The quality of this TM is good but the problem is that some sentences included in it were translated from English into German, instead that from English into Italian. Of course the TM target language is Italian. This is rather annoying for 2 reasons: first of all the German sentences appear first in the concordance window and I have to lose a lot of time to scroll down the list until the Italian sentences appear; then, in the analysis of the file are included the German sentences too so the number of repetitions and 100% matches is not correct.
I told the agency about it but they said thay can't do nothing since they didn't create the TM and the client does not have other TMs for that job. The only thing they can do is to count all the words as new words, but I don't think it's fair (although really good for me!).
Any suggestions on how to delete those German sentences?
Thanks a lot!
Alessandra

P.S. I'm working on Trados 8.0

Hopefully, it was not a methodical mixing of various languages, just a one-time messup.
If so, you should do a TMX export and have a look inside. Hopefully, the German sentences will be in one block and you can just delete those segments from the TMX. Reimport into a new TM and you're done. You could also look to see if the German segments have a differentiating feature (dates, creator name etc.)


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 10:20
Member (2005)
English to Spanish
+ ...
Filter by date Dec 15, 2010

Indeed things are much simpler if it was a one-time mishap. In that case, you could perhaps export the memory as recommended by the colleague, see on what date the mishap took place, and then use File > Maintenance and the Filter option to filter all segments from that date. This way you can easily delete them.

Direct link Reply with quote
 

alessandra bocco  Identity Verified
Local time: 10:20
Member (2006)
English to Italian
+ ...
TOPIC STARTER
Thanks a lot but... Dec 15, 2010

I have exported the TM but it looks like the segments in German (there are even some in French!) were mixed randomly with the Italian segments: no particular date or user. The TM is about 15.000 TUs long and deleting each segment separately is a very long job...
No other suggestions?
Alessandra


Direct link Reply with quote
 

Luisa Ramos, CT  Identity Verified
United States
Local time: 04:20
Member (2004)
English to Spanish
Reorganize Dec 15, 2010

Would reorganizing the TM help in some way? Could it maybe group together by language or by date?

This is only a suggestion. I have not tested this procedure, and I do not know whether this is something that the reorganization would do. I guess there is no harm in trying if that means you would be able to apply some of the ideas previously suggested, and perhaps solve your problem.


Direct link Reply with quote
 
FarkasAndras
Local time: 10:20
English to Hungarian
+ ...
Solution... Dec 15, 2010

alessandra bocco wrote:

I have exported the TM but it looks like the segments in German (there are even some in French!) were mixed randomly with the Italian segments: no particular date or user. The TM is about 15.000 TUs long

Print out a paper copy, have it bound and hit the client over the head with it. That's about your best option at this point.
In all seriousness, you could ask the client for a proper TM or the source files this was made from. If you're lucky, they may have all the Italian TMXes saved in one place... but something tells me this is not the case.
The TM you have now is probably beyond hope. If the intruding segments don't have any differentiating feature (note added to them, different username, a couple of specific dates or date ranges) then there is no way of filtering them out automatically.*


*Well, there are tools for automatic language identification but that's overkill here and it probably won't work well for individual sentences. Those things are designed for texts, not tiny text fragments.


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 10:20
Member (2005)
English to Spanish
+ ...
Another potential risk Dec 16, 2010

Indeed automatic language detection would be a bit of an overkill here unless some tool (for instance Microsoft Word), could identify the language on a sentence basis. I tried but it did not work, at least on my machine.

There is also an additional risk here: that the customer has updated and/or cleaned the same segments twice, once with Italian files and once with German files, for instance. If the customer has enabled the Merge function when cleaning files, the result is a memory which contains several translations for the same source segment, but in several languages.

This multilingual memory is supported by the TMX format, but is not supported by Trados 2007, which will mark all target sentences as being in the target language of the memory.

The only solution I see here is to try to find the segments by spellchecking an export of the memory (previously hiding all tags and the source segment), manaually marking alien segments with some mark so that they can be filtered out after creating a new memory with the export, and/or trying to identify blocks of segments that were added to the memory in one go, via a cleanup on some day and at some time in particular.


Direct link Reply with quote
 

alessandra bocco  Identity Verified
Local time: 10:20
Member (2006)
English to Italian
+ ...
TOPIC STARTER
Thanks everybody Dec 16, 2010

Things are exactly as explained by Tomàs: the memory contains several translations for the same sentence but in several languages...
My client is in Israel so it's not so easy for me to hit the client with a printed copy of the TM but it would be a great idea!!


Direct link Reply with quote
 
FarkasAndras
Local time: 10:20
English to Hungarian
+ ...
Not Word Dec 16, 2010

Tomás Cano Binder, CT wrote:

Indeed automatic language detection would be a bit of an overkill here unless some tool (for instance Microsoft Word), could identify the language on a sentence basis. I tried but it did not work, at least on my machine.

I don't think MS Word ever tries to identify languages. That's not in its job description.
E.g. when you translate with Trados Workbench, target language sentences are marked as the target language by Trados, not Word.

I really don't think it's worth the trouble in this case, but this is the tool one would use for the job:
http://software.wise-guys.nl/libtextcat/
Basically, it builds up a "fingerprint" i.e. a list of distinguishing features of a language from a large corpus. Then you pass it any text and it tells you if that text matches the fingerprint. There are premade fingerprints for many languages on the site.
So, in principle, you could install libtextcat, grab the Italian fingerprint, export the TM to TMX, convert the TMX to a tab delimited file, grab the target language column from it and feed it to libtextcat in a loop, one segment at a time. Then record the Italian/Not Italian labels it spits out and filter the tab delimited file based on that information, convert it back to TMX and you're done... yeah, it's probably better to ask the client for the originals:)

[Edited at 2010-12-16 09:06 GMT]


Direct link Reply with quote
 
FarkasAndras
Local time: 10:20
English to Hungarian
+ ...
Multilingual TMX Dec 16, 2010

Tomás Cano Binder, CT wrote:

There is also an additional risk here: that the customer has updated and/or cleaned the same segments twice, once with Italian files and once with German files, for instance. If the customer has enabled the Merge function when cleaning files, the result is a memory which contains several translations for the same source segment, but in several languages.

That's almost certainly what happened.

Tomás Cano Binder, CT wrote:
This multilingual memory is supported by the TMX format, but is not supported by Trados 2007, which will mark all target sentences as being in the target language of the memory.

Not quite. The TMX format supports having various languages in the same file, each labeled with its own language code. If that was what the OP had here, I'm pretty sure Trados 2007 would have automatically picked out the two correct languages (the languages of the Trados TM you import into) and the extra languages wouldn't have been noticed at all. Even if T2007 had no support for multilingual TMX files, it would be trivially easy to generate an Italian-only TMX with Studio or other tools.
What must have happened here is that the clueless client cleaned up a bunch of billingual Word documents into the same TM, regardless of what languages they contained. Thus, all languages got (mis)labeled as Italian, which is not supported by any tool.


Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 10:20
Member (2005)
English to Spanish
+ ...
Indeed Dec 16, 2010

FarkasAndras wrote:
What must have happened here is that the clueless client cleaned up a bunch of billingual Word documents into the same TM, regardless of what languages they contained. Thus, all languages got (mis)labeled as Italian, which is not supported by any tool.

Indeed this is what it looks like: that there was a high number of segments containing German and French text but identified as Italian...


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

TM containing sentences in another language

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search