Pages in topic:   [1 2] >
source text in 2 languages: how to create separate TMs?
Thread poster: KuaLanx

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
Dec 13, 2013

I couldn't find a topic on this. If there is one already, I'd be happy to read that and have this one deleted.

I'm preparing for a big project, and according to the information I have now, the source file will contain 2 languages: Dutch and Chinese. The majority of the text will be Dutch, with Chinese example sentences. I haven't decided whether I'm going to do this in Trados yet, I'm looking at my options.

I'd like to know if there is a way (ideally: an easy way) to save the two source languages in 2 separate TM's so that the Dutch-English as well as the Chinese-English TMs will both be available for projects in the future.

If the above is impossible (i.e. they are and will always be 1 TM unless I separate them manually), then I'd like to know whether a mixed TM will actually be workable. Has anyone done this? What are your experiences? Will Trados refuse to search/ use a mixed TM? Will Trados only find one of the languages in the TM and ignore the other?

I realize that when you create a TM, you have to choose the languages. I would initially save it as a Dutch-English TM, because 80% of the source is going to be Dutch. Could I save a copy of the same TM as a Chinese-English TM? (How?)

This is quite a complicated and lengthy question, I hope the situation I describe is clear and I'd be thankful for any thoughts on this.

edited (typo)

[Edited at 2013-12-13 11:57 GMT]


 

FarkasAndras
Local time: 09:22
English to Hungarian
+ ...
Where there is a will... Dec 13, 2013

...there is a way.
You can:
- Create an nl-en project, add the file, set up an nl-en tm for it. Translate the file, only confirm the nl-en segments. You can translate the zh-en segments, just don't confirm them.
- Set up a different TM in the same project (so that the nl-en tm is not update, only the new one is). Go through the file and confirm the zh-en segments. Export the second TM into TMX. Open the tmx file, replace the language code (NL-NL) with the language code of Chinese (ZH-CN I think). Create a new zh-en TM and import the modified tmx.

Alternatively, if you want to mix segments in the same TM, you can relabel the whole TM as described above.

[Edited at 2013-12-13 11:41 GMT]


 

xxxnrichy
France
Local time: 09:22
French to Dutch
+ ...
No it is very common Dec 13, 2013

KuaLanx wrote:

This is quite a complicated and lengthy question,


No this is a very common problem. I have this since 2001, since I started with Wordfast, nearly every day. Source files for some of my clients (bilingual team doing marketing brainstorming) are mixed French and English, target has to be Dutch. I did not seperate the languages, combined TMs here and there, and the TMs are "polluted" from a technical point of view. For Wordfast Classic this is not a problem.

For Studio, and for Wordfast Pro too, the source segments have to be labeled in the same way, with the same language codes, so one cannot mix up FR-FR and EN-US segments in the same TM. However, Studio doesn't care what is in it. So in my case the English source segments are also labeled FR-FR. I made an attempt in seperating them into two TMs but then I cannot attach both TMs at the same time, for the same translation.

It seems to me that the only way to solve this question is to use one TM for both source languages. If others have another solution, I would be happy to know about it.


 

MikeTrans
Germany
Local time: 09:22
Italian to German
+ ...
TM manager Dec 13, 2013

Hello KunaLanx,

generally the TM managment inside CAT tools are limited. Everytime you want to do major corrections or operations like you mention, you should use a TM managment tool.

An excellent freeware tool which I can refer you on is Olifant from Okapi Frameworks. This tool is able to:

- Filter on segments based on SQL or RegEx
- Many template filters for the most common operations relevant for translators
- Elaborated search / replace operations with SQL or RegEx
- Import from tab-text, from TMX; MULTI-LANGUAGES SUPPORTED
- Export to TMX or tab-text (after tab-text export you need to replace TMX conversions like ampersands , >, etc..)
- Join, split, segments based on markers (you can use search / replace to batch-insert markers)
- many additional features.

Here some hints:

Remove all TMX codes in segments:
Entries > Remove codes

To import:
File > Import ... (txt UTF-8, TMX)

To export:
TMX:
File > Export > Default
Tab-Text:
Chose all segments, CTRL-C, paste into Windows editor, save as UTF-8, replace remaining TMX codes



Please google to Olifant, Okapi Frameworks
I think the last version of Olifant is R22.

I hope this helps you,

Greets,
Mike



[Edited at 2013-12-13 13:17 GMT]


 

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
TOPIC STARTER
sounds like great advice, thanks! Dec 13, 2013

That sounds like great advice, Mike, thank you!

I'll try it out and maybe come back in this topic with questions if I encounter any problems.


 

MikeTrans
Germany
Local time: 09:22
Italian to German
+ ...
I have edited my post above... Dec 13, 2013

... some hints to get you started.

Mike


 

SDL Community  Identity Verified
United Kingdom
Local time: 09:22
English
Or you could tackle it the easy way.... Dec 13, 2013

... and just upgrade it in Studio. There’s an article on this here and doing it this way will allow you to group your languages as you see fit and create the TMs in one go : http://wp.me/p2xDjK-bR

Regards

Paul


 

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
TOPIC STARTER
Thank you! Dec 13, 2013

Thanks Paul, I'll read the article and see if that's easier.

Looks like I can download Olifant even if there is a solution in Studio for this particular situation. I see some other features that Mike lists that look useful.

Thank you for the extra info you added Mike, I'm pretty sure that the 2 source languages will work out fine either your way or the other way. I'll post my experiences in this topic for whoever stumbles on this and wants to know how it went.


 

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
TOPIC STARTER
Paul's link is great but not for my situation? Dec 13, 2013

I read Paul's article. It's a great feature, but reading it through I don't think it'll work for my situation, because I have a mixed source in the TM. (Mind you I haven't tried it out yet.)

My TM is not English - Chinese and English - Dutch, but I have 1 mixed Chinese/Dutch source document to be translated in its entirety into 1 English document (1 source: Chinese/Dutch, 1 target: English, containing translation of both Dutch and Chinese).

But it seems Mike's software will address this.


 

SDL Community  Identity Verified
United Kingdom
Local time: 09:22
English
If you are using Studio as the end tool... Dec 13, 2013

.... then you would have to do this:

1. Create TMs in the flavours you like
2. Import the TMX

This way the appropriate languages would be imported I think. I’d have to test it with your TMX but I think it should work. So I’d create two TMs, nl-en and zh-en and then import the same TMX into each.

Worth a try... it works with the ECDC multilingual TMs.

Regards

Paul

[Edited at 2013-12-13 14:42 GMT]


 

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
TOPIC STARTER
I think this will work Dec 13, 2013

I think I can conclude that many people would use a workaround without the Olifant software (but I'm curious and will download it to see what it does for me, after all it's freeware..).

I like FarkasAndras's solution (translating but not confirming the Chinese sentences and then confirming those with another TM set to the project, meaning that they'll end up in the CH-EN TM. It's the kind of workaround I likeicon_smile.gif Maybe this is also an idea for nrichy? It seems better than having a TM with a mixed source text. Mixed source is probably best if you always work with mixed source, but it limits the ways in which you can use the TM, doesn't it? For example, if you'd want to turn your TM into a NL-EN one for some reason, you'd end up with a lot of French cluttering your English target segments.

Other than that, I don't have a problem with a mixed source text TM, as long as I can be sure it won't lead to problems when Trados searches the thing. But I think I can conclude that I would even be able to search a mixed TM for Chinese, as long as I set it to Chinese when I use it in a Chinese project.

Thank you again everyone who has responded, you've helped me a lot.


 

SDL Community  Identity Verified
United Kingdom
Local time: 09:22
English
If you use... Dec 13, 2013

KuaLanx wrote:

But I think I can conclude that I would even be able to search a mixed TM for Chinese, as long as I set it to Chinese when I use it in a Chinese project.



... Any TM from the SDL OpenExchange as a TM provider in Studio then you can add any TM you like and be able to read/write as you see fit. This is quite handy because you can then keep your TMs in the right languages but still look up the others if there is something potentially useful to help you with your translation.

Agreed on Olifant... I’ve got it tooicon_wink.gif

Regards

Paul


 

FarkasAndras
Local time: 09:22
English to Hungarian
+ ...
Label segments Dec 13, 2013

SDL Support wrote:

.... then you would have to do this:

1. Create TMs in the flavours you like
2. Import the TMX

This way the appropriate languages would be imported I think. I’d have to test it with your TMX but I think it should work. So I’d create two TMs, nl-en and zh-en and then import the same TMX into each.

Worth a try... it works with the ECDC multilingual TMs.

Regards

Paul

[Edited at 2013-12-13 14:42 GMT]

That will work, but only if each segment is labelled correctly in the TMX file (ZH-CN or NL-NL). If you just translate the mixed source text and confirm all the segments, then export the TM, then obviously they will all be labelled with whatever language you specified when setting up the project.
You can of course open that mixed TMX and fix the language code of the Chinese segments with some selective manual search and replace, but that's probably going to take more time than separating them off during translation as I recommended above.


 

MikeTrans
Germany
Local time: 09:22
Italian to German
+ ...
Some simple RegEx examples... Dec 13, 2013

... with Olifant:


Always work with a copy of your file.

a)
Flag specific entries (CTRL-E);

non-chinese (hello KuaLanx!):

([ñõãöäúáíòó]) + ([a-z])

Export flagged entries and you have the non-chinese TM.
Delete all flagged entries and export, this is your chinese TM.

b)
Flag specific entries (CTRL-E);

non-french (hello nrich!):

[ñõãöäúáíòó]

or:

Almost all german text in a non-german project:

( für )|( und )|( der )|( die )|( ein )|([ö])|([ü])|([ß])

Garbage out:

(\A(\()([^a-z])+(\))\Z)|(\A([^a-z])+\Z)


No need to make acrobaties in Studio: it may get sick, he's already very delicateicon_smile.gif

Greets,
Mike


 

KuaLanx  Identity Verified
Netherlands
Local time: 09:22
Chinese to Dutch
+ ...
TOPIC STARTER
convinced Dec 13, 2013

MikeTrans wrote:

No need to make acrobaties in Studio: it may get sick, he's already very delicateicon_smile.gif



You know you've already convinced me to download Olifant and I'll be sure to return to this thread to read what to do with iticon_smile.gif
I think I'll try all the methods mentioned here sometime at the end of December and see which one saves me the most time.


That will work, but only if each segment is labelled correctly in the TMX file (ZH-CN or NL-NL). If you just translate the mixed source text and confirm all the segments, then export the TM, then obviously they will all be labelled with whatever language you specified when setting up the project.
You can of course open that mixed TMX and fix the language code of the Chinese segments with some selective manual search and replace, but that's probably going to take more time than separating them off during translation as I recommended above.


I think 'delicate' Studio can handle this procedure alright without getting sickicon_smile.gif


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

source text in 2 languages: how to create separate TMs?

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search