Is there a simple way to strip TMs of tags?
Thread poster: Kroz Wado

Kroz Wado
Japan
Local time: 09:53
Japanese to English
Nov 18, 2009

I've received some really untidy TMs, with a very large number of needless tags scattered across each of 30,000 or so segments and the first thing I'd like to do is strip all the tags from them, what's the simplest and fastest way to do this?

(btw... have both Trados 2007 and 2009)

[Edited at 2009-11-18 04:44 GMT]


 

Giuliana Buscaglione  Identity Verified
Austria
Local time: 02:53
Member (2001)
German to Italian
+ ...
Why would you want to do that? Nov 18, 2009

Hello Kroz Wado,

... in case you have an exact match wordwise in a segment, the TM will insert the translation without tags (showing a lower match value though). Am I missing something?

Giuliana

PS the other way round, should you have a text full of those tags in future, you might not have any text inserted and be able to find a matching translation only via concordance search... not sure it's worth the step and the hassle... I have never removed any tag from my units...

[Edited at 2009-11-18 07:58 GMT]


 

Adam Łobatiuk  Identity Verified
Poland
Local time: 02:53
Member (2009)
English to Polish
+ ...
Try Olifant Nov 18, 2009

There are .Net and newer Java versions (which don't work for me, but do work for others). For more info please see here: http://okapi.sourceforge.net/

If you are sure those tags get in the way, go ahead and remove them, of course. But that could also affect the matching. You could also export the TM, strip the tags, and import back to the old TM, so you have both versions in one TM, and that could actually increase the number of matches.


 

FarkasAndras
Local time: 02:53
English to Hungarian
+ ...
Thinking out loud Nov 18, 2009

Kroz Wado wrote:

I've received some really untidy TMs, with a very large number of needless tags scattered across each of 30,000 or so segments and the first thing I'd like to do is strip all the tags from them, what's the simplest and fastest way to do this?

(btw... have both Trados 2007 and 2009)


If trados doesn't have a solution for that, you could just take a txt export, strip tags and reimport. I'm assuming these tags are enclosed in the usual < > brackets. There's a sed script floating around on the net that can do this on a file of any size in seconds. All you need to do is convert the basic trados tags in order to preserve them (TrU, CrD, Seg L= etc.), and then run the tag stripper script and convert the needed tags back.
Of course if Olifant can do what you need then that's probably more convenient for most people than screwing around with command lines (after installing gnuwin32 if you're on Windows...)

[Edited at 2009-11-18 10:35 GMT]


 

Kroz Wado
Japan
Local time: 09:53
Japanese to English
TOPIC STARTER
Better to get rid of them... Nov 18, 2009

At the moment they're more of a hindrance to smooth operation than a benefit and it seems unlikely that that will change.

I can't emphasize enough how many tags there are or how messily they're scattered across the segments.

I managed to do a really nice job with Olifant and the regular expression ... Thank you!

[Edited at 2009-11-18 11:09 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is there a simple way to strip TMs of tags?

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search