Is there a simple way to strip TMs of tags?
Thread poster: Kroz Wado

Kroz Wado
Japan
Local time: 03:48
Japanese to English
Nov 18, 2009

I've received some really untidy TMs, with a very large number of needless tags scattered across each of 30,000 or so segments and the first thing I'd like to do is strip all the tags from them, what's the simplest and fastest way to do this?

(btw... have both Trados 2007 and 2009)

[Edited at 2009-11-18 04:44 GMT]


 

Giuliana Buscaglione  Identity Verified
United States
Local time: 11:48
Member (2001)
German to Italian
+ ...
Why would you want to do that? Nov 18, 2009

Hello Kroz Wado,

... in case you have an exact match wordwise in a segment, the TM will insert the translation without tags (showing a lower match value though). Am I missing something?

Giuliana

PS the other way round, should you have a text full of those tags in future, you might not have any text inserted and be able to find a matching translation only via concordance search... not sure it's worth the step and the hassle... I have never removed any tag from my units...

[Edited at 2009-11-18 07:58 GMT]


 

Adam Łobatiuk  Identity Verified
Poland
Local time: 19:48
Member (2009)
English to Polish
+ ...
Try Olifant Nov 18, 2009

There are .Net and newer Java versions (which don't work for me, but do work for others). For more info please see here: http://okapi.sourceforge.net/

If you are sure those tags get in the way, go ahead and remove them, of course. But that could also affect the matching. You could also export the TM, strip the tags, and import back to the old TM, so you have both versions in one TM, and that could actually increase the number of matches.


 

FarkasAndras  Identity Verified
Local time: 19:48
English to Hungarian
+ ...
Thinking out loud Nov 18, 2009

Kroz Wado wrote:

I've received some really untidy TMs, with a very large number of needless tags scattered across each of 30,000 or so segments and the first thing I'd like to do is strip all the tags from them, what's the simplest and fastest way to do this?

(btw... have both Trados 2007 and 2009)


If trados doesn't have a solution for that, you could just take a txt export, strip tags and reimport. I'm assuming these tags are enclosed in the usual < > brackets. There's a sed script floating around on the net that can do this on a file of any size in seconds. All you need to do is convert the basic trados tags in order to preserve them (TrU, CrD, Seg L= etc.), and then run the tag stripper script and convert the needed tags back.
Of course if Olifant can do what you need then that's probably more convenient for most people than screwing around with command lines (after installing gnuwin32 if you're on Windows...)

[Edited at 2009-11-18 10:35 GMT]


 

Kroz Wado
Japan
Local time: 03:48
Japanese to English
TOPIC STARTER
Better to get rid of them... Nov 18, 2009

At the moment they're more of a hindrance to smooth operation than a benefit and it seems unlikely that that will change.

I can't emphasize enough how many tags there are or how messily they're scattered across the segments.

I managed to do a really nice job with Olifant and the regular expression ... Thank you!

[Edited at 2009-11-18 11:09 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is there a simple way to strip TMs of tags?

Advanced search







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search