Tool for checking for word repetition in TUs of TM
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:44
Member (2006)
English to Afrikaans
+ ...
May 13, 2011

G'day everyone

Do you know of a standalone tool (or any tool) that can do a QA on a TMX (or other TM format) to check whether any segments have duplicated words in them? I don't mean words that occur right next to each other, but words that occur more than once in the segment. Ideally the matching should be case insensitive.

So if my TM has this:

SL: I wonder if this can be done.
TL: Ek wonder of dit kan gedoen kan word.

Then I would li
... See more
G'day everyone

Do you know of a standalone tool (or any tool) that can do a QA on a TMX (or other TM format) to check whether any segments have duplicated words in them? I don't mean words that occur right next to each other, but words that occur more than once in the segment. Ideally the matching should be case insensitive.

So if my TM has this:

SL: I wonder if this can be done.
TL: Ek wonder of dit kan gedoen kan word.

Then I would like that segment to be flagged because the word "kan" is repeated.

Any ideas? My main CAT format is uncleaned RTF, so a macro that could check such files would actually be the best option for me, but... a TMX checker would also do nicely.

Thanks
Samuel
Collapse


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 12:44
Dutch to English
xbench May 13, 2011

Hi Samuel,

I'm playing around with xbench as an alternative concordance app, and it looks like you can customize the searches pretty easily if you know what you're doing (I don't!). Have you had a look at it yet?

Olly


 
Oscar Martin
Oscar Martin
Spain
Local time: 12:44
English to Spanish
+ ...
Xbench May 13, 2011

Hi Samuel,

You can download Xbench from the ApSIC website. Install the software and add the TMX file. You can use regular expression to find duplicate words anywhere in the segment.

The following expression will find duplicate words in source or target.

"(<[:alpha:]+>)=1.*<@1>"


Change simple mode to regular expression mode and press Ctrl+P when searching (Powersearch).

Regards,

Oscar Martin

[
... See more
Hi Samuel,

You can download Xbench from the ApSIC website. Install the software and add the TMX file. You can use regular expression to find duplicate words anywhere in the segment.

The following expression will find duplicate words in source or target.

"(<[:alpha:]+>)=1.*<@1>"


Change simple mode to regular expression mode and press Ctrl+P when searching (Powersearch).

Regards,

Oscar Martin

[Editat el 2011-05-13 14:40 GMT]

[Editat el 2011-05-13 14:44 GMT]

[Editat el 2011-05-13 14:56 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Tool for checking for word repetition in TUs of TM







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »