Mobile menu

Pages in topic:   [1 2] >
Identifying/highlighting internal repetitions in a Word document
Thread poster: RobinB
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
Sep 1, 2006

Hi,

CAT tools (e.g. Trados) generally have an analysis function that indicates how many segments in a Word document repeat internally (absolute and percentage segment repetitions). However, all you get here is an indication that segments repeat, not *where* they repeat.

We haven't been able to find a way to indicate where segments repeat in a large document, e.g. by highlighting or similar, either in CAT tools or other utilities such as DeltaView.

Does anybody have an idea how to solve this problem? Are there any add-ons or utilities out there that can mark repeated segments or sentences in a document?

TIA,
Robin


Direct link Reply with quote
 

Endre Both  Identity Verified
Germany
Local time: 17:06
Member (2002)
English to German
DVX allows for highlighting and/or text colouring in Word Sep 1, 2006

With Atril's DéjàVu X, you have the possibility of exporting a document with colour indication (background highlight or font colour) of different segment types: duplicates, 100% matches, fuzzy matches, etc.

I think this is limited to Word or RTF documents, though.

Endre


Direct link Reply with quote
 

Christel Zipfel  Identity Verified
Partial member (2004)
Italian to German
+ ...
Maybe with the Word function "replace"? Sep 1, 2006

Hi Robin,
I came myself across quite recently this possibility.

You type the word or sentence/segment or whatever you look for and then under "replace by" (Extras->Suchen->Ersetzen->Ersetzen durch) you repeat the same word/sentence and so on and add perhaps an asterisk or so. Then looking for the asterisk, you find the repetitions themselves.

But you will still have to highlight them yourself...


Direct link Reply with quote
 

Victor Dewsbery  Identity Verified
Germany
Local time: 17:06
German to English
+ ...
At least within DVX they are marked Sep 1, 2006

When you have imported the job into DVX and do the word count (with "Count duplicate rows" activated), all of the duplicated rows are marked by a colour bar on the left. (I think the default colour is grey, but I have changed my colour to magenta red).

I know that it is possible to select just the duplicate rows in the DVX interface (so I can see instantly the 504 duplicate rows in a 682 row project I finished earlier today).
It would therefore be possible to show the duplicates in the "External view", which is a table export format showing the source and target text in 2 columns. I'm not sure off-hand whether they are colour-coded by default in this export format, but if not, there are tricks to do so (e.g. select just the duplicates, mark them all as "locked", and they are then colour coded as such).
This is not, of course, a colour-coded version of the original file in the original layout, but it does show the duplicates in a format that can be examined in Word.

To handle the "External view" format comfortably, it is easiest if you splash out on the "Workgroup" edition, which will set you back about 2500 euros. It is normally possible to get just about the same result with the "Professional" edition (about 900 euros), but it sometimes needs a couple of extra steps to do so.

Of course, you could go for the 30 day free demo (no functional limitations, you just have to get an activation code from Atril, which may take a couple of days if you catch them at a busy period).


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
Word replace? Sep 1, 2006

Hi Christel,

Thanks for your suggestion, but it's predicated on your knowing what you're looking for. What I'm confronted with is a 180-page document that Trados tells me has around 10% internal repetitions. I don't know what's repeated, or where, and that's what I need to know...

But thanks again.
Robin


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
DVX Sep 1, 2006

Endre,

Thanks. So what you're saying is that DVX will highlight repetitions in a "virgin" document, i.e. one for which there are otherwise zero TM hits (because there are no 100% or fuzzy matches in any memory). Is that right?

Robin


Direct link Reply with quote
 

Endre Both  Identity Verified
Germany
Local time: 17:06
Member (2002)
English to German
Virgin or (pre)translated documents Sep 1, 2006

RobinB wrote:
So what you're saying is that DVX will highlight repetitions in a "virgin" document, i.e. one for which there are otherwise zero TM hits (because there are no 100% or fuzzy matches in any memory). Is that right?


That's possible, yes – and it is the easiest task of all; you can get DVX to export the text with colour information after pretranslation or translation as well. Depending on your goals, you may need to do some SQL tweaking.

But just marking and exporting internal repetitions is very easy -- and by "exporting" I mean export of the source document with its original formatting; not DVX's "External View" feature (a two-column export without formatting), which Victor correctly pointed out as another alternative.

Endre



[Bearbeitet am 2006-09-01 14:06]


Direct link Reply with quote
 

Christel Zipfel  Identity Verified
Partial member (2004)
Italian to German
+ ...
Ok... Sep 1, 2006

RobinB wrote:

Hi Christel,

Thanks for your suggestion, but it's predicated on your knowing what you're looking for. What I'm confronted with is a 180-page document that Trados tells me has around 10% internal repetitions. I don't know what's repeated, or where, and that's what I need to know...

But thanks again.
Robin


I understood you knew what the repetitions are but wanted to know where exactly they are...


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 18:06
Member (2003)
Finnish to German
+ ...
Write a macro Sep 1, 2006

IN Word a macro would search for strings in a text-file (from a TM) in the document and would mark the found strings somehow.
Some agencies have such tools, they send Word-files, where known sentences are marked strike-through.
But I have no idea what tool they use.
Such a macro would be easy to write if on would know the technique

Regards
Heinrich


Direct link Reply with quote
 

Cecilia Falk  Identity Verified
Local time: 17:06
English to Swedish
Pre-translate with different color Sep 1, 2006

If you use Trados you could do the following:

* Export all repetitions.
* Create a new empty TM and import the repetition file.
* Pre-translate a copy of the document with that TM and a specific color for the 100%.

This will give you a color overview of the repetitions.

Hope this helps.

Best regards,
Cecilia




[Edited at 2006-09-01 21:07]


Direct link Reply with quote
 

Lucica Abil  Identity Verified
Romania
Local time: 18:06
Member (2005)
Italian to Romanian
+ ...
TextSTAT Sep 1, 2006

Try this programme for the analysis of texts (freeware)
www.niederlandistik.fu-berlin.de/textstat


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
Trados workaround Sep 1, 2006

Cecilia,

Many thanks for the tip - it's a workaround, certainly, but it does seem to do the job reasonably effectively.

I'll keep looking for a non-CAT solution, though (Word add-in or separate tool), preferably something that produces a report similar to a DeltaView file comparison.

Thanks again,
Robin


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
DVX #2 Sep 1, 2006

Victor, Endre,

Thanks for the info on DVX - looks interesting, but I think an excessive investment of time (and subsequently money) for what I think should (at least in theory) be a relatively simple routine. I'll keep looking for a Word add-in or other tool that will do the job.

Robin


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
Word macros Sep 1, 2006

Heinrich,

Thanks for the suggestion, though I'm not sure that Word macros are *ever* easy And the problem with the file I'm dealing with at the moment is that it is entirely virgin as far as TM is concerned, i.e. there *is* no TM.

Robin


Direct link Reply with quote
 
RobinB  Identity Verified
Germany
Local time: 17:06
German to English
TOPIC STARTER
TextSTAT Sep 1, 2006

Lucia,

I've tried out TextSTAT in the past, and it's certainly not useful for what I need. As a statistical analysis tool, I'd classify it as a toy system, as it doesn't even come anywhere close to what much older systems (such as System Quirk from the mid-1990s) have to offer in terms of functionality and granularity of analysis.

But thanks anyway.

Robin


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Identifying/highlighting internal repetitions in a Word document

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs