Document comparison wordcount
Thread poster: Locafix S.P.R.L

Locafix S.P.R.L  Identity Verified
Belgium
Local time: 13:00
Member (2016)
Jul 6, 2016

Hello,

Does someone knows how to calculate the word count of a document comparison and the changes needed to be implemented ?

Is there a good software or trick for this ?

Thanks in advance !

Kindest regards,


 

Philippe Etienne  Identity Verified
Spain
Local time: 13:00
Member
English to French
Using a CAT tool Jul 6, 2016

I assume that you can have both documents in original form and updated form separately.

If I were asked what to charge for this, I would:

Make a "fake" TM off the original document,
Run the above TM on the changed document,
Charge 1 hour for the bother,
Using the CAT analysis, charge 60% of my full rate one the 85-99% bracket, 100% of my full rate for anything below 85% and zilch for 100% (which I would not even read).

The actual purpose of a CAT tool is precisely your scenario: estimate changes made and efficiencies gained on the update of a text.
Any other use is genuine perversion.

Philippe

[Edited at 2016-07-06 10:38 GMT]


 

Manuela Ribecai  Identity Verified
Italy
Local time: 13:00
English to French
+ ...
What about aligment? Jul 6, 2016

I agree with Philippe, but I don't understand the "fake TM".
To my thinking, you need to align the segments… that could take more than an hour (depending on the file).

Philippe, could you please explain how to deal without aligment?

Thanks.

Manuela


 

Philippe Etienne  Identity Verified
Spain
Local time: 13:00
Member
English to French
Alignment Jul 6, 2016

My attempt was to work out a sensible "wordcount" using a CAT tool, to get a basis for the effort needed to integrate changes in the original translation.

The idea was to then work manually (without TM) in the original translation and change when needed. A real time-consuming process more prone to errors and omissions I admit.
To be able to work in a CAT tool, there will be indeed the additional (chargeable) step of aligning original document with original translation to get a proper TM and work with it on the update.

By "fake TM" I meant copying source to target and generate a TM that's not really one, which is then used to figure out the amount of change between original and updated documents. This can be expressed in (weighted) "words" using a leverage grid (aka Trados grid), which can be seen as an estimate of the time needed to perform the changes in the original translation.

In fact, there is no actual wordcount to speak of on a document comparison, because an update can involve:
words replaced with other words
words moved around but not changed
words deleted
words added
Each of these categories require more or less work at translation-updating time, hence a "word" as a unit of effort needed is inadequate.

But I realise that the OP is not a translation agency, so it all may sound very obscure to a company not actually involved in translation... Maybe they should clarify what they want to do with such a "wordcount".

Philippe


 

Manuela Ribecai  Identity Verified
Italy
Local time: 13:00
English to French
+ ...
You're right Jul 6, 2016

You're right, it was "only" about wordcount.

Each of these categories require more or less work at translation-updating time, hence a "word" as a unit of effort needed is inadequate.


I totally agree.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 13:00
Member (2006)
English to Afrikaans
+ ...
Same as Philippe Jul 7, 2016

Philippe Etienne wrote:
* Make a "fake" TM off the original document,
* Run the above TM on the changed document.


Yes, if it was a new client, and if it appeared to me that the job would take more than 1 hour (my minimum fee is 1 hour), I would have done pretty much the same thing. Here's my description of it:

1. Create two versions of the source file, named e.g. all-accepted.doc and all-rejected.doc.

2. Create a source=target TM from the all-rejected.doc file.

3. Analyse the all-accepted.doc file against that TM, to get a weighted word count.

There are at least two ways of creating a source=target TM.

If you have an aligner (or if your CAT tool can't copy source to target for all segments):

(a) Load both all-rejected.doc and all-accepted.doc into your CAT tool, and perform an extraction on each of them (i.e. export two files that contain all the segments from those two files). Do not use the aligner directly, because the aligner might segment the files different from your CAT tool, and the aligner might not add the same formatting tags as your CAT tool does. So you end up with two files named e.g. all-rejected.txt and all-accepted.txt.

(b) Make a copy of the all-rejected.txt file (named e.g. all-rejected-2.txt), and then align those two files. Do not align all-rejected.txt with all-accepted.txt. The alignment process should be instantaneous and error-free, since it's the same file.

If you do not have an aligner (and if your CAT tool can copy source to target for all segments):

(a) Load all-rejected.doc into your CAT tool.
(b) Create a new, blank TM (and connect the new TM to the project, if your CAT tool requires this bothersome extra step).
(c) Copy source to target for all-segments.
(d) Add all those segments to the TM (e.g. by "cleaning up" or by marking the segments as "finalised", or whatever method is used by your CAT tool).

The actual purpose of a CAT tool is precisely your scenario: estimate changes made and efficiencies gained on the update of a text. ... Any other use is genuine perversion.


I don't quite agree that this method is the best in all scenarios, because one's productivity in updating the target text may depend on many factors, e.g. how related the two languages are, how literal the original translation was, how much the original translator's style is similar to your style, and document related aspects such as whether you would have fix formatting, your ability to use find/replace, etc. But I agree that for a new client, and for a longer file, this is probably the best simple method to determine a weighted word count.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Document comparison wordcount

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search