Document comparison wordcount
Thread poster: Locafix S.P.R.L
Does someone knows how to calculate the word count of a document comparison and the changes needed to be implemented ?
Is there a good software or trick for this ?
Thanks in advance !
| Using a CAT tool || Jul 6, 2016 |
I assume that you can have both documents in original form and updated form separately.
If I were asked what to charge for this, I would:
Make a "fake" TM off the original document,
Run the above TM on the changed document,
Charge 1 hour for the bother,
Using the CAT analysis, charge 60% of my full rate one the 85-99% bracket, 100% of my full rate for anything below 85% and zilch for 100% (which I would not even read).
The actual purpose of a CAT tool is precisely your scenario: estimate changes made and efficiencies gained on the update of a text.
Any other use is genuine perversion.
[Edited at 2016-07-06 10:38 GMT]
My attempt was to work out a sensible "wordcount" using a CAT tool, to get a basis for the effort needed to integrate changes in the original translation.
The idea was to then work manually (without TM) in the original translation and change when needed. A real time-consuming process more prone to errors and omissions I admit.
To be able to work in a CAT tool, there will be indeed the additional (chargeable) step of aligning original document with original translation to get a proper TM and work with it on the update.
By "fake TM" I meant copying source to target and generate a TM that's not really one, which is then used to figure out the amount of change between original and updated documents. This can be expressed in (weighted) "words" using a leverage grid (aka Trados grid), which can be seen as an estimate of the time needed to perform the changes in the original translation.
In fact, there is no actual wordcount to speak of on a document comparison, because an update can involve:
words replaced with other words
words moved around but not changed
Each of these categories require more or less work at translation-updating time, hence a "word" as a unit of effort needed is inadequate.
But I realise that the OP is not a translation agency, so it all may sound very obscure to a company not actually involved in translation... Maybe they should clarify what they want to do with such a "wordcount".
| || || |
| | Samuel Murray
Local time: 11:00
English to Afrikaans
| Same as Philippe || Jul 7, 2016 |
Philippe Etienne wrote:
* Make a "fake" TM off the original document,
* Run the above TM on the changed document.
Yes, if it was a new client, and if it appeared to me that the job would take more than 1 hour (my minimum fee is 1 hour), I would have done pretty much the same thing. Here's my description of it:
1. Create two versions of the source file, named e.g. all-accepted.doc and all-rejected.doc.
2. Create a source=target TM from the all-rejected.doc file.
3. Analyse the all-accepted.doc file against that TM, to get a weighted word count.
There are at least two ways of creating a source=target TM.
If you have an aligner (or if your CAT tool can't copy source to target for all segments):
(a) Load both all-rejected.doc and all-accepted.doc into your CAT tool, and perform an extraction on each of them (i.e. export two files that contain all the segments from those two files). Do not use the aligner directly, because the aligner might segment the files different from your CAT tool, and the aligner might not add the same formatting tags as your CAT tool does. So you end up with two files named e.g. all-rejected.txt and all-accepted.txt.
(b) Make a copy of the all-rejected.txt file (named e.g. all-rejected-2.txt), and then align those two files. Do not align all-rejected.txt with all-accepted.txt. The alignment process should be instantaneous and error-free, since it's the same file.
If you do not have an aligner (and if your CAT tool can copy source to target for all segments):
(a) Load all-rejected.doc into your CAT tool.
(b) Create a new, blank TM (and connect the new TM to the project, if your CAT tool requires this bothersome extra step).
(c) Copy source to target for all-segments.
(d) Add all those segments to the TM (e.g. by "cleaning up" or by marking the segments as "finalised", or whatever method is used by your CAT tool).
The actual purpose of a CAT tool is precisely your scenario: estimate changes made and efficiencies gained on the update of a text. ... Any other use is genuine perversion.
I don't quite agree that this method is the best in all scenarios, because one's productivity in updating the target text may depend on many factors, e.g. how related the two languages are, how literal the original translation was, how much the original translator's style is similar to your style, and document related aspects such as whether you would have fix formatting, your ability to use find/replace, etc. But I agree that for a new client, and for a longer file, this is probably the best simple method to determine a weighted word count.
| || || |
To report site rules violations or get help, contact a site moderator:
You can also contact site staff by submitting a support request »
Document comparison wordcount
|Translation Memory Software for Any Platform|
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
More info »
|Protemos translation business management system |
|Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!|
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info »