word count for scanned PDF files
Thread poster: Tai Fu

Tai Fu  Identity Verified
United States
Local time: 18:26
Chinese to English
Jun 6, 2010

The agency that I am now dealing with says that there could be PDF files submitted by their clients. I have looked at such PDF files before and it turns out they were scanned PDF pages therefore it is impossible to do a word count. The problem is not only you can't do a word count, but the usual trick I have for turning simplified chinese into traditional chinese for easier reading won't work, and CAT tools won't work either. The agency said they wanted to set a target word rate, I said it's impossible to bill at target word rate because there's no way to predict the number of target words until the job is complete, also the last job I did with them there were less than half of target words compared to source characters so that means I get paid next to nothing assuming I follow that scheme. So I told them that I would prefer billing on a per line or per page rate.

Has anyone received this type of files and how do you usually bill your client? Is there an easy way to OCR those scanned pages so I can do things like convert between variants of Chinese and being able to use CAT tools?


Direct link Reply with quote
 
Christiane Allen  Identity Verified
United States
Local time: 18:26
Member (2007)
English to French
What I would do Jun 6, 2010

I would charge the client by hour giving him/her a range of min/max hours in the quote.

Direct link Reply with quote
 

imatahan  Identity Verified
Brazil
Local time: 23:26
English to Portuguese
+ ...
You can do per hour/ per page Jun 6, 2010

You can charge per hour or per page.

A page has around X words, depending upon the letter width. You can calculate an average value.

Or you can charge per hour.

I work a lot with this type of files, because I work with legal translations, where you usually have copies of a process parts and no way to have in a digital form but PDF.

We make a an average of the pages count them and charge.


Direct link Reply with quote
 

Veronica Lupascu  Identity Verified
Netherlands
Local time: 03:26
Dutch to Romanian
+ ...
OCR Jun 6, 2010

You should try an OCR software, like ABBYY Fine Reader. I don't know if it works for Chinese, but it converts PDF (scanned) files into word (.doc) documents. It might be helpful in your case.

Then you can count the words and translate the document with any CAT Tool.


Direct link Reply with quote
 

Veronica Lupascu  Identity Verified
Netherlands
Local time: 03:26
Dutch to Romanian
+ ...
ABBYY - trial versions Jun 6, 2010

You may download a trial version from the link below. Read carefully and find the version that supports Chinese as well.

http://www.abbyy.com/Default.aspx?DN=6d4713f2-965c-4e8c-96c8-00158b569120&l=English&adxSearchText=trial%20version&Submit.x=0&Submit.y=0


Direct link Reply with quote
 

William [Bill] Gray  Identity Verified
Norway
Local time: 03:26
Member (2006)
English
+ ...
PDF files... Jun 6, 2010

... are always a pest when it comes to making use of good TMs built up over the years. I always try to OCR them (Omnipage 17 is my current tool), but it is not always good copy, and proofing is often just as time-consuming as typing out the whole thing. Then, of course, you don't get the benefit of an increased TM which will be helpful in the future.

An OCR scan does however let you get a word count. Otherwise, I always charge per target word. I tell the client they will just have to wait until the job is done, and that I will allow a 10% reduction in the number of words, since Norwegian has many more compound words than English, so the word count tends to be higher after translation. Most clients have been happy with this arrangement so far. I do have a few regular customers who I do "guesstimates" for sometimes.

Good luck with the dreaded PDF world!!



Direct link Reply with quote
 
xxxMaren Paetzo  Identity Verified
Germany
Local time: 03:26
Italian to German
+ ...
word count in graphics Jun 6, 2010

If you are looking for a tool to count graphics... I never tested it, but they announced that the new version of Anycount can even do word count in graphics like JPGs and scanned PDFs.
I'm still using an older version, but it is usefull and working fine, so I think I will soon upgrade to Anycount 7
...more information you find here http://www.translationmanagementsystem.com/word_character_line_count_software.html


Direct link Reply with quote
 

LEXpert  Identity Verified
United States
Local time: 20:26
Member (2008)
Croatian to English
+ ...
Set target rate based on expansion % Jun 6, 2010

Over the years, you probably have obtained a decent notion of the average source>target expansion or contraction rate in your pair(s). If you know what you would charge for source, just use the expansion rate to extrapolate a target rate.

Apologizing in advance if Chinese does not work that way for some reason...


Direct link Reply with quote
 

Shouguang Cao
China
Local time: 10:26
Member (2007)
English to Chinese
+ ...
Manual counting Jun 7, 2010

Wrestling with an OCR software just to get the word count can be even more time consuming.

What you can do is to do a rough manual count. You can count the lines, and words in each line and then you can do a simple arithmetic like this:

Words in a line X number of lines X number of pages.

Always works with me!


[修改时间: 2010-06-07 04:44 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

word count for scanned PDF files

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search