Translation from scanned images
Thread poster: Tatsu02

Tatsu02  Identity Verified
United States
Local time: 22:52
Member (2013)
English to Japanese
+ ...
Aug 15, 2013

Hi guys,

So I am asked to quote for Japanese to English translation from a scanned copy of a manual type document.
I have given my rate quote, but they seem to be asking for the total amount.

I have tried to convert the original Japanese pdf file into word file using Nitro Pro 8 (which has a feature to convert pdf files into other file formats), but was unsuccessful. (It doesn't recognize Japanese I think.)
It seems like the manual word count or some kind of calculation using a formula is required.

So assuming the translation rate is USD 0.10 per source Japanese letter, how much should I charge?
Also, assumes that ultimately I might have to type Japanese text first from viewing scanned pdf files first, then translate the text into English.
Additionally, the original file includes, different font sizes, different text orientation, charts with words and etc etc...

Please tell me your opinion or experience working in a similar assignment.

Thank you for any comments in advance.

-Tatsu


Direct link Reply with quote
 

Mårten Engelberg  Identity Verified
Switzerland
Local time: 07:52
Member (2003)
English to Swedish
+ ...
You shouldn't have to type... Aug 16, 2013

...all of it: for Optical Character Recognition I used the free http://en.wikipedia.org/wiki/Nuance_PDF_Reader for a number of years, in which opened PDF's are (were?) uploaded to their web site and then emailed back as doc etc. Now I'm using their PDF Converter 8 (50 USD) which is offline and therefore maybe a safer alternative. Also the free way is (was?) sometimes slower, when there's a lot of people using it/volume in one file.

I can't remember how good the results for Japanese were in the free Reader, but in the paid version they've been good most times, and only really bad when the PDF was almost illegible (which too often is the case with stuff from Japanese clients, bless their hearts...).

Or you can set a price per PDF page, of course. But it's still going to be less accurate an estimate than OCR+cleaning/estimating.

Either way, best of luck, ganbare!
Mårten

[Edited at 2013-08-16 00:23 GMT]


Direct link Reply with quote
 

Srini Venkataraman
United States
Local time: 00:52
Member (2012)
Tamil to English
+ ...
exception Aug 16, 2013

I think if the pdf is from jpg then OCR may not work.

Direct link Reply with quote
 

Tatsu02  Identity Verified
United States
Local time: 22:52
Member (2013)
English to Japanese
+ ...
TOPIC STARTER
Trying OCR, so far no success Aug 16, 2013

Marten,

Thank you for mentioning about OCR process.
I didn't know about it and trying it right now.
So far I have tried using OCR in few programs and not successful.


Srini,

Yes you are kind of right about it.



Actually, I'm not quite sure... OCR isn't working well because,
-the file is based on Japanese (non-alphabet language)
-the pdf is based on scanned images.

I'm D/L OCR tool specialize (or at least it claims so) in East Asian language.
If this doesn't work... then I might have to go the manual way... =/


Direct link Reply with quote
 

Elina Sellgren  Identity Verified
Finland
Local time: 08:52
Member (2013)
Finnish to English
+ ...
Copy+paste? Aug 16, 2013

I have 'highlighted' text in PDF files (with your mouse), then copy+pasted it into Word. All the formatting disappears but you should be able to get the word count that way, if that's the main thing you need. Not sure how well it works with Japanese characters though.

Direct link Reply with quote
 

Branka Ramadanovic  Identity Verified
Bosnia and Herzegovina
Local time: 07:52
Member
English to Croatian
+ ...
I normally Aug 16, 2013

try to charge more for PDF originals, although I do not work in Chinese, because this usually requires additional work of this or that kind. Or, I ask the client to supply an editable version.

Best,
Branka


Direct link Reply with quote
 

Sandra Peters-Schöbel
Germany
Local time: 07:52
Member (2007)
English to German
+ ...
the worst... Aug 16, 2013

Hi,
I often get this kind of documents as well (certificates, sent by fax to the agency and afterwards emailed to me). Normally I am using ABBY Fine Reader Professional for converting pdf to word which works quite well.
But all the mentioned methods do not work if you have a scanned document, because the whole text is saved as one picture.
You cannot copy any part of it, use the 'extract text' function or similar.
So you can neither use a CAT tool nor give an exact quote. I don't think it has anything to do with the Japanese characters.

But if you have a difficult formatting converting is most of the times useless anyway. The layout work is so difficult afterwards that you are faster translating in a new Word document and format afterwards.

But how do quote:
I simply go and count the words on a full page and assume the same count for the other pages, plus an additional fare for all the layout work (because this can mean quite some time...)

When quoting this way (which is in my favor) I always tell the client that when getting a word document I could give an exact quote, maybe give a small discount on repetitions and am much faster... They simply have to learn that we cannot work with a badly scanned PDF or fax but need the orginal document (which in the case of the PDF often is a PowerPoint or Word).
You could also offer to invoice on the target word count, but remember to add your layout work to the price...

Kind regards
Sandra


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 07:52
Member (2006)
English to Afrikaans
+ ...
Use the old-fashioned count method Aug 16, 2013

Tatsu02 wrote:
I have given my rate quote, but they seem to be asking for the total amount.


Take a few random lines, count the number of characters in them, then multiply the average characters per line by the number of lines. The price will be slightly inflated, but you can always offer a discount in the end, if you feel guilty about it.


Direct link Reply with quote
 

Sheila Wilson  Identity Verified
Spain
Local time: 06:52
Member (2007)
English
+ ...
Just make sure you're paid for your time Aug 16, 2013

Tatsu02 wrote:
ultimately I might have to type Japanese text first from viewing scanned pdf files first, then translate the text into English.

Surely, you'll just read the phrases in Japanese and type them in the target language if you can't convert it, won't you? As someone else mentioned, CAT tools are probably going to be useless on this job even if the file can be converted. So I can't personally see any circumstances where this would be necessary, or even advisable, but if you do it, make sure the client knows about it and pays for the time taken.
Additionally, the original file includes, different font sizes, different text orientation, charts with words and etc etc...

Does the client need all that formatting? That's going to take time if you're knowledgeable about these things, and lots of time if you aren't. Ask the client what their requirements are, bearing in mind that perfection will cost extra. If it's an agency, you can bet your bottom dollar that they are charging their client more for that formatting!

Remember, there's absolutely no interest in you working for half your normal hourly rate simply because the client wants something complicated that you can't deliver. The client must pay you correctly or go elsewhere. You're better off refusing the job and spending the free time researching how to deal with the next similar request (as you're doing here), so that next time you can approach the job differently. Sometimes, we just have to say "No". No client would ever pay me what it would take for me to deal with this type of job (in my pair, of course!), so I just politely refuse such jobs. Somewhere out there, there will be someone who can reconstruct that document in a flash, using all sorts of IT tricks that I know nothing about, and will charge their normal per-word rate plus 5% or so for formatting. They're welcome to the job.


Direct link Reply with quote
 

Simin Tan  Identity Verified
Local time: 13:52
Chinese to English
Target word rate would also work Aug 16, 2013

In cases like this, I use a target word rate (typically 1.5x source word rate for ZH-->EN) and impose a "premium" for extra work OCR-ing, etc.

Direct link Reply with quote
 

Tatsu02  Identity Verified
United States
Local time: 22:52
Member (2013)
English to Japanese
+ ...
TOPIC STARTER
Tried all these OCR programs Aug 16, 2013

ABBYY Fine Reader is the closest to the success after trying all these OCR programs.
(It's having discount right now too. =D)
It's OCR rate is around 60-70% I think.
Which I don't really like it for going back and forth to check whether it's OCR is correct.

The reason I would like to have pdf into proper word file at first is because CAT tool might be beneficial on this project considering it's technical (with repeated terms) and fairly large volume.
Just thinking about the benefit on using accurate and consistent words for the client.
Another benefit would be any future translation review usage in the future (which is also for the client).

Well I decided to provide the total project fee based on per page rate and also explained what work will be done and the final product at the end.

Thanks guys for all your posts! =]


Direct link Reply with quote
 

Łukasz Gos-Furmankiewicz  Identity Verified
Poland
Local time: 07:52
English to Polish
+ ...
Manual and semi-manual solutions Aug 16, 2013

You can always just type, whether or not you tell the client. Obviously, you can't type a long text in time to offer a reasonably quick quotation.

Samuel's solution based on average counts per line, page etc. is also good, especially if you can find some standard formula to rely on for credibility. Alternatively, you can just simply tell the client that the alternative is manually counting the words, so you suggest this or that method of approximation. Almost all clients should be reasonable and understand, and you really don't need to go to great lengths to avoid any remote possibility of charging a cent or two too high. Remember the approximate solution is just that, an approximation, and one that aims to makes lives easier by skipping the full manual count. So don't make it difficult.

Also, Sandra may be right in that just simply counting the stuff manually may be less time consuming than finding sophisticated ways around the problem. Sometimes it really takes less time to do the footwork than to avoid it.

Also, yeah, target count. I use target count in such situations. So do my agencies. There are some people who don't really understand this, but they'd normally realise that they aren't experts, so they shouldn't be too difficult to deal with. If they are, well, just put your foot down. You're the pro there.

Oh, and avoid the kind of OCR that's more trouble than it's worth. If OCR increases your workload instead of reducing it, dump the OCR.

Also, you could probably hire a student for typing if you need to. Get yourself a walk in the sunshine in the meantime.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translation from scanned images

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search