Scanned PDF Files
Thread poster: Trevor Chichester

Trevor Chichester  Identity Verified
United States
Local time: 02:50
Member (2012)
German to English
+ ...
May 17, 2012

Good Afternoon All!

So...I was wondering, what's the percentage of scanned pdf's you guys do a year?

Strangely, more and more of my translations have been from dead pdf's. Right now, I'm working on 13K worth of dead pdfs and to be honest it is QUITE the headache to deal with this file format.

How do you guys combat this? Do you re-write the pdf? Or do you have an OCR converter?

I personally have a great OCR converter but that doesn't mean I don't have to wade through the entire file looking for errors before putting it into Trados.

How do you guys deal with these files?




Paulo Eduardo - Pro Knowledge  Identity Verified
Local time: 05:50
Portuguese to English
+ ...
have fun! May 17, 2012


Giles Watson  Identity Verified
Local time: 08:50
Italian to English
Money talks May 17, 2012

Trevor Chichester wrote:

How do you guys deal with these files?

By quoting a hefty (at least 30%) premium for working with them.

In practice, though, I don't do any. The client either comes up with a viable file format or goes elsewhere. I know plenty of translators who are quite happy to deal with scanned images but I'm not one of them.


Nikita Kobrin  Identity Verified
Local time: 09:50
English to Russian
+ ...
* May 17, 2012

Trevor Chichester wrote:
How do you guys deal with these files?

1) I ask the client to convert the PDF file into editable format (MS Word) and send it to me for translation (I accept only those converted files that are 100% identical to the PDF files from which they were converted).

2) If the client is not able to do 100% identical conversion himself I ask my DTP operator to do the conversion. In order to be able to compensate his work I charge the client extra. It's not cheap: in difficult cases the cost of conversion my be equal to the cost of translation.


[Edited at 2012-05-17 20:26 GMT]


Anton Konashenok  Identity Verified
Czech Republic
Local time: 08:50
English to Russian
+ ...
Just OCR it, but do it properly May 17, 2012

Nikita, your DTP operator seems to be overcharging you by a huge factor. In my own experience, OCRing a scanned text of decent quality (maybe even a good fax) has never taken me more than 10% of the time needed for translation, and I consider it good customer relations to offer it free of charge if a steady client sends me an occasional scanned document.
There is, however, an important point to remember: never run your OCR in fully automatic mode, nor allow it to format the paragraphs for you. I'm using FineReader, defining the recognition areas by hand (selecting text or table as appropriate) and saving the results as plain text. For very clear originals, I may decide to save as formatted text instead, but delete all paragraph styles created by FineReader before doing any further work - this way, I only keep character-level formatting (font size and bold/italic/underline). Recreating the necessary paragraph format by hand takes a small fraction of the time needed to straighten out the automatically generated formatting.


Nadezhda & Vatslav Yehurnovy  Identity Verified
Local time: 09:50
Member (2008)
English to Russian
+ ...
Pricing is often NOT meant to do OCRing May 18, 2012

We also have a friend who sometimes helps with OCRing and deep DTP wizardry, but completely agree with Nikita as for pricing extra per hour. And the originals in Word or other editable and not pre-OCRed formats really start to appear like magicicon_smile.gif

Well, sometimes miracles do not happen, and so the client pays per hour for re-creating the document versions from a scanned all-tables PDF with several consecutive changes of numbers in the cells.

Anton, how about a scanned 15-page document with numerous hardly legible handwritten memos with arrows etc., full of tables and block diagrams?icon_wink.gif

We just gave a quote for OCRing, drawing and typing, and received back the great Word file with everything intact, just in 3 hours.


Rolf Keller
Local time: 08:50
English to German
Online services vs. confidentiality May 18, 2012

Usage of such online services might compromise the confidentiality.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Scanned PDF Files

Advanced search

Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for users! Save over 13% when purchasing Wordfast Pro through Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search