Help needed
Thread poster: Dijana Evans
Dijana Evans
Dijana Evans
United Kingdom
Local time: 06:20
English to Croatian
+ ...
Oct 17, 2014

We are taking on a project for the translation of bank statements, which are in scanned PDF format.

It would be helpful if we could have them converted to DOC with the formatting matching the originals as closely as possible, but I'm a bit wary of advertizing for the job due to privacy concerns.

Can anyone provide any tips, or point me in the direction of a reliable DTP expert (by PM if necessary)?

Any help would be enormously appreciated.


 
Henry Hinds
Henry Hinds  Identity Verified
United States
Local time: 00:20
English to Spanish
+ ...
In memoriam
OCR Oct 17, 2014

You can use an OCR (Optical Character Recognition) program, but often you end up with a mess that is more trouble than it's worth. You must use a program intended for the source language. The alternative is to re-create the original format as close as may be practical, or merely just so it can be understood. You should charge extra for working with difficult formats because they can be very time-consuming.

 
Dijana Evans
Dijana Evans
United Kingdom
Local time: 06:20
English to Croatian
+ ...
TOPIC STARTER
OCR Oct 17, 2014

To be honest, I would rather steer clear of OCR. The scan quality on some of these documents isn't that crisp. This is a legal client and we cannot afford a single mistake. What I would ideally like is someone with a bit of time on their hands who is willing to take some money to manually recreate all the pages, albeit by copying and pasting parts that are the same (headers, etc).

 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 07:20
German to Swedish
+ ...
Acrobat Oct 17, 2014

The built-in OCR in Acrobat Pro is quite good.

If these statements are all structured in the same way, setting up a couple of Word stylesheets (and headers/footers for repeated text) won't be too laborious. You could even use Indesign, which gives you much more sophisticated control.

And yes, if you use OCR you need to manually proofread every word and number...


 
Rodrigo Castillo H.
Rodrigo Castillo H.
Chile
Local time: 03:20
English to Spanish
A few tips Oct 17, 2014

Ehm as a DTP specialist, I can tell you that there's no reliable way to convert from PDF to Word, especially if the PDF contains a lot of tables or complex formatting (indents, bullets...). If you need to convert to Word, I'd just use OCR on the original PDF, but would export the result as plain text. It's usually easier and faster to reapply formatting to a whole plain text document than to fix badly formatted OCR files.
Another workflow would be to use OCR to generate a plain text file,
... See more
Ehm as a DTP specialist, I can tell you that there's no reliable way to convert from PDF to Word, especially if the PDF contains a lot of tables or complex formatting (indents, bullets...). If you need to convert to Word, I'd just use OCR on the original PDF, but would export the result as plain text. It's usually easier and faster to reapply formatting to a whole plain text document than to fix badly formatted OCR files.
Another workflow would be to use OCR to generate a plain text file, translate that file, and then DTP the final translation (either in Word or InDesign or what have you).
If you need any assistance, don't hesitate to contact me
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Help needed






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »