Help requested: PDF to Word: How to copy tons of numbers
Thread poster: David Jessop

David Jessop  Identity Verified
Spain
Member
Spanish to English
+ ...
Apr 6, 2009

Hello,

I am working on a 10,000 word Spanish to English translation in which perhaps 4,000 of the “words” are actually numbers and statistics from a whole mess of tables dispersed throughout the document. To make matters even more challenging, they don´t copy or end up in a text dump with Acrobat Reader. Does anyone have an idea of how to semi-automate this process so I do not spend all my time in Word typing out numbers into the translated document? Is the only option to open Illustrator and select the text and copy it as raw text or is there a better way? Any feedback is appreciated.

Best,
David


Direct link Reply with quote
 

Vladislav Badalov  Identity Verified
Russian Federation
Local time: 01:55
Russian to English
+ ...
Fine Reader is the answer Apr 6, 2009

Hi, David,

Fine Reader will surely recognise all your numbers and even tables!


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 19:55
English to Portuguese
+ ...
OCR or InFix Apr 6, 2009

David Jessop wrote:
... they don´t copy or end up in a text dump with Acrobat Reader...


There are two major kinds of PDF files: distilled and scanned.

From what you said above, I guess yours has been scanned. In this case you'll need OCR (Optical Character Recognition) software, such as OmniPage, ABBYY, ReadIris or some other, to convert those "pictures" back into text.

If your file has been distilled, i.e. it was converted into PDF by, e.g. "printing" from MS Word (or any other program) to Acrobat Distiller, InFix, from http://www.iceni.com is a PDF editor. You may keep the tables as they are, and edit the text alone on the PDF itself.


Direct link Reply with quote
 

Mihaela BUFNILA  Identity Verified
Romania
Local time: 01:55
English to Romanian
+ ...
Overwriting Apr 6, 2009

If you have the original text in Word, you might just overwrite the original with the target text and therefore leave the numbers just as they are.

If you have the original text in a picture format or on paper, you might use Abby FineReader http://www.abbyy.com to solve this.

HTH


Direct link Reply with quote
 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 01:55
Member (2008)
English to Russian
+ ...
FineReader Apr 6, 2009

but add "Numbers" to the list of languages used in ORC for better results.

Direct link Reply with quote
 

Sangeeta Joshi  Identity Verified
India
Local time: 04:25
Member (2009)
German to English
Try Select Table and Copy Table Option Apr 6, 2009

Unless the pdf fiel is a scanned document, I usually use the select table and copy table option in Acrobat Reader itself, when I have to work on tables with lots of numerical figures.

Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 01:55
Member (2003)
Finnish to German
+ ...
Communicate with the customer Apr 6, 2009

Tell them if they send you the original file (pdf is never an original file format) they can save lots of money as you only have to translate the text and can leave the numbers alone.

If they are not reasonalbe: Finereader is a possibility, but the formatting will be changed and it might look strange. But if the content in the tables must be translated there is no other way.

If you do not need to change anything in the table you might use snapshot software to take a picture from each table and insert it into a Word file.

Regards
Heinrich


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 18:55
English to French
+ ...
Use AutoUnbreak Apr 6, 2009

AutoUnbreak lets you copy content from within a PDF and retain the formatting, too. AutoUnbreak copies tables, no problem. The output is an RTF file.

http://digital.hollmen.dk/products/autounbreak/index.htm

Make sure you read the instructions on the website, though, to ensure you don't run into any trouble.


Direct link Reply with quote
 

Mikhail Popov
Singapore
Local time: 06:55
Member (2015)
English to Russian
+ ...
Solid Converter PDF Apr 6, 2009

Solid Converter is a nice program, if your PDF document was distilled so it contains text symbols and numbers, not pictures.
If you document is just a package of scanned papers, use ABBYY FineReader - it's the best program in such case.


Direct link Reply with quote
 
Marcus Geibel
Germany
Local time: 00:55
English to German
Copy and paste Apr 6, 2009

Do you need to edit these numbers?
If not, you can use an Acrobat Reader tool to copy and paste them as pictures into your Word file. Here's how to:

(I have got a German version only, so I do not know wether the items are exactly as I translate them)

Go to the "Tools" (German: Werkzeuge) menu, choose "Select and zoom" (Auswählen und zoomen, should be the first option in the pulldown menu) and then "Snapshot tool" (Schnappschuss-WErkzeug, should be the bottom item)

Then go to the text you want to copy and draw a selection frame around it by placing the selection tool in one corner and - with mouse button pressed - moving the cursor over the entire text. Release the mouse button when all text to be copied is within the frame, it will then be copied to the clipboard.

From there you can simply paste into your Word document.
There, you can edit it as any graphic (right-click and select from context menu)

Hope this helps.


Direct link Reply with quote
 

trebla
Canada
Local time: 18:55
French to English
PDF Problems Apr 7, 2009

Whoever invented PDF should be taken out and shot!

I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course.

When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting.

Finally, I go through and reconstruct the original in word, putting the original formatting back in.


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 18:55
English to French
+ ...
Questions to trebla Apr 7, 2009

I am wondering about two things:

1. Why don't you charge for the extra time required to massage those PDFs?
2. Why don't you simply process your original PDF file using OmniPage? OmniPage is really efficient and can produce great results, although you really have to invest some time to learn to use it right. You seem to be going into a lot of unnecessary trouble to create your editable versions...

[Edited at 2009-04-07 15:36 GMT]


Direct link Reply with quote
 
FarkasAndras
Local time: 00:55
English to Hungarian
+ ...
burn at the stake Apr 7, 2009

trebla wrote:

Whoever invented PDF should be taken out and shot!

I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course.

When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting.

Finally, I go through and reconstruct the original in word, putting the original formatting back in.


Fully agree with the sentiment. Pdf is my worst enemy.
If I can't copy and paste (scanned doc) I usually just resign myself to having to work from the pdf itself. I'm not really a fan of OCR in general, although that's starting to change.


Direct link Reply with quote
 

David Jessop  Identity Verified
Spain
Member
Spanish to English
+ ...
TOPIC STARTER
Thanks! Apr 16, 2009

Thank you for everyone´s tips! I ended up using ABBYY FineReader. This did the trick for some of the tables but others came out really badly when transferring to Word. I had to spend a lot of time recreating them.

Best,
David


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Help requested: PDF to Word: How to copy tons of numbers

Advanced search






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search