Exporting Tables from PDF files
Thread poster: bharg

bharg  Identity Verified
Local time: 19:13
French to English
+ ...
Nov 1, 2003

Hi all,

One of my clients has given me a bilingual glossary of about 400 pages as a PDF file. The terms are arranged in a table in 2 distinct columns. I was wondering if I could convert this into a 2-column Excel worksheet which I could then import into Multiterm. I tried saving the PDF file as RTF but it doesn't retain the table format and all terms are just listed one after the other. The PDF is editable text and not a scanned image so I guess there must be some way to extract the table. All help will be appreciated.

Direct link Reply with quote

Natalie  Identity Verified
Local time: 14:43
Member (2002)
English to Russian
+ ...

Try using good OCR software Nov 1, 2003

For example, FineReader Pro version 6 or higher. If your file is large, then divide it first into smaller parts using full version of Acrobat, otherwise opening file in FineReader would last for ages.

After having opened the file, recognize the text as usually and then choose "Send to Word". 99% of formatting will be saved.

Direct link Reply with quote

Harry Bornemann  Identity Verified
English to German
+ ...
Write a macro Nov 1, 2003

I would write a macro in Word-VBA or Perl.
First you could insert a sign like # after every second end-of-paragraph mark and then search and replace until you got a tab separated table.

400 pages might be too much for FineReader and even too much for Word. That's where Perl becomes interesting, it would do it within a few seconds.

[Edited at 2003-11-01 12:04]

Direct link Reply with quote

Mónica Machado
United Kingdom
Local time: 13:43
English to Portuguese
+ ...
Fine Reader 7 could be useful Nov 1, 2003


Fine Reader 7 could be useful. You can download a trial version for 15 days (serch under Abby). If 400 pages is too much for it, split the document in two. Fine Reader 7 works ok with 270 pages (I have never tried more than that for each doc).

Hope this helps


Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Exporting Tables from PDF files

Advanced search

Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search