Exporting Tables from PDF files
Thread poster: Bharg Shah

Bharg Shah  Identity Verified
India
Local time: 02:07
French to English
+ ...
Nov 1, 2003

Hi all,

One of my clients has given me a bilingual glossary of about 400 pages as a PDF file. The terms are arranged in a table in 2 distinct columns. I was wondering if I could convert this into a 2-column Excel worksheet which I could then import into Multiterm. I tried saving the PDF file as RTF but it doesn't retain the table format and all terms are just listed one after the other. The PDF is editable text and not a scanned image so I guess there must be some way to extract the table. All help will be appreciated.


 

Natalie  Identity Verified
Poland
Local time: 22:37
Member (2002)
English to Russian
+ ...

MODERATOR
Try using good OCR software Nov 1, 2003

For example, FineReader Pro version 6 or higher. If your file is large, then divide it first into smaller parts using full version of Acrobat, otherwise opening file in FineReader would last for ages.

After having opened the file, recognize the text as usually and then choose "Send to Word". 99% of formatting will be saved.


 

Harry Bornemann  Identity Verified
Mexico
English to German
+ ...
Write a macro Nov 1, 2003

I would write a macro in Word-VBA or Perl.
First you could insert a sign like # after every second end-of-paragraph mark and then search and replace until you got a tab separated table.

400 pages might be too much for FineReader and even too much for Word. That's where Perl becomes interesting, it would do it within a few seconds.
HTH,
Harry

[Edited at 2003-11-01 12:04]


 

Mónica Machado
United Kingdom
Local time: 21:37
English to Portuguese
+ ...
Fine Reader 7 could be useful Nov 1, 2003

Hello,

Fine Reader 7 could be useful. You can download a trial version for 15 days (serch under Abby). If 400 pages is too much for it, split the document in two. Fine Reader 7 works ok with 270 pages (I have never tried more than that for each doc).

Hope this helps

Regards,
Mónica


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Exporting Tables from PDF files

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search