Pages in topic:   [1 2] >
pdf-word conversion
Thread poster: eva75

eva75
English
+ ...
May 7, 2005

I need to translate a pdf document in Trados, so therefore need to convert it into Word. Has anyone had hands-on experience of a pdf-word conversion programme? The pdfs I need to convert are heavily laden with images and text boxes. It would be useful if I could remove the latter during conversion.

I have looked at the pdf file in the "How to" section, but unfortunately the answer to my question cannot be found there!

Any help on this matter would be greatly appreciated.




[Edited at 2005-05-07 10:33]


Direct link Reply with quote
 
remyrosf
Local time: 18:51
English to French
Solid PDF ! May 7, 2005

http://www.solidpdf.com/

eva75 wrote:

I need to translate a pdf document in Trados, so therefore need to convert it into Word. Has anyone had hands-on experience of a pdf-word conversion programme? The pdfs I need to convert are heavily laden with images and text boxes. It would be useful if I could remove the latter during conversion.

I have looked at the pdf file in the "How to" section, but unfortunately the answer to my question cannot be found there!

Any help on this matter would be greatly appreciated.




[Edited at 2005-05-07 10:33]


Direct link Reply with quote
 
xxxTekkie
English to German
+ ...
The best solution is OCR May 7, 2005

remyrosf wrote:

http://www.solidpdf.com/[/quote]
The problem with PDF-to-text converters like Solid Converter PDF (which I have purchased) is that they extract the PostScript code from PDF files. If the original file from which the PDF file was generated has been formatted miserably (which is most often the case), the resulting text (e.g., Word) file will have the same lousy formatting.

The best solution is to use an OCR program capable of reading PDF files. I have used OmniPage 14 and ABBYY FineReader Professional Edition 7 to convert PDF files. The best results have been achieved with ABBYY FineReader.

OCR programs will not place a hard return at the end of every line, and are also capable of recognizing and removing hyphenations.

Get ABBYY FineReader Professional Edition 7, and you will be able to minimize problems when converting PDF files to text.

Direct link Reply with quote
 

Nick Lingris  Identity Verified
United Kingdom
Local time: 17:51
Member (2006)
English to Greek
+ ...
Quick Conversion May 7, 2005

Try this before you try any commercial products (most of them quite a headache).

In Adobe Reader (version 7, preferably), go to Edit and Copy File to Clipboard.

In a new Word document, do Paste or Paste Special (RTF).

Remove images using Find and Replace, with ^g (for Graphic) in the Find what field.

Depending on the complexity of your PDF, this might work for you.


Direct link Reply with quote
 
xxxTekkie
English to German
+ ...
Adobe Reader is completely useless for this purpose May 7, 2005

Santo Subito wrote:

In Adobe Reader (version 7, preferably), go to Edit and Copy File to Clipboard.

In a new Word document, do Paste or Paste Special (RTF).

Remove images using Find and Replace, with ^g (for Graphic) in the Find what field.

Depending on the complexity of your PDF, this might work for you.

This doesn't work at all because you'll have hard returns at the end of each line that will subsequently have to be removed manually which can be extremely time-consuming for larger documents.

Furthermore, the sequence of paragraphs becomes garbled.

The only useful solution is to use OCR software.


Direct link Reply with quote
 
Suzanne Blangsted  Identity Verified
Local time: 09:51
Danish to English
+ ...
PDF conversion May 7, 2005

I use PDF converter and it works great. After I store the PDF file in "my documents", all I have to do when I have MS Word open is to click on the PDF and the converter automatically converts it and opens it in the Word program.

I can also open a PDF in Omni-Pro (OCR program). This program installs in MS Word's task bar under "file" and will open when clicked, then ask for the program I want to open, which I then locate and it opens it up in Word.

I like both programs, but I prefer PDF converter, which I downloaded from Adobe. (www.adobe.com)


Direct link Reply with quote
 

Can Altinbay  Identity Verified
Local time: 12:51
Japanese to English
+ ...
I did this on a file with no graphics May 7, 2005

Santo Subito wrote:

Try this before you try any commercial products (most of them quite a headache).

In Adobe Reader (version 7, preferably), go to Edit and Copy File to Clipboard.

In a new Word document, do Paste or Paste Special (RTF).

Remove images using Find and Replace, with ^g (for Graphic) in the Find what field.

Depending on the complexity of your PDF, this might work for you.


Just Paste worked fine. The only problem was that the footnotes appeared as regular text at the end of the text for the original page from the PDF. I reinserted them as footnotes.


Direct link Reply with quote
 

Nick Lingris  Identity Verified
United Kingdom
Local time: 17:51
Member (2006)
English to Greek
+ ...
PDF converter? May 7, 2005

BLANGSTED wrote:

I use PDF converter and it works great. After I store the PDF file in "my documents", all I have to do when I have MS Word open is to click on the PDF and the converter automatically converts it and opens it in the Word program.

I can also open a PDF in Omni-Pro (OCR program). This program installs in MS Word's task bar under "file" and will open when clicked, then ask for the program I want to open, which I then locate and it opens it up in Word.

I like both programs, but I prefer PDF converter, which I downloaded from Adobe. (www.adobe.com)


I'm not familiar with a PDF converter from Adobe (apart from their web utility). I use Adobe Acrobat (not a free program), which does allow you to save a PDF as a Word document, but I have found the above-described Copy/Paste method to be much faster and more reliable than Acrobat, at least with simple documents.

There's a shareware program called PDF converter which will convert a PDF to txt, but is nothing to write home about. Could our dear colleague BLANGSTED provide more information about the provenance of her program?


Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 18:51
Member (2002)
English to Russian
+ ...

MODERATOR
Hi May 7, 2005

PDF files do NOT contain anything like "textboxes". They contain only text and graphics. There is no need to delete anything from the PDF before converting to word.

The best conversion results can be obtained with Finereader 7 (an OCR program). Please see the HOWTO for more information. The options of recognition in Finereader are very flexible, so you may finetune it exactly for your needs (for example, do not paste graphics into the recognized text, or whatever).


Direct link Reply with quote
 
xxxLia Fail  Identity Verified
Spain
Local time: 18:51
Spanish to English
+ ...
Solid PDF Converter May 7, 2005

remyrosf wrote:

http://www.solidpdf.com/

eva75 wrote:

I need to translate a pdf document in Trados, so therefore need to convert it into Word. Has anyone had hands-on experience of a pdf-word conversion programme? The pdfs I need to convert are heavily laden with images and text boxes. It would be useful if I could remove the latter during conversion.

I have looked at the pdf file in the "How to" section, but unfortunately the answer to my question cannot be found there!

Any help on this matter would be greatly appreciated.




[Edited at 2005-05-07 10:33]



I have used this and it's great BUT....it's leaves 'markings' (I don't know what you could call them) in the Word document. An example:

I converted a couple of PDFs I had received using this software. I found things underlined that hadn't been underlined in the original, also oddly (slightly) formatted tables, and lines in various parts of the text. I couldn't remove the lines or correct these problems, try as I might.

So now what I do is convert, then save as TXT, copy back into WORD, then reformat in Word. The work is proportional to the complexity of the formatting in the original, and simply isn't worth it sometimes. But it does mean being able to use a translation memory, and sometimes clients will accept that they won't get a text back exactly as they deliveered it, i.e. mimicking the PDF exactly.


Direct link Reply with quote
 

Marina Khonina  Identity Verified
Turkey
Local time: 19:51
Russian to English
+ ...
PDF conversion May 8, 2005

I have a similar problem with a document I just received from a client. It's a nearly 300-page manual in PDF. The client told me not to worry about the formatting, but the problem is that parts of the text are imbedded as graphics (maybe that's what eva75 means by text boxes).

Saving the text as a Microsoft Word document from Acrobat didn't help. I agree that ABBYY FineReader would probably be the best option. I used to have it installed on my computer, but it somehow got damaged after I reinstalled Windows XP, and I have the FineReader installation CD in my other home in another country.

I saw a new product of ABBYY called PDF Transformer ( http://www.abbyy.com/pdftransformer/ ) in stores recently, so I might just go ahead and buy it. If this happens, I'll share my experience of using it to convert PDF files to MS Word documents.


Direct link Reply with quote
 

Rina LS  Identity Verified
Serbia
Local time: 18:51
English to Serbian
+ ...
Maybe this could be helpful May 8, 2005

eva75 wrote:

I need to translate a pdf document in Trados, so therefore need to convert it into Word. Has anyone had hands-on experience of a pdf-word conversion programme? The pdfs I need to convert are heavily laden with images and text boxes. It would be useful if I could remove the latter during conversion.

I have looked at the pdf file in the "How to" section, but unfortunately the answer to my question cannot be found there!

Any help on this matter would be greatly appreciated.




[Edited at 2005-05-07 10:33]

Well, I have a PDFtoWord converter and it works just fine. Anyway, depends on volume of your document. If you are interested in it, I could e-mail it to you. However, you have to send me your e-mail address.
Regards,
Katarina


Direct link Reply with quote
 
Suzanne Blangsted  Identity Verified
Local time: 09:51
Danish to English
+ ...
Correction to PDF converter May 9, 2005

I had previously mentioned that I used the PDF converter from Adobe, and mentioned www.adobe.com. The PDF Converter is NOT from Adobe but from SCANSOFT. I downloaded it a long time ago and forgot where I got it from. When I read through these comments, I realized my error. It is a great program though, even if it is not from Adobe. See www.scansoft.com - PDF converter/PDF creator is listed for about US$50.00

Scansoft writes:
PDF Converter is the world's #1 selling solution for instantly turning PDF files into Microsoft Word documents and forms that you can easily edit - complete with text, columns, tables, and graphics.* PDF Converter 2 contains powerful new features that allow you to quickly convert proposals, contracts, letters and more into Microsoft Word documents for editing, saving valuable time and money. PDF Converter can even be used to extract charts and graphs from PDF files so they can be reused in Microsoft PowerPoint® and other applications with cut-and-paste ease!

And there's even more! PDF Converter lets you turn static PDF forms into editable Microsoft Word forms with a single mouse click. Release information trapped in PDF files and eliminate the time spent re-keying and laying out documents. PDF Converter lets everyone in your organization access, edit and share information more productively and efficiently than ever before!


Direct link Reply with quote
 

eva75
English
+ ...
TOPIC STARTER
Thank you! May 18, 2005

Thanks to all of you for your help. It seems as there is no perfect solution, but ABBYY and SolidPdf have proved the most efficient.

Direct link Reply with quote
 

Piotr Sawiec  Identity Verified
Local time: 18:51
English to Polish
+ ...
security May 19, 2005

I have been using FineReader to convert PDFs to Word files and it worked fine until I came across a well-secured document. As you are aware, security options in pdf files are quite extensive and can be modified in a variety of ways. And it can be secured agaist OCRs, in such a case you can neither copy/paste any part of the document nor OCR it. Did anybody learn how to deal with it (of course you can print the document, scan and OCR the scan, but it seems rather cumbersome in long term)?.

Piotr


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

pdf-word conversion

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search