https://www.proz.com/forum/translation_theory_and_practice/50616-no_usable_doc_after_pdf_conversion.html

No usable .doc after .pdf conversion
Thread poster: Evelyne Morel
Evelyne Morel
Evelyne Morel  Identity Verified
France
Local time: 13:07
English to French
Jul 3, 2006

Hi guys,

If someone could help me with that. I received a pdf document to be translated and returned into pdf. Right, I already managed to convert pdf into word doc sucessfully but for this one I cannot manage to obtain something workeable to be translated. I tried :
- using Microsoft Office Word Imaging (scanning and then exporting into .doc version)
- by downloading the trial version of a free converting software : verypdf.com

In both cases I get somethin
... See more
Hi guys,

If someone could help me with that. I received a pdf document to be translated and returned into pdf. Right, I already managed to convert pdf into word doc sucessfully but for this one I cannot manage to obtain something workeable to be translated. I tried :
- using Microsoft Office Word Imaging (scanning and then exporting into .doc version)
- by downloading the trial version of a free converting software : verypdf.com

In both cases I get something far from the original documents, which are diplomas.
Does that mean I would need more powerfull tool like quarkXpress.... ????

Thanks for your answers.

[Edited at 2006-07-03 20:06]
Collapse


 
Vito Smolej
Vito Smolej
Germany
Local time: 13:07
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
there's pdfs and pdfs... Jul 3, 2006

quarkexpress would do no good...

I wonder that MSoft imaging did not produce anything useful. I have nitroPDF which is a rather good adobe clone, with DOC export. Why dont you send the file in and I'll see what I can get squeeze out of the document.

Regards

Vito


 
Evelyne Morel
Evelyne Morel  Identity Verified
France
Local time: 13:07
English to French
TOPIC STARTER
Jul 3, 2006



[Edited at 2006-07-03 19:57]


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 07:07
German to English
Image files hard to convert Jul 3, 2006

Generally the only files that are immediately usable after exporting from PDf are those created from a wordprocessing program. Your diplomas were read as images into PDF format.

Possibly an OCR program like OmniPage or FineReader may help, but the amount of effort required for a diploma would exceed what you would have to do to translate the document without such aids.

QuarkXpress is a desktop publishing program and would be of no use to you in converting to a usable wo
... See more
Generally the only files that are immediately usable after exporting from PDf are those created from a wordprocessing program. Your diplomas were read as images into PDF format.

Possibly an OCR program like OmniPage or FineReader may help, but the amount of effort required for a diploma would exceed what you would have to do to translate the document without such aids.

QuarkXpress is a desktop publishing program and would be of no use to you in converting to a usable word processing document.
Collapse


 
Vito Smolej
Vito Smolej
Germany
Local time: 13:07
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
well, as I said - "there's pdfs and there's pdfs" Jul 3, 2006

The file you have sent me - thank you - consists of two pictures (...of text etc, sent in a fax ...). I had even worse cases of "this kind of pdfs" - hand-annotated faxes of copies etc.

PDF format can handle a lot - pictures, tags, graphics. And text. But there's nothing one can do - except what you have already done, namely try to OCR - if the text is in pictures.

So my NitroPDF produced faithfully a doc file with two pictures.

Sorry, but you will have to
... See more
The file you have sent me - thank you - consists of two pictures (...of text etc, sent in a fax ...). I had even worse cases of "this kind of pdfs" - hand-annotated faxes of copies etc.

PDF format can handle a lot - pictures, tags, graphics. And text. But there's nothing one can do - except what you have already done, namely try to OCR - if the text is in pictures.

So my NitroPDF produced faithfully a doc file with two pictures.

Sorry, but you will have to cut and paste and write.
Collapse


 
Anna Fitzgerald
Anna Fitzgerald  Identity Verified
France
Local time: 13:07
Member
French to English
A good question Jul 3, 2006

Your question is a good one - never mind the errors, we all make them.

I recently had the same problem and ended up buying software called Able2Extract (I tried verypdf but it created a bunch of textboxes). The formatting was strange and I had to spend a lot of time fixing it up, so I'll have to look into nitroPDF and MS imaging, as well any other suggested solutions.


 
Fernando Toledo
Fernando Toledo  Identity Verified
Spain
Local time: 13:07
German to Spanish
You can not extract Jul 3, 2006

Anna Fitzgerald wrote:

Your question is a good one - never mind the errors, we all make them.

I recently had the same problem and ended up buying software called Able2Extract (I tried verypdf but it created a bunch of textboxes). The formatting was strange and I had to spend a lot of time fixing it up, so I'll have to look into nitroPDF and MS imaging, as well any other suggested solutions.


text from a image with PDFs converters.

You need a OCR application.

Use the trial of ABBYY, Finereader (the best)
http://download.abbyy.com/content/default.aspx

you have 30 days


if the image is OK you'll do it in a few seconds




Regards

P.S.: and for converters, I mean tagged docs in PDF format to Word docs my favorite is:

http://www.solidpdf.com/





[Bearbeitet am 2006-07-03 20:01]


 
Evelyne Morel
Evelyne Morel  Identity Verified
France
Local time: 13:07
English to French
TOPIC STARTER
OUOUOUOUPPPPS Jul 3, 2006

Have to apologize for the MISTAKES (you could not have missed).....

Well until now "convertion" has always been written "conversion" (at least in English)
Hi guys ! would be more appropriate I suppose.

2 mistakes (maybe more????) in a few words.... Too much translating lately I guess...:)


 
Vito Smolej
Vito Smolej
Germany
Local time: 13:07
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
hint - Microsoft office imaging does OCR Jul 3, 2006

the same old story - wherever there a piece of action, Microsoft will move in on it.

It does a respectable job. I prefer Abby FineReader tho.


 
Giles Watson
Giles Watson  Identity Verified
Italy
Local time: 13:07
Italian to English
In memoriam
I wish I could write French as well as you write English ; -) Jul 3, 2006

Evelyne Morel wrote:

Have to apologize for the MISTAKES (you could not have missed).....

Well until now "convertion" has always been written "conversion" (at least in English)
Hi guys ! would be more appropriate I suppose.

2 mistakes (maybe more????) in a few words.... Too much translating lately I guess...:)



Dear Evelyne,

Why apologise when everyone understands you?

You make no claim to translate professionally into English so who cares if your spelling is not quite perfect (it's still excellent, though)? If you have found out something about PDFs from our wonderful colleagues, then everyone's happy.

There's a difference between a "communication language" (English in your case) and a "language of culture" (French for you), in other words the "music with which we charm the serpents guarding another's treasure" (Ambrose Bierce, The Devil's Dictionary).

Bonne chance,

Giles


 
Evelyne Morel
Evelyne Morel  Identity Verified
France
Local time: 13:07
English to French
TOPIC STARTER
The end of the story.... Jul 3, 2006

First of all, thanks for the tips, help and support

After trying different ways of converting my .pdf document, downlowding some of the softwares that were suggested, and realizing as you said that there would be no ways of properly converting the document into a .doc, I finally asked my contact at the agency if we could find another quicker solution.
I got an answer going like : ....."Thank you for your help..
... See more
First of all, thanks for the tips, help and support

After trying different ways of converting my .pdf document, downlowding some of the softwares that were suggested, and realizing as you said that there would be no ways of properly converting the document into a .doc, I finally asked my contact at the agency if we could find another quicker solution.
I got an answer going like : ....."Thank you for your help........it's a scanned image and even Adobe Acrobat Professional is having problems recognizing the text, so I will send you the doc in .rtf format".

So in the end, this was really worth the trouble striving to find a solution (thanks to the precious help of my Proz collegues) - I now have plenty of pdf converter softwares and a good first contact with my agency

Welcome into the tricky PDF world !!!






[Edited at 2006-07-03 22:14]

[Edited at 2006-07-03 22:47]

[Edited at 2006-07-03 22:49]
Collapse


 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 07:07
English to French
+ ...
Hi Évelyne Jul 5, 2006

Nice to know you've been doing too much translation lately

I can't help you fix your problem, but while we're at it, as this has to do with the original subject, here is a link to a nice little piece of freeware:

http://digital.hollmen.dk/products/autounbreak/index.htm

What t
... See more
Nice to know you've been doing too much translation lately

I can't help you fix your problem, but while we're at it, as this has to do with the original subject, here is a link to a nice little piece of freeware:

http://digital.hollmen.dk/products/autounbreak/index.htm

What this does is the following: when you take a PDF - one that's not a scan of something but rather has text that's recognizable by the computer - once you have pasted it into a Word doc, you turn it into RTF and then use this to delete all the unnecessary carriage returns. It can take about 65,000 characters at a time. It basically makes any PDF-to-RTF document instantly editable using a CAT tool - and it preserves all other formatting!

You may want to give it a try... It has saved me lots of time and many many headaches already. Can't live without it anymore...

Good luck!
Collapse


 
Sabine Knorr
Sabine Knorr
Germany
Local time: 13:07
Spanish to German
+ ...
Virus found in "autounbreak" software Jul 9, 2006

Warning!

I just downloaded the freeware program from the site Viktoria had suggested.
When extracting the ZIP file, my antivirus program found and neutralized a virus ..... I don't think I will install this nice little piece of program.


 
neilmac
neilmac
Spain
Local time: 13:07
Spanish to English
+ ...
My final solution Jul 13, 2006

I eventually gave up trying, it takes more time and is more bother than worthwhile IMO. I now charge 50% extra for PDF docs (or anything on paper/fax) and try to convince regular clients to factor in translating when planning ahead in order to avoid these PDF hassles...

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

No usable .doc after .pdf conversion


Translation news





Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »