Pages in topic:   [1 2] >
pdf to doc conversion of document with large page size
Thread poster: Peter Nicholson

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
May 2, 2011

I have a pdf file with a page size of 850 x 1050 mm. I want to convert it to doc format for translation. ABBYY cannot handle such a large page size. Studio 2009 opens it, but adds thousands of tags, splitting many of the words into individual letters. I can export the source file from Studio in doc format, and this version is workable, but the formatting is far from perfect. Does anyone have any suggestions as to which programme might convert my pdf to doc more successfully?

TIA


 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 14:45
English to Portuguese
+ ...
In memoriam
Check if InFix will work for you May 2, 2011

Have a look at http://www.iceni.com/infix-Translate.htm
I'm not sure you can download the Pro version demo and use it, however if not, the standard one should let you edit the PDF.

If Studio (never used nor saw it!) generates so many tags, it may be because they exist. If the document is written like tthis, chances are that the tags have a reason to be there. If this is the case, InFix will generate just as many tags, and it will be up to you to follow that madness.

Good luck!
Collapse


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
Thank you May 3, 2011

Thank you José. I am looking into this. I will post the outcome later.

 

Dragomir Kovacevic  Identity Verified
Italy
Local time: 19:45
Italian to Serbian
+ ...
pdf to image file format by means of PDFCreator, and then May 3, 2011

You might try using PDFCreator, it's an open source tool, installs as a virtual printer and is highly useful for several task types.

One of them is opening an existing PDF file and printing-saving it as a TIFF image format.

Then you proceed with OCR on an image.


Dragomir

Peter Nicholson wrote:

I have a pdf file with a page size of 850 x 1050 mm. I want to convert it to doc format for translation. ABBYY cannot handle such a large page size. Studio 2009 opens it, but adds thousands of tags, splitting many of the words into individual letters. I can export the source file from Studio in doc format, and this version is workable, but the formatting is far from perfect. Does anyone have any suggestions as to which programme might convert my pdf to doc more successfully?

TIA


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
PDFCreator didn't work May 3, 2011

Thank you, Dragomir, for the idea. We tried PDFCreator, but it didn't install properly (not on my main computer, of course). Besides, I am not sure that converting my pdf to TIFF would help, becuase it would still require further OCR/editing. FineReader 10 opens and OCRs my pdf very efficiently, but it can only save it in doc format at sizes up to 558.8 x 558.8 mm. So it saves all the text boxes piled up on top of each other and spread out over more than one page. I want to make an editable docu... See more
Thank you, Dragomir, for the idea. We tried PDFCreator, but it didn't install properly (not on my main computer, of course). Besides, I am not sure that converting my pdf to TIFF would help, becuase it would still require further OCR/editing. FineReader 10 opens and OCRs my pdf very efficiently, but it can only save it in doc format at sizes up to 558.8 x 558.8 mm. So it saves all the text boxes piled up on top of each other and spread out over more than one page. I want to make an editable document (for translation in Studio) with the same page size and layout as the original, and I want to be able to save the final version (the translation) as a pdf. I think I am right in saying that Studio will not export a TIFF to doc or pdf.Collapse


 

Dragomir Kovacevic  Identity Verified
Italy
Local time: 19:45
Italian to Serbian
+ ...
resize image to an editable format May 3, 2011

with some good image editor; it does'nt mean decrease the image, but resizing without loosing quality.
when you've done that, OCR in ABBYY, on a max. allowed format for it, will do its job equally well on an image, like on a PDF.

Peter Nicholson wrote:

Thank you, Dragomir, for the idea. We tried PDFCreator, but it didn't install properly (not on my main computer, of course). Besides, I am not sure that converting my pdf to TIFF would help, becuase it would still require further OCR/editing. FineReader 10 opens and OCRs my pdf very efficiently, but it can only save it in doc format at sizes up to 558.8 x 558.8 mm. So it saves all the text boxes piled up on top of each other and spread out over more than one page. I want to make an editable document (for translation in Studio) with the same page size and layout as the original, and I want to be able to save the final version (the translation) as a pdf. I think I am right in saying that Studio will not export a TIFF to doc or pdf.


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
I need an exact copy May 3, 2011

I do not want to reduce the page size. I need an exact copy, in editable form, so that the final version (the translation) will be an exact reproduction, including the page dimensions.

 

Germaine  Identity Verified
Canada
Local time: 13:45
English to French
+ ...
What is it exactly? May 3, 2011

Peter Nicholson wrote:
I do not want to reduce the page size. I need an exact copy, in editable form, so that the final version (the translation) will be an exact reproduction, including the page dimensions.


If it is a regular pdf, I would suggest that you do CTRL-A, CTRL-C, open Word, CTRL-V. Do the translation, then format (Page layout, paper size, margin...).

If it is a scan, and your have the full version of Adobe Acrobat, right-click, choose the langage option and do the OCR, than copy to Word, etc.


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
Far too simple May 3, 2011

Thank you, Germaine, for your suggestion, but it is too simple by half. Word cannot create a document with such large page dimensions. It is a regular pdf, but Ctrl+A does not select the photographs or other graphic elements.

[Edited at 2011-05-03 14:08 GMT]

[Edited at 2011-05-03 14:27 GMT]


 

Antoní­n Otáhal
Local time: 19:45
Member (2005)
English to Czech
+ ...
two basic options May 3, 2011

In my opinjon, yiou have two options, which I give below in the order of preference I would give them if I were you:

1. Ask your customer for the source file (InDesign or whatever it was) and translate it.

2. "Emulate" this process by converting to an xml format using Infix, translate the xml, re-generate the pdf.

For the sake of completeness, you can probably work "directly" in any pdf-editing software, but I would not go that way.

Antonin


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
Customer did not create this pdf May 3, 2011

Thanks Antonín. I suspect that the customer did not create the pdf and that it would not be diplomatic to ask. I have tried saving in several formats from InFix, but with no success. Studio has great difficulty with the document no matter what format I use, and besides, as I said before, Studio cannot export to doc/pdf from all formats.

 

Antoní­n Otáhal
Local time: 19:45
Member (2005)
English to Czech
+ ...
infix May 3, 2011

Have you tried

Document - Translate - Export XML

option?

Anonin


 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 14:45
English to Portuguese
+ ...
In memoriam
InFix x InFix Pro May 3, 2011

Antoní­n Otáhal wrote:

Have you tried
Document - Translate - Export XML
option?

Anonin



This is the key difference between InFix and InFix Pro.
I wonder if they release the Pro version as a free demo.


 

Peter Nicholson  Identity Verified
Poland
Local time: 19:45
Polish to English
TOPIC STARTER
I have now May 3, 2011

I only downloaded the demo version of InFix today, and didn't know about the Document - Translate - Export XML option. I have just tried it. However, the XML I created suffers from the same basic problems with recognition of words - a great many of the words have been cut in half or otherwise divided and in many cases split into the individual component letters. It is not practicable to edit/translate it in this form. Finereader would be ideal for this task if only it could handle the large page... See more
I only downloaded the demo version of InFix today, and didn't know about the Document - Translate - Export XML option. I have just tried it. However, the XML I created suffers from the same basic problems with recognition of words - a great many of the words have been cut in half or otherwise divided and in many cases split into the individual component letters. It is not practicable to edit/translate it in this form. Finereader would be ideal for this task if only it could handle the large page dimensions.

I have a demo of InFix Pro.

I have almost finished the translation, using the doc which I made using Studio (import pdf, save source as doc), but I would like to be better prepared for the next time.
Collapse


 

Antoní­n Otáhal
Local time: 19:45
Member (2005)
English to Czech
+ ...
not sure May 3, 2011

I dimly recall that I was able to test the process before buying the full version, but there may have been a volume restriction...

Anyway, 160 USD or so is not too expensive as it saves you a lot of headaches.

Antonin


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

pdf to doc conversion of document with large page size

Advanced search






SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search