Is it advisable to use DTP software to replicate a PDF's layout/look?
Thread poster: Alvaro Pavié

Alvaro Pavié
Chile
Local time: 03:56
English to Spanish
+ ...
Sep 11

Greetings,

On Monday I got a PDF from a client who wasn't able to convert it into a Word file format. I'm only able to translate manually, and replicating the layout myself is the only option available as I have no money to hire someone else. Is it viable to use a DTP software to recreate the layout? Is there a good DTP software out there that's also free?

Thanks!

Update: Unable to extract text, translating manually.

[Edited at 2019-09-11 19:38 GMT]


 

Kevin Fulton  Identity Verified
United States
Local time: 02:56
German to English
DTP separate task and charged separately Sep 11

Word has a number of useful features that lend themselves to doing layout work. It's not unreasonable to try to replicate simple formatting such as columns, bullet points, etc. using Word (or other word processing program) after extracting text from a PDF file. Anything beyond that is considered a separate task and should be charged accordingly – by the hour. Publisher, which comes with some versions of MS Office, might be useful for this. Fulll-featured DTP programs tend to be expensive and d... See more
Word has a number of useful features that lend themselves to doing layout work. It's not unreasonable to try to replicate simple formatting such as columns, bullet points, etc. using Word (or other word processing program) after extracting text from a PDF file. Anything beyond that is considered a separate task and should be charged accordingly – by the hour. Publisher, which comes with some versions of MS Office, might be useful for this. Fulll-featured DTP programs tend to be expensive and difficult to learn, which is why DTP costs extra when provided as part of a translation job.Collapse


Vadim Kadyrov
Philippe Etienne
Jorge Payan
Samuel Murray
 

Dimmo Petrov
Local time: 09:56
English to Bulgarian
Only if the client is paying for this Sep 11

Converting a pdf file to editable format must be done by the linguist only if it's a paid task. Recreating the exact formatting is time-consuming and annoying, especially in the common cases when this is a scanned pdf file.
You must always ask the client if they can find the original file used for creating the pdf file.
In case the client agrees that you perform the conversion task, you must specify if they prefer plain text only or full recreation of the formatting.
For me, be
... See more
Converting a pdf file to editable format must be done by the linguist only if it's a paid task. Recreating the exact formatting is time-consuming and annoying, especially in the common cases when this is a scanned pdf file.
You must always ask the client if they can find the original file used for creating the pdf file.
In case the client agrees that you perform the conversion task, you must specify if they prefer plain text only or full recreation of the formatting.
For me, best software for handling pdf files is Abbyy FineReader; second one is Adobe Acrobat Pro.
Collapse


 

Philippe Etienne  Identity Verified
Spain
Local time: 08:56
Member
English to French
I experienced the situation once Sep 11

It was an end client, the translation consumer, who wanted the same layout as the PDF, but of course didn't have the underlying InDesign file. I didn't want to spend any time learning about DTP programs or struggling with the layout for ages in Word.
After informing the client about this, they required me to handle that DTP part too, so I assigned the task to a freelance DTP specialist, transferring the costs to the client.
Somebody who masters DTP is much quicker, has the right tool
... See more
It was an end client, the translation consumer, who wanted the same layout as the PDF, but of course didn't have the underlying InDesign file. I didn't want to spend any time learning about DTP programs or struggling with the layout for ages in Word.
After informing the client about this, they required me to handle that DTP part too, so I assigned the task to a freelance DTP specialist, transferring the costs to the client.
Somebody who masters DTP is much quicker, has the right tools to get to optimal results, and can even extract the text for optimal use in your preferred CAT tool, then incorporate the translation back into the DTP file.

It's not cheap, but headaches are costlier.

If you're constrained in terms of costs, you may have a look at Infix from Iceni. I've never tried it, but it's supposed to do exactly what you ask.

Philippe
Collapse


Kevin Fulton
 

John Fossey  Identity Verified
Canada
Local time: 02:56
Member (2008)
French to English
Infix Sep 11

It's not free after the third page, but I have sometimes successfully used Infix. It exports the text from the PDF in XML format, which can be translated in any CAT tool. The translated text is then reimported into Infix which then recreates the PDF with the translated text.

Potential pitfalls:
- If you are working in a language pair where the target text is more voluminous than the source, you will have problems with the target text not fitting the allotted space.
- The
... See more
It's not free after the third page, but I have sometimes successfully used Infix. It exports the text from the PDF in XML format, which can be translated in any CAT tool. The translated text is then reimported into Infix which then recreates the PDF with the translated text.

Potential pitfalls:
- If you are working in a language pair where the target text is more voluminous than the source, you will have problems with the target text not fitting the allotted space.
- There are often font issues, where some unusual font is embedded in the PDF. Only characters actually used in the document are embedded and if the target text contains characters that were not in the source they will be skipped or replaced with a different font. Sometimes you can find the missing font online and install it on your computer to resolve this problem.
- Text that is actually part of an image will not be exported.
Collapse


DZiW
 

Alvaro Pavié
Chile
Local time: 03:56
English to Spanish
+ ...
TOPIC STARTER
Can't use OCR software. Sep 11

I should have pointed out that I can't use OCR software as the PDF is protected and my client couldn't do the conversion herself because of that.

I just found out that the text extracted by Calibre is all messed up, so it wasn't a real solution after all. Don't want to use web-based solutions for extracting the text since I'm not sure if the material is confidential or not (my client never told me so.)

Also, the PDF doesn't have a simple layout: It contains different-co
... See more
I should have pointed out that I can't use OCR software as the PDF is protected and my client couldn't do the conversion herself because of that.

I just found out that the text extracted by Calibre is all messed up, so it wasn't a real solution after all. Don't want to use web-based solutions for extracting the text since I'm not sure if the material is confidential or not (my client never told me so.)

Also, the PDF doesn't have a simple layout: It contains different-colored headers and subheaders, columns are divided by straight lines and some large and small images and logos. Doesn't look like the type of document I could replicate using only Word.

Money is definitely a constraint, I'm just starting my professional career and barely make enough to afford basic stuff such as transportation, food and clothing. Hiring someone else to do the job is out of the question. Besides, I'd like to take this opportunity to learn to use DTP software, as I already learned Inkscape (similar to Illustrator, but free) and my client has been pretty happy with the results, yet the company she works for will not pay more for doing all this work, but I need it nonetheless.

Lastly, I solely need advice on how to proceed with the limited means at my disposal, so please keep that in mind when replying.

Thanks.

[Edited at 2019-09-11 16:02 GMT]

[Edited at 2019-09-11 16:03 GMT]
Collapse


 

Patricia Fierro, M. Sc.  Identity Verified
Ecuador
Local time: 01:56
Member (2004)
Spanish to English
+ ...
Abbyy FineReader Sep 11

Hi,

I have Abbyy FineReader version 14 and it exports PDFs to Word files. The format usually matches the PDF format.

Maybe you can take screenshots and convert the files. This works with protected PDF files. Abbyy FineReader can work with image files, such as what you would get when storing the screenshots by using MS Paint or similar apps.

Good luck!
Patricia

[Edited at 2019-09-11 16:24 GMT]


Alvaro Pavié
 

Jorge Payan  Identity Verified
Colombia
Local time: 01:56
Member (2002)
German to Spanish
+ ...
Print and scan Sep 11

Patricia Fierro, M. Sc. wrote:

Maybe you can take screenshots and convert the files. This works with protected PDF files. Abbyy FineReader can work with image files, such as what you would get when storing the screenshots by using MS Paint or similar apps.



My approach would be to print the file and then scan it in color. It will remove the problem with password protection and you could then use OCR software.

Customarily, I convert the text in the image to plain text and not to Word. It saves a lot of time in the DTP process.

Saludos


Philip Lees
 

Alvaro Pavié
Chile
Local time: 03:56
English to Spanish
+ ...
TOPIC STARTER
Already took screenshots. Don't know how to use Transtools properly. Sep 11

Jorge Payan wrote:

My approach would be to print the file and then scan it in color. It will remove the problem with password protection and you could then use OCR software.

Customarily, I convert the text in the image to plain text and not to Word. It saves a lot of time in the DTP process.

Saludos



I already took screenshots of the pages and saved them into bmp (24 bits) format. The quality is not the same as that of the original, but it does seem to work somewhat if I export it as a Word file. I'm still having issues with Transtools, though. Can't seem to clean tags properly as everything turns out even more messy than it was after cleaning.


 

Alvaro Pavié
Chile
Local time: 03:56
English to Spanish
+ ...
TOPIC STARTER
Update. Sep 11

Finally gave up on trying to extract the text from the PDF so I'm just translating on a blank .docx file and will attempt to replicate the layout once I finish translating. Is Scribus a good choice for DTP?

 

Samuel Murray  Identity Verified
Netherlands
Local time: 08:56
Member (2006)
English to Afrikaans
+ ...
Free DTP Sep 13

Alvaro Pavié wrote:
Finally gave up on trying to extract the text from the PDF...


Even if you could use an OCR program to extract the text and save it as a Word file, the layout would not be translator-friendly, particularly with the type of document that you've been describing. When an OCR program tries to mimic the layout, it uses all kinds of tricks that make the text look good on screen but makes the document a nightmare to edit. For example, it might put every single line of a paragraph in its own little floating text box. It looks great on the screen and on paper, but it is practically untranslatable. Or, you can set the OCR program to create an edit-friendly document, but that just means that the most difficult parts of the layout isn't done by the OCR program, but left for you to do.

...so I'm just translating on a blank .docx file and will attempt to replicate the layout once I finish translating.


What you should do is to type/get the source text in plain text, then create a formatted version of the file (with the source text), and then translate that file (e.g. in a CAT tool), and then afterwards fix minor layout inconsistencies that were introduced by the process.

Is Scribus a good choice for DTP?


Look, I'm sure Scribus, Canva, MS Publisher and OpenOffice Draw etc are fine to use, but learning to use DTP isn't quick either. In most cases, however, if you use a DTP program, you would do the translation in plain text first, then create the layout, and then copy/paste the content into the DTP program. It's a lot of work.

There may be some DTP programs that allow you to create the layout first, with the source text, and then translate the DTP file (either directly or by text export/import). OmegaT can translate OpenOffice Draw files directly. Scribus files are XML-like files (though not actual XML) with the translatable content as values of the CH attribute of the ITEXT tag, so you may be able to convince some CAT tool to translate it.

But don't forget that you'd still have to fix formatting and layout problems in the DTP program afterwards that are caused by e.g. the source text and target text being of different lengths, so you still need to be an expert at fixing formatting in the DTP program.

[Edited at 2019-09-13 08:20 GMT]


 

Alvaro Pavié
Chile
Local time: 03:56
English to Spanish
+ ...
TOPIC STARTER
Clarification, please. Sep 13

Samuel Murray wrote:

What you should do is to type/get the source text in plain text, then create a formatted version of the file (with the source text), and then translate that file (e.g. in a CAT tool), and then afterwards fix minor layout inconsistencies that were introduced by the process.


Could you elaborate more on this please? I don't quite get what you mean by saying "create a formatted version of the file (with the source text)". My translation is ready, so all I need to do now is recreate the layout.


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Is it advisable to use DTP software to replicate a PDF's layout/look?

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search