Translation and DTP of a PDF File

By Alexey Ivanov | Published  12/4/2005 | Translation Techniques | Recommendation:
Of course, DTP, like any job requiring a very high level of skill and expertise is best left to the professionals. The trouble is that to do a good DTP job of a publication in a foreign language, first of all you must know the language. Secondly, your DTP application must have the target language listed as one of the system languages and have the hyphenation and spellchecker programs for the target language. And, thirdly, you must have a selection of postscript fonts in the target language in order to be able to use the same fonts as in the original publication. Some of them cost as much as $700 for a set. If you do not know the language, you are liable to make mistakes in hyphenation while formatting the text. If you do not have the postscript fonts, you will not be able to create a printable PDF or EPS (encapsulated postscript) file from your DTP application. So, agencies often turn to the translator for help.

Probably very few translators if any, given just the text in the source language and the idea of the design of the future publication will manage to produce a print-ready publication of acceptable quality. Especially, if it includes a lot of color graphics. This really should be left to the professionals. But replacing the source text with the target text in a PDF file using a DTP application is possible, if you have the tools and the expertise.

To do this you need:
1. Full version of Adobe Acrobat (not the Reader).
2. A DTP application like QuarkXPress, Adobe InDesign or Page Maker. I recommend Adobe InDesign which has the best compatibility with Acrobat and into which you can import PDF files without loss of colour.
3. An image editing application like Photoshop or Adobe Illustrator for editing the scanned image boxes (e.g. a drawing with dimensions or an image with text printed over it).

The only condition is that the security restrictions on the use of the original PDF file should be removed. If they are not, ask the client to remove the restrictions or to supply you with the password, if the file is protected with a password.
The process comprises several stages:

Stage A: Converting the PDF text into DOC/RTF format

The best and the easiest way is to use a special inexpensive application "PDF Transformer" produced by Abbyy Software House which you can download from their site www. Unfortunately, the site is in Russian. But the application is sold with two interfaces: English and Russian. But if you do not have it, do the following:

1. Open the original PDF file in Adobe Acrobat and remove the security restrictions using the password received from the client (The path is: "Menu"/"File"/"Document Properties"/"Security"). The security level must be "No Security".
2. After you have removed the security restrictions, or if the file has no restrictions, choose "File" and "Save As" from the menu and save the file in RTF or DOC format.
3. Now you can translate the text using a CAT tool.

Please bear in mind that in many cases you will not be able to convert all of the text you have in the original file into editable/translatable text, because there are two types of PDF files: application-generated and scanned. Often a PDF file is a combination of both: application-generated text in the text boxes and scanned images in the picture boxes. So, when you save a PDF file as an RTF or DOC file you will get the text from the text boxes as editable text (often with minor losses in formatting) and the images including any text in the picture boxes as images.

Stage B: Preparation of the background for the translated text

In a typical case a DTP publication, e.g. an advertising leaflet, includes some text on the background of graphics (company logo, images of the advertised equipment, decorative elements, etc.). There can be two or more layers of images. You do not need the text now, as you have saved all the text you could in RTF or DOC format. But you need the background graphics, as they remain unchanged in the translation. So, you need to remove the text and save the background graphics.

1. Open the original PDF file in the Acrobat.
2. Using the "Select Object" and "Select Text" tools select and remove all the text leaving only the graphics.
3. Save each page with only the graphics as a TIFF or PDF file. If you use Adobe InDesign or PageMaker, PDF files are fine. But if you use QuarkXPress, the new saved files must be in TIFF format, as when you import PDF files into Quark there may some loss of colour in the graphics.

Stage C: DTP of the translation in the text boxes

1. Open a new project/file in your DTP application.
2. Using the "Rectangle Box Picture" tool create a picture box covering the whole of the page and using the "Get Picture" (in Quark) or "Place" (in InDesign and PageMaker) function import the TIFF/PDF file of the respective page with the background graphics using the original PDF file for reference.
3. Using the "Text Box Tool" create the text boxes in the exact positions where they are situated in the original PDF file. These text boxes will constitute the second layer of your DTP file.
4. Import the translations into the respective text boxes. Note: you may have a problem in importing the translation in DOC or RTF format. If it happens and you get unreadable text, save the translation in TXT format before importing. You will loose all the formatting, but will be able to import the text into your DTP file text box. Don't worry too much about that: in any case you will have to do the final formatting in the DTP file.
5. Format the text in the text box using the instrument, paragraph and character style panels of your DTP application.

Stage D: Editing the text in the scanned image boxes

As it has been pointed out there are two types of PDF files and parts of PDF files: generated with the help of applications and scanned files/parts of the file. The former usually are the text boxes and are easy to deal with. The method described above solves their problem. The latter, usually drawings or pictures of equipment with dimensions or text printed over the image are more difficult and require extra effort. Usually there is not much to translate. The problem is in accessing that small amount of text in the scanned image. The only way is to save the scanned image as a TIFF file and then open it in an image editing software e.g. Photoshop and:

1. Using the "Erase Tool" erase the original text.
2. Using the "Text Tool" replace it with translation.
3. Save the edited file as a new TIFF file.
4. Create a picture box in the DTP file.
5. Import the new TIFF file into the appropriate picture box in the final DTP file.

After you have created and edited the DTP file of the translation in your DTP application you can save it as a PDF or EPS file or using the "Collect for Output" function from "File"/"Menu" you can create a high quality print-ready file including all the fonts used by the translator.

Comments on this article

    can simplify and speed up the process. This program allows direct editing of the entire paragraphs. The early versions of InFix were full of bugs; however, versions 5.00 and later are fairly reliable. This method doesn't work if you need to translate large files (many pages) for big projects. In the latter case, you have to extract the text in a two-column table and to translate the obtained text document by means of regular CAT tools.

