Infix and similar PDF tools: are there situations when the approach itself doesn't work?
Thread poster: Artem Vakhitov

Artem Vakhitov  Identity Verified
Estonia
English to Russian
+ ...
Nov 15, 2016

There are software tools on the market that allow working with PDF files by exporting their content to CAT software and then re-importing back with possible subsequent corrections to the layout. Iceni Infix is an example of such software. The recently released FlexiPDF Pro from Softmaker GmbH also features that functionality.

What I'm curious to know is whether there are situations when the approach itself doesn't work for reasons like the nature of the content, font availability etc. even though the software itself works as designed.

Any insights based on personal experience?


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 15:26
English to German
PDF is a one-way street Nov 15, 2016

Artem Vakhitov wrote:

What I'm curious to know is whether there are situations when the approach itself doesn't work for reasons like the nature of the content, font availability etc.


Surely. PDF is a one-way street, designed to produce a printout. There is no real way back to the underlying original, because PDF files lack some formatting info that is unnecessary for printouts but needed for converting.

Fonts are no big problem, BTW, because you can embed them into the PDF. This is necessary because your print service provider might need them in oder to produce paper copies that look like the original on your PC.
--> https://en.wikipedia.org/wiki/PDF/A.


Direct link Reply with quote
 

Philippe Etienne  Identity Verified
Spain
Local time: 15:26
Member
English to French
Currently discovering Infix Nov 18, 2016

Artem Vakhitov wrote:
...Any insights based on personal experience?

I'm currently on thick editable pdf files with large tables, images embedded within, 15-20 fonts (including non-alphabetical), the works. InDesign underlying files nowhere to be found.

Trados (2009) got me a bilingual file, but it was fairly useless (tags, segmenting...).
Omnipage Pro required too much work to manually draw countless area types (text/images/tables/etc.) and approach usability in a CAT tool. And my machine would melt.
Since the deadline for these pdfs is very generous, I have ample time to try and find a more elegant approach than brute force, like overwriting the frigging pdfs and sod it with automated repeats and similarities. So I've been toying with Infix for the past day or two, with the aim of using MemoQ to translate. Thanks people (José, Tomas...) for mentioning it in these forums.

In these specific files, I find there is still a bit of work upstream to properly reformat "stories" in tables: two or more adjacent cells in a row/column are often in the same story, font replacements can seriously disrupt the table layout, line breaks missing... But then I am clueless about DTP, so I am learning on the job the hard way.

My feeling is that it looks more promising than a raw Omnipage OCR, but raw Infix output is not usable either without faffing about with the source pdf files.
So the approach does have benefits, but it's still not as "clean" a workflow as I hoped with these particular files. However I think that with "standard" editable pdfs, the Infix route may be less time-consuming compared to the OCR/Word process.

Rolf Keller wrote:
...Fonts are no big problem, BTW, because you can embed them into the PDF...

They are a problem for me, and I wish only Arial, Courier New and my own handwriting existed. I replaced fonts because only the characters used in the EN pdf are available, but I get text like SQODUF Q JOC MC QOSOK instead of legible text in some instances (Asian 2-byte fonts I guess), or weird character spacing.

Philippe


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 14:26
Member (2009)
Dutch to English
+ ...
curious what ABBYY FineReader 12 would make of your files Nov 18, 2016

Philippe Etienne wrote:

Artem Vakhitov wrote:
...Any insights based on personal experience?

I'm currently on thick editable pdf files with large tables, images embedded within, 15-20 fonts (including non-alphabetical), the works. InDesign underlying files nowhere to be found.

Trados (2009) got me a bilingual file, but it was fairly useless (tags, segmenting...).
Omnipage Pro required too much work to manually draw countless area types (text/images/tables/etc.) and approach usability in a CAT tool. And my machine would melt.
Since the deadline for these pdfs is very generous, I have ample time to try and find a more elegant approach than brute force, like overwriting the frigging pdfs and sod it with automated repeats and similarities. So I've been toying with Infix for the past day or two, with the aim of using MemoQ to translate. Thanks people (José, Tomas...) for mentioning it in these forums.

In these specific files, I find there is still a bit of work upstream to properly reformat "stories" in tables: two or more adjacent cells in a row/column are often in the same story, font replacements can seriously disrupt the table layout, line breaks missing... But then I am clueless about DTP, so I am learning on the job the hard way.

My feeling is that it looks more promising than a raw Omnipage OCR, but raw Infix output is not usable either without faffing about with the source pdf files.
So the approach does have benefits, but it's still not as "clean" a workflow as I hoped with these particular files. However I think that with "standard" editable pdfs, the Infix route may be less time-consuming compared to the OCR/Word process.

Rolf Keller wrote:
...Fonts are no big problem, BTW, because you can embed them into the PDF...

They are a problem for me, and I wish only Arial, Courier New and my own handwriting existed. I replaced fonts because only the characters used in the EN pdf are available, but I get text like SQODUF Q JOC MC QOSOK instead of legible text in some instances (Asian 2-byte fonts I guess), or weird character spacing.

Philippe


I don't know how good Omnipage Pro is, but I pretty much always get great results with ABBYY FineReader 12.

Michael


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 15:26
English to German
Fonts in PDF Nov 19, 2016

Philippe Etienne wrote:

Fonts (...) are a problem for me, and I wish only Arial, Courier New and my own handwriting existed.


The question was about the nature of PDF content. If the creator of the .pdf embeds the fonts (e. g. by using the PDF/A option), you can view & OCR the file correctly. Provided that your OCR "knows" that langiage and its letters.

Of course you will run into problems if you replace fonts later on.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 12:26
English to Portuguese
+ ...
Some tricky situations - typical issues Nov 20, 2016

Rolf Keller wrote:

Philippe Etienne wrote:

Fonts (...) are a problem for me, and I wish only Arial, Courier New and my own handwriting existed.


The question was about the nature of PDF content. If the creator of the .pdf embeds the fonts (e. g. by using the PDF/A option), you can view & OCR the file correctly. Provided that your OCR "knows" that langiage and its letters.

Of course you will run into problems if you replace fonts later on.


1. Font embedding

Font embedding is often an issue, especially if you are translating from a language that seldom uses diacritics (English) to another that uses them.

One point in PDF files being smaller is the embedding of partial fonts. The issue was important in the days of the TrueType 256-glyph (characters) fonts. Even at that time, on setup request, Acrobat would only embed the characters actually used, to save space. Now, with OpenType fonts holding up to thousands of glyphs, this is extremely important. Why waste file space storing Cyrillic, Greek, Hebrew, Arabic, Symbol and other chars you don't use, for each and every font your PDF contains?

The problem arises in translation. If your PDF contains, say "republican" (EN) and nothing else in a specific font, it will embed 10 chars from this font into your PDF. Translating it into PT or ES means adding an "O" at the end. Translating it to IT implies doing it PLUS doubling the "B".

The "B" in IT causes no trouble whatsoever, because it's already embedded in there. However the "O" isn't! If that font is a plain-vanilla Arial, Courier, or Times that everyone has, you can draw it from your system-installed font.

However if it's a special font, it will be a problem. Options are:
a) Replace that entire font with one you have in the publication;
b) Get and install that font, if it's free;
c) Buy the license for that font, which may be quite expensive at times;
d) If it's a company-proprietary font (two examples that come to my mind are General Electric and Rakuten), check if the employees are allowed to release them to outsiders; and
e) Find a lookalike font, where have some characters will probably have a different width, messing up the entire layout.


2. Underscored text

Though the underscore is placed together with bold, italic and other char settings in most apps, in a PDF file it is a loose line - i.e. a separate object - placed under a certain part of the source text.

After that text has been translated, that formerly underscoring line - possibly out of place and size - must be manually found and destroyed. You'll have to select the text to be underscored, select it and apply that as a font attribute.


3. Text alignment

Without delving into how a PDF file is structured, each block of text is placed "there". If it should be centered, right-aligned, or whatever, including soft or hard line breaks at certain spots and possible tabs and delimiters to position it "there", quite often a totally different strategy will have to be used on the translation to achieve the desired effect.


4. Font kerning, tracking, font width, letter spacing, word spacing

These will have been implemented with the original DTP app used to create the publication. It is often a matter of resetting them all and starting over - if needed to fit the text in the allotted space.


These are just some of the many issues one must face upon translating a PDF file with PDF editing tools.

Is it tough? Quite possibly, for someone who has no DTP experience.

The truth is that most translators are used to MS Word text formatting and layout tools which, if compared to the lamest DTP app, are simply horrible. Word is a feature-bloated typewriter, not a DTP app.

So, what are the options?

One is the old way, investing heavily in buying, learning to use, and having numerous versions (to cope with "old" publications) of the required DTP apps.

The other is having and learning to use ONE of these PDF editors - usually cheaper than any pro-level DTP app alone - and coping with the unavoidable PDF-intrinsic issues.


Direct link Reply with quote
 

Artem Vakhitov  Identity Verified
Estonia
English to Russian
+ ...
TOPIC STARTER
Thank you Dec 1, 2016

Thanks everybody who's chimed in. Special thanks to Philippe Etienne and José Henrique Lamensdorf for the detailed descriptions of their experience.

Direct link Reply with quote
 
MikeTrans
Germany
Local time: 15:26
Member (2005)
Italian to German
+ ...
Yes, this approach can be very time-consuming Dec 2, 2016

I have tried Infix PDF Editor in a scenario of a technical manual to be translated from English to German and only the PDF file from the company to work with. My objective was to deliver the tanslation in a perfect 1:1 PDF format.
The result: Although Infix's innovative features it took me a considerable longer time than a usual delivery in another common format.

My workflow was a) arranging the PDF document to be sent to the CAT tool; b) working in the CAT as normally; c) re-exporting to Infix with hopefully no or very little post-editing.
IIRC, I had 2 major problems:

- A significant difference of the sentence lenght in German compared to English (which is generally shorter)
- The Handling of written text in a table content; here also becasue of the differences in text lenght, all the tables needed a re-arrangement

I ended up having to completly add new pages, change the formatting, carefully change the font sizes etc. ,in short: Infix had not enough editing features to do all this *quickly*, especially tables to be split on different pages.
All in all, all these editing operations had nothing to do with my job, which is: translating. I was too busy playing the technical editor (a job which I did before becoming a translator more than 25 years ago).
The transfer from Infix to MemoQ, however, was perfect: only the minimum of formatting tags necessary were present, no eccessive 'noise'. I could start to translate immediately and the result was exported very fine. The problem was to get to this point working in MemoQ and to get there *quickly*.
I had the feeling that Infix was not designed primarily to help the translators, but to work with PDFs in the first place, a job which the tool does very well.

Whatsoever, don't let my feedback discourage you: A PDF tool with plugins to CAT tools is a wonderful thing as long as clients expect you to also handle PDF files. Because my clients do not ask me to *deliver* translated PDFs, the question wether Infix was useful for me got more in the background.

Greetings,
Mike


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 15:26
English to German
Workflows and desired results Dec 2, 2016

MikeTrans wrote:

My objective was to deliver the tanslation in a perfect 1:1 PDF format.


If a client wants this, he has to hire an DTP expert. Otherwise he can be happy with a PDF file that looks (in the eyes of an office worker!) like the original PDF. Many PDFs coming from clients aren't designed by a DTP expert, anyway; instead they were created using Word and then processed by Adobe Acrobat in order to get a file that doesn't depend on the reader's software.

I convert .pdf into .docx, translate the .docx, re-format it roughly (if need be) and then export it into a PDF creator. So, I perform any formatting within Word.


Direct link Reply with quote
 
MikeTrans
Germany
Local time: 15:26
Member (2005)
Italian to German
+ ...
The Infix plugin to CATs clearly beats any conversion tool Dec 2, 2016

Rolf Keller wrote:

If a client wants this, he has to hire an DTP expert. Otherwise he can be happy with a PDF file that looks (in the eyes of an office worker!) like the original PDF. Many PDFs coming from clients aren't designed by a DTP expert, anyway; instead they were created using Word and then processed by Adobe Acrobat in order to get a file that doesn't depend on the reader's software.

I convert .pdf into .docx, translate the .docx, re-format it roughly (if need be) and then export it into a PDF creator. So, I perform any formatting within Word.


Because agencies want to have easy post-edition and distribution capability, they seldom want PDFs to be returned. These formats are however very useful for reference of technical data etc.
If you are concerned about converting PDFs, then Infix is a very good tool, much better than any conversion tool because of its plugins to CAT tools. I remember that the transfer of the PDF content to MemoQ was rather perfect, which means: in any scenario that follows you will have your whole work as TMX content that you can use to create output documents in other formats, by just applying a TM. For example, you can use the Heartsome TMX Editor (now free) to convert a tmx to a Word document or to a tab-separated plain text, with it you can also delete any tags and some extras. In Trados Studio I would use a plugin to convert sdlxliff files to Word files and have a very neat table to present as a translation work. It all depends how free you are in using your source documents to be translated, and if you are not free, you still can just apply a TM (your work send by Infix to your CAT) with pre-translation of the documents sent by your clients.

With this method Infix just bypasses any problems you may have after a normal conversion from PDF to xxx. , that may create you one headache after another...
BTW, I think concersions can also be performed by the tool, in txt format for sure, I don't know about Office formats.

Mike

[Edited at 2016-12-02 13:56 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Infix and similar PDF tools: are there situations when the approach itself doesn't work?

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search