Pages in topic:   [1 2] >
Can a good PDF creating software help the translator?
Thread poster: MikeTrans
MikeTrans
Germany
Local time: 01:41
Member (2005)
Italian to German
+ ...
Dec 3, 2014

Hi,

PDFs are on the run, I see them more and more in my translation requests, but: I hesitate to accept any larger project related to PDF files. And that's a pitty, if software exists that could turn bad PDFs into good ones.

My point is: Abby Fine Reader, Nuance PDF Converter Professional, and other such converters give very good conversion results to MS-Office programs if the PDF are native ones, using licensed Adobe technology, in short: 'true' PDFs.
But get some scanned PDF files, even very nice-looking (but with exotic formats) and all these converters are rather useless, and: absolutely no way to import them into CATs without applying savent tricks and start working without trouble.

So, my question is:
It's not going to be cheap, but, what about Adobe Acrobat or a similar professional software that creates PDFs? How can they matter in a translation process? Do they have special import/export functions able to turn a 'bad' PDF into a 'good' one? Would it be easy and worth spending time to re-create a PDF after translation in order to deliver a PDF, and not a Word / Excel file which doesn't follow exactly professional DTP standards?

I know that 'smaller' PDF Editors like Inceni InFix Professional are able to export PDFs to XML files which can easily be imported into a CAT, and then after translation be re-imported into InFix to turn into a PDF.
But from a big and expensive package like Adobe Acrobat, I'm looking for more. But are there such features ususeful for translators?

Thanks very much for any of your comments!
Mike


Direct link Reply with quote
 

Gregory Flanders  Identity Verified
France
Local time: 01:41
French to English
+ ...
PDF Transformer works for me Dec 3, 2014

Hi Mike, I'm a very satisfied user of an older version of PDF Transformer from Abby. I have used it on all sorts of scanned documents and generally receive good results. You have to keep in mind that no OCR software will give you a perfect match in a Word document--you have to be prepared to put in some work to reformat documents (and make sure you charge the client for the extra work. )

Occasionally PDF Transformer will "over-tag" my documents for me, which for some reason is more of a problem in Trados than in Wordfast. There are various tools out there to help remove poorly tagged Word documents created by OCR software.

[Edited at 2014-12-03 13:58 GMT]


Direct link Reply with quote
 

Michael J.H. Davies  Identity Verified
Denmark
Local time: 01:41
Member (2009)
English to Danish
+ ...
Foxit Phantom PDF Dec 3, 2014

I have used Foxit Phantom PDF for several years now. It is available in a professional version with many useful functions at a much lower price than Adobe Acrobat.

Among the many functions, for example, is the ability to convert PDF documents to MS Office format documents (WORD, Excel, PowerPoint).

Acrobat probably has even more features but I have never missed them so am unable to say what they are. I can definitely recommend the Foxit product.

Check http://www.foxitsoftware.com/ for further information.


Direct link Reply with quote
 
finnword1
United States
Local time: 19:41
English to Finnish
+ ...
creating PDF? Dec 3, 2014

I don't know about creating PDF, nobody has asked, but I use OmniPage for converting PDF's to text, Word, or preferably to RTF. It does everything that I need.
If you have a word document, MSWord 2007 can save it as a live PDF.


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 01:41
English to Russian
+ ...
No, you don't have to have Acrobat Dec 3, 2014

Firstly, there are no "true" and "false" PDFs. A PDF file may contain text, graphics, or a mixture of the two. The textual part can be extracted by numerous free tools (unless the file is secured, but that's a different story, nothing will help you short of cracking it). The graphic part may, in particular, be a scanned image of text - that's what some people call "dead" PDF (as opposed to "live" ones), and to convert it to text you need an OCR program. There is an OCR function built into Acrobat, but it's quite poor - better stick with a dedicated OCR tool. Personally, I recommend ABBYY Finereader. ABBYY PDF Transformer is the same OCR engine but with a simplified user interface and fewer options.

Direct link Reply with quote
 
philgoddard
United States
Member (2009)
German to English
+ ...
I don't accept that they're "on the run". Dec 3, 2014

MikeTrans wrote:

PDFs are on the run, I see them more and more in my translation requests, but: I hesitate to accept any larger project related to PDF files.



They're the global standard, and it's only CAT-using translators that don't like them.


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 00:41
Member (2014)
Japanese to English
Another vote here for Phantom Dec 4, 2014

Michael J.H. Davies wrote:
Acrobat probably has even more features but I have never missed them so am unable to say what they are. I can definitely recommend the Foxit product.

I also use Phantom and find it capable and useful.

Dan


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 21:41
English to Portuguese
+ ...
Live vs. Dead PDFs Dec 4, 2014

Anton Konashenok wrote:

The graphic part may, in particular, be a scanned image of text - that's what some people call "dead" PDF (as opposed to "live" ones), and to convert it to text you need an OCR program.


Excellent taxonomy, Anton. It's much clearer than using "scanned" for "dead" and especially "distilled/software generated" for "live".

I translate live PDFs using Infix Pro. An illustrated walk-through of one such project may be found on this page.

For dead PDFs, I use OmniPage to OCR the text.
If there are illustrations, I'll export the pages to JPG/BMP/whatever using Infix, and then crop/translate/edit them as necessary using my favorite, long-deceased Ulead PhotoImpact (Photoshop and many others work, too).
Then I use InDesign's father, PageMaker. Any good DTP app will do. I color all those exported pages to something that will stand out, e.g. replacing all black with blue, and place each page in its place on PM as a template. With PM, I quickly rebuild each page, placing all the elements in their places, and adjusting properly over the "template" in the background. After that, I delete the templates from the background, and have a fresh translated publication to distill into a live PDF.

People should be willing to believe that Microsoft Word is a word processor as it name implies, and NOT a DTP app. It is a pretty bad word processor, in spite of the gazillion bells & whistles that have been added to it over the years. Otherwise, why hasn't Microsoft buried the horrible MS Publisher already?

DTP should be done with a DTP app, and this work should have a price tag. I've heard of too many desperate translators who burned the midnight oil attempting to do complex DTP work with MS Word... for free!


Direct link Reply with quote
 

neilmac  Identity Verified
Spain
Local time: 01:41
Spanish to English
+ ...
Radical solution Dec 4, 2014

Educate your clients. Explain to them that scanned OCRs are not the way to go. Tell them to smarten up or "find another monkey" It works for me.

Direct link Reply with quote
 
Joakim Braun  Identity Verified
Sweden
Local time: 01:41
German to Swedish
+ ...
Yes and no Dec 4, 2014

José Henrique Lamensdorf wrote:
I translate live PDFs using Infix Pro. An illustrated walk-through of one such project may be found on this page.


An excellent tutorial, and for that very reason a good argument against workflows where PDF:s are recreated by the translator. At least where document design and typography matter in the slightest.

Infix is an impressive solution to a poorly defined problem.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 21:41
English to Portuguese
+ ...
Again, an anthologic statement Dec 4, 2014

Joakim Braun wrote:

Infix is an impressive solution to a poorly defined problem.


PDF files are an undefined (series of) problem(s) every time!

All the time I get crooked PDFs. A couple of hours ago I got one from a client. Acrobat Reader couldn't open it, and merely said "An error occurred upon opening this document. Could not repair file." Actually it was only reference material for a DOC file, so no major harm. Anyway I fixed it as I'll describe next.

The worst is when I get a PDF, open it with Acrobat Reader, and it looks OK. So, let's do the OCR! OmniPage either says it can't be opened, or opens a series of white pages.

I've learned to use a drag'n'drop contrivance named PDF Unlocker by SMTguru that apparently re-runs the PDF through something named GhostScript and usually fixes most of the glitches.

My Infix walk-through, to some extent, tries to point out a few of the incidents that may render difficult to re-DTP a PDF, like partially embedded (and often proprietary) fonts, text alignment, crazy hyphenation (from the software that created that PDF), the underscored font causing free lines on the page, the list would be endless.

However not all translators do DTP, and not all DTP operators translate. It's like, say, a clarinet player who can also play a trumpet or a trombone. Nevertheless, he's gotta be paid for all the time he is playing music, regardless of the instrument used. So if a translator does DTP work, it shouldn't be for free.

A long-standing friend of mine is an insurance broker. However he plays music too. Not a great musician, but he can always play recognizable music with any instrument he gets his hands on. It's simply amazing! Though his basic instrument is the accordion, and then the organ, he can play strings, winds, drums, and more. A common friend once said, "Give him a toothbrush and a curtain, and he'll make music with that too." My take is that music made with a toothbrush and a curtain should sound as good as DTP done with MS Word.


Direct link Reply with quote
 
MikeTrans
Germany
Local time: 01:41
Member (2005)
Italian to German
+ ...
TOPIC STARTER
Thank you very much for your tips... Dec 5, 2014

Dear Colleagues,

sorry for delaying my response, but I was so busy about following the advices about Iceni InFix in the MemoQ Yahoo groups...

https://groups.yahoo.com/neo/groups/memoQ/conversations/topics/38705

...where this question was discussed, but I also wanted to hear your opinions.

@gflan, Dan, Anton, Michael,
Thank you for your software suggestion.
I will try to go through it by gathering trial software if possible, although I'm very enthusiastic now after trying the workflow shown by José with InFix (and discussed in the Yahoo group above). Clients rarely want PDFs returned but prefer MS-Word (even if sending PDFs), but I could still convert the final translated InFix PDF into MS-Word, Excel or PowerPoint with ABBY Fine Reader: I have the older version 9, but I did never go through the Help documentation for any details, instead I used the Quick tasks immediately after launching the program, which was rather good but still needed more or less extra work on the translated file, enough to make me reluctant in handling larger PDFs.

@neilmac,
yeah, that's a very good point. I'm actually doing it, and if necessary I'm also explaining them why a translator job has nothing to do with a secretary job I'm tollerant here, because I think that some clients (especially direct clients) are not supposed to know in the first place about these finesses.
But you must be careful: Telling others what they ought to do is not always welcome...it's a delicate matter.

@Philgoddard,
sorry for my English: With 'on the run' I meant to say "popular"...

@José Henrique,
thanks for taking the time, and: your indepth tutorial of InFix is really a bag of gold!

I whish you all very nice days up to Christmas and the end of the year,
Mike


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 21:41
English to Portuguese
+ ...
This makes no sense to me Dec 5, 2014

MikeTrans wrote:

Clients rarely want PDFs returned but prefer MS-Word (even if sending PDFs), but I could still convert the final translated InFix PDF into MS-Word, Excel or PowerPoint with ABBY Fine Reader.


Maybe clients who request this are in some way less sensible than others.

I would equate this translated PDF-to-doc conversion to someone who bought a neatly bound hardcover book, and asked the bookstore to cut the pages off the binding, punch them, and deliver in a three-ring binder falling apart.

The PDF file is - to some extent - sealed and complete. It will open and print correctly, always the same in any system provided with the free Acrobat Reader devised for it.

Meanwhile a DOC - or worse, DOCX - file keeps all kinds of surprises in store, especially if opened across different versions, platforms, hardware, etc.

Taking my point one step further, some big decision-making moron at Microsoft, many years ago, chose to have all the Excel commands translated! Namely @SUM became @SOMA in the Brazilian version of MS Office and so on. Bottom line is that a spreadsheet devised on Excel for the USA will NOT recalculate in the Brazilian version of (supposedly) the same Excel. You get "#ERRO" all over.

A PDF is safe. A proper PDF file in, say, both Hebrew and Arabic, should come out exactly the same in a computer running in Japanese, Chinese, or Cyrillic... and vice versa.


Direct link Reply with quote
 

Christine Andersen  Identity Verified
Denmark
Local time: 01:41
Member (2003)
Danish to English
+ ...
You said it's not cheap... Dec 5, 2014

... but time is also precious, and for someone with limited IT comprehension like me, the combination of Acrobat and Studio 2014 actually works quite well.

To judge from files I have received from agencies, other OCR software often seems to have trouble with the extra Danish vowels æ, ø, å apart from anything else. I don't know about other languages.

The Acrobat-Studio combination is not perfect, but I can often get a text that it is worth running a spell-check on and touching up before feeding it into Studio.

Studio can often open 'live' PDFs without further ado.

It is not always an option for me to tell clients to 'find another monkey'. The Danish health services only seem to provide medical records as scans and PDFs, and the client is not always able to get hold of any 'live' form. Regardless of millions spent on electronic medical records in hospitals and used by GPs - I have even translated some of the bumph for them... Researchers, insurers, translators and anyone else have to do what they can with more or less dead scans.

As there are considerable advantages in using a CAT for terminology and so on with these texts, I go to quite a lot of effort sometimes to get a source I can feed into my CAT...

And that is the best solution I have found.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 21:41
English to Portuguese
+ ...
OCR is tough Dec 5, 2014

Christine Andersen wrote:

To judge from files I have received from agencies, other OCR software often seems to have trouble with the extra Danish vowels æ, ø, å apart from anything else. I don't know about other languages.


In any language, I see that OCR has most trouble differentiating "rn" (lowercase RN) from "m" (lowercase M). This calls for spellchecking AND LUCK to be detected.

PT and other languages use ó (lowercase O + acute accent), and a telltale sign that an OCR program is set for English is when it gets converted into a 6 (number six).

I don't know about other programs, but OmniPage works best when not only the actual language is turned ON, but also ALL OTHER languages are turned OFF! This may be tricky on bilingual docs, and perhaps the most effective way in this case to do OCR twice, on different selected zones.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can a good PDF creating software help the translator?

Advanced search






Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search