Translated text not seen in Infix after import
Thread poster: Andrey Korobeinikov

Andrey Korobeinikov  Identity Verified
Russian Federation
Local time: 02:33
English to Russian
+ ...
Jan 20, 2016

Hi

I am trying to master Infix at the moment and looks like something slips my attention.

I have a scanned PDF to translate. I mark it up manually using FineReader 11 specifying all the images (background and other), texts and tables. Then save it as PDF and open it with Infix to check if the markup worked well. Correct if needed. After that, I export it as an XML file, translate using SDL Studio 2015, generate target file and try to import back into Infix. The process looks complete but nothing happens, I still do not see my translation and only the original text. However after clicking a good deal on various menu options in Infix, I found OCR Corrections. After starting it off, I can see my translation but straight after I finish this mode everything comes back to my original.

I am quite desperate as the process should be quite simple and I can't get through it moreover, there is more work to come in PDF and Infix could be a real help.

Regards


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 18:33
Member (2014)
Japanese to English
Why PDF? Jan 20, 2016

Andrey Korobeinikov wrote:
Then save it as PDF and open it with Infix to check if the markup worked well. Correct if needed. After that, I export it as an XML file, translate using SDL Studio 2015, generate target file and try to import back into Infix.Regards

I think I must be missing something here. If you have to OCR the original PDF file anyway, I don't understand why you need to keep it as PDF all the way through instead of using other formats and converting to PDF at the end.

My default approach would be: Client-> PDF -> OCR -> Word/.docx source -> Studio -> Word/.docx target -> PDF -> Client

Regards
Dan


Direct link Reply with quote
 

Andrey Korobeinikov  Identity Verified
Russian Federation
Local time: 02:33
English to Russian
+ ...
TOPIC STARTER
Drawing saved as PDF Jan 20, 2016

This is a drawing saved as a PDF. Therefore, I can't have it as Word. OCR helps retain the background and recognize the text in it since it is a scan. I save it as PDF because Infix works with PDFs only.
I wouldn't try Infix if it wasn't for complicated document layout. Right now I am using Photoshop for this purpose but Infix looks to be fit for the purpose better. The Infix export/import feature seems just fine for me but I can't get the translation replace the original text as I have seen other people do it easily and that is what I can't achieve.


Direct link Reply with quote
 

Stepan Konev  Identity Verified
Russian Federation
Local time: 21:33
English to Russian
Searchable pdf Jan 20, 2016

This happens because FR11 makes pdf as an exact copy (Точная копия) with no choice.
When you open it in Infix, it is 'Searchable Image'. So, when you save your final pdf, you can serch for Russian words. Of course, you don't need this option. You need it only when the hidden (searchable) text location coincides with the image text. Joined, this gives an effect of searching through the text even though it is an image. For this effect, you need to have the hidden/searchable text in the same language as the text in your image pdf.
However, since your pdf style now is Searchable Image, and your hidden/searchable text is in Russian, after saving the final file, you can search by Russian words. Totally needless for your purposes, because Russian translation word (which is hidden) will never be at the same place as English source word (which is visible image).

You need to invoke 'Recognise Text (OCR)' window in Infix and choose 'Editable Text' option for PDF style.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:33
English to Portuguese
+ ...
Infix manual Jan 20, 2016

Andrey,

Stepan has accurately pinpointed the cause. The effect, as you discovered it, can be controlled with the OCR Corrections feature, as explained in the Infix manual (illustrations omitted here, of course, my emphasis on a few relevant aspects):
OCR corrections Standard Pro

Infix can be used to adjust the hidden text associated with a scanned document. This text is generated during OCR (Optical Character Recognition) of a scanned image or printed page. See “Inserting pages from a scanner” on page 142 for details of how to perform OCR on a document.

The OCR text is hidden in the PDF so that it can be searched. Often there are errors in the hidden text that can be difficult to fix because it is hidden.
Open the PDF to be corrected then choose Document->OCR Corrections->Start

The example shows a scanned page in which the text added by the OCR process is hidden.

Since OCR mode could cause a PDF to be substantially changed, you will be asked to confirm your choice.
Always make sure you have a backup of the PDF before you start this mode.
Choose the “Start OCR mode” option to begin.

The hidden text becomes visible, the scanned image faded and locked to make editing easier. You can now edit the text whilst making reference to the original content in the image.

After all corrections have been done, choose: Document->OCR Corrections->Finish
The OCR text, including any edits you made will become invisible and the scanned image restored to its normal density.

If you find some unwanted text remains visible in your document, choose Document->OCR Corrections->Hide all text which will make all text invisible even if it wasn’t originally invisible. Since this operation cannot be undone, please ensure
you save a copy of your document first.


Notes
If your document happened to contain any non-ocr text added after the scanning process, this too will be hidden at the end of the correction process.
• Choose View->Text Boundaries to see the boundaries between different blocks of text.
• Changing the colour of the OCR text can make it easier to distinguish from the background image. This will not affect the finished PDF.
• Some OCR packages create many small text blocks that are difficult to edit. Use the “Rebuilding text boxes” on page 94 in Infix Pro to merge disjoint blocks of text
into a single, editable text block.


The non-ocr text added in the first note is obviously the translation, so it clinches.
This should solve your problem.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:33
English to Portuguese
+ ...
Why PDF Jan 20, 2016

Dan Lucas wrote:

I think I must be missing something here. If you have to OCR the original PDF file anyway, I don't understand why you need to keep it as PDF all the way through instead of using other formats and converting to PDF at the end.

My default approach would be: Client-> PDF -> OCR -> Word/.docx source -> Studio -> Word/.docx target -> PDF -> Client


Dan, you are missing DTP here.

MS Word - as it name implies - is a word processor. You start on page 1, and go filling pages with text, like a sausage is made by filling the casing from one end. Though Word over time has gained several million bells and whistles, its original paradigm is still the typewriter using a paper roll like those formerly used in telex machines. The computer has allowed cutting & splicing the paper.

Evidence is that you can't create a Word file starting on page 26, then add a few things to page 60, then make page 1, and so on.

The first widespread DTP app was PageMaker, now superseded by InDesign. Its paradigm/predecessor is the paste-up art studio. DTP is a full-featured paste-up studio, including all its usual external vendors of those days... in virtual reality. Among these external vendors, there is one word processor, clearly separated from the art studio in PageMaker, being invoked with Ctrl+E.

DTP can handle complex layouts, which a word processor was never intended to do. Some translators insist in doing DTP with Word, which is as easy as tearing down a brick wall with a plastic spoon.

Evidence is that if Word were a DTP app, Microsoft would have phased out their lame MS Publisher (a very bad one, but yet a DTP app) many years ago.

Most word processors can exchange standard files. DTP apps can't. For instance, InDesign, QuarkXpress, and FrameMaker can't use each other's files. I saw a few attempted "converters" for these proprietary-format files now and then, but none of them did the job.

PDF is the ultimate common ground of all programs capable of printing do a PostScript printer. Otherwise every translator would need to own and know to operate a bunch of DTP apps. Each of them (I mean only the good ones) has a high 3-digit price tag, and a steep learning curve.

I see that several CAT tools can trespass on DTP files for translation, without the translator having to own/use the corresponding DTP app. However we all know that text often swells/shrinks in the translation process, so a DTP operator will have to do some post-translation "surgery" there to fix the layout. Quite often, this operator will be unfamiliar with the target language, and occasionally with the character set it uses as well.

On the other hand, a translator is not a graphic artist. S/he'll be asked to translate a finished publication with as little layout disruption as possible.

When I first saw Infix - the original, non-Pro version - it was a PDF editor. A good and quite affordable solution for translators relatively skilled in DTP to fix the layout in the PDF after translation using CAT tools that can trespass into such files. At least, they are fully familiar with the target language and the char set used.

So I (and possibly other translators - I wouldn't know) gave Iceni the idea to develop a translating solution within Infix. They did it, and named Infix Pro. I eventually became a beta tester, so I know they test it with Trados and DejaVu (though I use WordFast Classic).

A while ago I developed a walk-through of the PDF translation process (using WFC), and published it at http://www.lamensdorf.com.br/translating-a-pdf.html .


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 18:33
Member (2014)
Japanese to English
Nostalgic Jan 20, 2016

José Henrique Lamensdorf wrote:
Dan, you are missing DTP here.

Indeed, that was what was not stated by the OP.

Like you I cut my DTP teeth on Pagemaker, which I first used on my university department's brand-new and screamingly fast Apple IIx; I guess this would have been back in late 1988.

Back then it was quite exciting - the whole idea that something useful could be done on these little boxes. I used to proof little newsletters on the Laserwriter and use the university's gigantic film typesetter to print them! Amazing at the time.

I even dabbled in writing raw PostScript, but it's a very verbose language and I didn't enjoy it much. Now DTP is all rather boring and mundane, but fortunately I don't do have to get involved any more.

Interesting comments about Infix though. If I ever need industrial-strength PDF processing that is where I will take myself.

Dan


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 16:33
English to Portuguese
+ ...
More flashbacks Jan 20, 2016

Dan, PageMaker didn't run on the Apple II. On that machine I used MultiScribe, which got hi-res on paper with dot-matrix printers by running the print head four times per line in graphic mode. I completely wore off one Epson FX-286 DMP printing head in less than one year using it.

Then I bought a Brazilian Rima DMP, which had three half-inch dia. stainless steel rods where the Epson had a couple of plastic strips. Indestructible! That Rima printer is still here, stashed away, working as new last time I tried. It's as noisy as eviscerating live geese.

PageMaker started out in the Macintosh. Its version 3 was developed for the PC too, running under the iconless Windows 2.01. I have been using it ever since, one upgrade after another. Stopped at v6.52, which I consider better than v7. It still enables me to rebuild publications from properly scanned PDFs very quickly.

I tried it's "son", InDesign, and it felt like using a B-737 jet to drive the kids to school or to go shopping at the grocery store. A massive overkill for DTP-enabled translators, though a superb tool for graphic designers.

Infix is excellent for translating & layout-adjusting live/distilled/editable PDFs.

When I have to translate a scanned PDF, for me specifically, it's faster to rebuild the entire pub using PageMaker. (Of course, OCR with OmniPage, translation using WordFast, and only then doing DTP.)


Direct link Reply with quote
 

Dan Lucas  Identity Verified
United Kingdom
Local time: 18:33
Member (2014)
Japanese to English
Whoops Jan 20, 2016

José Henrique Lamensdorf wrote:
Dan, PageMaker didn't run on the Apple II.

Sorry, that was badly phrased - I was using an Apple Macintosh IIx. That was a nice machine.

A year or so later the department got a IIfx, which was even faster. The student computer room only had the little 9" Macintoshes. At home I used an Amiga, which was an interesting piece of kit with a really nice multitasking OS, and a Star Micronics dot-matrix.

Dan


Direct link Reply with quote
 

Andrey Korobeinikov  Identity Verified
Russian Federation
Local time: 02:33
English to Russian
+ ...
TOPIC STARTER
Doc gets messed up when choosing OCR - Editable Text Jan 20, 2016

Stepan, Jose Henrique thanks for valuable comments.

It is surely of no practical importance to have Russian invisible and only searched. I need to have English hidden and Russian visible in the final file.

You need to invoke 'Recognise Text (OCR)' window in Infix and choose 'Editable Text' option for PDF style.


As far as I understand when doing so Infix tries to OCR my doc. When I choose this option in Infix my whole document gets messed up completely. Well, this is quite expectable since Infix tries to see a text where there is no text...

What am I missing? How can I make the hidden text visible in the final PDF and original hidden or removed? OCR Corrections do not help since the translation becomes invisible again after finishing the task.

Regards


Direct link Reply with quote
 

Andrey Korobeinikov  Identity Verified
Russian Federation
Local time: 02:33
English to Russian
+ ...
TOPIC STARTER
It is all about Finereader Feb 22, 2016

The solution turned out to be quite simple. It was Finereader that placed the text under the image. Once I changed that setting everything became normal. The setting is at Options -- Save -- PDF -- Save mode - select "Text over the page image" (not sure about the tabs names because I am using the Russian version of the software)

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translated text not seen in Infix after import

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search