MemoQ scrambling texts?
Thread poster: psicutrinius

Feb 10

Hello all.

Have been doing quite a long translation (over 10 Mb, thus no preview was available), first in word 2007 (I asked the client to try that in the first place). I found that, whenever there was a table with cells, memoQ did not seem to notice the cells, so it translated in line (horizontally) the first line of every cell instead of finishing the cell one line at a time and then going for the next. Also, the layout for the page went at least slightly awry from the beginning (and for 200 pages and being cumulative, you can imagine the aspect of the last 30 translated ones).

There were quite a number of these, therefore the additional work was short of awesome (and, in fact -which was the real problem- meant getting past the agreed delivery time), so I asked for the pdf (a pdf-A, thus editable), and tried to use the TM on it, but, first, segmentation had changed (totally) and, second, the problem with the cells persisted.

Any solutions? at the very least: any mitigations?. Plus, of course, I would love to know the reasons. If this is due to (for instance) the original being a series of cut-and-paste word texts from different authors as is the case for most technical manuals, in possible different versions of the software, or for updates "grafted into" the same way, which are then converted to pdf, I wouls like to know how to spot it.

Thanks in advance.


DTP issue Feb 10

I had a similar problem a few months ago. I received a pdf of a brochure from which I easily (or so I thought) extracted the text to MS word. There were tables and bullet lists that got all mixed up. Basically I had to cut and paste to reassemble the text. I've encountered this before on a much smaller scale (2 pages). A colleague suggested that the pdf was created from a document that had been assembled using a desktop publishing program rather than from an MS Word document.


Incorrect preparation Feb 11

This sounds like a Word file made by saving a PDF file coming from some other application, not a native Word file.

If you examine the source Word 2007 file you have, can you use the arrow keys in the keyboard to continue from the end of a line in a cell in a table to the next line, or is each line an individual object? This would be the case of the file was saved from a PDF without any proper OCR process. You will get a clearer picture of the true structure of the Word file if you enable the Show all icon in the ribbon to reveal hidden characters.

Possible solutions are, if the actual source file is indeed a PDF:
- If the PDF is an object-based file, i.e. you can select the text using Acrobat Reader: Use software like Iceni Infix to produce a proper interpretation of the contents of the PDF. Infix detects the tables and creates full sentences in each cell, as it should be. Once Infix has processed the PDF file, you can produce an XML file you can import into memoQ, translate, and then reimport with Infix into the PDF again. With some additional formatting work in Infix, the result is relatively OK.

- If the PDF is a scanned document: Use an OCR tool like Abbyy Finereader to produce a Word file that resembles the PDF document more closely, although this is still the less desirable situation and will not produce a clean final document anyway.

I hope to have helped a little bit!


