MemoQ scrambling texts?
Thread poster: psicutrinius

psicutrinius  Identity Verified
Spain
Local time: 20:42
Member (2008)
English to Spanish
+ ...
Feb 10

Hello all.

Have been doing quite a long translation (over 10 Mb, thus no preview was available), first in word 2007 (I asked the client to try that in the first place). I found that, whenever there was a table with cells, memoQ did not seem to notice the cells, so it translated in line (horizontally) the first line of every cell instead of finishing the cell one line at a time and then going for the next. Also, the layout for the page went at least slightly awry from the beginning (and for 200 pages and being cumulative, you can imagine the aspect of the last 30 translated ones).

There were quite a number of these, therefore the additional work was short of awesome (and, in fact -which was the real problem- meant getting past the agreed delivery time), so I asked for the pdf (a pdf-A, thus editable), and tried to use the TM on it, but, first, segmentation had changed (totally) and, second, the problem with the cells persisted.

Any solutions? at the very least: any mitigations?. Plus, of course, I would love to know the reasons. If this is due to (for instance) the original being a series of cut-and-paste word texts from different authors as is the case for most technical manuals, in possible different versions of the software, or for updates "grafted into" the same way, which are then converted to pdf, I wouls like to know how to spot it.

Thanks in advance.


Direct link Reply with quote
 

Kevin Fulton  Identity Verified
United States
Local time: 14:42
German to English
DTP issue Feb 10

I had a similar problem a few months ago. I received a pdf of a brochure from which I easily (or so I thought) extracted the text to MS word. There were tables and bullet lists that got all mixed up. Basically I had to cut and paste to reassemble the text. I've encountered this before on a much smaller scale (2 pages). A colleague suggested that the pdf was created from a document that had been assembled using a desktop publishing program rather than from an MS Word document.

Direct link Reply with quote
 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 20:42
Member (2005)
English to Spanish
+ ...
Incorrect preparation Feb 11

This sounds like a Word file made by saving a PDF file coming from some other application, not a native Word file.

If you examine the source Word 2007 file you have, can you use the arrow keys in the keyboard to continue from the end of a line in a cell in a table to the next line, or is each line an individual object? This would be the case of the file was saved from a PDF without any proper OCR process. You will get a clearer picture of the true structure of the Word file if you enable the Show all icon in the ribbon to reveal hidden characters.

Possible solutions are, if the actual source file is indeed a PDF:
- If the PDF is an object-based file, i.e. you can select the text using Acrobat Reader: Use software like Iceni Infix to produce a proper interpretation of the contents of the PDF. Infix detects the tables and creates full sentences in each cell, as it should be. Once Infix has processed the PDF file, you can produce an XML file you can import into memoQ, translate, and then reimport with Infix into the PDF again. With some additional formatting work in Infix, the result is relatively OK.

- If the PDF is a scanned document: Use an OCR tool like Abbyy Finereader to produce a Word file that resembles the PDF document more closely, although this is still the less desirable situation and will not produce a clean final document anyway.

I hope to have helped a little bit!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ scrambling texts?

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search