Text from source Word document missing in sdlxliff file
Thread poster: Katarzyna Slowikova

Katarzyna Slowikova  Identity Verified
Germany
Local time: 11:12
English to Czech
+ ...
Jun 20, 2014

Hi all Trados victims,
I have Trados Studio 2011.
I've just had a situation where multiple chunks of text in the source Word document weren't incorporated into the sdlxliff file, therefore missing also in the exported target Word document.
This occurred 2 times with the same file: I had to create 2 projects with the same source file (because of the famous hyperlinks bug in the first one which made the export impossible) and I can see the same parts missing in both sdlxliff's.
The source word document has been generated by OCR from a pdf file.
The missing parts didn't have any special format or tags or anything that would visually distinguish them in the source doc file or pdf.
Has anybody had a similar problem? I had searched the forum and haven't found anything.
Does anybody know the cause and a way to prevent it in future?
Thanks,
Katarzyna


 

Natalie  Identity Verified
Poland
Local time: 11:12
Member (2002)
English to Russian
+ ...

Moderator of this forum
Text boxes? Jun 21, 2014

OCR software "loves" creating text boxes. Maybe this is the case? Text boxes from OCR are often omitted by CAT tools.

 

Katarzyna Slowikova  Identity Verified
Germany
Local time: 11:12
English to Czech
+ ...
TOPIC STARTER
You're most likely right Jun 21, 2014

I'm not sure what exactly "text boxes" are but if, as I suppose, a table of contents is some kind of them, you have hit the nail on the head.
I had the files checked yesterday by a translator who knows a lot about all those SW tricks and that's what he found out.
The missing parts of the text were formatted as table of contents by OCR, referring to some accidental bits of text in the document. That's why there was only "TOC" in those places - which I haven't noticed before. Trados doesn't load those parts into the sdlxliff, thinking you'll update the field in the exported document. After I did this, some nonsense bits of translated texts appeared.
So actually the OCR generated word file was translated correctly, only that it wasn't the best idea on my part to use this file to setup the project (when I had a pdf and DPT to choose).

You probably know all this but maybe someone else with a similar problem will find it useful.

Still, you have all my respect for guessing it just like this!icon_smile.gif

Have a nice weekend,
Katarina

PS. To be sure, it wasn't Trados fault here, so I take back those "Trados victims" from my initial post.icon_wink.gif

[Edited at 2014-06-21 14:25 GMT]


 

Natalie  Identity Verified
Poland
Local time: 11:12
Member (2002)
English to Russian
+ ...

Moderator of this forum
TOC Jun 21, 2014

should not be translated in Trados at all: it is an automatically generated Table of contents. After you translate the entire file and convert it back to Word, you should simply update the TOC there. In most of newer WORD versions this option is readily available at right mouse button. You should simply right click on the "TOC" field and choose the "update field" option.

 

Katarzyna Slowikova  Identity Verified
Germany
Local time: 11:12
English to Czech
+ ...
TOPIC STARTER
Exactly Jun 21, 2014

However, the problem was, those bits of text weren't in fact tables of contents (therefore should have been translated).
So OCR messed it up here.
It was a file sent by client which they told me I was free to use, so I thought I can trust it.
I knew those files are annoying due to all those tags in each segment, but didn't know the same tags can have such dire consequences.

I'll never again use OCR generated doc file with Trados!


 

Miguel Carmona  Identity Verified
United States
Local time: 02:12
English to Spanish
... Jun 21, 2014

It is incredible how some programs, in this case Trados and the OCR software, seem to conspire and make criminal partnerships to kill our productivity when they should be doing exactly the opposite.

 

LEXpert  Identity Verified
United States
Local time: 04:12
Member (2008)
Croatian to English
+ ...
CONTROL+SHIFT+F9 to fix in Word before Trados. Jun 22, 2014

Katarzyna Slowikova wrote:

I'll never again use OCR generated doc file with Trados!


Don't know if I'd go that far... What you describe is not a terribly uncommon occurrence/annoyance with OCR'd files. If you expand the TOC (if necessary), select the text, and click CONTROL+SHIFT+F9, this will remove the TOC form field formatting and leave the plain text, which CAT tools will then handle normally. Of course, you have into inspect the Word file beforehand, but that should be a habit any customer-provided files, especially if they may have been OCR'd.


[Edited at 2014-06-22 13:32 GMT]


 

Katarzyna Slowikova  Identity Verified
Germany
Local time: 11:12
English to Czech
+ ...
TOPIC STARTER
no idea how "to summarize my point" Jun 22, 2014

@Rudolf: Now I know this one trick but who knows what will it do next time... As we say in Czech, "a joke said twice is not a joke anymore"icon_wink.gif

If I'll have the DPT file, I'll always use that - though I must admit I never used them so far, so I don't know whether there aren't some other specific tricks involved.

I'm somehow in doubt what's worse: to make a project with a pdf (therefore use Trados to OCR the file) or to use an OCR word (though I'm sure I wouldn't use my Abby OCR, it always returns more or less salad). I get those files quite often (and Trados is requirement), so it's not just a hypothetical dilemma to me.

@Miguel: Yes, it sucks we have to do so much things where we're payed for translating only. If a work is through an agency (as it was in this case), I'd at least expect them to deliver a workable file, e.g. an already cleaned OCR-ed doc. Seems like there's still an awful lot of them who make living just from forwarding our work to clients. :///
But that's a never ending discussion, let's leave it here...

Thanks for your thoughts, I was initially afraid this thread will become one of those with xxxx views and 0 answers.icon_smile.gif

[Edited at 2014-06-22 12:07 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Text from source Word document missing in sdlxliff file

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search