Tag Overload! Word document returns a great amount of tags...
Thread poster: Kjersti Farrier

Kjersti Farrier  Identity Verified
United Kingdom
Local time: 04:36
English to Norwegian (Bokmal)
+ ...
May 22, 2010

... when I try to translate it is TRADOS.
The customer wants a bilingual .ttx and the translated Word document in return, therefore since I use TRADOS Studio 2009 I convert the Word doc to .ttx (in Trados Workbench) for then to open it in Studio 2009. I have done this procedure lots of times without any issues.
However this one Word document is rather curious. There are literally tags between each letter, showing the font and size etc. Even if I just open the file with TagEditor (wh
... See more
... when I try to translate it is TRADOS.
The customer wants a bilingual .ttx and the translated Word document in return, therefore since I use TRADOS Studio 2009 I convert the Word doc to .ttx (in Trados Workbench) for then to open it in Studio 2009. I have done this procedure lots of times without any issues.
However this one Word document is rather curious. There are literally tags between each letter, showing the font and size etc. Even if I just open the file with TagEditor (which I am not all that familiar with, but know that this is not right) it shows all these tags.


This document

- This is an example from TagEditor. If you cannot quite make it out, it actually only says: This document...

So a 3000 words document turns into over 10000 words in Trados... Not quite as it should be in other words. I have tried to open other documents the same way just now (to make sure that it is not some setting which I have changed), but there is no such problem. I even opened other documents from the same customer (in case there was something odd with the characters since the client is in CHina), but those documents are fine. Just this one...

One more thing, I even copied a short part of the document into a new Word document, changed the font type, saved it and opened it in TagEditor - tag overflod again!

Any sort of help would be highly useful, this is a real showstopper for me, I can hardly read what the segments say.

Many many thanks!
Kjersti
Collapse


 

Jerzy Czopik  Identity Verified
Germany
Local time: 05:36
Member (2003)
Polish to German
+ ...
Word formatting is very bad May 22, 2010

Here is the text you wanted to show:
<paragraph style="Normal" font="Calibri" size="11" asiantextfont="Calibri" complexscriptsfont="Calibri" complexscriptssize="11">
<cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8" spacing="-0.05">Thi</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8">s</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8" spacing="0.1"> </cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8">d</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8" spacing="-0.1">o</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8">c</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8" spacing="-0.05">u</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8">me</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8" spacing="-0.05">n</cf><cf font="Verdana" size="8" complexscriptsfont="Verdana" complexscriptssize="8">t </cf>


This forum does interpret signs like < and > as tags instead of what you see here. If you wanna use those, use entities - & lt ; and & gt ; (without spaces).

Before you open the doc in TagEditor, please open it in Word.
Select the whole text with CTRL+A, now press CTRL+D. Select Verdana as a font, do not change any other setting here. Now change to "Character spacing" (I think it is called so - the second tab in font menu), then make the font width 100% and set spacing to "Normal".
Press OK. Your document should now have only few tags.


 

Jerzy Czopik  Identity Verified
Germany
Local time: 05:36
Member (2003)
Polish to German
+ ...
DO not change font type May 22, 2010

Now I see the document uses different fonts - so do not set any font type, but just go to font properties and make it 100% and normal in spacing.

 

Kjersti Farrier  Identity Verified
United Kingdom
Local time: 04:36
English to Norwegian (Bokmal)
+ ...
TOPIC STARTER
Thank you - it worked wonders! May 22, 2010

Jarzy,
thank you very much - I appreciate your time
So it was the settings of the document than, not my Word settings?
TagEditor shows it all very strangely still. There seem to be some hidden text which shows up in TagEditor but not in Word. Is that possible?

After some further research I found that it is the text boxes in the document which TRADOS picks up, but I cannot access them in any other way
... See more
Jarzy,
thank you very much - I appreciate your time
So it was the settings of the document than, not my Word settings?
TagEditor shows it all very strangely still. There seem to be some hidden text which shows up in TagEditor but not in Word. Is that possible?

After some further research I found that it is the text boxes in the document which TRADOS picks up, but I cannot access them in any other way - I am quite impressed that TRADOS can... am I missing something here? Clearly I am...
Collapse


 

Kjersti Farrier  Identity Verified
United Kingdom
Local time: 04:36
English to Norwegian (Bokmal)
+ ...
TOPIC STARTER
It was headers... May 22, 2010

Headers in the document tricked me, of some reason (which I am trying to clarify with the customer at the moment) some of the text is headers, although it should clearly be a part of the normal text - like it is a list of point from 1 to 6 where number 5 is missing (because it has ended up as a header...), I didn't see them originally and they are not part of the selected text when I want to correct the Character Spacing as suggested by Jerzy (Sorry, I misspelled your name, Jerzy, the first time... See more
Headers in the document tricked me, of some reason (which I am trying to clarify with the customer at the moment) some of the text is headers, although it should clearly be a part of the normal text - like it is a list of point from 1 to 6 where number 5 is missing (because it has ended up as a header...), I didn't see them originally and they are not part of the selected text when I want to correct the Character Spacing as suggested by Jerzy (Sorry, I misspelled your name, Jerzy, the first time.), therefore those segments are still flooded by tags in TRADOS.

SO not yet a solution, but at least the problem has been found and I am now waiting for a new document from the client.

Phew!
Collapse


 

Jerzy Czopik  Identity Verified
Germany
Local time: 05:36
Member (2003)
Polish to German
+ ...
Same old story May 22, 2010

That's all Word, or better all Word user.
If you install Word out of the box, in its wisdom it guesses what the user wants to do and assigns styles without asking. Now the user sees, that his formatting has changed for obviously no reason, so he calls Word the most stupid software in the world and reformats the text. But he leaves the style...
It's mostly as easy as that - 90% of Word documents we get are poorly formatted.
If you are in the position of having the source file, y
... See more
That's all Word, or better all Word user.
If you install Word out of the box, in its wisdom it guesses what the user wants to do and assigns styles without asking. Now the user sees, that his formatting has changed for obviously no reason, so he calls Word the most stupid software in the world and reformats the text. But he leaves the style...
It's mostly as easy as that - 90% of Word documents we get are poorly formatted.
If you are in the position of having the source file, you can still reformat that before you start. But otherwise you're left with all those problems in CAT.
I hope CAT producers will follow SDL's example and write filters for all formats, which will remove such ridicuolous superfluous formatting codes. For the time being the InDesign INX-filter in Studio does that.
Collapse


 

Paul Tindall  Identity Verified
Local time: 04:36
French to English
Unnecessary tags in converted Word document: complexscriptsfont Dec 17, 2011

Hi, I have a similar problem. I was able to clear up the spacing tags easily enough because could see from the tag text what they referred to, but I also have a lot of tags relating to "complexscriptsfont" and "complexscriptssize". I don't understand what these are or how to get rid of them, so would be grateful for any help!

Thanks


 

Jerzy Czopik  Identity Verified
Germany
Local time: 05:36
Member (2003)
Polish to German
+ ...
Tried to tidy up the formatting in Word? Dec 17, 2011

First make sure the styles do match the local formatting.
Check the Normal style and make sure the font there is the same as the one chosen in the document.
Then press CTRL+A, followed by CTRL+D.
Select the fonrt to be used in the document and press OK.
Press CTRL+A and CTRL+D once again. Change to Character spacing tab and set Scale to 100% and Spacing to normal. Then deselect "Kerning" and press OK.
Save the file. This should already remove 80% of unnecessary tags
... See more
First make sure the styles do match the local formatting.
Check the Normal style and make sure the font there is the same as the one chosen in the document.
Then press CTRL+A, followed by CTRL+D.
Select the fonrt to be used in the document and press OK.
Press CTRL+A and CTRL+D once again. Change to Character spacing tab and set Scale to 100% and Spacing to normal. Then deselect "Kerning" and press OK.
Save the file. This should already remove 80% of unnecessary tags.
To get rid of Complexscriptfont completely you can try to open the file in OpenOffice and save back as doc.
Collapse


 

Paul Tindall  Identity Verified
Local time: 04:36
French to English
Thanks! Dec 17, 2011

Many thanks Jerzy

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Tag Overload! Word document returns a great amount of tags...

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search