Word doc in Tag Editor - rash of cf tags
Thread poster: Robin Thomson

Robin Thomson  Identity Verified
United Kingdom
Local time: 02:20
German to English
+ ...
Mar 10, 2011

Hello,

Currently I have a batch of Word documents to translate using Tag Editor (version 8, build 8.3.0.863, as issued with Trados 9 in early 2010).

Problem is that when I import the Word docs (Word 7 docx) into Tag Editor, there is a huge rash of 'cf' tags - often several in each word, almost between each letter. It's impossible to type around these and they make using the TM impossible.

Viewing the tags in full form, they are almost all to do with font settings - and each sprawls almost a whole line in length. Example: "cf font="Frutiger 45 Light" size="10" complexscriptsfont="Frutiger 45 Light" complexscriptssize="10" asiantextfont="Frutiger 45 Light" fontcolor="231F20" scale="108" spacing="-13"

The font in the original is indeed Frutiger 45 Light. However, the Word doc appears simple, with no obvious font changes. I admit to knowing little about tags, except that these particular ones seem to be spurious and irrelevant to any formatting etc.

So, does anybody know why these tags have appeared?
Are they necessary?
Can I get rid of them? - if so, how?

I tried translating the documents using Workbench (8.3.0.863) direct in Word 2010; the translation worked, but the cleanup failed as the 'Tag Editor tag structure' was wrong. So no further forward.

Operating system is Windows XP.

Would be grateful for any help. Hope this does not repeat other people's questions.

Thanks!
Robin Thomson


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 03:20
Member (2010)
Spanish to English
ocr Mar 10, 2011

These files are almost certainly ocr conversions of pdfs.
If you can, remove all formatting from the original doc file and then process with Tag Editor.
If not, use CodeZapper to do the best job possible.
HTH


Direct link Reply with quote
 

Michal Glowacki  Identity Verified
Poland
Local time: 03:20
Member (2010)
English to Polish
+ ...
clean formatting Mar 10, 2011

As Emma rightly suggested, these tags can be removed either by removing formatting (you can mark the text and use the clear all formatting function in word) or Code Zapper. Either way, the number of tags should decrease significantly.

Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 04:20
Member (2003)
Finnish to German
+ ...
Save as txt Mar 10, 2011

You loose all formatting but in some cases it is the easiest way.
If you translate in Word using Trados Wordbench or Wordfast Classic these tags don't matter, you don't even notice them.


Direct link Reply with quote
 

avsie  Identity Verified
Local time: 03:20
English to French
+ ...
OCR (2) Mar 11, 2011

When I use an OCR software to obtain a Word document from a PDF, this kind of tags soup is very common. My usual workaround is to save that file as RTF, then re-open it and save it as DOCX again. I found this to work quite well.

Direct link Reply with quote
 

Robin Thomson  Identity Verified
United Kingdom
Local time: 02:20
German to English
+ ...
TOPIC STARTER
Thank you everybody for your help Mar 11, 2011

Many thanks to the people who kindly contributed to this post.

Yes, on looking closer I think these were OCR or converted pdf files, and their initial state in Word was a ghastly soup. I tried converting to rtf as Marie-Claude suggested (I notice that Workbench does this when cleaning files up) but this didn't get rid of the rogue code, or at least not enough of it. Using the 'remove formatting' function in Word was perhaps too literal and, while it did get rid of a lot of the bad code, it made the files so chaotic that I had to think again. As for Heinrich Pesch's suggestion, I did try using Workbench direct in Word, and there was no immediate problem (except that it was all very slow and laboured), but Workbench couldn't clean the files up and skipped them.

Finally I acted on Emma's and Michal's advice and went for Code Zapper - a set of macros that work within Word and are specifically designed for this kind of situation. It's the best I could have hoped for - thanks to David Turner for creating it - and has brought the files to a condition in which it will be possible to work on them, although they are still a bit delicate.

All your advice gratefully received!

Robin


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word doc in Tag Editor - rash of cf tags

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search