Word doc in Tag Editor - rash of cf tags
Thread poster: Robin Thomson

Robin Thomson  Identity Verified
United Kingdom
Local time: 07:22
German to English
+ ...
Mar 10, 2011

Hello,

Currently I have a batch of Word documents to translate using Tag Editor (version 8, build 8.3.0.863, as issued with Trados 9 in early 2010).

Problem is that when I import the Word docs (Word 7 docx) into Tag Editor, there is a huge rash of 'cf' tags - often several in each word, almost between each letter. It's impossible to type around these and they make using the TM impossible.

Viewing the tags in full form, they are almost all to do with font settings - and each sprawls almost a whole line in length. Example: "cf font="Frutiger 45 Light" size="10" complexscriptsfont="Frutiger 45 Light" complexscriptssize="10" asiantextfont="Frutiger 45 Light" fontcolor="231F20" scale="108" spacing="-13"

The font in the original is indeed Frutiger 45 Light. However, the Word doc appears simple, with no obvious font changes. I admit to knowing little about tags, except that these particular ones seem to be spurious and irrelevant to any formatting etc.

So, does anybody know why these tags have appeared?
Are they necessary?
Can I get rid of them? - if so, how?

I tried translating the documents using Workbench (8.3.0.863) direct in Word 2010; the translation worked, but the cleanup failed as the 'Tag Editor tag structure' was wrong. So no further forward.

Operating system is Windows XP.

Would be grateful for any help. Hope this does not repeat other people's questions.

Thanks!
Robin Thomson


 

Emma Goldsmith  Identity Verified
Spain
Local time: 08:22
Member (2010)
Spanish to English
ocr Mar 10, 2011

These files are almost certainly ocr conversions of pdfs.
If you can, remove all formatting from the original doc file and then process with Tag Editor.
If not, use CodeZapper to do the best job possible.
HTH


 

Michal Glowacki  Identity Verified
Poland
Local time: 08:22
Member (2010)
English to Polish
+ ...
clean formatting Mar 10, 2011

As Emma rightly suggested, these tags can be removed either by removing formatting (you can mark the text and use the clear all formatting function in word) or Code Zapper. Either way, the number of tags should decrease significantly.

 

Heinrich Pesch  Identity Verified
Finland
Local time: 09:22
Member (2003)
Finnish to German
+ ...
Save as txt Mar 10, 2011

You loose all formatting but in some cases it is the easiest way.
If you translate in Word using Trados Wordbench or Wordfast Classic these tags don't matter, you don't even notice them.


 

xxxavsie  Identity Verified
Local time: 08:22
English to French
+ ...
OCR (2) Mar 11, 2011

When I use an OCR software to obtain a Word document from a PDF, this kind of tags soup is very common. My usual workaround is to save that file as RTF, then re-open it and save it as DOCX again. I found this to work quite well.

 

Robin Thomson  Identity Verified
United Kingdom
Local time: 07:22
German to English
+ ...
TOPIC STARTER
Thank you everybody for your help Mar 11, 2011

Many thanks to the people who kindly contributed to this post.

Yes, on looking closer I think these were OCR or converted pdf files, and their initial state in Word was a ghastly soup. I tried converting to rtf as Marie-Claude suggested (I notice that Workbench does this when cleaning files up) but this didn't get rid of the rogue code, or at least not enough of it. Using the 'remove formatting' function in Word was perhaps too literal and, while it did get rid of a lot of the bad code, it made the files so chaotic that I had to think again. As for Heinrich Pesch's suggestion, I did try using Workbench direct in Word, and there was no immediate problem (except that it was all very slow and laboured), but Workbench couldn't clean the files up and skipped them.

Finally I acted on Emma's and Michal's advice and went for Code Zapper - a set of macros that work within Word and are specifically designed for this kind of situation. It's the best I could have hoped for - thanks to David Turner for creating it - and has brought the files to a condition in which it will be possible to work on them, although they are still a bit delicate.

All your advice gratefully received!

Robin


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word doc in Tag Editor - rash of cf tags

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search