how to analyse Word documents from web pages with lots of unnecessary text
Thread poster: Jenny Duthie

Jenny Duthie  Identity Verified
France
Local time: 02:38
French to English
Oct 22, 2009

Help a customer has sent me 41 Word documents to translate for his customer's website. The problem is the Trados (2007) analysis I carried out came up with a much higher wordcount, because the text to be translated is only part of the text in the documents: to explain, here is a sample;

Composition



GAOS® *


   Vitamine C naturelle :
30 mg

In this sample, the only words to be translated are "Composition" in the 2nd line and "Vitamine C naturelle" in the 2nd last line; in the original Word documents the text to be translated is in red, which does help a bit! But all of the documents look like this, and they're all several pages long and it looks like I'll have to delete all of the unnecessary text in order to come up with the correct word count for the analysis. Is there any way I can carry out the Trados analysis on the text in red only which is the text to be translated, or do I have to do it the slow way?! He wants the analysis this afternoon, I've explained the problem but he can't send the docs in another format!!

If anyone can think of a quick way to either copy and paste the red text into a new doc or another way to open the docs that would help, please let me know, thanks very much!! I tried opening the docs as html or xml but it hasn't made a difference. Maybe in Excel?


Direct link Reply with quote
 

Marinus Vesseur  Identity Verified
Canada
Local time: 17:38
English to Dutch
+ ...
Try TagEditor Oct 22, 2009

Have you tried TagEditor? It may recognize the tags for what they are and protect them. Once you opened the Word file with TagEditor it will create a ttx file. Try to analyse that file. If that doesn't work you may have to change the tag settings in TagEditor, but that is for someone else to explain.

Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 02:38
English to Czech
+ ...
Non-translatable style Oct 22, 2009

First, your customer should send you the file(s) in the original HTML format, not just copied to Word. Then you would be able to analyze them and translate easily in TagEditor.

Marinus's advice will not work, I am afraid. TagEditor will not recognize the tags and will treat them as regular text.

What you can do is apply the non-translatable style to the non-translatable text. Trados will then ignore it and count (and translate) only the translatable part.

Please contact me privately if you need help with this. Unfortunately, I don't have time to explain it right now, but I can do it very quickly.


Direct link Reply with quote
 

tectranslate ITS GmbH
Local time: 02:38
German
+ ...
Save the Word files as HTML Oct 22, 2009

Why don't you just save the Word files as text files and rename them so they have an .HTML suffix? That should do the trick just fine, because then you can translate them in TagEditor with the standard HTML settings that come with Trados.

Or do the files actually contain regular Word formatting as well?

HTH,

Benjamin


Direct link Reply with quote
 

Andreas Nieckele  Identity Verified
Brazil
Local time: 22:38
English to Portuguese
Broken layout Oct 22, 2009

Dear Jenny,

Unfortunately I cannot provide any advice on this situation, but I think the code you pasted has seriously affected the layout of this page. I recommend editing your post. Besides, the format of the tags is not entirely clear since they probably got messed up.

Well, maybe I can offer some advice: isn't it possible to perform a search and replace on Microsoft word, using wildcards (*, ?, etc.) or regular expressions to get rid of all the tags? Since they all probably start with "", it should be easy to replace "" with nothing.

[Edited at 2009-10-22 11:44 GMT]


Direct link Reply with quote
 

Jenny Duthie  Identity Verified
France
Local time: 02:38
French to English
TOPIC STARTER
reply to Benjamin's suggestion Oct 22, 2009

tectranslate wrote:

Why don't you just save the Word files as text files and rename them so they have an .HTML suffix? That should do the trick just fine, because then you can translate them in TagEditor with the standard HTML settings that come with Trados.

Or do the files actually contain regular Word formatting as well?

HTH,

Benjamin


Thanks Benjamin, I did try that but it didn't seem to work, but no, they don't have regular Word formatting, it is obvious they have been converted from html, as they look rather odd!
I will try this again.....


Direct link Reply with quote
 

Hynek Palatin  Identity Verified
Czech Republic
Local time: 02:38
English to Czech
+ ...
How to apply the non-translatable style Oct 22, 2009

1. Create a backup copy of the file.
2. Open the file in Word.
3. Run the sAddTagStyles macro (Alt-F8) to add the Trados styles to the document.
4. Search for all black text (ie. the tags that should be ignored) and replace it with the tw4winInternal style.

This is just a quick and dirty how-to; I would have to see the file to make sure everything goes well.


Direct link Reply with quote
 

Roy OConnor
Local time: 02:38
Member (2009)
German to English
An easy straightforward way... Oct 22, 2009

..is to replace the "garbage" with nothing. Move the cursor to a bit of "garbage", say the red text. Then click the font style menu to find out the exact colour. Then find/replace "Any character" in this colour with nothing. You will have to do it for each type of garbage text. Then save the remaining text under another name for the trados analysis.

Save the procedure as a macro, then you just call the macro up for each file.

It sounds involved, but it isn't really!


Direct link Reply with quote
 

Jenny Duthie  Identity Verified
France
Local time: 02:38
French to English
TOPIC STARTER
Roy's suggestion Oct 22, 2009

Roy O´Connor wrote:

..is to replace the "garbage" with nothing. Move the cursor to a bit of "garbage", say the red text. Then click the font style menu to find out the exact colour. Then find/replace "Any character" in this colour with nothing. You will have to do it for each type of garbage text. Then save the remaining text under another name for the trados analysis.

Save the procedure as a macro, then you just call the macro up for each file.

It sounds involved, but it isn't really!




Thanks Roy this does sound quite simple but how to I do "find/replace" "any character" in this colour with nothing, I can't see anything in the "find/replace" option that enables you to search text by colour...... all the offending text is black, btw, and the text I need to keep is red!


Direct link Reply with quote
 
Kate Chaffer
Italy
Local time: 02:38
Italian to English
How to delete black text Oct 22, 2009

Use 'find and replace' in Word.

Check 'Use wildcards'

In the 'find' part, type
Click Format
Click Character
Under 'Character colour' change the colour to the colour of the text to remove.

Leave the replace box empty (no formatting).

Click 'Replace all'.

Should work! Sorry if the terms are not exact but I'm using the Italian version of Word!


Direct link Reply with quote
 

Roy OConnor
Local time: 02:38
Member (2009)
German to English
Find/replace in Word Oct 22, 2009

Hi, Jenny,

In Word in the extended "find/replace" menu you enter the any character wildcard ("Beliebige Ziffer" in my German version) via the Special format (Sonderformat) button and the colour of the garbage text in Format/Font (Format/Zeichen) also in the the Find/Replace menu. Leave the Replace section blank. Word looks for all characters with this specified colour and changes them for nothing, i.e. it deletes them. Magic! That garbage style disappears!

Rgds,
Roy


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

how to analyse Word documents from web pages with lots of unnecessary text

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search