How to force Trados 2006 to save HTML files always in UTF-8?
Thread poster: Atso Puronen
Atso Puronen
Atso Puronen
Local time: 18:06
English to Finnish
Jul 25, 2008

Hi all,

We often receive HTML code snippets from our client and they seldomly have META charset declared. We know customer wants translations back in UTF-8 but telling this to TagEditor is another thing.

TagEditor must interpret English source files as US-ASCII and when translated into Finnish TagEditor changes the encoding to windows-1252. How can I change this default to UTF-8?

I have set TagEditor not to add META charset tag or change its value but of co
... See more
Hi all,

We often receive HTML code snippets from our client and they seldomly have META charset declared. We know customer wants translations back in UTF-8 but telling this to TagEditor is another thing.

TagEditor must interpret English source files as US-ASCII and when translated into Finnish TagEditor changes the encoding to windows-1252. How can I change this default to UTF-8?

I have set TagEditor not to add META charset tag or change its value but of course it doesn't prevent TagEditor from changing the actual encoding to a charset that is able to handle all characters or the translated file.

We know that adding the charset tag can be added prior to translation but in this case client DOES NOT WANT any tags added. Therefore we would every time need to omit the added charset tag afterwards.

Is there any way to force TagEditor to save HTML etc. files with UTF-8?
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 17:06
English to Polish
+ ...
Make it utf-8 before translation Jul 25, 2008

Atso Puronen wrote:

Hi all,

We often receive HTML code snippets from our client and they seldomly have META charset declared. We know customer wants translations back in UTF-8 but telling this to TagEditor is another thing.

TagEditor must interpret English source files as US-ASCII and when translated into Finnish TagEditor changes the encoding to windows-1252. How can I change this default to UTF-8?

I have set TagEditor not to add META charset tag or change its value but of course it doesn't prevent TagEditor from changing the actual encoding to a charset that is able to handle all characters or the translated file.

We know that adding the charset tag can be added prior to translation but in this case client DOES NOT WANT any tags added. Therefore we would every time need to omit the added charset tag afterwards.

Is there any way to force TagEditor to save HTML etc. files with UTF-8?


Are these files in UTF-8 before translation?

IF not convert them to UTF-8 before translation. It won't hurt their conents, provided that you don't use MS Word for it.

UltraEdit can handle conversion safely, but I'm afraid it can do it one by one only. But I'm sure there are other good (and even free) tools for encoding conversion.

HTH

Piotr


 
Daniel García
Daniel García
English to Spanish
+ ...
Yes convert in advance but beware of the BOM Jul 25, 2008

You can use MS Word to open any text file in any text encoding and convert it to any text encoding.

Be careful: Do not open HTML files in word as HTML. Open them as plain text.

I guess that this is what Piotr meant when he recommended not to use MS Word but you should be OK if you open the files as US ASCII.

Of course, UltraEdit is also another good option.

Keep in mind the following: to make sure that TagEditor opens the source file as UTF-8,
... See more
You can use MS Word to open any text file in any text encoding and convert it to any text encoding.

Be careful: Do not open HTML files in word as HTML. Open them as plain text.

I guess that this is what Piotr meant when he recommended not to use MS Word but you should be OK if you open the files as US ASCII.

Of course, UltraEdit is also another good option.

Keep in mind the following: to make sure that TagEditor opens the source file as UTF-8, the UTF-8 file must have either:

a) A BOM

b) A tag specifying the UTF-8 encoding.

You can convert the EN You can convert these text files from US-ASCII to UTF-8 with MS Word or with UltraEdit. This will add the BOM, which will make sure tha TagEditor opens the files as UTF-8.

Important thing: The target files will be in UTF-8 but TagEditor removes the BOM....

The translated UTF-8 files will not have a BOM and will have, from what you say, no tag marking them as UTF-8. I guess this might cause problems, depending on how they are use.

TRADOS 7.5 (or TRADOS 8, I am not sure), handle this much better. There is an option where you can specify in the INI files to open files as UTF-8 and save them as UTF-8 with or without the BOM.

Good luck!

Daniel
Collapse


 
Atso Puronen
Atso Puronen
Local time: 18:06
English to Finnish
TOPIC STARTER
Thanks Jul 28, 2008

Ok,

so it seems I cannot to this without manipulating the files. Trados 7.5 doesn't have the option to force UTF-8 encoding in INI settings.

We are accustomed to use "iconv" command line tool instead of UltraEdit or Word. We can batch convert files with a simple "for" loop and it speeds things up considerably.

Thanks a lot for your answers,
Atso Puronen


 
Celine Courcy (X)
Celine Courcy (X)  Identity Verified
Local time: 17:06
French to English
How to change the ini file to force UTF8 encoding? Oct 20, 2008

Atso Puronen wrote:

Trados 7.5 doesn't have the option to force UTF-8 encoding in INI settings.

...
Atso Puronen


Hi,

I'm using Trados 8 but the only reference to UTF8 I can find in the .ini file is:

"[ReaderSettings]
KeepLineBreaks=No
AllowTranslatableScripts=Yes
MetaCharsetTreatment=Add
KeepLineBreaksAfterPunctuation=No
XmlLangEmitMode=Leave
Utf8BomHandling=Preserve"

Are you referring to the tag settings .ini file or some other .ini file?


I desperately need to force Trados to always produce output files that have UTF8 encoding and adding the BOM or a tag in the source file declaring UTF8 is just not an option for my Client.

Does anyone know how to do this? I doubt this has any bearing on it, but just in case, the files are in XHTML1 transitional.


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:06
French to Polish
+ ...
External tools... Oct 20, 2008

Celine Courcy wrote:


I desperately need to force Trados to always produce output files that have UTF8 encoding and adding the BOM or a tag in the source file declaring UTF8 is just not an option for my Client.

Does anyone know how to do this? I doubt this has any bearing on it, but just in case, the files are in XHTML1 transitional. [/quote]

No way in Tag Editor.
1) you can convert your files to UTF-8 before you start the translation in TE.
For the BOM, see Advanced settings for HTML
In TE, Tools menu, Tag settings, in the dialog box, select HTML, then click Properties.
In the dialog box select the Elements tab, and click Advanced.
2) You can use SDLX instead of TE.

PS.
I used to make crazy conversions with DVX.
Create a project with empty databases and with the same source and target language, import your files, specify the output encoding, pretranslate (select the option 'Insert source text for failed portions') and export.
It sound strange but it works.
A 30-days demo is available.

Cheers
GG


 
Michael Watson
Michael Watson  Identity Verified
United Kingdom
Local time: 16:06
Member (2006)
German to English
+ ...
Thanks very much Daniel (Trados / TE 8) Nov 27, 2009

Excellent info about opening in word as UTF8 then saving as the same. It certainly worked for me. I have Trados TE 8 but can't see any option for forcing it to open html files with UTF8 encoding.

There are options in TE as discussed for: preserving but not adding the BOM (byte order mark), preserving it and adding it if not originally present and removing it if present.

I'm not sure about TE actually converting a file type just by removing or adding the BOM mark, surely
... See more
Excellent info about opening in word as UTF8 then saving as the same. It certainly worked for me. I have Trados TE 8 but can't see any option for forcing it to open html files with UTF8 encoding.

There are options in TE as discussed for: preserving but not adding the BOM (byte order mark), preserving it and adding it if not originally present and removing it if present.

I'm not sure about TE actually converting a file type just by removing or adding the BOM mark, surely the file has to be in that format anyway, or am I missing the point somewhere?
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to force Trados 2006 to save HTML files always in UTF-8?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »