Pages in topic:   [1 2] >
weird marks in tageditor with HTML files
Thread poster: Gilbert Lin

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
Jun 3, 2010

Dear all:

while translating some HTML files with tageditor (I simply drag them to tageditor and selected the default HTML filter), characters such as quotes, dashes or plus and minus were recognized as some weird symbols.

Examples as:

"The patient’s name is displayed in red."

"Select the image number (or time, b–value, or ppm) by simply moving the mouse cursor to the left or right on the graph view."

and "Correlation coefficient in range –1.0 to 1.0"

These symbols are really annoying for me to understand the original text, and I have to open the original HTML all the times.

Any advice is greatly appreciated.

Thank you in advance!

Yun Lin


[Edited at 2010-06-03 13:20 GMT]


Direct link Reply with quote
 
Adam Łobatiuk  Identity Verified
Poland
Local time: 04:07
Member (2009)
English to Polish
+ ...
Encoding mismatch Jun 4, 2010

The file encoding might be different than the encoding declared in the HTML files. For example, the HTML files might be ANSI text files, but the encoding declaration in the HTML header might say UTF-8. To check (and correct that), you could open such an HTML file in Notepad, choose Save as and see the encoding in one of the drop-down lists.

Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
Thanks, it's UTF-8 indeed, how should I fix it? Jun 4, 2010

Adam Łobatiuk wrote:

The file encoding might be different than the encoding declared in the HTML files. For example, the HTML files might be ANSI text files, but the encoding declaration in the HTML header might say UTF-8. To check (and correct that), you could open such an HTML file in Notepad, choose Save as and see the encoding in one of the drop-down lists.


You are right, I opened one HTML file in notepad and the first line came up with an "UTF-8". The problem is that the project is a relatively large one, with over 100 HTML files. I am wondering if I could change the setting of the filter rather than the 100 HTML files.

Could Mr. Lobatiuk or anybody kindly direct me a way to do this?

Thanks in advance

Yun Lin


Direct link Reply with quote
 

Antoní­n Otáhal
Local time: 04:07
Member (2005)
English to Czech
+ ...
replace strings in many files Jun 4, 2010

I use UltraEdit, but I am pretty sure there are other "Notepad extensions" which enable you to replace a particular string with another in many files simultaneously.

Antonin


Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
Follow up questions Jun 4, 2010

Antoní­n Otáhal wrote:

I use UltraEdit, but I am pretty sure there are other "Notepad extensions" which enable you to replace a particular string with another in many files simultaneously.

Antonin



Great, thanks! I have a follow up question.

1. Could you give me a stepwise instruction of an available software?

2. Will changing the encoding declaration affect the compatibility with client?


Direct link Reply with quote
 
opolt  Identity Verified
Germany
Local time: 04:07
English to German
+ ...
Pls. forget Notepad Jun 4, 2010

Pls. forgive me, but for some reason I do not quite understand, Windows users seem to insist on using Notepad, which hasn't changed much since Windows 3.1 or 3.0 (!), even though there are countless other (free) editors out there with much better features, including support for virtually all existing encodings and unlimited file sizes (for all practical purposes). Please, do yourself a favour and get a decent editor, such as jedit (Java-based, www.jedit.org). Notepad is not much more than a typewriter in software.

A good text editor should be in the toolbox of every translator out there, IMHO. You will need it once in a while to handle all those weird formats and the bugs introduced by CAT tools, etc.

Yun, if the original files declare UTF8 as their encoding, you would normally make sure that your editor saves them in that particular encoding. Only a proper text editor can give you full control over that. This issue is of particular importance when handling complex scripts, such as those of East-Asian languages.

Cheers and good luck!


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:07
French to Polish
+ ...
Metainformation error... Jun 4, 2010

Yun Lin wrote:

2. Will changing the encoding declaration affect the compatibility with client?

In your case, probably yes.
Your files are already UTF-8, the conversion to ANSI will be lossy.
You should rather force Trados to recognize UTF-8 properly.

Trados has a bug (well, a feature...) related to the metainformation related to the charset.
E.g., if the line as
meta http-equiv="Content-Type" content="text/html; charset=utf-8"
is malformed or missing, Trados will default to your Windows code page if BOM (Byte Order Mark) is not present (it's your case, 100% sure) and, in some cases, will be unable to interprete tags.
You need some basic HTML/XML knowledge.
You should check the headers and, if necessary, rewrite files with BOM.

Can you post the header?

BTW.
Most tools will have problems with the encoding of this kind of malformed files but some of them, e.g. memoQ or Wordfast Pro will recognize the tags correctly.

Cheers
GG

[Edited at 2010-06-04 11:49 GMT]


Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
My header Jun 4, 2010

Grzegorz Gryc wrote:

BTW.
Most tools will have problems with the encoding of this kind of malformed files but some of them, e.g. memoQ or Wordfast Pro will recognize the tags correctly.

Cheers
GG

[Edited at 2010-06-04 11:49 GMT]


Hi GG:

Thanks for you kind reply, here's my header:

?xml version="1.0" encoding="utf-8"?
html xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" MadCap:conditions="Primary.Translated_operator_manual" MadCap:check_list="First Draft,Added To TOC,Links,Browse Sequences" MadCap:comment="MZ edit 2.16.08" MadCap:priority="90" MadCap:lastBlockDepth="5" MadCap:lastHeight="664" MadCap:lastWidth="523" MadCap:conditionTagExpression="include["Primary.In MR450 and not in MR750"] exclude["Primary.In_MR750 and not in MR450 1.5T"] " class=""

Since the Trados is mandatory on this project, could you give me a hand on how I should force my trados tageditor to recognize the utf-8 encoding?

Any advice is greatly appreciated.
Thank you in advance!

Yun Lin

[Edited at 2010-06-04 14:28 GMT]

[Edited at 2010-06-04 14:29 GMT]

[Edited at 2010-06-04 14:30 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:07
French to Polish
+ ...
MadCap Flare... BOM... Jun 4, 2010

Yun Lin wrote:

Grzegorz Gryc wrote:

BTW.
Most tools will have problems with the encoding of this kind of malformed files but some of them, e.g. memoQ or Wordfast Pro will recognize the tags correctly.


Thanks for you kind reply, here's my header:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns:MadCap="http://www.madcapsoftware.com/Schemas/MadCap.xsd" (...) >

I see.
It's MadCap Flare HTML-like format.
Make sure you declared the MadCap Flare ini in Trados.
Otherwise you and your customer may be surprised.

Since the Trados is mandatory on this project, could you give me a hand on how I should force my trados tageditor to recognize the utf-8 encoding?

It will be sufficient to write UTF-8 BOM in all the files.
Normally, I use a small Polish tool for it (which is not a good option for you) but you may use the 30-day UltraEdit trial, as Antonin suggested.
It's a monster tool
For more details, see e.g.:
http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/unicode.html

Cheers
GG

[Edited at 2010-06-04 14:56 GMT]


Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
Not quite understand Jun 4, 2010

Sorry that I need some more details:

Grzegorz Gryc wrote:

Make sure you declared the MadCap Flare ini in Trados.
Otherwise you and your customer may be surprised.

Cheers
GG

[Edited at 2010-06-04 14:56 GMT]


1. How do I declare it in Trados?

2. After declaring the MadCap Flare, I still need to write utf-8 into UTF-8 BOM in the header of all the files?

Thank you!

YL


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:07
French to Polish
+ ...
Declaring ini... Jun 4, 2010

Yun Lin wrote:

Sorry that I need some more details:

Grzegorz Gryc wrote:

Make sure you declared the MadCap Flare ini in Trados.
Otherwise you and your customer may be surprised.

1. How do I declare it in Trados?

First at all, ask your customer.
He should give you the file.

Then see the Trados help (quoted below)...

Use the Tag Settings Manager to manage tag settings files for XML/HTML/SGML files in your translation projects. The Tag Settings Manager allows you to add and remove tag settings files. Use the Tag Settings Wizard to create new or edit existing tag settings files.

You can access both the manager and the wizard directly from Translator's Workbench and TagEditor.

In TagEditor, select Tag Settings from the Tools menu.

In Translator's Workbench, select Translation Memory Options from the Options menu, select the Tools tab, and click Tag Settings.


2. After declaring the MadCap Flare, I still need to write utf-8 into UTF-8 BOM in the header of all the files?

Yes, the BOM must be present, the MadCap Flare "HTML" doesn't contain valid HTML charset declaration, so why the default code page is used.
In fact, it's a XML file, not HTML.

BTW.
Which exactly Trados version are you using?
The older versions will be unable to handle UTF-8 files AFAIR.

Cheers
GG


Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
Trados 2007 Jun 5, 2010

Thanks GG:

I am currently using Trados 2007 freelance, I'll see to it if I can fix the problem, and get back to you.

Many thanks for you kind help and great patience.

Best

YL


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:07
French to Polish
+ ...
T2007 should be OK... Jun 5, 2010

Yun Lin wrote:

I am currently using Trados 2007 freelance.


So, it should work.

Cheers
GG


Direct link Reply with quote
 

Gilbert Lin  Identity Verified
China
Local time: 11:07
Member (2009)
English to Chinese
+ ...
TOPIC STARTER
Solution Jun 8, 2010

Just in case if any one encounter the same problem,I put a Chinese solution I get from colleague working for Boffin as follows:

原因:由于 trados 2007(7.x) 和 2008(8.x) 都不支持 Encoding 为 UTF-8 格式(注意,与 Charset 没有关系) 的 html 文件。如果直接将 utf-8 的 html 拖进 tageditor,tageditor 会强制将 html 文件转换成 ansi 编码格式,并在文件头部留下 utf-8 的 BOM (Byte Order Mark,中文系统下会显示为乱码)。另外,由于我们工作的时候采用的是简体中文系统,系统默认的字符集是 GB2312,显示时将会把 ansi 编码格式的一些特殊字符转换成中文(也就是我们看到的乱码)。

解决方法:将 HTML 文件的 Encoding 从 utf-8 转为 unicode (即 utf-16),之后再用 trados tageditor 打开&翻译。翻译完成后,再把最终的 html 文件的 encoding 转回 utf-8。不要告诉我你连 encoding 转换都不会。:-)

Great thanks to GG and Fei Yang for helping me.

[Edited at 2010-06-08 10:09 GMT]


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:07
French to Polish
+ ...
Huh... Jun 8, 2010

Yun Lin wrote:

Just in case if any one encounter the same problem,I put a Chinese solution I get from colleague working for Boffin as follows:

原因:由于 trados 2007(7.x) 和 2008(8.x) 都不支持 Encoding 为 UTF-8 格式(注意,与 Charset 没有关系) 的 html 文件。如果直接将 utf-8 的 html 拖进 tageditor,tageditor 会强制将 html 文件转换成 ansi 编码格式,并在文件头部留下 utf-8 的 BOM (Byte Order Mark,中文系统下会显示为乱码)。另外,由于我们工作的时候采用的是简体中文系统,系统默认的字符集是 GB2312,显示时将会把 ansi 编码格式的一些特殊字符转换成中文(也就是我们看到的乱码)。

解决方法:将 HTML 文件的 Encoding 从 utf-8 转为 unicode (即 utf-16),之后再用 trados tageditor 打开&翻译。翻译完成后,再把最终的 html 文件的 encoding 转回 utf-8。不要告诉我你连 encoding 转换都不会。:-)

Great thanks to GG and Fei Yang for helping me.


Well.
We would appreciate an English version here...

Cheers
GG


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

weird marks in tageditor with HTML files

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search