https://www.proz.com/forum/sdl_trados_support/93830-accents_not_displayed_when_cleaning_xml_in_tageditor.html

Accents not displayed when cleaning xml in TagEditor
Thread poster: iamsara
iamsara
iamsara
Local time: 20:59
English to Spanish
Jan 10, 2008

Hi, I have a XML file and the client didn´t provide the DTD file.

My problem is when I clean the document, accents are shown as ´. I want them to remain as accents (as they would if I were to edit the document in Dreamweaver) because the character is not recognized when trying to open the translated XML file. I´ve been trying different options in the default DTD (under the tab Entities) but no luck.

Anyone could help? Thanks in advance


 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 14:59
English to French
+ ...
Off the top of my head Jan 11, 2008

I may be completely off the track here, but let's try this anyway.

Maybe it isn't the DTD that's causing you troubles. It is possible that it is simply the encoding of your document that is not the correct one. If you simply change the encoding of the XML document and translate it again using your existing TM, that may fix it.

All the best!


 
iamsara
iamsara
Local time: 20:59
English to Spanish
TOPIC STARTER
Not working :( Jan 11, 2008

Thanks Viktoria, but I´ve tried that. There´s no heading in the XML nor anything refering to the encoding. I tried apening the xml in notepad and saving as the differents encodings possible, but without any luck.

 
Marek Buchtel
Marek Buchtel  Identity Verified
Czech Republic
Local time: 20:59
Member (2005)
English to Czech
+ ...
SITE LOCALIZER
Fiddle with DTD Jan 11, 2008

Hello,

You have probably already tried to change the DTD, but maybe you skipped a step in the procedure.

So here is what I would do:

When the file is opened in TE, go to Tools->Tag Settings and check, which setting is ticked off (has a red mark on it). That's the one used for the file.
Then CLOSE all files in TE, go to Tag Setting, select the respective file, go to Properties->Entities, and:
a) uncheck the box "Convert entities" - no entities wi
... See more
Hello,

You have probably already tried to change the DTD, but maybe you skipped a step in the procedure.

So here is what I would do:

When the file is opened in TE, go to Tools->Tag Settings and check, which setting is ticked off (has a red mark on it). That's the one used for the file.
Then CLOSE all files in TE, go to Tag Setting, select the respective file, go to Properties->Entities, and:
a) uncheck the box "Convert entities" - no entities will be converted (not even non-breaking space), if you find that you need some entities to be converted, check "Convert entities" and:
b) in "Entity sets", go to Added Latin 1 and in the right-hand panel, uncheck all entities, which you don't want to be converted.

Press OK, then again OK in the next window.

Now open the XML FILE - i.e. NOT the ttx file you have created before, but the original XML.
Translate a few segments, then save as target, to see if it works

HTH
Collapse


 
iamsara
iamsara
Local time: 20:59
English to Spanish
TOPIC STARTER
UTF-8 Jan 13, 2008

Hi, Marek, thanks for your help. I´m trying that but still no luck.

The thing is the original document´s encoding is UTF-8. Therefore, if I edit it in Dreamweaver or Notepad and enter special characters, there´s no prolem.

The problem arises when I edit it in TagEditor: it saves the cleaned document in Windows Western European (Windows). If I then open the documet in Dreamweaver and try to change the encoding back to UTF-8, it does not work, since TagEditor has cha
... See more
Hi, Marek, thanks for your help. I´m trying that but still no luck.

The thing is the original document´s encoding is UTF-8. Therefore, if I edit it in Dreamweaver or Notepad and enter special characters, there´s no prolem.

The problem arises when I edit it in TagEditor: it saves the cleaned document in Windows Western European (Windows). If I then open the documet in Dreamweaver and try to change the encoding back to UTF-8, it does not work, since TagEditor has chaged the accents to "acute;" and they stay like that.


BTW, I´ve checked in Dreamweaver and there´s no DTD in the document.
Collapse


 
Wojciech Froelich
Wojciech Froelich  Identity Verified
Poland
Local time: 20:59
English to Polish
It's not a change of encoding Jan 14, 2008

Dubloc wrote:

The problem arises when I edit it in TagEditor: it saves the cleaned document in Windows Western European (Windows). If I then open the documet in Dreamweaver and try to change the encoding back to UTF-8, it does not work, since TagEditor has chaged the accents to "acute;" and they stay like that.


TagEditor is not that stupid
It checks the encoding (it should be defined in the header of the file) and it will keep it or adjust it in target file (it will surely keep utf-8).

I guess the problem is the conversion of the entities. TagEditor will strictly follow the settings from INI file (you can access these settings from TagEditor, try modifying INI file with no document open) – all you have to do is to check the settings for entity conversions and leave it only for the XML-specific characters (usually it's also switched on for Latin-1 accented characters). Then you simply save target or clean the document.


 
Wojciech Froelich
Wojciech Froelich  Identity Verified
Poland
Local time: 20:59
English to Polish
No heading? Jan 14, 2008

Dubloc wrote:

There´s no heading in the XML nor anything refering to the encoding.


To get the target file in the same encoding (utf-8 in this case), you have to define the encoding in the source file and then open it again in TE. Of course you also have to take care of the appropriate entities conversion (INI file settings).


 
iamsara
iamsara
Local time: 20:59
English to Spanish
TOPIC STARTER
got it Jan 14, 2008

Hi Wojciech



TagEditor is not that stupid
It checks the encoding (it should be defined in the header of the file) and it will keep it or adjust it in target file (it will surely keep utf-8).



The file I’m translating does not have a heading of the type: ?xml version="1.0"?.

If I open it with notepad to Save as and change the encoding to UTF-8, it says the document is in ANSI. However, if I open them in DW, it says the encoding is UTF-8. But, if I open it in TagEditor, translate it and then check the document properties, it shows windows-1252 as the original encoding.

What I´ve done is to change the encoding by means of opening the document in notepad, save as UTF-8 so that TE says when it´s already translated that the original encoding was UTF-8 and not windows-1252 and uncheck entities conversion. It seems to work ok.

Thanks all!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Accents not displayed when cleaning xml in TagEditor


Translation news related to SDL Trados





Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »