Accents not displayed when cleaning xml in TagEditor
Thread poster: iamsara

iamsara
Local time: 16:10
English to Spanish
Jan 10, 2008

Hi, I have a XML file and the client didn´t provide the DTD file.

My problem is when I clean the document, accents are shown as &acute. I want them to remain as accents (as they would if I were to edit the document in Dreamweaver) because the character is not recognized when trying to open the translated XML file. I´ve been trying different options in the default DTD (under the tab Entities) but no luck.

Anyone could help? Thanks in advance


 

ViktoriaG  Identity Verified
Canada
Local time: 10:10
English to French
+ ...
Off the top of my head Jan 11, 2008

I may be completely off the track here, but let's try this anyway.

Maybe it isn't the DTD that's causing you troubles. It is possible that it is simply the encoding of your document that is not the correct one. If you simply change the encoding of the XML document and translate it again using your existing TM, that may fix it.

All the best!


 

iamsara
Local time: 16:10
English to Spanish
TOPIC STARTER
Not working :( Jan 11, 2008

Thanks Viktoria, but I´ve tried that. There´s no heading in the XML nor anything refering to the encoding. I tried apening the xml in notepad and saving as the differents encodings possible, but without any luck.

 

Marek Buchtel  Identity Verified
Czech Republic
Local time: 16:10
Member (2005)
English to Czech
+ ...
Fiddle with DTD Jan 11, 2008

Hello,

You have probably already tried to change the DTD, but maybe you skipped a step in the procedure.

So here is what I would do:

When the file is opened in TE, go to Tools->Tag Settings and check, which setting is ticked off (has a red mark on it). That's the one used for the file.
Then CLOSE all files in TE, go to Tag Setting, select the respective file, go to Properties->Entities, and:
a) uncheck the box "Convert entities" - no entities will be converted (not even non-breaking space), if you find that you need some entities to be converted, check "Convert entities" and:
b) in "Entity sets", go to Added Latin 1 and in the right-hand panel, uncheck all entities, which you don't want to be converted.

Press OK, then again OK in the next window.

Now open the XML FILE - i.e. NOT the ttx file you have created before, but the original XML.
Translate a few segments, then save as target, to see if it worksicon_smile.gif

HTH


 

iamsara
Local time: 16:10
English to Spanish
TOPIC STARTER
UTF-8 Jan 13, 2008

Hi, Marek, thanks for your help. I´m trying that but still no luck.

The thing is the original document´s encoding is UTF-8. Therefore, if I edit it in Dreamweaver or Notepad and enter special characters, there´s no prolem.

The problem arises when I edit it in TagEditor: it saves the cleaned document in Windows Western European (Windows). If I then open the documet in Dreamweaver and try to change the encoding back to UTF-8, it does not work, since TagEditor has chaged the accents to "acute;" and they stay like that.


BTW, I´ve checked in Dreamweaver and there´s no DTD in the document.


 

Wojciech Froelich  Identity Verified
Poland
Local time: 16:10
English to Polish
It's not a change of encoding Jan 14, 2008

Dubloc wrote:

The problem arises when I edit it in TagEditor: it saves the cleaned document in Windows Western European (Windows). If I then open the documet in Dreamweaver and try to change the encoding back to UTF-8, it does not work, since TagEditor has chaged the accents to "acute;" and they stay like that.


TagEditor is not that stupidicon_wink.gif
It checks the encoding (it should be defined in the header of the file) and it will keep it or adjust it in target file (it will surely keep utf-8).

I guess the problem is the conversion of the entities. TagEditor will strictly follow the settings from INI file (you can access these settings from TagEditor, try modifying INI file with no document open) – all you have to do is to check the settings for entity conversions and leave it only for the XML-specific characters (usually it's also switched on for Latin-1 accented characters). Then you simply save target or clean the document.


 

Wojciech Froelich  Identity Verified
Poland
Local time: 16:10
English to Polish
No heading? Jan 14, 2008

Dubloc wrote:

There´s no heading in the XML nor anything refering to the encoding.


To get the target file in the same encoding (utf-8 in this case), you have to define the encoding in the source file and then open it again in TE. Of course you also have to take care of the appropriate entities conversion (INI file settings).


 

iamsara
Local time: 16:10
English to Spanish
TOPIC STARTER
got it Jan 14, 2008

Hi Wojciech



TagEditor is not that stupidicon_wink.gif
It checks the encoding (it should be defined in the header of the file) and it will keep it or adjust it in target file (it will surely keep utf-8).



The file I’m translating does not have a heading of the type: ?xml version="1.0"?.

If I open it with notepad to Save as and change the encoding to UTF-8, it says the document is in ANSI. However, if I open them in DW, it says the encoding is UTF-8. But, if I open it in TagEditor, translate it and then check the document properties, it shows windows-1252 as the original encoding.

What I´ve done is to change the encoding by means of opening the document in notepad, save as UTF-8 so that TE says when it´s already translated that the original encoding was UTF-8 and not windows-1252 and uncheck entities conversion. It seems to work ok.icon_smile.gif

Thanks all!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Accents not displayed when cleaning xml in TagEditor

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search