Mobile menu

Can we convert a web page from ISO-8859-1 to UTF-8?
Thread poster: Medworks
Medworks
United States
Local time: 14:28
Italian to English
+ ...
Sep 27, 2007

Hello!

An agency that gives me large volume of localization work is no longer accepting rtf format and is asking me to use Trados, so I am.

I actually like Trados better now, and have been working fine with their TM and DTD in html format, and am able to upload the completed translation files in .ttx format.

Yet, they also asked if I know how to convert a web page from ISO-8859-1 to UTF-8.

So far, I noticed that the pages I've received said UTF-8 in Trados TagEditor.

Can I convert format with trados TagEditor?


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 23:28
Member (2005)
English to Polish
+ ...
UltraEdit Sep 27, 2007

Martha wrote:

Hello!

An agency that gives me large volume of localization work is no longer accepting rtf format and is asking me to use Trados, so I am.

I actually like Trados better now, and have been working fine with their TM and DTD in html format, and am able to upload the completed translation files in .ttx format.

Yet, they also asked if I know how to convert a web page from ISO-8859-1 to UTF-8.

So far, I noticed that the pages I've received said UTF-8 in Trados TagEditor.

Can I convert format with trados TagEditor?


Open the web page in UltraEdit, and save as UTF-8.

You will also have to change the:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

This tag can be found near the beginning of the file.

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.

HTH

Piotr


Direct link Reply with quote
 
Daniel García
English to Spanish
+ ...
Microsoft Word should also do the trick Sep 27, 2007

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.


Although using UltraEdit is an excellent tip, I would do it with MS Word for the very simple reason that I feel more confortable with Word's macro language than with UltraEdit's macros.

Because the job is likely to involve a big number of files, I would try to use some automated solution.

Daniel


Direct link Reply with quote
 
Medworks
United States
Local time: 14:28
Italian to English
+ ...
TOPIC STARTER
Brilliant!! Sep 27, 2007

Thank you Daniel!!

Your response was brilliant in it's simplicity!
I opened up the web files with Word and followed these steps:

-----------------------------------
Tools> Options> General> Web Options> Encoding
------------------------------------

It asked for target code to reload and save!
That's it! Very simple..

To test I previewed the different settings and the writing/characters appeared correctly (Western European or unicode UTF-8 encoding) but went funny on me with foreign formats like (big-endian), so it works!

Thank you Piotr, I will also opened the file with Notepad to verify the tag

Loved it... Thanks again


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 01:28
Member (2006)
English to Russian
+ ...
Do not do that! Sep 27, 2007

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


Direct link Reply with quote
 
Medworks
United States
Local time: 14:28
Italian to English
+ ...
TOPIC STARTER
Even better!! :) Sep 27, 2007

esperantisto wrote:

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


Hehe... you're right! When I opened the file with notepad earlier, I immediately saw the extra code. It even included my name, the time and date of last modification, etc. No, that's not good at all! Too bad, it seemed so nice and simple!

I looked in to www.jedit.org and saw the program. It's even a free opensource program! Thanks!

I'll download it and give it a try (and compare the code)

All my Italian files have had the right UTF-8 tag, but it's good to know what to do, just in case. Thanks again!...


Direct link Reply with quote
 
Medworks
United States
Local time: 14:28
Italian to English
+ ...
TOPIC STARTER
It couldn't open Trados file.. Sep 27, 2007

I downloaded and ran jEdit and it looks like a good program

went to utilities>global options> encoding

Though it opened my sample html files, it could not open the files I receive from the agency which are formated for Trados TagEditor...

In Trados TagEditor, I noticed that at the very top of the file when I'm translating it says:

----------------------------------------------------
meta...content= text/html;charset=utf-8
-----------------------------------------------------

If there were other type of character codes, couldn't I just write utf-8 when needed and everything would be alright????

Of course, preview to double-check that the writing appears correctly.


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 23:28
Member (2004)
English to Polish
Unifier? Sep 27, 2007

This program, Unifier, seems to do exactly what you want:

http://www.melody-soft.com/html/unifier.html

It converts batches, adds the appropriate tags, etc. Note that I haven't tried it, so cannot tell how good it is (better backup your files first...).

Also, there might be some freeware tools, but haven't found them yet...

Edit:

I have just noticed that you are trying to convert the ttx files themselves instead of html files. I am not sure it is a good idea, as the encoding in ttx itself (i.e. xml) might be different than the input/output html files.

[Edited at 2007-09-27 11:38]


Direct link Reply with quote
 
Daniel García
English to Spanish
+ ...
Sorry! I meant "open as encoded text" with MS-Word! :-( Sep 27, 2007

Martha wrote:

esperantisto wrote:

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


Hehe... you're right! When I opened the file with notepad earlier, I immediately saw the extra code. It even included my name, the time and date of last modification, etc. No, that's not good at all! Too bad, it seemed so nice and simple!

I looked in to www.jedit.org and saw the program. It's even a free opensource program! Thanks!

I'll download it and give it a try (and compare the code)

All my Italian files have had the right UTF-8 tag, but it's good to know what to do, just in case. Thanks again!...


Sorry! I meant open the HTML files from MS Word as "encoded text" to do the conversion, not as HTML. I should have explained more clearly...

Of course, sperantiso is right and you have seen it yourself. Opening and HTML as HTML will insert a lot of code...

Apologies for the confusion again...

Daniel


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 01:28
Member (2006)
English to Russian
+ ...
Glad that you like jEdit Sep 27, 2007

Forgot to say, download and use the latest 4.3 pre10 version. Although it's said to be unstable, that seems to be just developers' caution: I find it rock stable and fairly improved compared to 4.2. Note also that there are lots of plugins that can make your life a bit more comfortable. Just explore the respective section of their site.

Direct link Reply with quote
 

Robert Tucker
United Kingdom
Local time: 22:28
German to English
+ ...
iconv Sep 27, 2007

On Linux I would probably change the coding with the Bluefish html editor but I could also do it with iconv, which normally comes with Linux. It is, however, also available online:

http://www.iconv.com/iconv.htm

and can be installed on Windows:

http://gnuwin32.sourceforge.net/packages/libiconv.htm


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 23:28
Member (2005)
English to Polish
+ ...
No macros needed for that in UltraEdit Sep 27, 2007

dgmaga wrote:

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.


Although using UltraEdit is an excellent tip, I would do it with MS Word for the very simple reason that I feel more confortable with Word's macro language than with UltraEdit's macros.

Because the job is likely to involve a big number of files, I would try to use some automated solution.

Daniel


If Word does that without making any additional problems, then it's alright with me, but Word is known for causing problems with HTML files and introducing its own peculiar markup.

In UltraEdit it's a simple open and Save As (F12) operation with changing a few characters in the meta tag. No macros/scripts required.

Of course if you want to convert a lot of files, you need a batch converter.

Regards, Piotr

[Edited at 2007-09-27 19:57]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can we convert a web page from ISO-8859-1 to UTF-8?

Advanced search


Translation news related to SDL Trados





memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs