Can we convert a web page from ISO-8859-1 to UTF-8?
Thread poster: Medworks
Medworks
Medworks
United States
Local time: 14:24
Italian to English
+ ...
Sep 27, 2007

Hello!

An agency that gives me large volume of localization work is no longer accepting rtf format and is asking me to use Trados, so I am.

I actually like Trados better now, and have been working fine with their TM and DTD in html format, and am able to upload the completed translation files in .ttx format.

Yet, they also asked if I know how to convert a web page from ISO-8859-1 to UTF-8.

So far, I noticed that the pages I've received said UTF
... See more
Hello!

An agency that gives me large volume of localization work is no longer accepting rtf format and is asking me to use Trados, so I am.

I actually like Trados better now, and have been working fine with their TM and DTD in html format, and am able to upload the completed translation files in .ttx format.

Yet, they also asked if I know how to convert a web page from ISO-8859-1 to UTF-8.

So far, I noticed that the pages I've received said UTF-8 in Trados TagEditor.

Can I convert format with trados TagEditor?
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 23:24
English to Polish
+ ...
UltraEdit Sep 27, 2007

Martha wrote:

Hello!

An agency that gives me large volume of localization work is no longer accepting rtf format and is asking me to use Trados, so I am.

I actually like Trados better now, and have been working fine with their TM and DTD in html format, and am able to upload the completed translation files in .ttx format.

Yet, they also asked if I know how to convert a web page from ISO-8859-1 to UTF-8.

So far, I noticed that the pages I've received said UTF-8 in Trados TagEditor.

Can I convert format with trados TagEditor?


Open the web page in UltraEdit, and save as UTF-8.

You will also have to change the:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

This tag can be found near the beginning of the file.

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.

HTH

Piotr


 
Daniel García
Daniel García
English to Spanish
+ ...
Microsoft Word should also do the trick Sep 27, 2007

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.


Although using UltraEdit is an excellent tip, I would do it with MS Word for the very simple reason that I feel more confortable with Word's macro language than with UltraEdit's macros.

Because the job is likely to involve a big number of files, I would try to use some automated solution.

Daniel


 
Medworks
Medworks
United States
Local time: 14:24
Italian to English
+ ...
TOPIC STARTER
Brilliant!! Sep 27, 2007

Thank you Daniel!!

Your response was brilliant in it's simplicity!
I opened up the web files with Word and followed these steps:

-----------------------------------
Tools> Options> General> Web Options> Encoding
------------------------------------

It asked for target code to reload and save!
That's it! Very simple..

To t
... See more
Thank you Daniel!!

Your response was brilliant in it's simplicity!
I opened up the web files with Word and followed these steps:

-----------------------------------
Tools> Options> General> Web Options> Encoding
------------------------------------

It asked for target code to reload and save!
That's it! Very simple..

To test I previewed the different settings and the writing/characters appeared correctly (Western European or unicode UTF-8 encoding) but went funny on me with foreign formats like (big-endian), so it works!

Thank you Piotr, I will also opened the file with Notepad to verify the tag

Loved it... Thanks again
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 00:24
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Do not do that! Sep 27, 2007

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


 
Medworks
Medworks
United States
Local time: 14:24
Italian to English
+ ...
TOPIC STARTER
Even better!! :) Sep 27, 2007

esperantisto wrote:

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


Hehe... you're right! When I opened the file with notepad earlier, I immediately saw the extra code. It even included my name, the time and date of last modification, etc. No, that's not good at all! Too bad, it seemed so nice and simple!

I looked in to www.jedit.org and saw the program. It's even a free opensource program! Thanks!

I'll download it and give it a try (and compare the code)

All my Italian files have had the right UTF-8 tag, but it's good to know what to do, just in case. Thanks again!...


 
Medworks
Medworks
United States
Local time: 14:24
Italian to English
+ ...
TOPIC STARTER
It couldn't open Trados file.. Sep 27, 2007

I downloaded and ran jEdit and it looks like a good program

went to utilities>global options> encoding

Though it opened my sample html files, it could not open the files I receive from the agency which are formated for Trados TagEditor...

In Trados TagEditor, I noticed that at the very top of the file when I'm translating it says:

----------------------------------------------------
meta...content= text/html;charset=utf-8
----------
... See more
I downloaded and ran jEdit and it looks like a good program

went to utilities>global options> encoding

Though it opened my sample html files, it could not open the files I receive from the agency which are formated for Trados TagEditor...

In Trados TagEditor, I noticed that at the very top of the file when I'm translating it says:

----------------------------------------------------
meta...content= text/html;charset=utf-8
-----------------------------------------------------

If there were other type of character codes, couldn't I just write utf-8 when needed and everything would be alright????

Of course, preview to double-check that the writing appears correctly.
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 23:24
Member (2004)
English to Polish
SITE LOCALIZER
Unifier? Sep 27, 2007

This program, Unifier, seems to do exactly what you want:

http://www.melody-soft.com/html/unifier.html

It converts batches, adds the appropriate tags, etc. Note that I haven't tried it, so cannot tell how good it is (better backup your files first...).

Also, there might be some freeware tools, but haven't found them yet...

Edit:

... See more
This program, Unifier, seems to do exactly what you want:

http://www.melody-soft.com/html/unifier.html

It converts batches, adds the appropriate tags, etc. Note that I haven't tried it, so cannot tell how good it is (better backup your files first...).

Also, there might be some freeware tools, but haven't found them yet...

Edit:

I have just noticed that you are trying to convert the ttx files themselves instead of html files. I am not sure it is a good idea, as the encoding in ttx itself (i.e. xml) might be different than the input/output html files.

[Edited at 2007-09-27 11:38]
Collapse


 
Daniel García
Daniel García
English to Spanish
+ ...
Sorry! I meant "open as encoded text" with MS-Word! :-( Sep 27, 2007

Martha wrote:

esperantisto wrote:

I would do it with MS Word


For very simple reason that Microsoft Word notoriously inserts lots of shit into HTML code, and your client may not be happy with it. Beside UltarEdit, there are bunches of text/HTML editors capable of performing the task (jEdit, UniRed are my favorite) without harming any tag.


Hehe... you're right! When I opened the file with notepad earlier, I immediately saw the extra code. It even included my name, the time and date of last modification, etc. No, that's not good at all! Too bad, it seemed so nice and simple!

I looked in to www.jedit.org and saw the program. It's even a free opensource program! Thanks!

I'll download it and give it a try (and compare the code)

All my Italian files have had the right UTF-8 tag, but it's good to know what to do, just in case. Thanks again!...


Sorry! I meant open the HTML files from MS Word as "encoded text" to do the conversion, not as HTML. I should have explained more clearly...

Of course, sperantiso is right and you have seen it yourself. Opening and HTML as HTML will insert a lot of code...

Apologies for the confusion again...

Daniel


 
esperantisto
esperantisto  Identity Verified
Local time: 00:24
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Glad that you like jEdit Sep 27, 2007

Forgot to say, download and use the latest 4.3 pre10 version. Although it's said to be unstable, that seems to be just developers' caution: I find it rock stable and fairly improved compared to 4.2. Note also that there are lots of plugins that can make your life a bit more comfortable. Just explore the respective section of their site.

 
Robert Tucker (X)
Robert Tucker (X)
United Kingdom
Local time: 22:24
German to English
+ ...
iconv Sep 27, 2007

On Linux I would probably change the coding with the Bluefish html editor but I could also do it with iconv, which normally comes with Linux. It is, however, also available online:

http://www.iconv.com/iconv.htm

and can be installed
... See more
On Linux I would probably change the coding with the Bluefish html editor but I could also do it with iconv, which normally comes with Linux. It is, however, also available online:

http://www.iconv.com/iconv.htm

and can be installed on Windows:

http://gnuwin32.sourceforge.net/packages/libiconv.htm
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 23:24
English to Polish
+ ...
No macros needed for that in UltraEdit Sep 27, 2007

dgmaga wrote:

For sure there are automatic converters for html files but I can't think of any off the top of my head right now.


Although using UltraEdit is an excellent tip, I would do it with MS Word for the very simple reason that I feel more confortable with Word's macro language than with UltraEdit's macros.

Because the job is likely to involve a big number of files, I would try to use some automated solution.

Daniel


If Word does that without making any additional problems, then it's alright with me, but Word is known for causing problems with HTML files and introducing its own peculiar markup.

In UltraEdit it's a simple open and Save As (F12) operation with changing a few characters in the meta tag. No macros/scripts required.

Of course if you want to convert a lot of files, you need a batch converter.

Regards, Piotr

[Edited at 2007-09-27 19:57]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can we convert a web page from ISO-8859-1 to UTF-8?







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »