Converting Japanese to Unicode/HTML?
Thread poster: Paul Cohen

Paul Cohen  Identity Verified
Greenland
Local time: 10:27
German to English
+ ...
Jun 22, 2010

I have a question concerning placing Japanese texts on websites.

Recently a longstanding client asked my wife to add Japanese texts to an existing website (which my wife programmed herself using an HTML editor). The client said that he would have the translations done by a qualified translator and then forwarded to her.

Well, now she has received a Japanese text in a Word file but she can't figure out how to convert it into HTML code and put it on the website. She has found a website with Unicodes for Hiragana and Katakana characters, but that seems to cover just a small proportion of the characters.

Does anyone out there have any experience in this area?

Does an application exist that can convert Japanese characters into Unicode/HTML?

Any comments or ideas would be greatly appreciated.

Thanks in advance for your help!

Paul


Direct link Reply with quote
 

Katalin Horváth McClure  Identity Verified
United States
Local time: 08:27
Member (2002)
English to Hungarian
+ ...
What is the problem exactly? Jun 22, 2010

If she received a Word file, the Japanese text itself is most likely in Unicode.
I mean - does it display on her computer correctly?
As to how to put it into HTML - well, you just have to use Unicode for the encoding, and specify the language as Japanese. html lang="ja"
Use meta-tags for Unicode: content="text/html; charset=utf-8"

Or, if you want/need to use Shift-JIS encoding, then charset=Shift_JIS

So, there is no need to replace every character with a code number, if that's what you are thinking. That's not the way to go.

Maybe I am not clear about the question, but perhaps it would help to take a look at the source code of any Japanese webpage.
http://www.nikon.co.jp/
http://www.toyota.co.jp/
http://www.nissan.co.jp/

Katalin


Direct link Reply with quote
 

Madeleine MacRae Klintebo  Identity Verified
United Kingdom
Local time: 13:27
Swedish to English
+ ...
From an amateur with an interest in character issues and web design Jun 22, 2010

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:



If not, the best thing would probably be to do so. Next alternative is to use a converter, there are many free ones on the net. This should convert actual characters into html numbers or codes. Just remember, if you're using a free online tool you have to be careful with confidential information as well as:

"It's generally better, however, to use the characters themselves rather than their Unicode NCRs in cases where a Web page has a lot of Chinese text, because Chinese characters take up less file space than their NCRs."

Reference: http://pinyin.info/tools/converter/chars2uninumbers.html

If none of this is helpful, maybe your site developer might want to read this:

http://www.joelonsoftware.com/articles/Unicode.html

BTW - it's usually safer to throw text into Notepad or similar before adding to a CMS to remove Word's (unnecessary) formatting.

Edited to add that the missing "bit" above (forgot to add spaces around tags) is the same as mentioned by Katalin.



[Edited at 2010-06-22 21:54 GMT]


Direct link Reply with quote
 

RieM  Identity Verified
United States
Local time: 08:27
English to Japanese
+ ...
good ol' native2ascii Jun 22, 2010

I still use it. It's part of Java SDK.

Of course, there are text editors that support such conversion. But then, the file should be text format first.

I will be happy to take a look and covert it as you like. Just send the file from my profile page.

Rie


Direct link Reply with quote
 

Paul Cohen  Identity Verified
Greenland
Local time: 10:27
German to English
+ ...
TOPIC STARTER
Exellent advice Jun 23, 2010

Thanks, Katalin, Madeleine and Rie.

Excellent advice! We'll look into it an let you know how things turn out.

Thanks again,

Paul (& Monika)


Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 16:27
Member (2006)
English to Russian
+ ...
Some remarks Jun 23, 2010

Katalin Horvath McClure wrote:

If she received a Word file, the Japanese text itself is most likely in Unicode.


Theoretically, it can be so called Far Eastern Word 6.0/95 format. But if it is opened in Word 97 to 2010, it is converted to Unicode on-the-fly.

As to how to put it into HTML - well, you just have to use Unicode for the encoding, and specify the language as Japanese. html lang="ja"
Use meta-tags for Unicode: content="text/html; charset=utf-8"


…and make sure you’re saving your HTML file in, respectively, Unicode UTF-8 (with or without BOM, that’s immaterial). I would suggest using a text/HTML editor with explicit encoding control such as jEdit.


Direct link Reply with quote
 

Paul Cohen  Identity Verified
Greenland
Local time: 10:27
German to English
+ ...
TOPIC STARTER
Converting Japanese characters into Unicode Sep 30, 2010

Sorry that it took so long to get back to all of you.

This is the solution that we found:

http://www.cse.iitb.ac.in/~pratik/downloads/ConvertCharactersToUnicode.html

Just copy in the characters and you get Unicode. It works!

It also appears to work for Hindi, Sanskrit, Malayalam and Chinese characters.

Best regards,

Paul


Direct link Reply with quote
 
FarkasAndras
Local time: 14:27
English to Hungarian
+ ...
Tags Sep 30, 2010

Not that it matters much at this point, but if anyone wants to post tags in the forum, remember to use character entities, not actual angle brackets. I.e. write &lt; instead of <, because otherwise the forum motor misinterprets your tags as, well, tags.

This post shows how things go wrong:

Madeleine MacRae Klintebo wrote:

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:





This is how it looks - as intended - if you use lt and gt:
Madeleine MacRae Klintebo wrote:

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />



Direct link Reply with quote
 

Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 20:27
Member (2004)
English to Thai
+ ...
MS Word text format Oct 1, 2010

Since the texts in question are in MS Word, one of the easiest ways to convert is saving the file as text, and select option as Unicode (Unicode-8, Unicode-7) etc. These fonts are shown correctly in HTML file with Unicode font enabled on the header tag line.

Soonthon Lupkitaro


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting Japanese to Unicode/HTML?

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search