Final html files show corrupted characters Studio 2011
Thread poster: Jan Hinrichs

Jan Hinrichs  Identity Verified
Spain
Local time: 09:40
Oct 12, 2012

Hi there,

I am having serious trouble with a Japanese project I am managing through Studio 2011. The client sent us EN html files which we processed in Studio Packages for FR and JA. The translators worked with 2011 packages and than we used the "Export for external review" feature to get non-Trados reviewers doing the proofreading on the bilingual docx files. The import process worked out fine in FR and we were able to get clean html files in FR without any character problems when opening in different browsers.

However in Japanese the picture was quite different. First, the exported docx files couldn't be reimported. The error message was just: "First segment couldn't be imported" and I had to use a trick mentioned in another thread to get the reviewed strings back into the system. So I copied the target text column without the header in a newly exported file and accepted all changes. Then I could import the reviewed files without problems and everything looked just fine in the editor view. Afterwards I signed off all translations and clicked the finalize batch operation. When I tried to open the generated target files from the target folder in a browser then however all files showed up with wired characters all over. No single Japanese character there. From another thread I learned to set the "Active document settings" from Auto to Japanese but even this didn't work out.

This is how the output looks like:
----
ã∆äEç≈çÇÇÃêªïi

Lookout ÇÃóDÇÍÇΩÉÇÉoÉCÉãÉZÉLÉÖÉäÉeÉBÇÕÅA†PC WorldÅA†CNETÅA†Business InsiderÅATechCrunch Ç®ÇÊÇ— LAPTOP Magazine ǻǫÇÃéÂóvÇ»éGéèÇ‚ÉIÉìÉâÉCÉìèoî≈ï®Ç≈çÇÇ≠ï]âøÇ≥ÇÍǃǢNjÇ∑ÅB

----

Would be great to get help if anyone knows a solution for this problem.

Thanks,
Jan

PS: Studio Serivce pack 1 is already installed and the reviewers worked with 2010 docx versions so this shouldn't be the reason I think.


Direct link Reply with quote
 
Lorenzo Cordini
Local time: 09:40
English
Have you checked the encoding of original HTML files? Oct 12, 2012

Hi Jan,

it may be that the original HTML files have an encoding that does not support Japanese characters.

I would try to open of the original HTML files in Notepad and check the encoding (via "Save As").
If it's ANSI then you should change it to either UTF-8 or Unicode.

Reprocess this sample file and check the final result with Japanese text.

Lorenzo


Direct link Reply with quote
 

Jan Hinrichs  Identity Verified
Spain
Local time: 09:40
TOPIC STARTER
txt vs html Oct 15, 2012

Hi Lorenzo,

Many thanks for your quick response. I have opened the htm file as you said in Notepad and then saved it as unicode / utf-8 (it was default set to ANSI) but I only could save it as txt file (no other options). The import of the txt file in a new trados project didn't work out unfortunately. The segment import was quite bad (a lot of coding at the beginning was imported as if it was text).

The strange thing was that our translator could see the page fine after changing encoding to Japanese Shift_JIS. However I couldn't replicate this on my computer. At the end I sent the files to the client and he could actually introduce them into their Wordpress corporate blog.

Now I am a bit confused and would love too understand what is happening here and how I can avoid the trouble next time.

Any idea?
Best,
Jan


Direct link Reply with quote
 
Lorenzo Cordini
Local time: 09:40
English
Choose "All files" under "Save as type" to save files as html Oct 16, 2012

Hi Jan,

In Notepad, when saving the html file as Unicode, choose also the "All files" option in the "Save as type" drop-down menu. This will preserve them as html files. This is quite important otherwise Trados will mishandle the file as it happened in your attempt.

That I am pretty sure will solve the issue.

HTML files need to be saved as Unicode in order to support scripts other than Western Latin character sets.

Regards,

Lorenzo


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Final html files show corrupted characters Studio 2011

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search