Final html files show corrupted characters Studio 2011
Thread poster: Jan Hinrichs

Jan Hinrichs  Identity Verified
Spain
Local time: 20:47
Member (2012)
Oct 12, 2012

Hi there,

I am having serious trouble with a Japanese project I am managing through Studio 2011. The client sent us EN html files which we processed in Studio Packages for FR and JA. The translators worked with 2011 packages and than we used the "Export for external review" feature to get non-Trados reviewers doing the proofreading on the bilingual docx files. The import process worked out fine in FR and we were able to get clean html files in FR without any character problems when opening in different browsers.

However in Japanese the picture was quite different. First, the exported docx files couldn't be reimported. The error message was just: "First segment couldn't be imported" and I had to use a trick mentioned in another thread to get the reviewed strings back into the system. So I copied the target text column without the header in a newly exported file and accepted all changes. Then I could import the reviewed files without problems and everything looked just fine in the editor view. Afterwards I signed off all translations and clicked the finalize batch operation. When I tried to open the generated target files from the target folder in a browser then however all files showed up with wired characters all over. No single Japanese character there. From another thread I learned to set the "Active document settings" from Auto to Japanese but even this didn't work out.

This is how the output looks like:
----
ã∆äEç≈çÇÇÃêªïi

Lookout ÇÃóDÇÍÇΩÉÇÉoÉCÉãÉZÉLÉÖÉäÉeÉBÇÕÅA†PC WorldÅA†CNETÅA†Business InsiderÅATechCrunch Ç®ÇÊÇ— LAPTOP Magazine ǻǫÇÃéÂóvÇ»éGéèÇ‚ÉIÉìÉâÉCÉìèoî≈ï®Ç≈çÇÇ≠ï]âøÇ≥ÇÍǃǢNjÇ∑ÅB

----

Would be great to get help if anyone knows a solution for this problem.

Thanks,
Jan

PS: Studio Serivce pack 1 is already installed and the reviewers worked with 2010 docx versions so this shouldn't be the reason I think.


 

Lorenzo Cordini
Local time: 20:47
English
Have you checked the encoding of original HTML files? Oct 12, 2012

Hi Jan,

it may be that the original HTML files have an encoding that does not support Japanese characters.

I would try to open of the original HTML files in Notepad and check the encoding (via "Save As").
If it's ANSI then you should change it to either UTF-8 or Unicode.

Reprocess this sample file and check the final result with Japanese text.

Lorenzo


 

Jan Hinrichs  Identity Verified
Spain
Local time: 20:47
Member (2012)
TOPIC STARTER
txt vs html Oct 15, 2012

Hi Lorenzo,

Many thanks for your quick response. I have opened the htm file as you said in Notepad and then saved it as unicode / utf-8 (it was default set to ANSI) but I only could save it as txt file (no other options). The import of the txt file in a new trados project didn't work out unfortunately. The segment import was quite bad (a lot of coding at the beginning was imported as if it was text).

The strange thing was that our translator could see the page fine after changing encoding to Japanese Shift_JIS. However I couldn't replicate this on my computer. At the end I sent the files to the client and he could actually introduce them into their Wordpress corporate blog.

Now I am a bit confused and would love too understand what is happening here and how I can avoid the trouble next time.

Any idea?
Best,
Jan


 

Lorenzo Cordini
Local time: 20:47
English
Choose "All files" under "Save as type" to save files as html Oct 16, 2012

Hi Jan,

In Notepad, when saving the html file as Unicode, choose also the "All files" option in the "Save as type" drop-down menu. This will preserve them as html files. This is quite important otherwise Trados will mishandle the file as it happened in your attempt.

That I am pretty sure will solve the issue.

HTML files need to be saved as Unicode in order to support scripts other than Western Latin character sets.

Regards,

Lorenzo


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Final html files show corrupted characters Studio 2011

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search