Translating HTML within an XLF file
Thread poster: NicBathgate
Mar 6, 2013

Hi Proz community,

I am looking for a CAT tool that can effectively segment HTML tags within an XLF file so the content can be translated easily without having to deal with the HTML tags. I will illustrate with screenshots below. HTML content is bulk exported from a very large website in this XLF format so it is not possible to simply save the pages as HTML and translate them that way (or else the content can not be imported back into the website). We need to translate our content using the XLF files.

So far I have tried Deja Vu, Memsource Cloud, MemoQ, Wordfast Anywhere and SDL Trados Studio 2011 and all seem to have the same behavior as follows.

If I copy the HTML content out of the XLF file into an HTML file with no XLF trans-unit/source/target/group tags and import this into the CAT tool (Trados used for example), everything works as I expect, there are no HTML tags in sight, paragraphs, table cells, list items, headings are all split into their own segments and bold/italic/underlines are handled by Trados WYSIWYG feature. Perfect:
http://postimage.org/image/bvkb0ti5l/full/

The trouble begins when I import the original XLF file that has the HTML content organized in trans-unit tags. Instead of creating segments from the HTML tags, Trados creates segments from the tags and all of the HTML is left there for the translator to painstakingly wade through:
http://postimage.org/image/ah34svm0l/full/

The closest I have come to a solution to this problem is using the "Format > Run Regex Tagger > Filter configuration: Tags and entities" option in MemoQ, but this still doesn't split each HTML tag into a new segment, it just replaces the HTML tags with tag icons.

I have uploaded a copy of the XLF file used in this example in case somebody would like to test it in their own CAT tool, you can download it HERE.

Can anybody suggest a CAT tool or a way I can use one of the CAT tools mentioned above to achieve my goal?

Thank you,
Nic Bathgate


 

Piotr Bienkowski  Identity Verified
Poland
Local time: 00:52
Member (2005)
English to Polish
+ ...
MemoQ has cascading filters Mar 6, 2013

During import of the file you can apply a cascading filter to hide the markup that you don't want to see.

There is also an option to do that after import, but I forgot its name at the moment.

HTH


Piotr

P.S. I see that you tried memoQ too, but please explain why you want HTML tags as separate segments?


[Edited at 2013-03-06 09:19 GMT]

P.S.2 I see what you mean now. I'll ask in a different forum. Meanwhile, you can always split the large segments manually.

[Edited at 2013-03-06 09:22 GMT]


 

SDL Community  Identity Verified
United Kingdom
Local time: 00:52
English
Just in case you are interested... Mar 6, 2013

... I did speak to a developer I know who is working on an openexchange application to "beautify" an xliff file like this. We tested the file you provided and it opened like this:


So probably what you're looking for. It's not quite ready for release yet as he's working on a few additional features for it but if you would like to help he'd be very happy to see more sample files?

Just a thought. Drop me an email if you're interested - pfilkin@sdl.com

Regards

Paul


 

István Lengyel
Hungary
Local time: 00:52
English to Hungarian
+ ...
how you can do this in memoQ now Mar 8, 2013

Hi Nic,

I checked the file in memoQ, and it is possible to define a cascading filter for your file format. A cascading filter is basically a parser for an embedded file format, i.e. HTML in Excel, or regex tagging in PHP, etc. I see that your file is XLIFF, however, nothing is translated, only copied to the target. Is this a priority for you? If not, and you can just take the HTML file, you can create a similar XLIFF very easily. Just click on Import with options, and then select the text filter, click Change filter and configuration, and add a cascading filter for HTML. Then it'll appear fine, and you can export into XLIFF.

XLIFF is normally a prepared format, and I see this is coming from Cloudwords. I have no experience with their tool, but I believe that the tagging should be done by the original filter.

It is possible to tag up the XLIFF but for that you need to write some regular expression rules. The way to emulate HTML with regex is described in the memoQ help.

If I can help you more, please don't hesitate to contact us (my address is istvan and then dot and then lengyel at kilgray dot com, I just don't want to receive a hefty amount of spam as penalty for sharing my email addressicon_smile.gif)

István


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translating HTML within an XLF file

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search