Mobile menu

SDLX, XML and CDATA
Thread poster: Matulal
Matulal
Spanish to English
May 28, 2008

Greetings All,

I work at translation firm which uses SDLX 2007 and we've been having some problems with XML files containing CDATA. We need to translate only the CDATA text but we haven't figured out a way to select only that element. The CDATA heading (I don't know much about XML so forgive me if I'm using the wrong term) doesn't show up in the XML import profile, only the elements above it. We've analyzed all the XML files to create a profile but they still don't show up.

Here's an example:

Code:





Before you begin the Global Nutrition Training program, you must complete the pre-assessment. There is no minimum pass score, and results will only be used to establish a baseline measurement of your organization’s current nutrition knowledge.<br/>This assessment is comprised of challenging questions from each of the 7 courses developed. Integrating the results of this test with those of future post-training assessments will allow us to measure the degree of learning that has occurred through the program<br/>Please read the following questions and click on the best answer for each. Good luck!<br/>Click on the Next button to begin.]]>



The "core" "outline" and "primary text" tags will show up in our settings, but not CDATA. If we select those, we still get the CDATA text but we also get a lot of other junk. Plus, the CDATA sections come under many different headings so we pretty much have to select all of them and then we get 40% junk so our word count is off the charts.

I know that elements can be added to the XML profile but we haven't had any luck trying to write them. And the SDLX manuals are sorely lacking.

If anyone knows how to properly set up a profile for these files, we would greatly appreciate some help.

ML

[Edited at 2008-05-28 12:32]


Direct link Reply with quote
 

Ralf Lemster  Identity Verified
Germany
Local time: 02:44
English to German
+ ...
BB code May 28, 2008

The forum software does not properly recognise the XML coding you posted; you may want to try using "code" tags. See the explanations given when posting.

Best regards,
Ralf


Direct link Reply with quote
 

Stefan Gentz
Local time: 02:44
English to German
+ ...
Check for SDL TRADOS 2007 or have the XML prepped properly May 30, 2008

Matulal wrote:
I work at translation firm which uses SDLX 2007 and we've been having some problems with XML files containing CDATA.
Here's an example:

Code:
<gene engine="presentation" id="xa010">
<core>
<outline>
<title><![CDATA[Introduction to Assessment]]></title>
</outline>
<primary_text><![CDATA[
<p>Before you begin the Global Nutrition Training program, you must complete the pre-assessment. There is no minimum pass score, and results will only be used to establish a baseline measurement of your organization’s current nutrition knowledge.</p>
<br/>
<p>This assessment is comprised of challenging questions from each of the 7 courses developed. Integrating the results of this test with those of future post-training assessments will allow us to measure the degree of learning that has occurred through the program</p>
<br/>
<p>Please read the following questions and click on the best answer for each. Good luck!</p>
<br/>
<p>Click on the Next button to begin.</p>
]]>
</primary_text>



The "core" "outline" and "primary text" tags will show up in our settings, but not CDATA. If we select those, we still get the CDATA text but we also get a lot of other junk. Plus, the CDATA sections come under many different headings so we pretty much have to select all of them and then we get 40% junk so our word count is off the charts.


In an XML document, a CDATA section is a section of element content that is marked for the parser to interpret as only character data, not markup. I.e. everything in the CDATA section should be ignored by the parser and should be taken "as is". In consequence to this the (e.g. html) tags (like in your example) in the CDATA section are taken as they are as plain text and not as tags/elements.
Check out for the solution provided in SDL TRADOS SP2 or make sure that the client's xml is properly handled and prepared by an xml and trados specialist.
Please also note that, generally speaking, it's not really the best way to mix "languages" (in this case the client's XML database structure and HTML) with CDATAs. This should be only a stopgap solution. It is much more cleaner to separate languages with Namespaces.


Kind regards,
Stefan Gentz
TRACOM OHG


Direct link Reply with quote
 
Matulal
Spanish to English
TOPIC STARTER
CDATA itself is not the problem May 30, 2008

First of all Stefan, thanks for getting my code to show up when I couldn't. And yes I've read the posts about CDATA and Trados. The problem at this point is not the characters within the CDATA tags, which we don't want but are few. Rather it is all the junk outside it, in the hierarchies which contain the CDATA. I think you're right that this kind of HTML code within XML is not the best way to go and maybe that's why the current SDLX XML filter does not pick them up. That's why I was wondering if anyone knows how to add the correct elements to the filter so that it will pick them up. Now that my code can be seen, maybe someone will.

Thanks again.


Direct link Reply with quote
 

Stefan Gentz
Local time: 02:44
English to German
+ ...
Handling XML and CDATA May 30, 2008

It's difficult to understand your problem if you do not describe it with the correct terminology.

Do I understand correctly, that you do not get any of the content in the CDATA section shown for translation? That is, the text is simply missing?

<![CDATA[this text is missing in my translation file...]]>

Or do you get the content of the CDATA section provided for translation per se, but SDLX fails to "interpret" the tags like … as real "tags" and exposes them as "normal", editable text?
Please note that it is expected behavior for *any* xml parser (i.e. not just the SDLX/TRADOS XML-parser but *any* XML parser out there), that "tags" ("elements") in CDATA sections are not parsed ("interpreted"). This is by design and is part of the XML specification.

I suggest to translate the XML in TagEditor. It might be a good idea to give the XML to an engineer who can setup an ini file and prepare the xml file so that the CDATA sections are parsed as expected for translation. Please feel invited to send me the files for analysis.

Kind regards,
Stefan Gentz
TRACOM OHG


[Bearbeitet am 2008-05-30 13:42]


Direct link Reply with quote
 
Matulal
Spanish to English
TOPIC STARTER
To clarify... May 30, 2008

I knew my not knowing proper XML terminology would get me into trouble.

No, we are getting all the text we want to translate, including the tags within the CDATA text ("p", "br", etc.). I know that those will be picked up as text and that's OK for the time being. What I'm talking about is a lot of junk outside of the CDATA text, which we don't want.

When I said CDATA is not showing up, I meant that it's not showing up in the SDLX XML filter. In case you're not familiar with the filter, you analyze the xml files you want to import and then it gives you a list of the "elements", "attributes" or "expressions" that it finds. Then you select the ones you want to translate and it should ignore the rest. Since CDATA is not showing up in the filter, we have to select the element that contains it (usually whatever is above it in the hierarchy). For example, if you look at the code in Stefan's quote, the first line to translate is "Introduction to Assessment." So there we have to select "title" and we will get the CDATA text. But then the next line to translate falls under "primary text". And there are many others containing the CDATA we need, but many of those same tags will also contain stuff we don't want to translate, so we get that too. Is this making sense now?

It all boils down to needing to be able to select ONLY the CDATA text and not being able to. It simply doesn't appear in the filter. We may end up going with Trados and TagEditor but since we work almost exclusively with SDLX, we'd still like to try that route if we can.

Thanks again for the help.

[Edited at 2008-05-30 20:44]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

SDLX, XML and CDATA

Advanced search


Translation news related to SDL Trados





SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs