Changing the DOCTYPE declaration in XML possible? (Trados Studio 2009)
Thread poster: kirameister
Sep 30, 2010

Hi,

I was wondering if it was possible to modify the DOCTYPE declaration of XML files in Trados Studio 2009 (SP2). This is necessary in order to change the actual representation of the entities defined in the DTD.

I.e., I'd like to change

<!DOCTYPE main SYSTEM "../schema.dtd"[]>

to

<!DOCTYPE main SYSTEM "../schema.dtd" [<!ENTITY % LANGUAGE_ENGLISH 'IGNORE' ><!ENTITY % LANGUAGE_JAPANESE 'INCLUDE' >]>

Many thanks in advance!
Akira K.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 12:13
English
Change it in the source/target Oct 2, 2010

Hi Akira,

Maybe it would just be easier to replace it in the document itself? You should only see this once at the start of the file so it shouldn't be hard to do.

Regards

Paul


Direct link Reply with quote
 

kirameister
TOPIC STARTER
Maintenance also needed for each and every XML files Oct 4, 2010

Hello Paul
Thank you for your reply.

Well, my situation is the following :

- We have more than 800 XML files to translate (each of them have the same DOCTYPE prologue, which needs to be changed).
- Those XML files are constantly updated, and my task is to keep those files up-to-date.

In order to keep the files updated, I'm taking the following approach :

(1) Use the Trados to translate those XML files. While doing so, store the translation in the TM.
(2) In order to update the XML files from next time, simply open a XML file (and not via Project, which requires you to create sdlxliff file) and go over the each segment. Because the segmentation rule applied to the file is the same, I would only have to update the those segment updated in the original language.
(3) However, because the DOCTYPE prologue doesn't seem to be shown in the translation table (hence no TM can be applied), I'd have to apply the manual substitution each time I'd update my translation, which can be sometimes forgotten (if you'd have to update those XML files on daily bases).

I hope you've got the picture.

So far, I have failed to find a way to translate only those segments which are updated, given a XML file and corresponding sdlxliff file (so I took the method above). But even if this is feasible, it would still not solve the DOCTYPE issue.

Best regards,
Akira K.


Direct link Reply with quote
 
FarkasAndras
Local time: 12:13
English to Hungarian
+ ...
Script Oct 4, 2010

This probably won't be possible/easy to do in any CAT.
However, if I understand the issue correctly, this should be pretty easy to do with any scripting language. Python, Perl, BASH, windows bat script... any of these should do.
If my life depended on it, I could probably get a Perl script together in an hour.
Obviously, this is easiest to do if the files are all in the same folder and have the same encoding. With Perl on Windows, it helps a lot if they all have ASCII names (no Japanese characters in the file names).
If they meet all these criteria you can easily write a simple little script that loops through the files one by one and does the replacement. Given that what you have to change is a tag (i.e. uniquely identifiable), it's trivially easy to write a search and replace expression that does a clean job. Add some reporting (X files found, X replacements successfully completed) and a log and you're done with this once and for all.


Direct link Reply with quote
 
FarkasAndras
Local time: 12:13
English to Hungarian
+ ...
Even simpler Oct 4, 2010

Scratch that, I found a one-liner with 2 minutes of googling:

Code:
perl -p -i.bak -e "BEGIN{@ARGV=<*.xml>} s/replace this/with this/g"



This loops through every .xml file in the current directory and changes "replace this" to "with this", while creating a .bak backup of all the source files. Code doesn't get much more concise than this - although the price of the simplicity is that you get no report on how many replacements it did.

Your search and replace expressions contain a bunch of special characters, so you should escape them:
Code:
perl -p -i.bak -e "BEGIN{@ARGV=<*.xml>} s/\Q<!DOCTYPE main SYSTEM "../schema.dtd"[]>\E/\Q<!DOCTYPE main SYSTEM "../schema.dtd" [<!ENTITY % LANGUAGE_ENGLISH 'IGNORE' ><!ENTITY % LANGUAGE_JAPANESE 'INCLUDE' >]>\E/g"



Obviously, only start this in the directory where your xml files are located! It does create backups, but it's best to do a couple of tests on copies of the files.
It needs perl to be installed on the computer, so if you don't have a perl interpreter, install activeperl from activestate to use this.

[Edited at 2010-10-04 16:42 GMT]


Direct link Reply with quote
 

kirameister
TOPIC STARTER
Seems like employing external editor (or tool) is the only way.. Oct 5, 2010

Hello Farkas
Thank you for your comment with one liners

Using a script or an external editor with similar feature was my first idea. This would mean, however, that extra step outside of Trados will be necessary. The motivation of using Trados was to eliminate the human error (i.e., forgetting to run the script), which would be solved automatically, if Trados could handle the DOCTYPE as with other parts of the XML file content.

Thank you anyhow. As the title of initial post in this thread says, I was wondering if it was possible to accomplish such a task within Trados. The answer seems to be "No" so far, which is fine as well.

Best regards,
Akira K.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Changing the DOCTYPE declaration in XML possible? (Trados Studio 2009)

Advanced search







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search