ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: Freigeist
Need help - XML (TMX) Editor
Freigeist
Ireland
Oct 10, 2011

Hello everyone,

I have the following problem. I have a bunch of TMX files here with 3 languages and I want to remove one language completely and swap the sorting of the two remaining ones. As there are thousands of entries that need to be edited, I need to find an automated solution (for Windows). Each translation segment looks like this:

...
(tu tuid="STRINGID">)
(note)STRING NOTE(/note)
(tuv xml:lang="de")
(seg)DE TEXT(/seg)
(/tuv)
(tuv xml:lang="en")
(seg)EN TEXT(/seg)
(/tuv)
(tuv xml:lang="fr")
(seg)FR TEXT(/seg)
(/tuv)
(/tu)
...

(The brackets are of course angle brackets, but it's not displayed here so I changed it)

What I want to do now is get rid of the "de" segment and all its content, and swapping the sorting of the remaining two entries so fr is on top and en follows. I tried to use regular expressions with notepad++ to achieve this but I just can't figure out how to do it. Does anyone has an idea to do this? I tried using Olifant but there you can't change the sorting and it removes other information of the XML file (like ).

Thanks in advance!

[Bearbeitet am 2011-10-10 10:42 GMT]

[Bearbeitet am 2011-10-10 10:46 GMT]


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 22:48
Member (2005)
English to Polish
+ ...
Try Olifant Oct 10, 2011

It is free, you will find the download link here:
http://okapi.sourceforge.net/downloads.html

Once you load the file into Olifant, you will be able to to delete the unwanted language in the file's properties.

Hope this helps.

Regards,

Piotr


Direct link Reply with quote
 
Adam Łobatiuk  Identity Verified
Poland
Local time: 22:48
Member (2009)
English to Polish
+ ...
Use a CAT tool Oct 10, 2011

Most CAT tools support TMX, so you should be able to create a new FR-EN TM, import the TMX as is, and then export it again.

Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 22:48
English to Hungarian
+ ...
no point Oct 11, 2011

You could do this in Notepad+ but there is no need to do it if you want to use the tmx in a CAT tool. CATs are smart enough to ignore the superfluous third language and they are smart enough to import the languages correctly.

Just create a fr-en TM and import.

If you really need to do this for some odd reason, I can try and help get a regex together fot you.

[Edited at 2011-10-11 08:15 GMT]


Direct link Reply with quote
 

István Hirsch  Identity Verified
Hungary
Local time: 22:48
English to Hungarian
In Word Oct 11, 2011

I am sure the problem can be solved without creating a regex, easily and safely. However, if you are still interested, here is a regex which seems to work. It is for Word, because I am a bit more familiar with regex in Word, than in Notepad++.

Go to Find/Replace and check Wildcard checkbox.

To delete "de” part:

Find:
\<tuv xml:lang="de"\>^13\<seg\>*\</seg\>^13\</tuv\>^13
Replace with:
nothing

To change the order of „en” and „fr”:

Find:
(\<tuv xml:lang="en"\>^13\<seg\>*\</seg\>^13\</tuv\>^13)(\<tuv xml:lang="fr"\>^13\<seg\>*\</seg\>^13\</tuv\>^13)
Replace with:
\2\1
Note the space between each tuv and xml: tuv_xml.
If there are soft breaks between the lines, use ^10 instead of ^13.

After the Find/Replace operations I got this - hope this is what you need:

<tu tuid="STRINGID">
<note>STRING NOTE</note>
<tuv xml:lang="fr">
<seg>FR TEXT</seg>
</tuv>
<tuv xml:lang="en">
<seg>EN TEXT</seg>
</tuv>
</tu>


Direct link Reply with quote
 
aaron9l8i
United Kingdom
XML Editor Oct 11, 2011

If It helps I have used Liquid XML Editor (http://www.liquid-technologies.com/xml-editor.aspx) for working with my files without any problems, you should be able to make the necessary alterations in the preferences tab.

good luck.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie[Call to this topic]
Alfonso Romero[Call to this topic]

You can also contact site staff by submitting a support request »

Need help - XML (TMX) Editor







SDL Trados Studio 2011 Starter Edition
Discover Studio 2011 for only 99€ per year!

SDL Trados Studio 2011 Starter Edition is the new low cost entry-level version of the leading translation memory software. This version is ideal for part-time translators and is a subscription based product. Follow the link to buy or learn more.

More info »
SDL MultiTerm Extract 2011
Save time by automatically extracting terms. 15% off!

SDL MultiTerm Extract 2011 allows you to automatically create candidate term lists from your existing documentation. This removes the manual effort involved with traditional terminology creation, allowing you to rapidly add terms to SDL MultiTerm.

More info »