Exporting XML to an easily editable format
Thread poster: Mike Taylor

Mike Taylor
Local time: 22:21
English
Sep 21, 2010

Having searched through the various forums regarding XML editors, (some going back quite a few years). Does anyone know if anyone has come up with an application that can export/convert XML into a simple editable format without the use of CAT or TM tools?

As an agency, we use SDL Studio 2009, which can handle XML files. However, we have many excellent freelance translators that do not have TM tools and still prefer to use MS Word (or similar).

We appear to be receiving more and more XML files and we do not want to have to switch to using translators that can only receive and work in TM tools.

Ideally we need a software that can export the XML file into say Word or Excel... the translator translates the file and then sends it back. We then convert the file back into XML

Does such a software exist yet?


 

Jerzy Czopik  Identity Verified
Germany
Local time: 23:21
Member (2003)
Polish to German
+ ...
Even in the year 2010? Sep 21, 2010

Mike Taylor wrote:

As an agency, we use SDL Studio 2009, which can handle XML files. However, we have many excellent freelance translators that do not have TM tools and still prefer to use MS Word (or similar).


Do you really mean that, even if we are already reaching the second decade of the twenty-first century? Back 1990 as I started this time was a very far future I must admit. Using the word "excellent" means realy better than other, outstanding - I would be reluctant to call a company excellent today, which is still using a hot metall typesetting. You are asking for producing a digicam in a manufacture.
Sorry for this inclusion, but seeing all the recent discussions about our profession I really wonder about this strong movement to stick so hard to the past.

Coming back to your question - you could for example look for any (free) html/xml edito. A quick google brought me this: http://felix-cat.com/tagassist/


 

FarkasAndras  Identity Verified
Local time: 23:21
English to Hungarian
+ ...
Agree with Jerzy Sep 21, 2010

Excellent translators shouldn't be too hard to convince about the merits of a CAT tool.
Still, if you're stuck with this situation, tagassist looks pretty good in principle.
If it fails to deliver for some reason, you could possibly try the following primitive solution:
reformat the XML into a table of sorts. Remove all line breaks, make sure the XML contains no tab characters, insert a tab after every > that is not followed by a <, and a line break before the next <. In sed or perl the replacement regex should be someting like "s/\n//g", then "s/>([^<][^<]*)/>\t\1\n/g".
If you then copy the resulting mess into Excel, you should get a two-column table with translatable text in the second column and tags in the first - this sed or perl script would basically be just what you mentioned: primitive software that converts XML to a table you can drop in Excel. Make the first column write protected, tell the translator to translate the second column, copy-paste their translation to Notepad, remove the tab characters and away you go.
Of course this is just a random idea I never tested, and it may be very prone to breaking in all sorts of interesting ways. I would only recommend doing stuff like this if you know what you're doing and know the structure of the XML files won't play dirty tricks on you. But if your translators are complete luddites and you can't find any software that's simple enough for them to use and smart enough to be useful, something along these lines could be an option.

Perhaps it would be even better to just make tags uneditable and make them show in grey in MS Word somehow.

[Edited at 2010-09-21 19:54 GMT]


 

Robert Tucker (X)
United Kingdom
Local time: 22:21
German to English
+ ...
iReport Sep 21, 2010

iReport may be able to do what you want.

 

opolt  Identity Verified
Germany
Local time: 23:21
English to German
+ ...
Which type of XML? Sep 21, 2010

Mike, forgive me but that doesn't make too much sense. XML is not "one format"; there are many many XML variants out there, and there's never going to be the one tool that converts all types of XML to simple text files. (BTW new XML dialects are being created all the time.)

If you're dealing with one specific XML dialect, there might be a program available -- you could even hire a programmer to write a conversion tool for you. It shouldn't be too expensive.

But overall XML was not made for exporting to ASCII or MS Word, or any other simple editable format. Most of the time it can be displayed (and printed) in a simple/editable format, e.g. via XLIFF in a CAT tool, but that's about it. By converting to text files, you would lose all the inherent complexity and thus all the nice functions that the various XML dialects, namely the document systems (DocBook, DITA) and XHTML, support. Ditto for just removing the tags (you'd have to know how to distinguish the mere contents from the rest, so you need to study the corresponding XML standard ...). Even if you convert to text files successfully, how are you going to convert back to XML in order to satisfy the client? You don't.

XML is sometimes used by programmers out of laziness, but most of the time, its advantages are so huge, especially for big companies with lots of documentation iterations (software industry...), that this completely outweighs its disadvantages. There is a reason that the use use of XML is on the increase.

So I'd venture to say that in the long run, all translation companies need to be well versed in XML and, if possible, have their own in-house staff and support for XML issues (or hire translators how are "in the know" -- wink winkicon_wink.gif). They also need to standardize on XLIFF and all the other XML based formats, and the tools that support it, sooner rather than later, or else risk losing their bigger clients.

It's long been known that MS Word and similar apps are not good enough as long term documentation solutions, and more and more companies are trying to escape from it. MS has tried to counteract by basing its office formats on XML themselves, but it's doubtful whether "MS XML" will be a success outside the regular office scenario, given that the corresponding standard comprises more than 6000 (!) pages.

A temporary solution might be to ask your customers for PDF files (which should be possible in many cases) or files converted to XHTML/HTML which can be displayed by web browsers. But you (and your clients) would still lose many of the advantages.

BTW It's also well possible to implement XML to PDF/XHTML conversion inhouse, but you need the right people with the right tools.

In short, XML compatibility is a must.

[Edited at 2010-09-21 21:23 GMT]

[Edited at 2010-09-21 21:26 GMT]


 

FarkasAndras  Identity Verified
Local time: 23:21
English to Hungarian
+ ...
Whew Sep 21, 2010

There was a lot that made a lot of sense in your wall of thext but...

opolt wrote:

Mike, forgive me but that doesn't make too much sense. XML is not "one format"; there are many many XML variants out there, and there's never going to be the one tool that converts all types of XML to simple text files. (BTW new XML dialects are being created all the time.)

Yes, but that doesn't necessarily matter much for our purposes.

opolt wrote:
Ditto for just removing the tags (you'd have to know how to distinguish the mere contents from the rest, so you need to study the corresponding XML standard ...)

Not really, I can give you a tag stripper that remove tags from any HTML, XML, TBX, TMX etc. file. A tagged format is a tagged format... but of course you're right that simply stripping the tags is of no use in this situation as you have no way of putting them back in correctly.

opolt wrote:
It's long been known that MS Word and similar apps are not good enough as long term documentation solutions, and more and more companies are trying to escape from it. MS has tried to counteract by basing its office formats on XML themselves, but it's doubtful whether "MS XML" will be a success outside the regular office scenario, given that the corresponding standard comprises more than 6000 (!) pages.

Well it was obviously never intended to replace other flavours of XML for other uses than the regular office scenario, I have no idea why you brought it up. The documentation may be long but you don't need to read any of it unless you develop applications that rummage around in these files... and if you are, you'll be glad there's a lot of documentation. Anyway, it's an open standard which makes it really convenient and powerful. I'm chuffed to bits about docx & co.

opolt wrote:
A temporary solution might be to ask your customers for PDF files (which should be possible in many cases)

Whoa. What? Why? How?
PDF is the translator's worst enemy. Why ask for a PDF file? How are they easier to translate than XML? In what format do you send the translation back? What does the client do with your translation?
opolt wrote:
or files converted to XHTML/HTML which can be displayed by web browsers. But you (and your clients) would still lose many of the advantages.

I'm not sure if that's feasible, but if it is, you still have a tagged format that's no easier to edit/translate than the original XML files.


 

mediamatrix (X)
Local time: 19:21
Spanish to English
+ ...
OT: Don’t know Sep 21, 2010

Mike, I can’t answer your question, simply because I have no idea.

I do nonetheless wish to applaud your business philosophy which, as you have explained, favours the adaptation of technology to suit the skills of your greatest resource: your experienced translators.

Jerzy Czopik wrote:
Do you really mean that, even if we are already reaching the second decade of the twenty-first century? Back 1990 as I started this time was a very far future I must admit. Using the word "excellent" means realy better than other, outstanding - I would be reluctant to call a company excellent today, which is still using a hot metal typesetting.


FarkasAndras wrote:
Excellent translators shouldn't be too hard to convince about the merits of a CAT tool.
Still, if you're stuck with this situation…

But if your translators are complete luddites and you can't find any software that's simple enough for them to use and smart enough to be useful, …


As a self-confessed ‘excellent translator’ who has never used a CAT in the 35+ years since I began translating for a living, and who has not the slightest intention of starting now, even if XML is creeping into clients’ documents, I find the comments from both those contributors utterly despicable.

The suggestion that the absence of CAT tools on the computers of Mike’s translators makes those translators – or, indeed, Mike’s own business - anything less than ‘excellent’ is totally absurd.

The only ‘Luddites’ here are those who pretend that top-quality translation is impossible with only a pencil and paper.

MediaMatrix


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 23:21
Member (2004)
English to Polish
External export Sep 22, 2010

Doesn' t Studio have an option to export the grid into an editable format? It is mostly used for reviews, but of course a translator can use that, too.

If Studio does not have that option - although I am pretty sure I have read about it somewhere - then MemoQ or DejaVuX make it quite easy. In fact, for one of my clients I have used MemoQ to convert Trados ttx made from InDesign files into editable Word table files for a translator who does not have Trados and then back into ttx.

However, I agree with Jerzy and others that using CATs throughout the whole workflow makes more sense - you will get more consistency, can unify glossary etc. The difference might depend on the source files, though (and the translators, of course - if they are good enough, they will be consistent anyway).

opolt: I think that the title might have misled you - the problem is not to convert XML into editable format - CATs will take care of that. It is rather how to convert CAT internal format into Word/Excel and that is perfectly possible.

[Edited at 2010-09-22 03:39 GMT]


 

opolt  Identity Verified
Germany
Local time: 23:21
English to German
+ ...
@FarkasAndras Sep 22, 2010

I admit my talking about Office XML was a digression, but otherwise, just in case I wasn't clear enough:

What Mike said showed, IMHO, that he doesn't seem to be fully aware of what XML is (it's not "another file format"), why businesses are using XML, what are its benefits etc., and how all this impacts the translation business. Clearly, if you're running a translation company, you should know at least something about these things -- suggesting to convert to MS Office files implies that he doesn't have a very clear idea about what is going on behind the scenes. Because in general, it's not a solution at all. But with the agency being higher up in the "food chain", it ought to know much more about this than the regular translator -- if not, how to convince the latter of the need to use CAT tools?

Clearly, as an "agency", you need a full XML tool chain these days (in most specializations): either your translators use CAT tools, or they translate XML files directly, in text editors or WYSIWYG XML editors (personally I have been doing the latter, BTW, and resisted CAT tools for quite some time).

Mike was asking for ways to make XML files readable no matter what. So that's why PDF is one of the "solutions" I suggested. BTW for those who insist on using MS Office only, a printed PDF file is no obstacle at all -- it's just on the same level of doing things the old-fashioned way.

But in fact, trying to circumvent XML may be one of the reasons translators so often receive PDF files rather than other formats, along with all those strange Excel tables and other weird stuff. All this is far from ideal, of course. But its not me who is suggesting that we could go back to the eighties.

Cheers.


[Edited at 2010-09-22 11:10 GMT]


 

FarkasAndras  Identity Verified
Local time: 23:21
English to Hungarian
+ ...
Ludditism, CATism Sep 22, 2010

mediamatrix wrote:

As a self-confessed ‘excellent translator’ who has never used a CAT in the 35+ years since I began translating for a living, and who has not the slightest intention of starting now, even if XML is creeping into clients’ documents, I find the comments from both those contributors utterly despicable.

You seem to be easily offended. I think that using advanced software (well, user friendly software that has been readily available and in widespread use for the better part of a decade or more) is part and parcel of the job, at least for translators who work on technically complex projects. (I.e. projects involving multiple translators, non-human-readable file formats, previously translated material and client-supplied terminology etc.) If an excellent translator decides to take on such projects, I would expect them to be open to the thought of using CATs.
I don't see how that's a despicable position.

mediamatrix wrote:
The suggestion that the absence of CAT tools on the computers of Mike’s translators makes those translators – or, indeed, Mike’s own business - anything less than ‘excellent’ is totally absurd.

I tend to agree, but then nobody said otherwise.
mediamatrix wrote:
The only ‘Luddites’ here are those who pretend that top-quality translation is impossible with only a pencil and paper.

No, Luddites are those who view technological change with suspicion because the new way of doing things is unfamiliar to them and it would require them to adapt.

Simple, really: translators can adopt CATs and learn how to use them, or decide to resist the change and pass on the projects that require CAT use. Both are valid positions and I see no reason to get all emotional about the issue, whichever decision one made.

[Edited at 2010-09-22 18:17 GMT]


 

FarkasAndras  Identity Verified
Local time: 23:21
English to Hungarian
+ ...
Tags? Sep 22, 2010

Jabberwock wrote:

Doesn' t Studio have an option to export the grid into an editable format? It is mostly used for reviews, but of course a translator can use that, too.

What does it do with the tags when it generates a table like that? Leave them out completely? I would think that there could be many situations where you'd get relevant, even indispensable information from the tags. I would want them to be available, just not inline with the translatable text with the same font, so that they don't get in the way.


 

Jaroslaw Michalak  Identity Verified
Poland
Local time: 23:21
Member (2004)
English to Polish
It depends... Sep 22, 2010

FarkasAndras wrote:
What does it do with the tags when it generates a table like that? Leave them out completely? I would think that there could be many situations where you'd get relevant, even indispensable information from the tags. I would want them to be available, just not inline with the translatable text with the same font, so that they don't get in the way.


It depends on the software, I suppose - I cannot remember how it was in DVX...

In MemoQ you have options aplenty! You can


  • leave out tags completely
  • export them as simple placeholders
  • export as placeholders with special Word style - helpful to work around them
  • export inline tags with their full text - helpful if their content is significant to translation, but sure does not look pretty...


Beside you can export segment status (useful), comment fields (and import them back - very useful!). Moreover, you can export the translation in three columns - one is source, one is target as exported and the third one contains the target changed by the reviewer.

Naturally, to import the file back one needs not to mess around with it too much - removing some codes, deleting rows etc. might make the import fail. However, if the translator/reviewer is reasonably careful about that (and properly instructed), there should be no problems.


[Edited at 2010-09-22 19:34 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Exporting XML to an easily editable format

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search