ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: barryw
extract text from xml files
barryw
Hong Kong
Local time: 03:06
Member (2007)
English to Chinese
+ ...
Nov 2, 2010

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!


Direct link Reply with quote
 

Arabic Translation Team  Identity Verified
Egypt
Local time: 21:06
Member (2009)
German to Arabic
+ ...
There is a work arround :) Nov 2, 2010

Hi Brrawy,

The following is very useful in most cases:

1- right click the XML file
2- Choose Edit (with Notepad for example)
3- When the file is opened, choose "File" | Save as
4- In the File name field change the file extension into ".html"
5- Save the file (will be saved as a webpage)
6- Open the web page,
you'll find the pure text with the XML tags and more over with format
7- In this way you can copy this text and paste it in a word file for example to be able to deal with the text alone

Best regards

Your Arabic Translation Team


Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 21:06
English to Hungarian
+ ...
Here's one Nov 2, 2010


barryw wrote:

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!



Here's a script of mine:
http://www.mediafire.com/?kq9yayc1hgt2kj9

Unzip, move your file to the tag_stripper folder and rename it to .html. Double click the .bat and follow the instructions. It's a bit crude but it should work... check the results of course, though.
The end result should be pretty much the same as opening the file in a browser and copying the content to a txt, but this solution will work with large files as well, while your browser definitely won't open a 50+ MB file for you.

Also, I have no idea why Arabic Translation Team posted such a convoluted solution. If you want to open the file in your browser, right click it, choose Open with... and pick the browser from the list. No need to change the extension, especially not by opening the file in another program first. If the browser is not on the "open with" list, choose "other program" and pick the browser from there.

The file extension doesn't change the type of the file ("save it as a webpage"). It's just an indication to the OS; it tells the OS what software to open the file with by default. You can easily override that default through the right-click local menu.

[Edited at 2010-11-02 20:48 GMT]


Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 21:06
English to Hungarian
+ ...
OS Nov 2, 2010

Note: the above solution only woks on Windows computers. Barryw failed to specify the OS he uses, so I assume it's some flavour of Windows.

Direct link Reply with quote
 
barryw
Hong Kong
Local time: 03:06
Member (2007)
English to Chinese
+ ...
TOPIC STARTER
Thanks very much for your suggestions. Nov 3, 2010

Dear Arabic Translation Team and FarkasAndras,

Thanks very much for your suggestions.

Arabic Translation Team's solution works well in my case! quite a simple solution.

Thanks FarkasAndras for giving a detailed suggestion, though I still haven't time to try your link, but I believe it will be a good fix for dealing with large size files. Yet, regarding your second suggestion by opening the xml files directly via "Open with>browser" command, it seems it doesn't work in my case. The firefox just shows all the tags, while IE simply cannot open the xml file. Maybe I mess up something?

Anyway, thank you all for your contributions.


Direct link Reply with quote
 
Dawid Wietrzyk  Identity Verified
Poland
Local time: 21:06
Polish to English
+ ...
It works.. Jan 23

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.

Direct link Reply with quote
 
FarkasAndras
Hungary
Local time: 21:06
English to Hungarian
+ ...
You're welcome Jan 24


Dawid Wietrzyk wrote:

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.


Glad it worked. Now this script (probably a more refined version) and similar random bits and bobs are in the "grab bag" at http://sourceforge.net/projects/aligner/files/

Currently, the grab bag is at version 1.6. You'll always find the most recent version at the sourceforge url above.

[Edited at 2012-01-24 11:29 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze[Call to this topic]
Mohamed Kamel[Call to this topic]

You can also contact site staff by submitting a support request »

extract text from xml files






XTM Cloud
20,000 extra words free with XTM Cloud!

A fully featured online CAT tool and TMS, with no installation required, and a simple, intuitive interface. Maximize linguistic assets by sharing in real time as you collaborate with colleagues. Make use of next generation, cloud-based translation technol

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »