Looking for a "site hoover" to extract text from web pages Автор темы: David BUICK
| David BUICK Local time: 00:35 Член ProZ.com c 2006 французский => английский + ...
Does anyone have any experience of what, in French, is termed an "aspirateur de site"? I'm looking at a huge contract to translate a website and wondering what the best way to deal with it is, so I don't waste hours manually indexing and copy/pasting text. | | | KathyT Австралия Local time: 08:35 японский => английский HTTrack Website Copier 3.41 RC1 | Sep 4, 2007 |
Try this one, downloadable free from: http://www.download.com/3000-12779_4-10634972.html. It had pretty good reviews and has supposedly been tested as spyware-free. Has been working well for me. Even large-ish websites can be downloaded in entirety in around 5 mins. P.S. I love your "site hoover" expression!... See more Try this one, downloadable free from: http://www.download.com/3000-12779_4-10634972.html. It had pretty good reviews and has supposedly been tested as spyware-free. Has been working well for me. Even large-ish websites can be downloaded in entirety in around 5 mins. P.S. I love your "site hoover" expression! ▲ Collapse | | | Oliver Walter Великобритания Local time: 23:35 немецкий => английский + ... I don't think HTTrack does what you want | Sep 4, 2007 |
KathyT wrote: Try this one, downloadable free from: http://www.download.com/3000-12779_4-10634972.html. It had pretty good reviews and has supposedly been tested as spyware-free. Has been working well for me. Even large-ish websites can be downloaded in entirety in around 5 mins. All this is true, and I use WinHTTrack (the Windows version) sometimes. But it doesn't extract the text into a text or word-processor file; it only copies the web site (the parts of it defined by the limits that you configure) onto your computer so that you can browse it offline. HTTrack is free because it is an Open Source project. Oliver | | | David BUICK Local time: 00:35 Член ProZ.com c 2006 французский => английский + ... Автор темы Maybe it does enough...? | Sep 4, 2007 |
Hmm. If it downloads in html format I think maybe my CAT app (Déjà Vu X) can extract from there and hopefully spit it back into the same format. When I get a moment I'll do a few trial runs. Thanks for the speedy responses. Anyone else got any experience or suggestions? | |
|
|
Philippe Etienne Испания Local time: 00:35 Член ProZ.com английский => французский
Eutychus wrote: ...Hmm. If it downloads in html format I think maybe my CAT app (Déjà Vu X) can extract from there and hopefully spit it back into the same format. WinHTTtrack downloads all files as they are to make up the website. So if you use a CAT tool that handles web files, then the process should be a breeze. Good luck, Philippe | | | Heinrich Pesch Финляндия Local time: 01:35 Член ProZ.com c 2003 финский => немецкий + ... Ask always for the files as zip | Sep 4, 2007 |
In the old days you could simply download all files from I site and translate, but nowadays most professional sites use databases, where the content is created on the fly. So translating what you get by downloading may not be the right procedure. Just my 2 c. Heinrich | | | megane_wang Испания Local time: 00:35 Член ProZ.com c 2007 английский => испанский + ... I agree with Heinrich | Sep 4, 2007 |
Clearly, you have no experience at that... If it's such a big web site, you NEED to talk to the customer and make a detailed project analysis, and see how you will get, process and deliver the contents. I've been both on the developer and translator side, and I can assure you that in a BIG site it's extremely rare that you can go ahead without that... ... at least if you want to do it right... See more Clearly, you have no experience at that... If it's such a big web site, you NEED to talk to the customer and make a detailed project analysis, and see how you will get, process and deliver the contents. I've been both on the developer and translator side, and I can assure you that in a BIG site it's extremely rare that you can go ahead without that... ... at least if you want to do it right Ruth @ MW
[Edited at 2007-09-04 13:10] ▲ Collapse | | | David BUICK Local time: 00:35 Член ProZ.com c 2006 французский => английский + ... Автор темы Thanks for replies so far | Sep 4, 2007 |
I have translated a number of sites and no two have been the same. In most cases I have had access to the source files as explained by Heinrich, but in more than one case the text has had to be inputted online (for example in Flash-based sites for which the original copy is no longer available). In several other cases where I have had the files, these have not included the various menu items and headlines which are added afterwards and often get translated by some non-specialist after the projec... See more I have translated a number of sites and no two have been the same. In most cases I have had access to the source files as explained by Heinrich, but in more than one case the text has had to be inputted online (for example in Flash-based sites for which the original copy is no longer available). In several other cases where I have had the files, these have not included the various menu items and headlines which are added afterwards and often get translated by some non-specialist after the project when they suddenly realise they forgot to ask for that part to be done, thus destroying the effect of the whole thing. I hope to be able to go down the route suggested by Heinrich and Ruth, but I would like to cover my options. ▲ Collapse | |
|
|
Samuel Murray Нидерланды Local time: 00:35 Член ProZ.com c 2006 английский => африкаанс + ... Get the text from the client | Sep 5, 2007 |
Eutychus wrote: I'm looking at a huge contract to translate a website and wondering what the best way to deal with it is, so I don't waste hours manually indexing and copy/pasting text. The client should provide the text for you, either in HTML or in some word processing format. Otherwise you can't know for certain if you got all the pages, or if you perhaps got more pages that the client thought he had. By requiring the client to provide the files, you avoid both nasties. I use Oleg Chernavin's Web Downloader 2.2, but it is abandonware and you may need to Google hard for it. What I like about Oleg's tool is that it recreates the folder tree for all the objects on the web site. | | | Cat's cradle | Sep 10, 2007 |
Bonjour, I'm not sure whether Cat's cradle is what you are looking for but at least it's a very handy little application (30 days free): http://www.stormdance.net/software/catscradle/overview.htm "CatsCradle grabs all the text that requires translating from a web page, puts it into a built in editor for you to translate alongside, then aut... See more Bonjour, I'm not sure whether Cat's cradle is what you are looking for but at least it's a very handy little application (30 days free): http://www.stormdance.net/software/catscradle/overview.htm "CatsCradle grabs all the text that requires translating from a web page, puts it into a built in editor for you to translate alongside, then automatically integrates your translated localized text back into the web page - leaving all the sensitive HTML code untouched. ..." Moreover, Julian Spencer always has an open ear for questions But Samuel is right, of course the original files with exact specifications what to translate and what not, provided by the client, are always the best solution... Charlotte ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Looking for a "site hoover" to extract text from web pages Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |