Extraction of text from Web site
Thread poster: luka
luka
Spain
Local time: 05:48
English to Spanish
+ ...
Jan 16, 2007

I have been asked to translate a big web site for a company. They do not have the source text for the site. I am looking for a tool which is capable of looking through all the pages, stripping out the code, leaving me with only the text to translate. Or any other suggestions on ways to do this. Thanks.

Direct link Reply with quote
 
Rod Darby  Identity Verified
Ghana
Local time: 04:48
German to English
+ ...
possible solution Jan 16, 2007

luka,
there's a shareware called Trellian which I believe will download the code of a site for you - I haven't tried it, but you might have a look.
Rod


[Edited at 2007-01-16 11:26]


Direct link Reply with quote
 

Jerónimo Fernández  Identity Verified
English to Spanish
+ ...
WinHTTrack Jan 16, 2007

Hola.

I use WinHTTrack (http://www.httrack.com). It's free and it works wonders. It mirrors in your hard drive the website that you want to work with.

Good luck,
Jerónimo


Direct link Reply with quote
 
xxxMarc P  Identity Verified
Local time: 05:48
German to English
+ ...
Extraction of text from Web site Jan 16, 2007

Why strip out the code from the pages? The customer will then have the job of putting it all back in again.

Tools are available with which you can download entire web sites, retaining the directory structure. wget is an example: www.gnu.org/software/wget

Once you have downloaded the site, you can translate the pages in a CAT tool which is capable of handling HTML. OmegaT, for example, will present you with the text for translation whilst keeping the entire web site structure - directories, images, the works - intact.

It is possible, however, that your customer's web pages are created dynamically with data from a database. In this case, you will probably have to get the customer to deliver the data to you.

Marc


Direct link Reply with quote
 

franksf
Chinese to English
try webstripper to download the site you need Jan 17, 2007

http://webstripper.net/reghelp.html

Direct link Reply with quote
 

Michael Bastin  Identity Verified
Spain
Local time: 05:48
English to French
+ ...
big website Jan 19, 2007

If the site is that big, chances are it is database-driven. Using a software to donwload pages may result in your waiting ages to complete the download.

In any case, you should only use the page for quoting purposes. The customer should send you the page they would like to have translated, or an export of the database if the content is generated that way.

My 2 cents


Direct link Reply with quote
 
luka
Spain
Local time: 05:48
English to Spanish
+ ...
TOPIC STARTER
Thank you very much Jan 19, 2007

I want to thank all of you for your help.
Eventually I have given up because the site is huge and I have told the client I can't find out the number of words and they should try to find the source files.

Have a great weekend


Direct link Reply with quote
 

mlconnections
United States
Local time: 22:48
English to Spanish
+ ...
extract web text and get summary? Jan 23, 2007

hi all:

i just posted this question yesterday, and someone kindly pointed me to this discussion. i'm in the process of trying out the software recommended, so thank you. however, has anyone found a program that can then generate a similar report to the following: http://www.apex-translations.com/en/cost_estimate/website_summary.html?

thank you.


Direct link Reply with quote
 
Paul Betts  Identity Verified
Local time: 05:48
French to English
Prior declarations or content management set-ups... Apr 20, 2007

I have found that if a client is considering a relatively small static-html web site translation, they often prefer a complete price for the site (including graphical elements).

My work has the attched conditions that all web page URLs requiring translation, are declared in advance - with the pages I have already seen listed by me in the quote.

On much larger dynamic page (data-based) sites, the clients often have domestic-language web programmers/developers present in their team. Well this is the ideal anyway.

If this is the case, I find it makes sense to ask that their developer adds a column to their database in the language that I offer them (as they will have to eventually) + create a simple content administration page which I can access with a password. This way I can see the text to be translated and below have a field blank to input, save and revise the equivalent new-language version.

When all is done, it takes no time for the developer to change the content reference variable in their web page template to the new language variable. It's all rather simple really

[Edited at 2007-04-20 14:59]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extraction of text from Web site

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search