Website text extractor
Thread poster: Chopkins
Chopkins  Identity Verified
France
Local time: 13:14
Member (2016)
French to English
+ ...
May 22

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 09:14
English to Portuguese
+ ...
Try these... May 22

HTTrack - http://www.httrack.com - freeware - to download entire web sites
CatsCradle - https://www.stormdance.net/software/catscradle/overview.htm - 30-day demo - to count words and translate; it has its own built-in CAT tool


Direct link Reply with quote
 

Elif Baykara  Identity Verified
Turkey
Local time: 14:14
Member (2015)
German to Turkish
+ ...
Hi! May 22

Did you check the more recent post below?

http://deu.proz.com/forum/general_technical_issues/241763-how_to_count_the_number_of_words_on_a_website_suggestions_needed.html?print=1

The last post is dated 15 Feb 2013.


Direct link Reply with quote
 

Maija Cirule  Identity Verified
Latvia
Member (2014)
German to English
+ ...
I would recommend May 22

Chopkins wrote:

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins


CatsCradle
It is a rather sophisticated program but in case of large volumes it is one of the best aids with embedded CAT tool..

[Edited at 2017-05-22 16:18 GMT]


Direct link Reply with quote
 
Chopkins  Identity Verified
France
Local time: 13:14
Member (2016)
French to English
+ ...
TOPIC STARTER
Have CatsCradle May 22

José Henrique Lamensdorf wrote:

HTTrack - http://www.httrack.com - freeware - to download entire web sites
CatsCradle - https://www.stormdance.net/software/catscradle/overview.htm - 30-day demo - to count words and translate; it has its own built-in CAT tool


Hi José,

Thank you very much for you reply.

I remember coming across your recommendation a few months ago and did wind up taking CatsCradle.

I like the program but have issues with subfolders and unnecessary content so I would like to change and just obtain a program which extracts text.

I'm possibly considering HTTrack if another proposition doesn't pop up.

Thank you for your suggestions,

Chopkins


Direct link Reply with quote
 
Chopkins  Identity Verified
France
Local time: 13:14
Member (2016)
French to English
+ ...
TOPIC STARTER
Thank you!!! May 22

Elif Baykara wrote:

Did you check the more recent post below?

http://deu.proz.com/forum/general_technical_issues/241763-how_to_count_the_number_of_words_on_a_website_suggestions_needed.html?print=1

The last post is dated 15 Feb 2013.


I think I may go with the last suggestion from the thread. Thank you for digging up this one from the forum archives

Thanks again,

Chopkins


Direct link Reply with quote
 
Chopkins  Identity Verified
France
Local time: 13:14
Member (2016)
French to English
+ ...
TOPIC STARTER
Curious to know if newer or better programs exists... May 22

Maija Cirule wrote:

Chopkins wrote:

Hello everyone,

I've had a few requests the past couple of months asking that I provide quotes for my services (translation of their websites). Although I've been in contact with a few who have been particularly forthcoming (in providing the source texts), others simply request that I use the text(s) directly from their websites.

I've seen previous threads but they are relatively old (dating back to 2007). Is there any evolution in the past 10 years where I can vacuum/extract text directly from the sites so that I can then obtain a more precise word analysis with my CAT tool.

Thanks for any help that you may be able to provide me,

Chopkins


CatsCradle
It is a rather sophisticated program but in case of large volumes it is one of the best aids with embedded CAT tool..

[Edited at 2017-05-22 16:18 GMT]


Maija,

I have CatsCradle and was somewhat happy with it on a couple of projects; however, given its age, I wanted to know if anything else is worthwhile.

Thanks again for your suggestion!

-Chopkins


Direct link Reply with quote
 

neilmac  Identity Verified
Spain
Local time: 13:14
Spanish to English
+ ...
Charge them extra May 24

Chopkins wrote:

Hello everyone,

... others simply request that I use the text(s) directly from their websites.


With this type of clients, I just tell them the fee will be roughly twice what it would be for normal text in a normal Word-compatible format. If that doesn't get them rooting about their back office to find a workable copy for you, they must have more money than sense.


Direct link Reply with quote
 

DZiW
Ukraine
English to Russian
+ ...
sure May 24

Just text and schemes/pictures(yayks!) translation won't do for it requires a proper layout, and they should consider culture-related peculiarities and how to redirect changeable/dynamic content for each language.

If there're just two or three languages, then straightforward sub-domains approach may be ok, but it's a separate copy to handle and process.

Anyway, extra work requires extra payment)


Direct link Reply with quote
 

John Fossey  Identity Verified
Canada
Local time: 07:14
Member (2008)
French to English
Get client to extract May 24

Don't forget that the text in a website is often much more than what's just visible when you view the page. There is internal text such as the content of menus, drop down lists, tool tips, etc. Sometimes there is text hidden in javascript which needs a programmer to extract. And sometimes changing such text can damage the page so that it doesn't work.

I will usually insist that the client get their webmaster to extract the text into a Word document, because it will usually take the webmaster's expertise to put the translation into the right places.


Direct link Reply with quote
 

Volodymyr Pedchenko
Local time: 14:14
English to Ukrainian
+ ...
Anycount 3D downloads web-sites and counts words, characters and lines May 24

Hello Chopkins,

We have released AnyCount 3D on the New Year's Eve. Its main difference from previous versions is the ability to download and count web-sites, that's actually what 3rd dimension in its name stands for.

Add from web and get word count

The web-site copy is stored for your use.

You are welcome to download it at http://www.wordcountsoftware.com/

As it is a novelty for translation world, we are open for suggestions on how to improve it.

Kind regards,
Vladimir.

[Edited at 2017-05-24 19:05 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Website text extractor

Advanced search






WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search