Counting words in websites
Thread poster: Catarina Aleixo
Catarina Aleixo  Identity Verified
Portugal
Local time: 09:00
Portuguese to English
Aug 21, 2009

I have in the past received several requests for quotes to translate websites with nothing more than a URL for reference.

I used a software (called a webcrawler, I think) that purported to count the words, but on every occasion my custoemrs got scared off by the number of words. This made me think that either the software was not counting correctly or that the customers were not aware of quite how big the site was. Either way I am in doubt about the reliability of this kind of software and have just received another similar quote request.

Does anyone have any tips about how to reliably count words in a website using only the basic URL (ie. http://www.proz.com) for reference, preferably without copying and pasting every page (this is a big website I have to quote for)? Thanks.


Direct link Reply with quote
 

Ulf Norlinger  Identity Verified
Sweden
Local time: 10:00
English to Swedish
+ ...
Use MS Word Aug 22, 2009

You can copy & paste the information from the web pages to a Word-doc and get the word count there. Though, be sure to filter out the HYPERLINKS (images and other embedded objects are not counted at all) e.g. through a pattern matching technique in the Replace-dialogue (i.e. { HYPERLINKS ... }--> "").

NOTE: I actually believe WebCrawler count the words in the web links, so may be your agency is right. This can have a huge impact on the no. of words.

[Redigerad 2009-08-22 06:34 GMT]


Direct link Reply with quote
 
Catarina Aleixo  Identity Verified
Portugal
Local time: 09:00
Portuguese to English
TOPIC STARTER
Thanks but wanted to avoid Word as site is huge Aug 22, 2009

Thanks Ulf, especially for the tip about the hyperlinks. I wanted to avoid copying and pasting into word as this will take a very long time (especially when having to remove hyperlinks one by one) and I am concerned I may miss pages due to the complexity and size of the site.

Direct link Reply with quote
 

Ulf Norlinger  Identity Verified
Sweden
Local time: 10:00
English to Swedish
+ ...
I see Catarina... Aug 22, 2009

You can try this free tool on the Internet:

http://www.kwintessential.co.uk/translation/website-wordcount-tool.php


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 07:00
English to Portuguese
+ ...
Try CatsCradle... Aug 22, 2009

... from http://www.stormdance.net .

I think you can download a demo. Btw, I think you have to download the whole web site to do it on your hard disk. On the brighter side, it has its own internal CAT tool for translating, and preserves the HTML, so you get the web pages translated, and not only the text therein.


Direct link Reply with quote
 
Catarina Aleixo  Identity Verified
Portugal
Local time: 09:00
Portuguese to English
TOPIC STARTER
Great suggestions I'll give them a try Aug 22, 2009

Thanks I'll try out both and give feedback when I know more about it.

Thanks again.


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 10:00
Member (2006)
English to Afrikaans
+ ...
You have to download it... Aug 22, 2009

Catarina Aleixo wrote:
Does anyone have any tips about how to reliably count words in a website using only the basic URL (ie. http://www.proz.com) for reference, preferably without copying and pasting every page (this is a big website I have to quote for)?


Well, you're gonna have to download it anyway (using your webstripper type of program). You can't count it unless you download it. Personally I'd use OmegaT for the counting, but that may be overkill (using a hammer to swat a fly). You can put several directories and subdirectories full of files in the /source/ folder of OmegaT and when you reload the project, OmegaT stores the word counts of all the files in a stats.txt file somewhere. Quite useful.


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 10:00
German to Spanish
+ ...
Webcounting Aug 22, 2009

I agree with Samuel. You will have to download the whole website anyway and in behalf of the kind of page wordcounting, it may be a serious headache (specially with dynamic pages). I recommend you to give a try to a fortnight free trial of AquinoWebbudget XT www.webbudget.com

This tool allows you to map the site (It does not translate the links to get the website working at your pc (like httrack and others), so you can upload your translation straight to the server once translated) and it count the words you will need to translate taking in account you will have to translate words that maybe are not visible at the page (javascripts, etc) and not taking in acccount tags and other not translatable elements. ¡Good luck!

[Editado a las 2009-08-22 17:42 GMT]


Direct link Reply with quote
 
Catarina Aleixo  Identity Verified
Portugal
Local time: 09:00
Portuguese to English
TOPIC STARTER
Thanks for input but now I'm confused... Aug 23, 2009

Thanks for everyone's input, but now I'm confused. How would I even go about downloading a website? What do you mean by mapping the site? Sorry for my tech ignorance.

[Edited at 2009-08-23 16:19 GMT]


Direct link Reply with quote
 
barreiro04
Local time: 06:00
English to Spanish
You can try TransAbacus Sep 8, 2009

TransAbacus counts websites online (so, you don't need to download the whole website to your PC). Also, it has an option to ignore repeated phrases (generally, web sites repeat the same key sentences quite often, or use the same meta tags, so if you count all repetitions you will get a huge number of words).

You can get this software at www.transabacus.com

Hope this helps, regards


Direct link Reply with quote
 

Pablo Bouvier  Identity Verified
Local time: 10:00
German to Spanish
+ ...
Counting words in websites Sep 9, 2009

Catarina Aleixo wrote:

Thanks for everyone's input, but now I'm confused. How would I even go about downloading a website? What do you mean by mapping the site? Sorry for my tech ignorance.

[Edited at 2009-08-23 16:19 GMT]


I am sorry for that, Catarina. Websites are text files hosted at a webserver (at a far computer). What we usually know as browsers are technically webclients (a websites reader/interpreter hosted in our PC). Webclients (browsers) allows us to see webpages (interpreted text files) hosted at a webserver (a far computer) at our PC-screen.

If you need to translate a website you need the files at the webserver (the files hosted at the far computer), not what the browser shows us that is far different. Nevertheless, almost website downloading programs changes the website structure, so the website can be seen offline at your PC alike you can see it online. Such programs are not useful, neither for translating, nor for word counting because they modify the whole website structure.

You need the original files from the webserver (the text files from the far computer), not what is shown at your computer's screen (the browsers interpretation of the text files). And there is were Webbudget comes in. It allows you to download all existing files hosted in a webserver to your computer, translate them, and upload them after translating, without any kind of conversion.

Forget about mapping. Webbudget will do it automatically (it will remember the webste structure at the webserver site (at a far computer) before downloading.

[Editado a las 2009-09-09 23:08 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Counting words in websites

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search