how to work out the exact source word count from a web page?
Thread poster: Federica Masante
Federica Masante
Federica Masante  Identity Verified
Local time: 00:46
Italian to English
+ ...
May 11, 2005

Dear All,

I am trying to put together a quotation for a client for the translation of a web site (source wordcount). I thought I could just copy and paste the text into a word format and work it out like that. Unfortunately though, the text is not in an editable format (it would appear to be an image file) and it won't let me copy and paste it. Is there any kind of software that could help me with this or does anyone have any other suggestions?
Thanks for your time,

... See more
Dear All,

I am trying to put together a quotation for a client for the translation of a web site (source wordcount). I thought I could just copy and paste the text into a word format and work it out like that. Unfortunately though, the text is not in an editable format (it would appear to be an image file) and it won't let me copy and paste it. Is there any kind of software that could help me with this or does anyone have any other suggestions?
Thanks for your time,

Federica
Collapse


 
Nick Lingris
Nick Lingris  Identity Verified
United Kingdom
Local time: 23:46
Member (2006)
English to Greek
+ ...
Counting the words of an image file May 11, 2005

I'm afraid, dear Federica, there is no way to count the words of an image file. You could, however, OCR the file or a printout thereof and turn it into Word text to count the words and use the source text in any other ways you can think of.

 
Cristina Mazzucchelli
Cristina Mazzucchelli  Identity Verified
Italy
Local time: 00:46
English to Italian
+ ...
print and scan May 11, 2005

Ciao Federica,

I don't know if my solution can help you out, but here's what I usually do when I have to count uneditable files:

I print the whole thing, scan it (sometimes if I have pdf files I can also avoid printing the file) and save it in word. Then I do my usual count...

Can you do this with this kind of file?

Hope to have given you a hand...


Good luck

Cristina


 
Sarah Brenchley
Sarah Brenchley  Identity Verified
Local time: 00:46
Spanish to English
+ ...
WebBudget May 11, 2005

This is the software I would recommend. It counts all the web pages, in a variety of formats, and presents the results in an easy -to-use format (i.e. HTML, excel, etc.).
It saves a lot of time and has been well worth buying.
All the best,
Sarah.


 
Jana Teteris
Jana Teteris  Identity Verified
United Kingdom
Local time: 23:46
Latvian to English
+ ...
I could be wrong, but.... May 11, 2005

I think you can count the number of words by saving the web-page as a text file in Unicode format.

 
Brandis (X)
Brandis (X)
Local time: 00:46
English to German
+ ...
I use Practicount May 11, 2005

Hi! I use practicount business edition, also tool bar, that processes various formats. But do you have the website on your local computer or do you wish to count the source remotely. Then the tools required may vary. Normally the outsourcer tells you which pages need conversion, otherwise if there are community forums in that websites, the count can be enormous.
Regds,
Bandi
I think I got you wrong earlier. If this is an image file, files end with .bin, .cue, .nrg etc., I use
... See more
Hi! I use practicount business edition, also tool bar, that processes various formats. But do you have the website on your local computer or do you wish to count the source remotely. Then the tools required may vary. Normally the outsourcer tells you which pages need conversion, otherwise if there are community forums in that websites, the count can be enormous.
Regds,
Bandi
I think I got you wrong earlier. If this is an image file, files end with .bin, .cue, .nrg etc., I use isobuster to extract the files and send to a word counting tool. Or inform your outsourcer to give you the file in a processible format, as you are not able to extract the words from the document.


[Edited at 2005-05-11 16:51]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:46
Member (2006)
English to Afrikaans
+ ...
OCR or ask the client May 11, 2005

Federica Masante wrote:
I thought I could just copy and paste the text into a word format and work it out like that. Unfortunately though, the text is not in an editable format (it would appear to be an image file) and it won't let me copy and paste it.


The client might have the original files from which they created the image files. If you're in luck, it may actually be something like Microsoft FrontPage Express, heh-heh. Alternatively, you can try to print the pages and OCR it.

For free OCR (but not very good OCR), try the GOCR module embedded in the Omniformat module of PDF995, or Google for SimpleOCR.

The alternative is the old method... spotcheck and manual count.


 
jemo
jemo  Identity Verified
United States
Local time: 18:46
Member (2005)
English to French
+ ...
Try this May 11, 2005

Have you tried CatsCradle? It might solve some of your problems. It's here:
http://www.stormdance.net/


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 00:46
Member (2006)
English to Afrikaans
+ ...
CatsCradle requires source text in electronic format May 11, 2005

jemo wrote:
Have you tried CatsCradle? It might solve some of your problems. It's here: http://www.stormdance.net/


CatsCradle is an amateur translator tool (and I mean that in a neutral sense)... but it can't grab text from images. It requires HTML pages with plaintext source.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

how to work out the exact source word count from a web page?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »