how to work out the exact source word count from a web page?
Thread poster: Federica Masante

Federica Masante  Identity Verified
Local time: 20:53
Member (2003)
Italian to English
+ ...
May 11, 2005

Dear All,

I am trying to put together a quotation for a client for the translation of a web site (source wordcount). I thought I could just copy and paste the text into a word format and work it out like that. Unfortunately though, the text is not in an editable format (it would appear to be an image file) and it won't let me copy and paste it. Is there any kind of software that could help me with this or does anyone have any other suggestions?
Thanks for your time,

Federica


Direct link Reply with quote
 

Nick Lingris  Identity Verified
United Kingdom
Local time: 19:53
Member (2006)
English to Greek
+ ...
Counting the words of an image file May 11, 2005

I'm afraid, dear Federica, there is no way to count the words of an image file. You could, however, OCR the file or a printout thereof and turn it into Word text to count the words and use the source text in any other ways you can think of.

Direct link Reply with quote
 

Cristina Mazzucchelli  Identity Verified
Italy
Local time: 20:53
English to Italian
+ ...
print and scan May 11, 2005

Ciao Federica,

I don't know if my solution can help you out, but here's what I usually do when I have to count uneditable files:

I print the whole thing, scan it (sometimes if I have pdf files I can also avoid printing the file) and save it in word. Then I do my usual count...

Can you do this with this kind of file?

Hope to have given you a hand...


Good luck

Cristina


Direct link Reply with quote
 

Sarah Brenchley  Identity Verified
Local time: 20:53
Spanish to English
+ ...
WebBudget May 11, 2005

This is the software I would recommend. It counts all the web pages, in a variety of formats, and presents the results in an easy -to-use format (i.e. HTML, excel, etc.).
It saves a lot of time and has been well worth buying.
All the best,
Sarah.


Direct link Reply with quote
 

Jana Teteris  Identity Verified
United Kingdom
Local time: 19:53
Latvian to English
+ ...
I could be wrong, but.... May 11, 2005

I think you can count the number of words by saving the web-page as a text file in Unicode format.

Direct link Reply with quote
 
xxxBrandis
Local time: 20:53
English to German
+ ...
I use Practicount May 11, 2005

Hi! I use practicount business edition, also tool bar, that processes various formats. But do you have the website on your local computer or do you wish to count the source remotely. Then the tools required may vary. Normally the outsourcer tells you which pages need conversion, otherwise if there are community forums in that websites, the count can be enormous.
Regds,
Bandi
I think I got you wrong earlier. If this is an image file, files end with .bin, .cue, .nrg etc., I use isobuster to extract the files and send to a word counting tool. Or inform your outsourcer to give you the file in a processible format, as you are not able to extract the words from the document.


[Edited at 2005-05-11 16:51]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 20:53
Member (2006)
English to Afrikaans
+ ...
OCR or ask the client May 11, 2005

Federica Masante wrote:
I thought I could just copy and paste the text into a word format and work it out like that. Unfortunately though, the text is not in an editable format (it would appear to be an image file) and it won't let me copy and paste it.


The client might have the original files from which they created the image files. If you're in luck, it may actually be something like Microsoft FrontPage Express, heh-heh. Alternatively, you can try to print the pages and OCR it.

For free OCR (but not very good OCR), try the GOCR module embedded in the Omniformat module of PDF995, or Google for SimpleOCR.

The alternative is the old method... spotcheck and manual count.


Direct link Reply with quote
 

jemo  Identity Verified
United States
Local time: 14:53
Member (2005)
English to French
+ ...
Try this May 11, 2005

Have you tried CatsCradle? It might solve some of your problems. It's here:
http://www.stormdance.net/


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 20:53
Member (2006)
English to Afrikaans
+ ...
CatsCradle requires source text in electronic format May 11, 2005

jemo wrote:
Have you tried CatsCradle? It might solve some of your problems. It's here: http://www.stormdance.net/


CatsCradle is an amateur translator tool (and I mean that in a neutral sense)... but it can't grab text from images. It requires HTML pages with plaintext source.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maria Castro[Call to this topic]

You can also contact site staff by submitting a support request »

how to work out the exact source word count from a web page?

Advanced search







Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs