How to count word of an e-book (html format)
Thread poster: Sandrine Rizzo (X)

Sandrine Rizzo (X)  Identity Verified
France
Local time: 11:27
Spanish to French
+ ...
Feb 12, 2014

Dear all,

I've been faced this week with a technical issue when a prospect sent me an e-book in the form of a link to a html address, where each page had this very same address, so it was impossible to work as any usual website, i.e saving each page for a wordcount and CAT-tool translation.

Can anyone of you give me their tips on how it is possible to do a reliable wordcount and also how to translate this kind of documents ?

For your information, the customer refused to send me the content in another format, so it was a kind of "do or die" request....

Thank you in advance for your help,
Sandrine


 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 12:27
Member (2008)
English to Russian
+ ...
In any CAT-tool Feb 12, 2014

If you have HTML -- it is a simple format supported by all CAT tools.

If you have a link to the web page, it is NOT the e-book.
Try using HTTRack or similar tool to extract the text.

It is difficult to say anything without seeing the web-site...

[Редактировалось 2014-02-12 11:55 GMT]


 

Sandrine Rizzo (X)  Identity Verified
France
Local time: 11:27
Spanish to French
+ ...
TOPIC STARTER
unsuccessful extraction Feb 12, 2014

Thank you Sergei for your reply.

As recommanded, I uploaded the HTTrack software & extracted the page with no success.
The extracted file has no content, except for the page titles, same thing when imported into a CAT tool. I saved the page as a text with the same result. It seems this page is some kind of a content "screenshot".

Actually, the link leads to this kind of address: http://customername.com

Any other idea?

For non-disclosure reasons, I do no want to give the right address here, but I could send it to you by private mail if you think you could work something out with it.

Sandrine


 

Joakim Braun  Identity Verified
Sweden
Local time: 11:27
German to Swedish
+ ...
Ajax Feb 12, 2014

This is probably an Ajax-based HTML interface where you never see the actual URLs.
The back-end data might be in all kinds of formats, inluding but not limited to HTML.

Ask the customer to provide the text.



[Bearbeitet am 2014-02-12 13:24 GMT]


 

Rolf Keller
Germany
Local time: 11:27
English to German
Copy and Paste? Feb 12, 2014

If all pages have the very same URL, they cannot be saved as files because no such files exist. The web server generates a new page on-the-fly, when you click the Next Page or Previous Page button. As HTTrack doesn't click that buttons, it will not work.

You could try to copy page by page via Copy-and-Paste from the screen into an open Word document. Then reformat it if necessary.


 

Joakim Braun  Identity Verified
Sweden
Local time: 11:27
German to Swedish
+ ...
Don't think so Feb 12, 2014

Rolf Keller wrote:

You could try to copy page by page via Copy-and-Paste from the screen into an open Word document. Then reformat it if necessary.


For quoting on a book of perhaps 200 pages? It's not worth it.
Ask the customer to provide the text. If they're a customer worth having, they'll be happy to provide it.

[Bearbeitet am 2014-02-12 14:31 GMT]


 

Rolf Keller
Germany
Local time: 11:27
English to German
Quoting based on estimation Feb 12, 2014

Joakim Braun wrote:

For quoting on a book of perhaps 200 pages? It's not worth it.


Not for quoting but for translating. For quoting I'd estimate the quantity (based on 3 typical" pages) and multiply the result by 1.2 (because of the uncertainity and because of the additional work). And I'd inform the client that not providing an editable file implies higher cost. Topic: "Customer Education"icon_smile.gif


 

Tony M
France
Local time: 11:27
Member
French to English
+ ...
Copy/paste or screen capture Feb 12, 2014

If the text can be selected with your mouse, then you can copy and paste it page by page into a document.

If the text only exists as an image, then you can use something like the Windows 'capture' tool to select and copy just the text on the page and copy it into an image file, which you can then OCR.

I have done this in the past, albeit only for a short document!

I can't see a glittering alternative — obviously e-book publishers aren't going to want you to copy their text easily into a format that could then be printed out...

If I were you, I'd be inclined to subcontract the donkey work out to someone whose time is less valuable than your own — maybe a computer-savvy kid who'd like to earn some pocket-money!

Just a thought, though — if your customer is unable to provide the original text, are you sure they actually have the right to have it translated? Depends what their intended use is, naturally.


 

Sandrine Rizzo (X)  Identity Verified
France
Local time: 11:27
Spanish to French
+ ...
TOPIC STARTER
thanks Feb 13, 2014

Thank you all for your useful contributions, although an effective solution has not yet be found, I discovered new tools and new technical tips !

The idea of capturing the pages and doing an OCR conversion seems interesting and of course this should be charged as it means extra work.

When I asked the customer for an editable format, he refused to do so, and actually I refused to work further on his quotation request because of his unethical behaviour, as he is the kind of customer who squeeze agencies dry to get lower prices with no efforts. And to be honest, this was a relief more than a lossicon_smile.gif !

Have a nice day !
Sandrine


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to count word of an e-book (html format)

Advanced search






WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search