Pages in topic:   [1 2] >
Translating a website: Tool for downloading hundreds of files and counting words
Thread poster: chopra_2002

chopra_2002  Identity Verified
India
Local time: 19:29
Member (2008)
English to Hindi
+ ...
Dec 5, 2010

Hi friends,

A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?

Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.

Thanks in advance for your precious help.

Regards,

Chopra


Direct link Reply with quote
 

Laurent KRAULAND  Identity Verified
France
Local time: 15:59
French to German
+ ...
Unprofessional way of dealing Dec 5, 2010

Hi langclinic,
there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe.

I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too much work.

This being said, I use Anycount to count the words in a PDF file. But the PDF file must be genuine PDF (like pages created in a DTP software or through an office application), not scanned files - in this case, and as the file would be images put in a PDF, you would have to count the words manually too.

Good luck!


Direct link Reply with quote
 

Riadh Muslih  Identity Verified
Local time: 06:59
Arabic to English
+ ...
I concur Dec 5, 2010

Laurent KRAULAND wrote:

Hi langclinic,
there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe.



I fully agree with Krauland. Not only on the point of professionalism, and perhaps copyright, also because I will not do the work of the client. The client must send me what he/she wants me to translate, not me fishing for it, with or without pay.


Direct link Reply with quote
 

jyuan_us  Identity Verified
United States
Local time: 09:59
Member (2005)
English to Chinese
+ ...
I think the question is still relevant and worth looking into Dec 5, 2010

Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files.

Direct link Reply with quote
 

Vadim Kadyrov  Identity Verified
Ukraine
Local time: 16:59
Member (2011)
English to Russian
+ ...
I have a piece of advice Dec 5, 2010

1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder.

2. Fine count is a very powerful tool to count html files, pdf, etc. You just select the folder where your downloaded files are stored, and than select html files only (to add them to the list).

3. You translate the files in TagEditor.

4. You than look through the on-line version of your translation to find any errors, slips of the pen, etc.

That is all. I successfully translated and localized several sites using the method. Of course, only small-scale web-sites can be translated in such a way. When having a large one, you will be lost in the piles of pages, images, etc.


All that takes you time (which means money). And frankly speaking, only rather small sites, of individuals or small companies, can be processed in that way. Large companies will of course never ask a single free-lancer to translate the whole web-site.

[Edited at 2010-12-05 06:52 GMT]


Direct link Reply with quote
 

Laurent KRAULAND  Identity Verified
France
Local time: 15:59
French to German
+ ...
Obviously... Dec 5, 2010

jyuan_us wrote:

Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files.


but a website does not appear ex nihilo somewhere on the Internet. Someone *must* be in possession of the original files.

It is like the plague some of us are dealing with when handling scanned PDFs - you'd be surprised how fast some clients manage to get the originals when you say that processing scanned PDFs comes at a surcharge of X%.

And how does one download Flash-generated content?


Direct link Reply with quote
 

Christina Paiva  Identity Verified
Brazil
Local time: 10:59
Portuguese to English
+ ...
PDF word count Dec 5, 2010

Hi langclinic!

Lots of suggestions on PDF word count here:

http://www.proz.com/forum/dtp_desktop_publishing/131071-tips_for_pdf_translation.html


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:59
Member (2006)
English to Afrikaans
+ ...
Three sets of tools Dec 5, 2010

langclinic wrote:
Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?


Yes, you need an "offline browser". I recommend Oleg Chernavin's Web Downloader 2.2 (google for webdown.exe and look on abandonware sites).

Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.


You can try Anycount:
http://www.anycount.com/download.html


Direct link Reply with quote
 

Emma Goldsmith  Identity Verified
Spain
Local time: 15:59
Member (2010)
Spanish to English
forum topic Dec 5, 2010

Have you read this thread:
http://www.proz.com/forum/software_applications/132076-software_used_to_extract_html_files_from_websites.html
It looks helpful.


Direct link Reply with quote
 

Joakim Braun  Identity Verified
Sweden
Local time: 15:59
German to Swedish
+ ...
Original files Dec 5, 2010

"Someone must be in possession of the original files".

Yes, but they may be server-side scripts querying databases and contain no actual HTML at all.

(That still doesn't make it the translator's problem, of course.)

Direct link Reply with quote
 

Joakim Braun  Identity Verified
Sweden
Local time: 15:59
German to Swedish
+ ...
Original files Dec 5, 2010

"Someone must be in possession of the original files".


Yes, but they may be server-side scripts querying databases and contain no actual HTML at all.

(That still doesn't make it the translator's problem, of course.)


Direct link Reply with quote
 

FarkasAndras
Local time: 15:59
English to Hungarian
+ ...
Some info Dec 5, 2010

As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside".
If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the command to download (mirror) a site is
wget -m -np -P outputfolder -p http://www.site/address.com
-m: mirror site, -np no parent folders, -P: specify name of output folder, -p: get page dependencies such as images

Word counts shouldn't be an issue with HTML. You should do HTML with a CAT anyway, and your CAT will give you a word count.

BTW both downloading and translating these files takes a fair bit of IT knowledge - I'm not sure I myself would take it on without the client's guidance and support.

[Edited at 2010-12-05 12:22 GMT]


Direct link Reply with quote
 

Jack Doughty  Identity Verified
United Kingdom
Local time: 14:59
Member (2000)
Russian to English
+ ...
Translator's Abacus Dec 5, 2010

Looked at "Anycount" and wondered if there was anything similar but free. Came across "Translator's Abacus" at http://www.globalrendering.com/download.html and downloaded it. I've tried it at it seems quite useful.

Direct link Reply with quote
 

ahmadwadan.com  Identity Verified
Kuwait
Local time: 16:59
English to Arabic
+ ...
Webreaper & Anycount Dec 5, 2010

langclinic wrote:

Hi friends,

Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner?


WebReaper 10.0 (Freeware)


Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.


Anycount


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:59
Member (2006)
English to Afrikaans
+ ...
Not free for us Dec 5, 2010

Ahmad Wadan wrote:
WebReaper 10.0 (Freeware)


Not free for us (unless you're a volunteer translator):
http://www.webreaper.net/licence.html


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translating a website: Tool for downloading hundreds of files and counting words

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search