Website wordcount
Thread poster: Paolo Dagonnier

Paolo Dagonnier  Identity Verified
Belgium
Local time: 17:29
Member (2017)
English to French
+ ...
Dec 4

Hi everyone,

A new client recently asked me how much I would charge them (per-word rate) for translating their website from Italian to French. In order to give them an accurate quote, I would need to know how many words their website has and I don't quite know how to do that. Plus, the website has many pages so going for the cut-and-paste option would take me too much time.

I've already worked on websites before, but I usually received HTML files which I would then upload in Trados to do the wordcount. I've seen that there are many requests similar to mine in the forum, but I wanted to know what methods translators are using in 2018 (a lot of the posts I found date back to 15 years ago).

I've downloaded HTTrack Website Copier and I'm currently running an analysis of the client's website. If there's any other easy, quick way to do it, I'd be more than happy to try it.

One last thing: my client's website (originally in Italian) has also been translated to English. How can I make sure that the wordcount is based on the Italian pages only (i.e. excluding the English version)?

Thanks a lot!


 

Vadim Kadyrov  Identity Verified
Ukraine
Local time: 18:29
Member (2011)
English to Russian
+ ...
The most favorable scenario is when you Dec 4

ask your client to pay you after the job is done. In this way, you can count the target text words and tell the client the price.

But this is an ideal case.

Or, your client can grant you ftp access to his website, or he can give you a copy of his website, and you can count words in these files, if you know what to count and what not to count (php code, html, etc.).

In all other situations, including the one you are in (i.e. when you download the files), someone has to pay for this preparation process. I think your client has to compensate you for this time you spend on counting words.

[Edited at 2018-12-04 13:39 GMT]


 

Thomas Pfann  Identity Verified
United Kingdom
Local time: 16:29
Member (2006)
English to German
+ ...
Ask for the source files Dec 4

Paolo Dagonnier wrote:

I've already worked on websites before, but I usually received HTML files which I would then upload in Trados to do the wordcount.


Why don't you ask your client for the source files and analyse those? The problem with the translator extracting content from a live website is also that the content might change all the time. In my opinion, it is much safer (and probably a lot easier for all involved) to request the content from the client rather than finding it yourself.

However, if they asked for a quote of your per-word rate (and if you are happy to provide a per-word rate) then you won't need to know the exact wordcount anyway (which is probably why the client didn't send the files). You'll need that in the next step, though, in order to determine how much the total cost will be and how long it will take you.


Vadim Kadyrov
Jorge Payan
 

Mikhail Popov
Singapore
Local time: 00:29
Member (2015)
English to Russian
+ ...
PO files Dec 5

Ask you client to give you you Italian "PO" files and you will be able to make estimation

 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:29
Member (2006)
English to Afrikaans
+ ...
@Paolo and @Mikhail Dec 5

Paolo Dagonnier wrote:
A new client recently asked me how much I would charge them (per-word rate) for translating their website from Italian to French.


A client who has a web site may not be aware of how the web site is built. It could be a static HTML web site or it could be a web site whose content (or some of it) is dynamically created. You won't necessarily be able to tell which is which by simply looking at file extensions and the source code. Using a web site ripper such as HTTrack will only be helpful to you if the site is coded in static HTML. It's best to ask the client's web designer.

If the client is not the web designer, then liaising with the client himself is always going to be risky. Even if you were to insist that the client give you a list of URLs for the pages that he wants you to translate, the client-supplied list may not be comprehensive, and this will lead to disappointment down the line when you don't translate everything that the client thought that you would translate.

Even liaising with the web designer isn't always straightforward. She might not be aware that translators are perfectly capable of translating HTML code (and other code) without breaking it. It is not unknown for web designers to tell translators to "visit each page and copy the text to MS Word". Also be aware that discussions with the web designer might be unpaid time for her.

I've already worked on websites before, but I usually received HTML files which I would then upload in Trados to do the wordcount. I've seen that there are many requests similar to mine in the forum, but I wanted to know what methods translators are using in 2018 (a lot of the posts I found date back to 15 years ago).


If anything, things have become more complicated in 2018 compared to 2003, 1997, etc. A lot more content on web sites is served dynamically, so the content you see in your browser may not be the entire content that is translatable. Web sites also make more extensive use of imported files (e.g. CSS, JavaScript) than 15 years ago, so even if you saved the HTML pages to your computer, those pages won't look the same when you open them directly instead of on the web site (in the distant past, web sites often used JavaScript mostly for functionality and not for content, so it is no longer always safe for translators to simply ignore the presence of any JavaScript... which is what many CAT tools do).

I've downloaded HTTrack Website Copier and I'm currently running an analysis of the client's website.


There are many similar tools, yes. Unless a tool like that is regularly updated, it will eventually be unable to download web sites successfully, because of the way the web has changed over the years. But even recently updated tools may still not be able to download all links.

In addition, it may be that not all of a site's pages are reachable from the home page, which is another reason why it's better if the client can give you a list of URLs for pages that need to be translator (or better yet, as everyone here have said, give you the actual files to translate).

One last thing: my client's website (originally in Italian) has also been translated to English. How can I make sure that the word count is based on the Italian pages only (i.e. excluding the English version)?


You're not going to like the answer: if the English and Italian are on separate pages, then simply delete all English pages after you've downloaded the pages (or don't download them, if they have a predictable naming pattern).

I know of no CAT tool that has the feature of performing language guessing on every segment during pre-processing, and then lock segments if they appear to be in a certain language (although such a feature should be fairly simple to implement in CAT tools that can lock segments based on certain criteria).

Mikhail Popov wrote:
Ask you client to give you you Italian "PO" files.


Very few programming languages used on web sites have support for PO (I assume you mean Gettext PO):
https://www.gnu.org/software/gettext/manual/html_node/Translators-for-other-Languages.html
Do you regularly get web sites to translate in PO format?



[Edited at 2018-12-05 07:43 GMT]


 

Mikhail Popov
Singapore
Local time: 00:29
Member (2015)
English to Russian
+ ...
Wordpress Dec 5

All Wordpress themes and plugins are based on PO files. That is 32.5% in CMS for websites all over the world (as per W3techs)
Drupal uses PO as well. +2%
Quite significant share, I guess.

But website translation is a complicated question, I would agree. Your have to translate theme, database, plus some separate custom strings. Nothing comparable to translation of a single Word file

[Редактировалось 2018-12-05 08:12 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:29
Member (2006)
English to Afrikaans
+ ...
Re: PO Dec 5

Mikhail Popov wrote:
All Wordpress themes and plugins are based on PO files. Drupal uses PO as well.


While it is true that the web site creators' templates may have language files that can be downloaded and translated separately in e.g. PO format, the bulk of what one would want to translate in a web site for a client is the *content*, and the content of Wordpress and Drupal sites can't be exported/imported in PO, as far as I know.

It would be fantastic if that were so, though. I recall helping a client translate their CMS-based web site and although I was initially quite excited because I assumed that you should be able to export all content from the CMS and then import it again, it turned out that the web designer didn't know how to set that up, and in the end I had to translate by copy/pasting text into the WYWISIG text editor for each page. I don't understand what's the problem with e.g. Wordpress etc., since all the content is in a database, why would it not be possible to export and import the darned stuff? But... it aint.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Website wordcount

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search