Finding and translating all subdirectories in an HTML file
Thread poster: Paul Lambert

Paul Lambert  Identity Verified
Sweden
Local time: 03:53
Member (2006)
Swedish to English
+ ...
Sep 25

I suspect this will be an obvious question to you younger tech-savvy types out there.
Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rat
... See more
I suspect this will be an obvious question to you younger tech-savvy types out there.
Lately, I have been getting plenty of jobs in translating websites. Often, the client does not provide me with a nice set of folders containing all the html text, but rather just a link to the web page he wants translated. Now, if it is just the one page that needs translating, it is simple enough to go to "view page source" source and get the html. However, I have now been asked to translate all of a rather elaborate site containing many pages, including pages with links that map to pages with other links to pages with other links etc. I could use brute force and map out each page and gather the "view page source" for each page individually, but that would be painstaking and prone to me missing something. I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc.

Any ideas?
Collapse


 

Thomas T. Frost  Identity Verified
Member (2014)
Danish to English
+ ...
Expression Web Sep 25

You could use Microsoft Expression Web 4 (the successor to FrontPage) to import the site, as described at https://www.expression-web-tutorials.com/import-site-wizard.html .

Expression Web is now free to download: See more
You could use Microsoft Expression Web 4 (the successor to FrontPage) to import the site, as described at https://www.expression-web-tutorials.com/import-site-wizard.html .

Expression Web is now free to download: https://answers.microsoft.com/en-us/windows/forum/all/microsoft-expression-web-4-download/e6a4eba5-2d7e-4eed-8fab-c945a83215c4 .
Collapse


 

Paul Lambert  Identity Verified
Sweden
Local time: 03:53
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Thanks Sep 25

Thanks, Thomas. I will check it out right now.

 

Thomas T. Frost  Identity Verified
Member (2014)
Danish to English
+ ...
PS Sep 25

It's old software, but it still works, also on Windows 10.

Be sure to confirm with the client exactly which files with how many words you intend to translate. If they use advanced techniques such as SQL, Expression Web may not find them all.


Paul Lambert
 

Paul Lambert  Identity Verified
Sweden
Local time: 03:53
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Worked like a charm Sep 25

Thanks again. What great advice. This software is excellent.

And yes, I will confirm on Monday that everything is included. This is an enormous task. No point missing anything.

Have a great weekend.


 

Thomas T. Frost  Identity Verified
Member (2014)
Danish to English
+ ...
Glad it worked Sep 25

Thanks, you too.

 

Sheila Wilson  Identity Verified
Spain
Local time: 02:53
Member (2007)
English
+ ...
My experience has been 100% negative Sep 25

The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted ... See more
The first couple of times I tried to gather all the text to work on, the client complained that I'd missed some and I had to do a rush job -- unpaid -- to complete it to their satisfaction. So then I insisted that the client (a communications agency) select the text. They grumbled but came up with it. A while after delivery, they came back with a hyper-urgent request for more text to be worked on. This time they'd missed it, and this time they had to pay my rush rate! I've since always insisted on receiving the text in Word or Excel files.Collapse


Endre Both
 

Samuel Murray  Identity Verified
Netherlands
Local time: 03:53
Member (2006)
English to Afrikaans
+ ...
A web site ripper, I imagine Sep 25

Paul Lambert wrote:
I must believe there is an easy way to go about it. So for instance if I went to a site called http://paulspage.com, I need to get all the page source for that page and all the subpages and the subpages of the subpages etc etc.


Yes, there are such utilities (web site rippers, strippers, or sometimes "offline browsers"), and 10-20 years ago when the web was younger, they were fairly reliable tools. However, web sites are no longer simple and web servers are no longer all the same, so many of these web site ripper programs no longer work as expected or promised.

A well-known free one is HTtrack, but I've never had good results with it. I've had reasonable results with VWget for ripping large archives (say, 10 000 HTML files in nested subfolders), but it's not easy to use (I've had most success with the commandline version).

See also my post here where I recommend Web Downloader 2.2, which you can still find on some download sites if you look really hard. I just tried it again, and it still works for simple sites. I've uploaded it here for 7 days.

[Edited at 2020-09-25 17:29 GMT]


 

Endre Both  Identity Verified
Germany
Local time: 03:53
Member (2002)
English to German
Have the client send you the source files Sep 25

Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this.

So you need to get them to send you all source files.
For static websites, this is a matter of copying all files from an FTP server.
For dynamic websites, they have to export the strings from the dat
... See more
Approaching it from the public (Internet) side of things as web rippers do is absolutely the wrong way to go. Your client has access to all the source files (unless they want to translate a third party's site without their knowledge), even if they may not be aware of this.

So you need to get them to send you all source files.
For static websites, this is a matter of copying all files from an FTP server.
For dynamic websites, they have to export the strings from the database that is used to dynamically generate the site.

None of this is your business – you need to insist on being provided with all relevant files without ripping them from a website. As Sheila says, this also puts the onus on them to catch all content.

When you have got all files, you need to check what types they are and how to best translate them.
Collapse


Adrien Esparron
Recep Kurt
Sara Massons
 

Paul Lambert  Identity Verified
Sweden
Local time: 03:53
Member (2006)
Swedish to English
+ ...
TOPIC STARTER
Thanks. Forget the answer I just erased. Sep 25

I just seemed like a jerk. I meant to say, thank you.

So, yes, thank you. Indeed, I will try to get the HTML files in question from the client, and if that does not work, then as a second resort I will use what I got from the software discussed above.

Take care!

[Edited at 2020-09-26 18:31 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Finding and translating all subdirectories in an HTML file

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search