Mobile menu

Looking for a tool
Thread poster: xxxBrandis
xxxBrandis
Local time: 07:26
English to German
+ ...
Sep 23, 2004

Hi all! I am searching for a tool, using which complete website (source) content can be extracted, format is ofcourse .html. Here I have various websites, automobiles, medical, etc., I thought a tool like this would be wonderful, especially to go about pre-planned TMs and develope the target content in course of time. I shall appreciate all help
Regards,
Brandis


Direct link Reply with quote
 

Judy Rojas  Identity Verified
Chile
Local time: 01:26
Spanish to English
+ ...
Try Webreaper Sep 23, 2004

Hi:
Try webreaper. You can download it at http://www.webreaper.net/download.html
Regards,
Ricardo


Direct link Reply with quote
 
xxxBrandis
Local time: 07:26
English to German
+ ...
TOPIC STARTER
I know webreaper Sep 23, 2004

Ricardo Martinez de la Torre wrote:

Hi:
Try webreaper. You can download it at http://www.webreaper.net/download.html
Regards,
Ricardo
Hi I know this tool already. I am using others, but what I am searching for is a tool for source terminology extract function from multiple webpages pertaining to one topic or product, with a view to build professional TMs.But thank you.A closer description is Trados Tageditor, where one could extract terminology from multiple bi-lingual files, i am in search of something similar, only as a separate tool.
brandis

[Edited at 2004-09-23 01:13]


Direct link Reply with quote
 

Luciano Monteiro  Identity Verified
Brazil
Local time: 02:26
English to Portuguese
+ ...
Fusion Sep 23, 2004

Hello Brandis

You might like to try Fusion. It has a terminology feature that I think would suit your needs.

Best regards,

Luciano Monteiro


Direct link Reply with quote
 
xxxMarc P  Identity Verified
Local time: 07:26
German to English
+ ...
Website retrieval and translation Sep 23, 2004

Here's one way of doing it:

First, retrieve the web site with wget. For example, if you want to retrieve the OmegaT web site at www.omegat.org/omegat/omegat.html, you enter:

wget http://www.omegat.org/omegat/omegat.html -r -p

on the command line. The -r option causes folders to be saved recursively (i.e. sub-folders will be saved), the -p option causes any files needed for complete display of the pages to be saved.

Then you create a new project in OmegaT and place all the files you have downloaded in the /source folder of that project exactly as you downloaded them, i.e. with the same folder structure. (You can of course create the empty project first, then on the command line, switch to the /source folder, and then download the web site into it directly.) When you have finished translating the html files in OmegaT, compiling the project in OmegaT will reproduce the structure with the translated files in the /target folder.

Get wget from:

http://wget.sunsite.dk/

and OmegaT (latest version 1.4.3 is just out, September 2004) from:

http://sourceforge.net/projects/omegat

wget and OmegaT both run on both Linux and Windows.

Marc


Direct link Reply with quote
 
xxxBrandis
Local time: 07:26
English to German
+ ...
TOPIC STARTER
Thank you Sep 23, 2004

Luciano Monteiro wrote:

Hello Brandis

You might like to try Fusion. It has a terminology feature that I think would suit your needs.

Best regards,

Luciano Monteiro
But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is.
Rgds,
Brandis


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 07:26
Member (2005)
English to Polish
+ ...
Try SDLX Sep 23, 2004

But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is.
Rgds,
Brandis


SDLX can do web formats, html, and html like files (this week I was translating chunks of html files {incomplete html code} which its web formats filter accepted happily), and many other formats, including XML and SGML, as well as RC and some programming languages files.

It will not download a web site for you but other than that it can handle translation of tagged files pretty well.

An until Sept. 30 it is available at half price.

For more information go to http://www.sdl.com/intltransday

HTH

Piotr


Direct link Reply with quote
 
xxxBrandis
Local time: 07:26
English to German
+ ...
TOPIC STARTER
I have sdlx Sep 23, 2004

syntaxpb wrote:

But fusion doesn´t cover website localisation aspect directly, one would need further instrumentation to reproduce a target = source website, addtionally fusion limits term extraction to .doc files only. One could certainly convert .html files to .doc files, and process further, but the work involved is not feasible, if one does it industrially. For large docs or multiple documents, fusion in that sense is probably the best there is.
Rgds,
Brandis


SDLX can do web formats, html, and html like files (this week I was translating chunks of html files {incomplete html code} which its web formats filter accepted happily), and many other formats, including XML and SGML, as well as RC and some programming languages files.

It will not download a web site for you but other than that it can handle translation of tagged files pretty well.

An until Sept. 30 it is available at half price.

For more information go to http://www.sdl.com/intltransday

HTH

Piotr
I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis


Direct link Reply with quote
 

Piotr Bienkowski  Identity Verified
Poland
Local time: 07:26
Member (2005)
English to Polish
+ ...
Terminology lists? Sep 24, 2004

Brandis wrote:

Piotr
I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis[/quote]

Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered).

Piotr


Direct link Reply with quote
 
xxxBrandis
Local time: 07:26
English to German
+ ...
TOPIC STARTER
I do not mean that Sep 24, 2004

syntaxpb wrote:

Brandis wrote:

Piotr
I am probably not clear with regard to my posting. I was infact looking for a probable free/shareware tool only for the puspose of extracting one worded webcontent. If you know any,I shall be thankful for all help. Regards, brandis


Do you mean web sites that contain terminology lists from different areas? If yes, I don't think there is a universal tool for this specific task, because these lists can be in different formats, e.g. a html table, separate paragraphs and lists (ordered and unordered).

Piotr

[/quote]Hi! again a small correction. This could be any website. For example, Metal working websites, here you may find anywhere from 100 to a few thousand, all use some standard terminology in their product presentation or descriptions via web,if one could extract that type of content as to build monolongual glossary initially switch to target webs and compare, one would have a field specific glossary, I guess. It is that kind of a tool I am looking for. Sofar in case of fusion (doesn´t process .html files) we have a wonderful term extraction facility basing on the files fed to fusion, whereas other tools actually require you of doing the translation in order to generate a TM. My search is hence two-fold, term extraction (monolingual) using a functinality as in fusion, but extracting from websites. As in my case my outsourcer either indicates the website or sends me the website for local processing and I start with Trados, as I cannot process these sites directly in Fusion, despite it´s term extraction ability. Sometimes my outsourcer gives me a TM (5 - 10%) of the file prepared and fights over the price. Another point is also, that most of the webcontent is a global publication ( see kudoz , mostly you see webreferences), so the idea is, I guess it is obvious now.
Regards,
Brandis
Regards,
Brandis


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Looking for a tool

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs