Mobile menu

Extract - who could loan it for a week or two?
Thread poster: Vito Smolej

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
Sep 11, 2006

I have amassed a large TM (cca 60.000 segments and a corresponding # of words) , which I would like to process for terms, creating eventually a vocabulary for all kinds of uses - strrealining expressions, hunting for no-gos etc.

Is it possible - technically speaking - to have somebody loan me his or her Extract software for a limited time, so that I could do the analysis? Sending in the material would be also possible, I just dont know if the person who would be ready to help (for money of course) , could do anything with the EN-slovenian language pair.

TiA

Vito


Direct link Reply with quote
 

Giles Watson  Identity Verified
Italy
Local time: 16:17
Italian to English
I'm afraid a loan is not possible... Sep 11, 2006

... because, unlike Workbench, Term Extract requires a unique licence generated for the hard disk on which the program is being used.

You could also look at Olifant:

http://www.translate.com/technology/tools/

It doesn't do term extraction as such but it is a useful tool for fine-tuning TMs.

And it's freeware ; -)

HTH

Giles


Direct link Reply with quote
 

Cecilia Falk  Identity Verified
Local time: 16:17
English to Swedish
Online Text Analyzer Sep 11, 2006

I sometimes use the following online utility to get a list of how many times words and phrases are repeated in a text.

"Free text analysis tool that provides information on the readability and complexity of a text, as well as statistics on word frequency and character count. Just paste the text into the box and click the button."

http://textalyser.net/

Cheers,
Cecilia


Direct link Reply with quote
 

Harry Bornemann  Identity Verified
Mexico
English to German
+ ...
DVX? Sep 11, 2006

It would take some time, but it is a systematic approach:

You could merge all TMs and TDBs of a project category into one, import the corresponding source files and use Menu/Lexicon/Build Lexicon... and resolve automatically with TMs and TDBs.

Of course, first you would have to play a little with this functionality to find the best settings (maybe even read the manual), and then your computer might be busy for hours or days, and in the end you would probably have to delete 90%, but at least you can edit the Lexicon like any file in DVX, and sort the terms of 1..N words length by frequency.

[Edited at 2006-09-11 20:56]


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
Olifant is a (YA) TM editor Sep 11, 2006

at least from my experience and from what they say it's supposed to do. What I need, however, is not TM handling (which I somehow, for all its deficiencies etc, delegate to TRADOS), it's TM crunching.

My opinion of free tools is - based on experience! is cautiously negative. If nothing else, they use Java to heat your room with CPU cycles and eventually you end up with something that still needs another filter operation/transformation to get you where you wanted to be in the first place.

end of rant

And thanks anyhow!

Vito


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
I use Word for this kind of exercises... Sep 11, 2006

for instance in the form of the (free) PlusTools. I may even have to go into this direction to get the barebone (or kickstart) vocabulary. This is good for catching spelling mistakes in batch mode (the Unix, K&R Style) for instance. But it is just a start - one needs to establish the vocabulary of unquestionables ("I", "the", etc in english) and this is such a drag that I do hope Extract can manage it...

Will keep posting on the subject if things start to happen...

Thank you!

Vito


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
...your computer might be busy for hours or days... Sep 11, 2006

which is absolutely one of the horrors I would like to avoid - I know myself well and have had my share of stupid code (written by myself or by others) that did nothing more than make CPU twiddle thumbs.

Direct link Reply with quote
 

Giles Watson  Identity Verified
Italy
Local time: 16:17
Italian to English
Hi again Vito and Harry Sep 11, 2006

Vito Smolej wrote:

at least from my experience and from what they say it's supposed to do. What I need, however, is not TM handling (which I somehow, for all its deficiencies etc, delegate to TRADOS), it's TM crunching.



I was going to - tentatively - suggest DVX, for which I also have a licence and which gives you a 30-day evaluation option with full capabilities (www.atril.com). As Harry so wisely suggests, it's a good idea to download and read the PDF manual first to see if it's going to be any use, and also so as not to waste your free month.

The DVX lexicon function is rather more limited than Term Extract, though, and would probably involve a few false starts before you got anything resembling what you wanted.

Term Extract is distinctly over-priced, and involves a learning curve, but if you've got the spare cash and can write the expense off against tax, you might want to think about biting the proverbial bullet.

FWIW

Giles


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
re DVX Sep 11, 2006

I happen to be in a posession of a full DVX license as well - were I born into Victorian times, my room would be full of rhino heads, turtle shells, tiger skins and other victims of a hunter&gatherer in me ("... yeah, that was a 6.5 Trados, a rather weak one as they go, just look at its antlers ...") -.

What I definitely do not need is one more barrage of learning curves, that would leave me where I am anyhow, but with an extra load of frustration. It is a pro bono project, iow I give my time for free, but then again, I do not need to overdo it - Oktoberfest is on as of this weekend, my grandchildren need kite flying lessons...

Regards

Vito

[Edited at 2006-09-11 20:43]


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 16:17
Member (2004)
English to Polish
Price vs efficiency Sep 11, 2006

I also wanted Term Extract badly, but one look at the price was enough for me to come up with another solution. While it is difficult to set up and requires supposedly somewhat more expertise in technical matters, it has one significant advantage - it is completely free.

The process is quite complex and involves several different tools. I will give the exact details if anyone is still interested.

The basic procedure is as follows:

1. Use a free tool for text analysis and extract most common phrases and words (excluding those which are unwanted).
For that purpose I have used extphr

http://instruct.uwo.ca/gplis/677/extphr32/extphr33.exe

which is quite customizable (which makes it somewhat difficult to use). At my suggestion a similar tool has been added to Okapi Framework, the package that includes Olifant; it is still work in progress, but it is functional:

http://okapi.sourceforge.net/Release/Utilities/Help/termextraction.htm

2. From that list you have to select only those which are needed. This is tedious, but Extract requires that stage too, it is really hard to automate.

Then you can manually put the list into Excel, for example, and add translations of terms using the concordance search. However, I managed to make that task somewhat simpler:

3. I have used a text replacement tool to add custom fields to TM, one which included the source terms from the list, and another one, empty, which would contain the translation of the term.

4. Load the modified TM into Olifant and sort it by the column which contains the custom field with source terms. This basically shows source terms in their context (source segment) as well as their translations (target segment). Then the translation needs to be copied from the target segment and possibly modified (case, word forms, etc.).

5. With the same text replacement tool I extract the custom fields from TM, saving them into a table which can be imported into MultiTerm.

I realise that the procedure is somewhat advanced, but after doing it once or twice it becomes quite efficient.

Yet another option might be to get a trial version of Fusion (if it is still available), its extraction module was quite nice (and I think it was not restricted in the trial version):

http://www.orcadev.com/


Direct link Reply with quote
 

Harry Bornemann  Identity Verified
Mexico
English to German
+ ...
...your computer might be busy for hours or days... Sep 11, 2006

Vito Smolej wrote:
...Oktoberfest is on as of this weekend, my grandchildren need kite flying lessons...

In your place I would tend to make sure that my computer will be busy for days - to obtain a pretext for more spare time..

PS: In case you need some more time, you can schedule a Defrag, complete Backup to external disk and a Virus check..

[Edited at 2006-09-11 22:49]


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
Harry - "I would tend to make sure..." Sep 12, 2006

Something of this sort nearly slipped in - in the sense I would have an excuse to buy a faster machine. But I'm not one of that GHz breed: I prefer smart software to brute force.

Re Virus check and Defrag, I prepared the clunker this way for my wife on several occasions but it didnt leave any impression - she kept saying "you take me for an idiot? Tell me what we cant do with what we got?"

[Edited at 2006-09-12 04:53]


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
Jabberwolk - this is smart Sep 12, 2006

and close to my way of thinking. Thanks J, will very probably follow the trail suggested.

Vito

[Edited at 2006-09-12 04:50]


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 10:17
English to French
+ ...
Jabberwolk and Vito Sep 12, 2006

It may be for different reasons - but you guys are just hilarious!



Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 16:17
Member (2004)
English to Slovenian
+ ...
TOPIC STARTER
The backdrop of the subject discussed Sep 12, 2006

I am involved in the slovenian OpenOffice team

http://sl.openoffice.org

and the subject of the discussion above is the complete Help environment (in OpenText format), in the works for the coming release of 2.1.x in autumn. It's about 35.000 segments (more than half of the new) and I dont even know for sure how many words, that need to get cleaned, streamlined, to be streamlined and cleaned again.

It is honorable and enjoyable pro bono experience. Here's a few flowers from this garden that's in dire need of some loving trimming:

serif - sherif

binomial distribution - binary distribution

the cell range of data - the range of a data cell

the inverse of a ... distribution - the inversion of ~

the result is the real coefficient of a complex number - the result is the true coefficient of a complex number

regular expression - common template

You may wonder at the experience of the translator/translators. Well, majority (but not all) of bloppers above is a product of a team project of students involved in the CAT course ("TRADOS? Whatsat?") at the philosophical faculty. So it's up to us, who happen to know a little more about complex numbers, regular expressions and such, to play the catcher in the rye.

Rye? Call it a tropical forest - if we dont halfway succeed I can already hear all the jeering and catcalling ("...what a bunch of dodos ... common template my a*s...") exploding a day or two after the release.

But then again, there's always a next release waiting behind the curtains...

[Edited at 2006-09-12 11:50]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extract - who could loan it for a week or two?

Advanced search


Translation news related to SDL Trados





TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs