Fuzzy word match in concordance search
Thread poster: TrM Hungarian Translations

TrM Hungarian Translations
Hungary
Local time: 14:45
Member (2007)
English to Hungarian
+ ...
Sep 19, 2008

Hello,

I am new to Across, and I can't figure out how to perform a fuzzy search in the concordance... i.e. searching for the term "concordance" should also find the entries containing "concordances" and vice versa. Trados does this in its concordance search function. Rigth now all I have in my across is precise match.

I saw that the across termbase system (crossterm) does allow for wildcards (*), and I'd need something similar for the concordance search.

Version: Personal Edition 4.00 SP1c_EN Package version 4

Thanks for the help.

Istvan FULOP
TRM Translations
Hungary


Direct link Reply with quote
 

Katarzyna Slowikova  Identity Verified
Germany
Local time: 14:45
Polish to Czech
+ ...
Same here! Jun 20, 2014

I have the very same problem in the latest version of Across Language Server. I have been using it for a year or so now and it's been there from the beginning (the updates are downloaded automatically).
Only 100% matches are ever found, which makes concordance very difficult to use in flexive languages.
To be sure, I have the percentage set to the lowest value, 50%.
Is this a normal performance or what?
I really hope to get an answer here!
Katarzyna


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 14:45
English to Czech
Old manual on fuzzy terminology recognition Jun 20, 2014

TrM Hungarian Translations wrote:

I saw that the across termbase system (crossterm) does allow for wildcards (*)


I am Wordfast trainer and I hope the across feature is similar to WFC.
The old manual can be useful.

My first question: who enter terminology into glossary?
My second question: are in glossary at word ending asterisks = *
My advice: put asteriks into termbase.

Here is more info:

Wordfast Classic manual (2004, by Yves Champollion) on propagate, asterisks (wildcard), fuzzy terminology recognition, stripping and stemming:


PropagateWhole

If arecognised single term ends with a wildcard, the whole word is replaced, rather than just its root. Thus, if the glossary has affect* = affecter and the source text has affection, the final result will be affecter rather than affection.

Terminology format

Terms can use upper and/or lower case. Avoid unnecessary characters like brackets, quotes, slashes, dashes etc unless absolutely necessary. The * wildcard can be used at the end of a term, if different forms of a term are possible (this is called MFTR and described below). Here is a sample english-french glossary:

Maintenance*
Entretien*

Interview*
Entrevue*

minimum wage*
salaire* minim*

Do not place the * wildcard less than four characters from the beginning of an entry. So, pa* the bill* is not valid; use three entries like pay the bill*, pays the bill* and payed the bill*.

During a translation session, press Shift+Ctrl+G to load glossaries into a toolbar drop-down list for better visibility. Outside sessions, use Ctrl+Alt+Left/Right to display/hide the glossary lists. Note that glossaries of more than 5,000 entries, or more than 200 Kbytes, cannot be loaded into a toolbar drop-down list. But when looking up terms, Wordfast will load the term, plus 50 terms before and after the found term, for reference. These large glossaries can nevertheless be used for all other operations: QC, terminology recognition, etc. They are fully opened and editable using the glossary editor (the icon after the glossary drop-down list).

This is where AFTR really helps, and yields best results. Once the job is completed, and you have a spare hour, you may consider integrating client terminology into one of your existing glossaries, and manually add asterisks like:

two-way multiplexed autoresponder*

double furnace boiler*

dichotomic search*

DOS-based application*

This way, your homegrown glossary runs on MFTR rather than AFTR.

Two PB (Pandora's Box) commands can be used to fine-tune AFTR: GloStemmingRule and GloStems.

The essence of AFTR is to determine what is a word's stem by gradually stripping letters from the word's end. Note that we deal here with statistics - there are exceptions to this rule, and every language has its requirements. The verb go, for example, will change into went in the past tense, thereby defeating any AFTR attempt. By chance, client terminology is primarily made of technical words and expressions, where nouns outnumber verbs by a clear margin, thereby minimizing the problem of verbs and their changing roots.


Milan


Direct link Reply with quote
 

Katarzyna Slowikova  Identity Verified
Germany
Local time: 14:45
Polish to Czech
+ ...
Asterisks do not work for the concordance search Jun 23, 2014

or more precisely, they're ignored: "kužel*" will find only occurrences with "kužel", as if the asterisk was not there.

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Fuzzy word match in concordance search

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums