ProZ.com global directory of translation services
 The translation workplace
Ideas

 
User
Thread poster: Deborah Kolosova
snowball vs. lucene tokenizers

Deborah Kolosova  Identity Verified
United States
Local time: 13:03
Member (2010)
Russian to English
+ ...
Nov 3, 2011

The instructions on the OmegaT site for installing tokenizers say you should select the appropriate tokenizer from the list. For my source language, Russian, there are two listed: the SnowballRussianTokenizer and the LuceneRussianTokenizer. What is the difference, and which one is the best to use? Or do they each have their own advantages?

Direct link Reply with quote
 

Susan Welsh  Identity Verified
United States
Local time: 16:03
Member (2008)
German to English
+ ...
I use lucene Nov 3, 2011

My recollection of past discussions is that lucene has a "stop word" function that snowball does not (meaning it ignores little irrelevant words like "and" and "the" when matching segments). Someone will probably correct me if I'm wrong. You can try them both and see what you like.

I translate from Russian, and lucene works great for me.

Susan


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Emanuela Galdelli[Call to this topic]

You can also contact site staff by submitting a support request »

snowball vs. lucene tokenizers






Fluency Translation Suite 2011
Translate Up To 50% Faster with Fluency

Start and finish your translations faster than ever with Fluency Translation Suite 2011. TMs, Terminology, and Online Resources are all fully integrated and only a click away. Download a free trial today!

More info »
XTM Cloud
20,000 extra words free with XTM Cloud!

A fully featured online CAT tool and TMS, with no installation required, and a simple, intuitive interface. Maximize linguistic assets by sharing in real time as you collaborate with colleagues. Make use of next generation, cloud-based translation technol

More info »