Differences between versions of Lucene tokenizers
Thread poster: CafeTran Training

CafeTran Training
Netherlands
Local time: 09:25
Jul 10, 2016

When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 09:25
Member (2007)
English to French
+ ...
3.0 provides better stemming Jul 10, 2016

CafeTran Training wrote:
When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.

According to translators translating from German, 3.0 uses a better stemming algorithm compared with 3.1 and latter.

You can read a thread on what started the need to configure the behaviour here:
https://groups.yahoo.com/neo/groups/OmegaT/conversations/topics/28375

In OmegaT 4.0, selecting the behaviour won't be necessary. All the tokenizers perform correctly, except German for which we found a way of replicating tokenizer 3.0 behaviour.

Didier


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Differences between versions of Lucene tokenizers

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search