Differences between versions of Lucene tokenizers
Thread poster: CafeTran Training

CafeTran Training
Netherlands
Local time: 09:32
Jul 10, 2016

When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.


 

Didier Briel  Identity Verified
France
Local time: 09:32
Member (2007)
English to French
+ ...
3.0 provides better stemming Jul 10, 2016

CafeTran Training wrote:
When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.

According to translators translating from German, 3.0 uses a better stemming algorithm compared with 3.1 and latter.

You can read a thread on what started the need to configure the behaviour here:
https://groups.yahoo.com/neo/groups/OmegaT/conversations/topics/28375

In OmegaT 4.0, selecting the behaviour won't be necessary. All the tokenizers perform correctly, except German for which we found a way of replicating tokenizer 3.0 behaviour.

Didier


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Differences between versions of Lucene tokenizers

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search