Differences between versions of Lucene tokenizers
Thread poster: CafeTran Training (X)

CafeTran Training (X)
Netherlands
Local time: 11:48
Jul 10, 2016

When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.


 

Didier Briel  Identity Verified
France
Local time: 11:48
Member (2007)
English to French
+ ...
3.0 provides better stemming Jul 10, 2016

CafeTran Training wrote:
When I create a new project in omegaT 3.6, it'll use version 3.0 of the Lucene tokenizer for the source language German and version 3.6 for the target language Dutch.

Since I see that I can also manually select version 3.6 of the Lucene tokenizer for the source language German, I'd like to learn what the differences are between the versions 3.0 and 3.6 of the Lucene tokenizer for German.

According to translators translating from German, 3.0 uses a better stemming algorithm compared with 3.1 and latter.

You can read a thread on what started the need to configure the behaviour here:
https://groups.yahoo.com/neo/groups/OmegaT/conversations/topics/28375

In OmegaT 4.0, selecting the behaviour won't be necessary. All the tokenizers perform correctly, except German for which we found a way of replicating tokenizer 3.0 behaviour.

Didier


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Differences between versions of Lucene tokenizers

Advanced search






SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search