Looking for The Best Hunspell dictionary for British English
Thread poster: Michael Beijer

Michael Beijer  Identity Verified
United Kingdom
Local time: 07:48
Member (2009)
Dutch to English
+ ...
Dec 7, 2013

Hello everyone,

I am looking for the best Hunspell dictionary for British English. I have come across many of them, but there seems to be something different wrong with each of them.

The LibreOffice one, for example, is way too big, and contains tons of absolute garbage. It’s around 6 MB, instead of the 560 KB of the original OpenOffice one. In order to tackle the curly quotes problem, they have included all kinds of words such as ‘isn’, and ‘wouldn’ (to make words like ‘isn’t’ and ‘wouldn’t’ work).

Here are some others:

• OpenOffice (the original one)
• Apache OpenOffice
• Aspell
• and a number of alternative versions

I have as a ‘en_GB.dic’ and a ‘en_GB.aff’ file that I more or less like (because they flag all -ize words as incorrect), and don’t contains lots of garbage words like the LibreOffice lists, but they seem to be missing tons of words. This means that I constantly have to add these words to my user.dic file when translating in CafeTran (my CAT tool). For example, a few words it just underlined as incorrect because they are not in the ‘en_GB.dic’ file: every, acts, years, individuals, etc.

Any suggestions or tips would be very welcome!

Michael


 

John Holland  Identity Verified
France
Local time: 08:48
Member (2012)
French to English
Not sure which is best... Dec 7, 2013

Here is a list of different hunspell/aspell English dictionaries, including several for En_GB:
http://misc.aspell.net/wiki/English_Dictionaries

Maybe there is something of interest there.


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 07:48
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
there are many different versions of the British Hunspell dictionary floating around Dec 7, 2013

Hi John,

Yes, I went through the list at http://misc.aspell.net/wiki/English_Dictionaries
and am currently trying to make sense of all the different versions floating around online.

Hunspell-dictionaries.png


I ready something about the SCOWL list possibly being the best one and so emailed its author, Kevin Atkinson, who promptly sent me a British English version generated from the SCOWL database. He sent me two versions, one with -ise, and one with -ize. I am testing it at the moment.

To avoid confusion with the official version, the British Hunspell dictionary is not included on the web page above but can be generated from SCOWL.

Since SCOWL was forked in David Bartlett version, it has undergone numerous corrections and should now be fairly accurate. David Bartlett's en_GB dictionary includes both -ize and -ise forms, while with SCOWL it is possible to get one with just the -ise forms, it is also possible to have the option of including common variants or leaving them out to promote consistent spelling. Furthermore, SCOWL is also a bit more up-to-date and includes words such as "blog" and "Google" (which where not in R 1.18). However, the SCOWL version still likely includes some American only words and misses some British only words, and it includes some words such as "alright" which David Bartlett made a point to leave out. Also David Bartlett version includes hyphenated words while SCOWL does not yet.

Since the official Aspell English dictionaries are generated from SCOWL, this is the version found in the official English dictionary for Aspell (which also includes American and Canadian options).
(http://misc.aspell.net/wiki/English_Dictionaries )
I am rather puzzled by the LibreOffice list, which has 674,039 entries, while all the other lists have only around 50,000.

It’s interesting to see that they are all quite different. Some of them, for example, lack new words, like 'blog', etc. LibreOffice, however, has:

blog
blog's
blogged
blogger
blogger's
bloggers
blogging
blogs


Kevin's en-GB -ise list has:

blog/SM
blogged
blogger/MS
blogging


The ‘af­fix files’ (.aff) are also very different in each set. Quite a puzzle altogether!

Michael





[Edited at 2013-12-08 01:23 GMT]

[Edited at 2013-12-08 11:19 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Looking for The Best Hunspell dictionary for British English

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search