Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?
Thread poster: Prof Projex

Prof Projex  Identity Verified
Local time: 13:30
English to Chinese
+ ...
Aug 4, 2008

Do you extract a concordance, glossary, or terminology list before big projects? If so, whats a good tool that can accurately extract from Chinese source documents?

For example, a 100,000 word Chinese document, to be translated to English. A good prep method which is easy enough for English, is to run it through MultiTerm (MT) Extract (or PhraseFinder for Euro languages), and it will extract the terminology and make a monolingual glossary from it. Then, basically translate these words ahead of time, create a term base, and translating goes faster when you start the project because you don't have to look up new terminology.

But what about with Chinese? MT Extract gives all sorts of crazy results, adding comma's, parenthesis, numbers, dashes, and full sentences into it's extracted finding - thus making the results useless. AntConc I heard was a good tool, but it only supports txt's, and there's over 30 different asian unicode options, and none of them seem to work correctly (in my experience). Do you extract terminology from a Chinese source before hand, and if so, how? Thanks for sharing!


Westbank Huang
Local time: 13:30
English to Chinese
You can create a stoplist_zh for MT Extract Aug 7, 2008

I think a stoplist should be developed for the tool.
Then type something that shall be excluded and save it as a txt file with UNICODE.





Prof Projex  Identity Verified
Local time: 13:30
English to Chinese
+ ...
多谢! Aug 8, 2008

Ah, wow, that's a great idea. Haven't tried that yet. I'll give it a shot. Thanks so much. I just assumed no one did this, or at least no one was willing to share their tricks of the trade. Hehe. Thank you Westbank Huangicon_smile.gif


eng2chi  Identity Verified
United Kingdom
Local time: 13:30
English to Chinese
+ ...
it is about word segmentation of Chinese Aug 18, 2008

Thank Prof Projex for raising such a question, from which I know there is a multiterm extract for doing English terminology extraction. I should try it next time.

For Chinese, I have sufficient reasons to say Trados should not be capable enough of segmenting and extracting Chinese phrases satisfactorily.

I read some comments on multiterm extract simply taking each Chinese character as a word, that in fact reminds me the unsolved CHINESE WORD SEGMENTATION problem. Since Chinese text usually has no separate, while a Chinese word often comprises two to five characters, how to automatically segment a sentence into proper words has been a hard problem. As I know, many top research teams in Mainland are still working on it. Fortunately, currently the state of the art techniques already has a precision above 90%, a quite good score for industrial requirements and applications. I guess when doing terminology extract, such a precision is enough too. However, I think SDL trados hasn't noticed this issue, otherwise they can buy some patented techniques on the market for doing Chinese segmentation.

I have some scripts for doing such kind of segmentation and also frequency counting, but I use it for other purposes. For translation, it seems that I haven't met a large scale project as you have, so it is not necessary for me to conduct automatic term extract beforehand. If you really need such a function, you can ask for some software houses to implement it as a software.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?

Advanced search

Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for users! Save over 13% when purchasing Wordfast Pro through Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search