Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?
Thread poster: Prof Projex

Prof Projex  Identity Verified
China
Local time: 13:34
English to Chinese
+ ...
Aug 4, 2008

Do you extract a concordance, glossary, or terminology list before big projects? If so, whats a good tool that can accurately extract from Chinese source documents?

For example, a 100,000 word Chinese document, to be translated to English. A good prep method which is easy enough for English, is to run it through MultiTerm (MT) Extract (or PhraseFinder for Euro languages), and it will extract the terminology and make a monolingual glossary from it. Then, basically translate these words ahead of time, create a term base, and translating goes faster when you start the project because you don't have to look up new terminology.

But what about with Chinese? MT Extract gives all sorts of crazy results, adding comma's, parenthesis, numbers, dashes, and full sentences into it's extracted finding - thus making the results useless. AntConc I heard was a good tool, but it only supports txt's, and there's over 30 different asian unicode options, and none of them seem to work correctly (in my experience). Do you extract terminology from a Chinese source before hand, and if so, how? Thanks for sharing!


Direct link Reply with quote
 

Westbank Huang
China
Local time: 13:34
English to Chinese
You can create a stoplist_zh for MT Extract Aug 7, 2008

I think a stoplist should be developed for the tool.
Then type something that shall be excluded and save it as a txt file with UNICODE.

















——
……

%
1
2
3
4
5
6
7
9
0


















[not_start]



Direct link Reply with quote
 

Prof Projex  Identity Verified
China
Local time: 13:34
English to Chinese
+ ...
TOPIC STARTER
多谢! Aug 8, 2008

Ah, wow, that's a great idea. Haven't tried that yet. I'll give it a shot. Thanks so much. I just assumed no one did this, or at least no one was willing to share their tricks of the trade. Hehe. Thank you Westbank Huang

Direct link Reply with quote
 
eng2chi  Identity Verified
United Kingdom
Local time: 13:34
English to Chinese
+ ...
it is about word segmentation of Chinese Aug 18, 2008

Thank Prof Projex for raising such a question, from which I know there is a multiterm extract for doing English terminology extraction. I should try it next time.

For Chinese, I have sufficient reasons to say Trados should not be capable enough of segmenting and extracting Chinese phrases satisfactorily.

I read some comments on multiterm extract simply taking each Chinese character as a word, that in fact reminds me the unsolved CHINESE WORD SEGMENTATION problem. Since Chinese text usually has no separate, while a Chinese word often comprises two to five characters, how to automatically segment a sentence into proper words has been a hard problem. As I know, many top research teams in Mainland are still working on it. Fortunately, currently the state of the art techniques already has a precision above 90%, a quite good score for industrial requirements and applications. I guess when doing terminology extract, such a precision is enough too. However, I think SDL trados hasn't noticed this issue, otherwise they can buy some patented techniques on the market for doing Chinese segmentation.

I have some scripts for doing such kind of segmentation and also frequency counting, but I use it for other purposes. For translation, it seems that I haven't met a large scale project as you have, so it is not necessary for me to conduct automatic term extract beforehand. If you really need such a function, you can ask for some software houses to implement it as a software.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?

Advanced search






WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search