Mobile menu

Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?
Thread poster: Prof Projex

Prof Projex  Identity Verified
China
Local time: 04:06
English to Chinese
+ ...
Aug 4, 2008

Do you extract a concordance, glossary, or terminology list before big projects? If so, whats a good tool that can accurately extract from Chinese source documents?

For example, a 100,000 word Chinese document, to be translated to English. A good prep method which is easy enough for English, is to run it through MultiTerm (MT) Extract (or PhraseFinder for Euro languages), and it will extract the terminology and make a monolingual glossary from it. Then, basically translate these words ahead of time, create a term base, and translating goes faster when you start the project because you don't have to look up new terminology.

But what about with Chinese? MT Extract gives all sorts of crazy results, adding comma's, parenthesis, numbers, dashes, and full sentences into it's extracted finding - thus making the results useless. AntConc I heard was a good tool, but it only supports txt's, and there's over 30 different asian unicode options, and none of them seem to work correctly (in my experience). Do you extract terminology from a Chinese source before hand, and if so, how? Thanks for sharing!


Direct link Reply with quote
 

Westbank Huang
China
Local time: 04:06
English to Chinese
You can create a stoplist_zh for MT Extract Aug 7, 2008

I think a stoplist should be developed for the tool.
Then type something that shall be excluded and save it as a txt file with UNICODE.

















——
……

%
1
2
3
4
5
6
7
9
0


















[not_start]



Direct link Reply with quote
 

Prof Projex  Identity Verified
China
Local time: 04:06
English to Chinese
+ ...
TOPIC STARTER
多谢! Aug 8, 2008

Ah, wow, that's a great idea. Haven't tried that yet. I'll give it a shot. Thanks so much. I just assumed no one did this, or at least no one was willing to share their tricks of the trade. Hehe. Thank you Westbank Huang

Direct link Reply with quote
 
eng2chi  Identity Verified
United Kingdom
Local time: 04:06
English to Chinese
+ ...
it is about word segmentation of Chinese Aug 18, 2008

Thank Prof Projex for raising such a question, from which I know there is a multiterm extract for doing English terminology extraction. I should try it next time.

For Chinese, I have sufficient reasons to say Trados should not be capable enough of segmenting and extracting Chinese phrases satisfactorily.

I read some comments on multiterm extract simply taking each Chinese character as a word, that in fact reminds me the unsolved CHINESE WORD SEGMENTATION problem. Since Chinese text usually has no separate, while a Chinese word often comprises two to five characters, how to automatically segment a sentence into proper words has been a hard problem. As I know, many top research teams in Mainland are still working on it. Fortunately, currently the state of the art techniques already has a precision above 90%, a quite good score for industrial requirements and applications. I guess when doing terminology extract, such a precision is enough too. However, I think SDL trados hasn't noticed this issue, otherwise they can buy some patented techniques on the market for doing Chinese segmentation.

I have some scripts for doing such kind of segmentation and also frequency counting, but I use it for other purposes. For translation, it seems that I haven't met a large scale project as you have, so it is not necessary for me to conduct automatic term extract beforehand. If you really need such a function, you can ask for some software houses to implement it as a software.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Do you extract concordance/glossary/terminology list before projects? Whats a good tool for Chinese?

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs