Defining your own segmentation rules for Chinese source for CAT tools ?
Thread poster: PatentTrans

United States
Local time: 09:33
Chinese to English
Oct 31, 2013

Anyone tried defining your own segmentation rules for CAT software, Chinese being the source? I'm using punctuation marks to break up paragraphs and it's not bad. For one of my documents (patent) it showed about 1/3 - 1/2 of the segments as being unique. Is it possible to optimize this further? Of course if the segments are too short then I'll run into readability issues. Chinese grammar is kind of chaotic and I'm having a tough time finding a reliable pattern.


Lawrence Lam  Identity Verified
Local time: 22:33
English to Chinese
+ ...
be careful doing this Nov 2, 2013

You can define some new splitting rules, according to the 1/3 - 1/2 unique content in your patent document.

But changing the type of segmentation considerably changes the way a CAT tool works and, among other things, may also influence the alignment of translations, pre-translations, etc. You should avoid repeatedly changing the type of segmentation for a document format, because this will otherwise have a negative impact on the quality of the translation memory.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Defining your own segmentation rules for Chinese source for CAT tools ?

Advanced search

Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search