Defining your own segmentation rules for Chinese source for CAT tools ?
Thread poster: PatentTrans
United States
Local time: 08:59
Chinese to English
Oct 31, 2013

Anyone tried defining your own segmentation rules for CAT software, Chinese being the source? I'm using punctuation marks to break up paragraphs and it's not bad. For one of my documents (patent) it showed about 1/3 - 1/2 of the segments as being unique. Is it possible to optimize this further? Of course if the segments are too short then I'll run into readability issues. Chinese grammar is kind of chaotic and I'm having a tough time finding a reliable pattern.

Direct link Reply with quote

Frank Lin  Identity Verified
Local time: 21:59
English to Chinese
+ ...
be careful doing this Nov 2, 2013

You can define some new splitting rules, according to the 1/3 - 1/2 unique content in your patent document.

But changing the type of segmentation considerably changes the way a CAT tool works and, among other things, may also influence the alignment of translations, pre-translations, etc. You should avoid repeatedly changing the type of segmentation for a document format, because this will otherwise have a negative impact on the quality of the translation memory.

Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Defining your own segmentation rules for Chinese source for CAT tools ?

Advanced search
You’re a freelance translator? helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search