Please recommend CAT tool for translating patent PDF files from Chinese to English
Thread poster: PatentTrans
PatentTrans
United States
Local time: 21:36
Chinese to English
Oct 27, 2013

Hi all, I need to make a decision fast and appreciate your thoughts. Just received a contract to translate a number of patents from simplified Chinese to English and they are in PDF format. I have the capability of doing an OCR on these. Never used CAT before but given the number of deliverables I'm going have to use one. What in your opinion is the best TM software for my situation? The only language pair I need is Chinese/English, and I'm only translating patents, and if the CAT can handle scanned PDFs it's even better. Thanks so much in advance.

[Edited at 2013-10-27 19:55 GMT]


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 05:36
Finnish to French
Consider PDF and CAT separately Oct 27, 2013

Those CAT tools that include support for PDF have licensed the technology from third parties: for instance, SDL uses technology from Solid, Wordfast from BCL (Wordfast Pro) and ABBYY (Wordfast Anywhere), Déjà Vu from BCL (like Wordfast Pro), while memoQ uses a freebie converter.

You would be better off selecting a CAT tool on its own merits as a CAT tool, and picking up a separate tool (or, better, separate tools) for PDF conversion. There are many different types of PDF and many different converters as well. No single converter is better than all others with all types of PDF: converter A may be the best for PDF X, while converter B will be better for PDF Y. This is why having several converters at your disposal in your arsenal could be a good idea.

Regarding patents and Chinese, most CAT tools should be able to handle both. Choosing the most suitable tool is a matter of personal preferences, unless you're so eager to please your clients you will pick up the tool they "require"...


Direct link Reply with quote
 
PatentTrans
United States
Local time: 21:36
Chinese to English
TOPIC STARTER
Any CAT tools better suited for Chinese? Oct 27, 2013

Thanks. My client is not requiring me to use a CAT tool at all so this is for my own benefit. The output will be simple text. At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work. Also I've heard some CATs are not very good at segmenting Asian languages.

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 03:36
Member (2009)
Dutch to English
+ ...
agree with Dominique Oct 27, 2013

You said you are in a hurry, so here is my quick answer. I suggest that you get
(1) ABBYY FineReader to convert the PDFs to .docx, and
(2) CafeTran or memoQ to translate them with.

Michael


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 05:36
Finnish to French
Five inexpensive CAT tools Oct 27, 2013

PatentTrans wrote:
At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work.

About one year ago, I made a series of short videos about five free or inexpensive CAT tools:
OmegaT
MemSource
Wordfast Anywhere
CafeTran
MetaTexis

I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.

Just take a short sample Chinese text and see how each of them fares with it.

You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market.


Direct link Reply with quote
 

jyuan_us  Identity Verified
United States
Local time: 22:36
Member (2005)
English to Chinese
+ ...
What did you mean by that? Oct 27, 2013

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.


Segments too long? too short? Improper location of a sentence to segment with?

I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it.


Direct link Reply with quote
 

Phil Hand  Identity Verified
China
Local time: 10:36
Chinese to English
You can change segmenting rules Oct 28, 2013

I guess in any CAT tool you can edit segmenting rules to suit your needs. I use SDL, and I use the function quite regularly.
For patents, you need to make sure you've got a terminology manager working with your CAT, because maintaining consistent terminology will be useful. SDL is having compatibility issues with its terminology software MultiTerm right now, so it's probably not your best choice.
Don't know what you've already got in terms of OCR, but I use a little freebie called Hanwang, and it's OK.


Direct link Reply with quote
 
PatentTrans
United States
Local time: 21:36
Chinese to English
TOPIC STARTER
I'm giving OmegaT a try Oct 28, 2013

Dominique Pivard wrote:

PatentTrans wrote:
At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work.

About one year ago, I made a series of short videos about five free or inexpensive CAT tools:
OmegaT
MemSource
Wordfast Anywhere
CafeTran
MetaTexis

I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.

Just take a short sample Chinese text and see how each of them fares with it.

You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market.



Thanks. I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started.

Also thanks a lot to everyone else who replied to my request. I'm downloading Hanwang OCR right now and will give it a try in a little bit.


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 05:36
Finnish to French
OmegaT resources Oct 28, 2013

PatentTrans wrote:
I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started.

Glad to hear you found the video useful! The OmegaT community has a very active mailing list on Yahoo with lots of helpful people, much more active than the corresponding ProZ forum. I strongly recommend you subscribe to the list, should you need to ask further questions about the tool.


Direct link Reply with quote
 

Grzegorz Gryc  Identity Verified
Local time: 04:36
French to Polish
+ ...
Splitting segments in the patent stuff... Oct 28, 2013

jyuan_us wrote:

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.


Segments too long? too short? Improper location of a sentence to segment with?

I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it.


In fact, the automatic segmenting is not enough here, the patent jobs need a LOT of manual segmenting.
I.e. if you start to split segments according to their meaning and some repetitive patterns, you'll be able to leverage more chunks of text.
I doubt it can be easily automated in a sound way.

Cheers
GG


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 03:36
Member (2009)
Dutch to English
+ ...
following from what Grzegorz said... Oct 28, 2013

If you are going to use the CAT tool for patents, you might want to choose a tool in which you can easily split and join segments. Patents often contain very long sentences, and as Grzegorz said, you might want to split these into smaller chunks, either to increase leverage (get more hits from your translation memories), or, simply to make them easier to handle with your poor human brain.

Michael


Direct link Reply with quote
 

Heartsome Support
Local time: 10:36
A clean file will make your cat work Oct 29, 2013

As suggested, you have lots of choices on TM software. But when you translate a file converted from OCR, you may find there are lots of tags in your cat, this will slow your work down severely. In this case, using or not using a cat is no difference, So recommend you to clean your files first with Transtool or Codezapper. This will make your file cleaner, so you can reuse the TM effectively.

Direct link Reply with quote
 
PatentTrans
United States
Local time: 21:36
Chinese to English
TOPIC STARTER
Tags and segmenting Oct 29, 2013

I spent more time playing with OmegaT. It has a tag removal function which worked for my test document.

Regarding segmentation, for Chinese I am having a hard time coming up with good rules that can be expressed using regex, so I copied the Japanese rules and added a few, basically I'm segmenting by punctuations: comma, period, semi-colon, colon, question mark. It does not look too bad and I'll test a few more docs before committing to using it. So far the process has been smooth.

[Edited at 2013-10-29 21:57 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Please recommend CAT tool for translating patent PDF files from Chinese to English

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search