how to create a trados TM using .pdf file.
Thread poster: Colin lee
Colin lee
Local time: 03:12
Chinese
Dec 8, 2008

Hello all

do you know how to create a trados TM using .pdf file?? now, i have two .pdf files: english source and chinese source. i want to use them to create a TM for pre-translation. any ideals will welcome.

thanks


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 21:12
Member (2003)
Polish to German
+ ...
Please, use the search function Dec 8, 2008

The topic about converting pdfs is popping up nearly every week again and again.
There is NO way to work directly with PDF files.
You will need to convert them to an editable format.

Thanks for your understanding
BR
Jerzy


Direct link Reply with quote
 
FarkasAndras
Local time: 21:12
English to Hungarian
+ ...
some tips Dec 8, 2008

Most pdf files allow you to select and copy text.
If yours does, do that and paste it into a word procesor, tidy it up and use your favourite aligner.

If it doesn't (say, the pdf is a scanned image) you can use win XP's little known built in OCR functionality. Vista probably has it too.
Take a screenshot, save it as .tiff and open with Microsoft Office Document Imaging. There you can just copy and paste into a word processor.
Works reasonably well on English, no idea if it even does Chinese.
Fancier OCR sw will certainly do a better job though.

If you don't know what aligning is or how to do it, do a search. Proz probably has an article on it.


Direct link Reply with quote
 

Ahmed Maher  Identity Verified
Local time: 21:12
English to Arabic
+ ...
Conversion software Dec 8, 2008

Hello,

I have experience with two of these programs, one of them is adobe acrobat pro. and Readiris Pro.
You need to try them to check if it work properly with Chinese, also you need to search for any local program that will work perfect with Chinese.

All the best,
Ahmed Maher


Direct link Reply with quote
 

Spiros Doikas  Identity Verified
Local time: 22:12
Member (2002)
English to Greek
+ ...
FineReader Dec 8, 2008

This is what I use.

Direct link Reply with quote
 
Colin lee
Local time: 03:12
Chinese
TOPIC STARTER
Thanks all here Dec 8, 2008

Actually, I can convert it to be word. but i don't want create a TM using win-align,because i will spend so much time for it.

Maybe the conversion is only the first step, but i don't know how can be related with trados tm.

Thanks


[修改时间: 2008-12-08 15:20 GMT]


Direct link Reply with quote
 

Jerzy Czopik  Identity Verified
Germany
Local time: 21:12
Member (2003)
Polish to German
+ ...
There is no way round alignment Dec 8, 2008

You have - as you stated - TWO documents.
The only automated process, beside printing the one out and overwriting the other using a TM system, to get the text from them in a TM is alignment.


Direct link Reply with quote
 

Anthony Baldwin  Identity Verified
United States
Local time: 15:12
Member (2006)
Portuguese to English
+ ...
bitext2tmx Dec 8, 2008

If it is possible to extract the text or convert the pdf to a txt file (Adobe Reader will do this, if it is not image files),
use Bitext2tmx to align and create an industry standard tmx. file.
Trados can import a tmx translation memory.


Direct link Reply with quote
 
FarkasAndras
Local time: 21:12
English to Hungarian
+ ...
alignment Dec 8, 2008

Colin lee wrote:

Actually, I can convert it to be word. but i don't want create a TM using win-align,because i will spend so much time for it.

Maybe the conversion is only the first step, but i don't know how can be related with trados tm.

Thanks


I can identify with not wanting to use winalign... I share that feeling. So I don't.
Hunalign is a great aligner, probably the best out there. Do a google site search here to see what I posted about it earlier.
Briefly: freeware, no GUI so its use is not exactly intuitive. Its party trick is the use of dictionaries, so for best results hunt down a dic in your language combination. Glossaries for the field in question plus a basic 50000-word dictionary should be more than enough. Spreadsheet/tab delimited format can be easily fed into hunalign. I don't know if it handles chinese but I think it uses utf-8 so it should.

Hunalign takes the line break as a segment delimiter (but of course it rearranges segment boundaries based on length and vocab). So you'll probably want a sentence boundary detector as well.

If the source texts match reasonably well and you do this cleverly, you could get 95%+ correct automatic matches. Then it's your call: use it as is, revise party, revise fully. One of these days I'll write up an article on alignment... Winalign is useless and people don't seem to know there is a better way.
There is a reason why megaprojects that align several million sentences use tools like Hunalign... they are far superior.

If your material is 10 pages and that is all you'll align in the foreseeable future, bite the bullet and use winalign. The learning curve to hunalign is a bit steep. If you need to do hundreds or thousands of pages, make the switch.
Also, it is very likely that you will suffer a lot because of how useless the pdf format is. line breaks everywhere, the order of paragraphs mixed up, whatever you can think of.

[Edited at 2008-12-08 19:23 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

how to create a trados TM using .pdf file.

Advanced search







LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search