Clean PDFs to TM solution
Thread poster: RafaelTiba

Local time: 14:52
English to Portuguese
+ ...
Nov 18, 2015


I have these technical docs in PDF. An English and a Portuguese version of each.
They are clean docs and I don't have the bilinguals.
I need a "TM" where I can input search strings for the source and get a quick access to the corresponding target. It does not have to be precise, like matching segment by segment, but paragraph by paragraph is ok.
Is there any easy way to get this?

I hope I was clear enough hehe.



Kevin Dias
Local time: 01:52
One suggestion Nov 18, 2015

Hi Rafael,

There are probably hundreds of different ways you could go about this - so I'm sure other translators might have some other great suggestions. One way would be to use TM-Town. You can load your source and target document and the text will be extracted (it has OCR as well if your PDF is an image). You can then easily align the two documents (see this video) and then your aligned segments can be searched from your browser, our CAT tool extension with CafeTran, or you could export the new translation memory (as a TMX, XLIFF, XLS, or CSV file) and use it in your CAT tool of choice.

Full disclosure I am a developer at TM-Town so I'm bias...but your situation is exactly one that we have worked on building tools for translators for icon_smile.gif


Edit: Here is a blog post explaining the process with screenshots.

[Edited at 2015-11-18 21:42 GMT]


Agnes Lenkey  Identity Verified
German to Spanish
+ ...
Depending on the CAT-tool you use_memoQ Nov 19, 2015

Hi Rafael,

In my memoQ I would convert the PDF-files first into Word (I do it with ABBYY PDF-converter), fix the outcome a little bit so both documents "look alike", go to the project I opened in memoQ and run the aligner from the LiveDocs section. After checking the automatically aligned segments I would confirm all segments and export to my Translation Memory. The most annoying part is converting the PDF-documents into Word-documents, the rest is very fast.

I suppose it depends on the tool you use, as Kevin says, the important goal is to obtain those TM-entries you can import and export afterwards.

Hope I could give you an idea about how this can be done in memoQ, best regards,



Reed James
Local time: 13:52
Member (2005)
Spanish to English
+ ...
Logiterm Nov 26, 2015

As long as the PDF is editable, with Logiterm you can align the text to produce a Bitext. Then you can save the BT as a TM through Logiterm. If the alignment is not exact, you can massage the segments yourself until you are satisfied. No PDF to Word conversion here.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Clean PDFs to TM solution

Advanced search

CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search