Clean PDFs to TM solution
Thread poster: RafaelTiba

Local time: 01:18
English to Portuguese
+ ...
Nov 18, 2015


I have these technical docs in PDF. An English and a Portuguese version of each.
They are clean docs and I don't have the bilinguals.
I need a "TM" where I can input search strings for the source and get a quick access to the corresponding target. It does not have to be precise, like matching segment by segment, but paragraph by paragraph is ok.
Is there any easy way to get this?

I hope I was clear enough hehe.



Local time: 13:18
One suggestion Nov 18, 2015

Hi Rafael,

There are probably hundreds of different ways you could go about this - so I'm sure other translators might have some other great suggestions. One way would be to use TM-Town. You can load your source and target document and the text will be extracted (it has OCR as well if your PDF is an image). You can then easily align the two documents (see this video) and then your aligned segments can be searched from your browser, our CAT tool extension with CafeTran, or you could export the new translation memory (as a TMX, XLIFF, XLS, or CSV file) and use it in your CAT tool of choice.

Full disclosure I am a developer at TM-Town so I'm bias...but your situation is exactly one that we have worked on building tools for translators for icon_smile.gif


Edit: Here is a blog post explaining the process with screenshots.

[Edited at 2015-11-18 21:42 GMT]


Agnes Lenkey  Identity Verified
German to Spanish
+ ...
Depending on the CAT-tool you use_memoQ Nov 19, 2015

Hi Rafael,

In my memoQ I would convert the PDF-files first into Word (I do it with ABBYY PDF-converter), fix the outcome a little bit so both documents "look alike", go to the project I opened in memoQ and run the aligner from the LiveDocs section. After checking the automatically aligned segments I would confirm all segments and export to my Translation Memory. The most annoying part is converting the PDF-documents into Word-documents, the rest is very fast.

I suppose it depends on the tool you use, as Kevin says, the important goal is to obtain those TM-entries you can import and export afterwards.

Hope I could give you an idea about how this can be done in memoQ, best regards,



Reed James
Local time: 01:18
Spanish to English
+ ...
Logiterm Nov 26, 2015

As long as the PDF is editable, with Logiterm you can align the text to produce a Bitext. Then you can save the BT as a TM through Logiterm. If the alignment is not exact, you can massage the segments yourself until you are satisfied. No PDF to Word conversion here.


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Clean PDFs to TM solution

Advanced search

Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Across Translator Edition and Across Language Server v7
Meet our brand new version and speed up your translation processes!

Deliver high-quality translations with our fast and secure solutions. You can also integrate machine translation and other third-party systems. The Across Translator Basic Edition is free for freelancers. Start now and discover new business opportuniti

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search