PDFs and new CAT tools
Thread poster: Miroslav Jeftic

Miroslav Jeftic  Identity Verified
Local time: 16:09
English to Serbian
+ ...
May 10, 2010

Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?

I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced in the end.icon_smile.gif

[Edited at 2010-05-10 09:53 GMT]


 

Stanislav Pokorny  Identity Verified
Czech Republic
Local time: 16:09
English to Czech
+ ...
Very limited May 10, 2010

Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size

It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional method:
1. Getting the editable source files, if possible.
2. If the client fails to provide me with them, I run an OCR, "clean" the converted text in terms of removing any redundant formatting and, finally, translate.


 

Sushan Harshe
India
Local time: 19:39
English to Hindi
+ ...
In Studio2009, it works as follows May 10, 2010

Hi Miroslav,

It is very simple to open .pdf in studio2009

[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]

the links for snapshots of process are here above; but I don't know why it is not showing the snaps specially taken and uploaded for you.

anyway its a public album!

Regards,

Sushan








[Edited at 2010-05-10 11:20 GMT]

[Edited at 2010-05-10 11:27 GMT]


 

Miroslav Jeftic  Identity Verified
Local time: 16:09
English to Serbian
+ ...
TOPIC STARTER
:) May 10, 2010

Thanks Stanislav! I guess it is as I have thought, we are still far away from good support for 10MB+ worth of scanned pages, unfortunately.icon_smile.gif

 

Kristyna Marrero  Identity Verified
United States
Local time: 10:09
Try the latest version of WORDFAST ANYWHERE with support for scanned PDFs Apr 12, 2011

Hi Miroslav,

Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.

Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspace. We invite you to try Wordfast Anywhere today at www.FreeTM.com.

Hope this helps,

Kristyna Marrero
Director, Sales & Marketing


 

Miroslav Jeftic  Identity Verified
Local time: 16:09
English to Serbian
+ ...
TOPIC STARTER
:) Apr 12, 2011

Hi Kristyna,

Actually I have tried Wordfast Anywhere, few days ago I think, and while it was ok with the simpler pdfs I uploaded, as soon as I tried one of my "difficult" ones it returned conversion erroricon_smile.gif


 

Michal Glowacki  Identity Verified
Poland
Local time: 16:09
Member (2010)
English to Polish
+ ...
CATs don't like PDFs Apr 13, 2011

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.

 

Miroslav Jeftic  Identity Verified
Local time: 16:09
English to Serbian
+ ...
TOPIC STARTER
:) Apr 13, 2011

Michal Glowacki wrote:

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.


Fully agreeicon_smile.gif


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

PDFs and new CAT tools

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search