Problem with the tags while translation a PDF with Studio 2009
Thread poster: Johanna von der Vring

Johanna von der Vring  Identity Verified
Germany
Local time: 09:38
Member (2006)
Italian to German
+ ...
May 6, 2013

Hi,

I'm translation a PDF which is full of fotos with Studio 2009. I controlled that the function "without tags" is on. Nevertheless the text is full of tags and it is impossible to get them out. It is too much work to translate it with the tags. Is there any possibility to get them out?

Best regards
Johanna von der Vring


 

SDL Community  Identity Verified
United Kingdom
Local time: 09:38
English
You need to correct the source file May 6, 2013

Hi,

The problem with PDF files is that depending on how they were created the OCR tool used to extract the text will often place tags all over the place to represent what it thinks it's looking at... sometimes nothing more than different kerning between the letters.

What you can try is this:

1. Save the source file from Studio and this will give you a Word file
2. Open the Word file in Word and press Ctrl+A, then Ctrl+spacebar
3. Save it and open this file in Studio instead

This often removes most of the offending and unnecessary tags without losing the formatting of the text.

If that doesn't help there's a really good article here with some other ideas to try: http://goo.gl/yvvFP

Regards

Paul


 

Siegfried Armbruster  Identity Verified
Germany
Local time: 09:38
Member (2004)
English to German
+ ...
Translating PDFs in Studio or in any other CAT tool is not a good idea May 6, 2013

If you want to learn how to prepare PDFs, a good start might be to attend my webinar on OCR with Finereader 11 (for more info see: http://alexandria-library.com/2013/04/01/ocr-finereader-11/).

PDFs need to be converted to get optimal result and depending on the type of PDF, either Finereader 11, Acrobat Pro XI or Infix PDF Editor will give you the best results.


 

Johanna von der Vring  Identity Verified
Germany
Local time: 09:38
Member (2006)
Italian to German
+ ...
TOPIC STARTER
Is there a better solution in Studio 2011? May 6, 2013

Does this problem exist also in Studio 2011? If there is a better solution, it might be a good reason to upgrade.

 

SDL Community  Identity Verified
United Kingdom
Local time: 09:38
English
I reckon... May 7, 2013

... any CAT apart from translating directly in Word will have this problem. The tags are there... it is not a fault in the tool. So this is why the solution is to correct the source first. There is a small utility called Codezapper that can also remove the tags, and DVX (another CAT tool) accesses this tool before the file is processed, so there the file would be cleaner. But this is still changing the source file to clean it up before you translate it.

Studio 2011 will be no different.

Regards

Paul


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Problem with the tags while translation a PDF with Studio 2009

Advanced search







memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search