Can I ignore tags in Studio 2009 and save a target file?
Thread poster: Vincent Lemma

Vincent Lemma  Identity Verified
Italy
Local time: 12:27
Member (2008)
Italian to English
+ ...
Jul 22, 2013

Hello, I have some medium sized files converted from PDF via OCR recognition and this conversion has filled my file with tags that seems "useless". With this I mean that the PDF file has no particular
text formatting or style, just a standard Arial character with its dimension.
Can I ignore the hundreds of tags to speed things up, or are there any short-cuts to streamline this work?
Sorry if this has been covered before, but I am unable to find the related posts.

Thanks so much,

Vince


 

Bernard Lieber  Identity Verified
Local time: 12:27
English to French
+ ...
Codezapper Jul 22, 2013

Hi Vincent,

Use Codezapper to clean the file: http://asap-traduction.com/CodeZapper before translating it will make life a lot easier for you.

HTH,

Bernard

[Edited at 2013-07-22 10:40 GMT]


 

FarkasAndras
Local time: 12:27
English to Hungarian
+ ...
Alternative to CodeZapper Jul 22, 2013

You can also remove some or all of the formatting manually quite easily.

If the text is all same-size Arial, you can select the entire document and set the font and font size. OCR software often inserts extra-wide and extra-narrow spaces in documents for some unfathomable reason; these can also be fixed. If all you need is uniform running text without any formatting whatsoever, you can copy-paste your text into a txt file, open a new word file and copy-paste the text back in the new file. This is the only way to remove all tags, but it also removes all formatting including bold, italic, text alignment etc.
Obviously, these things need to be done before processing the file with Trados.
Also, if you're the one doing the OCR, you can set the OCR software to produce cleaner, simpler text. Usually, there are several settings ranging from plain text (easy to process but looks nothing like the original visually) all the way to very faithfully rendered but horribly tag-ridden text.

[Edited at 2013-07-22 13:05 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can I ignore tags in Studio 2009 and save a target file?

Advanced search







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search