‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!
Thread poster: Michael Beijer

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:03
Member (2009)
Dutch to English
+ ...
Jun 29, 2014

Do you often have a lot of incorrect line breaks in OCRd and/or converted texts?

You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!

-> http://www.translatortools.net/word-unbreaker.html

I made a quick screencast showing how it works:

http://wordbook.nl/screencasts/Unbreaker-(new-tool-in-TransTools-to-remove-spurious-line-endings-from-e.g.-OCRd-documents).mp4


Unbreaker-screencast.png


Michael


 

Samuel Murray  Identity Verified
Netherlands
Local time: 12:03
Member (2006)
English to Afrikaans
+ ...
How does it compare to PlusTools? Jun 29, 2014

Michael Beijer wrote:
Do you often have a lot of incorrect line breaks in OCRd and/or converted texts? You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!


That's nice. PlusTools has had a similar function for years. In PlusTools, go to Tools > C[o]nv[ert] > Recreate paragraphs in currend doc[ument]. How does the Unbreaker tool's results compare to PlusTools' results?


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:03
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
some more info Jun 29, 2014

Hi Samuel,

I’m not really sure. I only used PlusTools once or twice, a long time ago. I wouldn’t even know where to download it anymore, whereas Stanislav (the developer of Unbreaker/TransTools) is extremely active and enthusiastic about his tool.

Unbreaker also has quite a few useful settings:


Unbreaker-settings1.png


+

Unbreaker-settings2.png


We are also discussing it over in the CafeTran mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/p3oMlbQeWaM

¬¬¬

TransTools is definitely a tool to keep an eye on. I think it has amazing potential.

I also just read the following in the ‘Translator Tools Newsletter’ this morning (for registered users of TransTools Professional Edition):


TransTools-newsletter1.png


+


TransTools-newsletter2.png


+


TransTools-newsletter3.png


Michael


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 12:03
English to Russian
+ ...
OCR Jun 29, 2014

Speaking of extra line and page breaks in OCRed text, a good OCR program will have an option to remove them. For example, ABBYY Finereader does.

 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:03
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
ABBYY FineReader Jun 29, 2014

Hi Anton,

That’s interesting. Can you tell me how to do it in ABBYY FineReader 12?

Incidentally, the documents in my screenshots were just something I found online and was in the process of aligning.

Michael


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 12:03
English to Russian
+ ...
Finereader Jul 7, 2014

Michael, all you need to do in Finereader is uncheck two boxes, "Keep page breaks" and "Keep line breaks" in the Options dialogue, Save tab.

 

Michael Beijer  Identity Verified
United Kingdom
Local time: 11:03
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
¬¬¬ Jul 7, 2014

Hi Anton,

I'll have to do a little testing. If I do this, will I then perhaps inadvertently remove valid page and line breaks. That is, ones that I would like to keep?

Michael


 

Chunyi Chen
United States
Local time: 03:03
English to Chinese
Thanks for sharing this information Jul 9, 2014

Hi Michael,

I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to tweak the source files for better importing results in their CAT environment, and to those who still work with Trados files and want to see better which segments are new, which are fuzzy, etc.
This is another nice addition after my ApSIC Xbench investment!

Chun-yi


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 12:03
English to Russian
+ ...
"Valid" breaks Jul 9, 2014

Michael, if I remember correctly, line breaks between paragraphs (as opposed to lines within the same paragraph) will be retained anyway. As to page breaks, either you keep them, or you don't, I don't see how a page break may be "valid" or "invalid".

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!

Advanced search







BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search