‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!
Thread poster: Michael Beijer

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:57
Member (2009)
Dutch to English
+ ...
Jun 29, 2014

Do you often have a lot of incorrect line breaks in OCRd and/or converted texts?

You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!

-> http://www.translatortools.net/word-unbreaker.html

I made a quick screencast showing how it works:

http://wordbook.nl/screencasts/Unbreaker-(new-tool-in-TransTools-to-remove-spurious-line-endings-from-e.g.-OCRd-documents).mp4


Unbreaker-screencast.png


Michael


 

Samuel Murray  Identity Verified
Netherlands
Local time: 01:57
Member (2006)
English to Afrikaans
+ ...
How does it compare to PlusTools? Jun 29, 2014

Michael Beijer wrote:
Do you often have a lot of incorrect line breaks in OCRd and/or converted texts? You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!


That's nice. PlusTools has had a similar function for years. In PlusTools, go to Tools > C[o]nv[ert] > Recreate paragraphs in currend doc[ument]. How does the Unbreaker tool's results compare to PlusTools' results?


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:57
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
some more info Jun 29, 2014

Hi Samuel,

I’m not really sure. I only used PlusTools once or twice, a long time ago. I wouldn’t even know where to download it anymore, whereas Stanislav (the developer of Unbreaker/TransTools) is extremely active and enthusiastic about his tool.

Unbreaker also has quite a few useful settings:


Unbreaker-settings1.png


+

Unbreaker-settings2.png


We are also discussing it over in the CafeTran mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/p3oMlbQeWaM

¬¬¬

TransTools is definitely a tool to keep an eye on. I think it has amazing potential.

I also just read the following in the ‘Translator Tools Newsletter’ this morning (for registered users of TransTools Professional Edition):


TransTools-newsletter1.png


+


TransTools-newsletter2.png


+


TransTools-newsletter3.png


Michael


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 01:57
English to Russian
+ ...
OCR Jun 29, 2014

Speaking of extra line and page breaks in OCRed text, a good OCR program will have an option to remove them. For example, ABBYY Finereader does.

 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:57
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
ABBYY FineReader Jun 29, 2014

Hi Anton,

That’s interesting. Can you tell me how to do it in ABBYY FineReader 12?

Incidentally, the documents in my screenshots were just something I found online and was in the process of aligning.

Michael


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 01:57
English to Russian
+ ...
Finereader Jul 7, 2014

Michael, all you need to do in Finereader is uncheck two boxes, "Keep page breaks" and "Keep line breaks" in the Options dialogue, Save tab.

 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:57
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
¬¬¬ Jul 7, 2014

Hi Anton,

I'll have to do a little testing. If I do this, will I then perhaps inadvertently remove valid page and line breaks. That is, ones that I would like to keep?

Michael


 

Chunyi Chen
United States
Local time: 16:57
English to Chinese
Thanks for sharing this information Jul 9, 2014

Hi Michael,

I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to tweak the source files for better importing results in their CAT environment, and to those who still work with Trados files and want to see better which segments are new, which are fuzzy, etc.
This is another nice addition after my ApSIC Xbench investment!

Chun-yi


 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 01:57
English to Russian
+ ...
"Valid" breaks Jul 9, 2014

Michael, if I remember correctly, line breaks between paragraphs (as opposed to lines within the same paragraph) will be retained anyway. As to page breaks, either you keep them, or you don't, I don't see how a page break may be "valid" or "invalid".

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search