‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!
Thread poster: Michael Joseph Wdowiak Beijer

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 18:10
Member (2009)
Dutch to English
+ ...
Jun 29, 2014

Do you often have a lot of incorrect line breaks in OCRd and/or converted texts?

You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!

-> http://www.translatortools.net/word-unbreaker.html

I made a quick screencast showing how it works:

http://wordbook.nl/screencasts/Unbreaker-(new-tool-in-TransTools-to-remove-spurious-line-endings-from-e.g.-OCRd-documents).mp4


Unbreaker


Michael


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:10
Member (2006)
English to Afrikaans
+ ...
How does it compare to PlusTools? Jun 29, 2014

Michael Beijer wrote:
Do you often have a lot of incorrect line breaks in OCRd and/or converted texts? You can now easily get rid of them with ‘Unbreaker’, a new tool in TransTools!


That's nice. PlusTools has had a similar function for years. In PlusTools, go to Tools > C[o]nv[ert] > Recreate paragraphs in currend doc[ument]. How does the Unbreaker tool's results compare to PlusTools' results?


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 18:10
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
some more info Jun 29, 2014

Hi Samuel,

I’m not really sure. I only used PlusTools once or twice, a long time ago. I wouldn’t even know where to download it anymore, whereas Stanislav (the developer of Unbreaker/TransTools) is extremely active and enthusiastic about his tool.

Unbreaker also has quite a few useful settings:


Wordbook.nl


+

Wordbook.nl


We are also discussing it over in the CafeTran mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/cafetranslators/p3oMlbQeWaM

¬¬¬

TransTools is definitely a tool to keep an eye on. I think it has amazing potential.

I also just read the following in the ‘Translator Tools Newsletter’ this morning (for registered users of TransTools Professional Edition):


Wordbook.nl


+


Wordbook.nl


+


Wordbook.nl


Michael


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 19:10
English to Russian
+ ...
OCR Jun 29, 2014

Speaking of extra line and page breaks in OCRed text, a good OCR program will have an option to remove them. For example, ABBYY Finereader does.

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 18:10
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
ABBYY FineReader Jun 29, 2014

Hi Anton,

That’s interesting. Can you tell me how to do it in ABBYY FineReader 12?

Incidentally, the documents in my screenshots were just something I found online and was in the process of aligning.

Michael


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 19:10
English to Russian
+ ...
Finereader Jul 7, 2014

Michael, all you need to do in Finereader is uncheck two boxes, "Keep page breaks" and "Keep line breaks" in the Options dialogue, Save tab.

Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 18:10
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
¬¬¬ Jul 7, 2014

Hi Anton,

I'll have to do a little testing. If I do this, will I then perhaps inadvertently remove valid page and line breaks. That is, ones that I would like to keep?

Michael


Direct link Reply with quote
 

Chunyi Chen
United States
Local time: 10:10
English to Chinese
Thanks for sharing this information Jul 9, 2014

Hi Michael,

I purchased the TransTools Professional Edition ($20 for installing the program on up to three machines) after reading your user feedback here and visiting the TransTools website. So far I have only tried two or three features and am very happy with this program. I thought I would need to go through the readme files to know how to use the features, but it turns out the onscreen instructions were quite easy to follow. I highly recommend this tool to those who need to tweak the source files for better importing results in their CAT environment, and to those who still work with Trados files and want to see better which segments are new, which are fuzzy, etc.
This is another nice addition after my ApSIC Xbench investment!

Chun-yi


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 19:10
English to Russian
+ ...
"Valid" breaks Jul 9, 2014

Michael, if I remember correctly, line breaks between paragraphs (as opposed to lines within the same paragraph) will be retained anyway. As to page breaks, either you keep them, or you don't, I don't see how a page break may be "valid" or "invalid".

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

‘Unbreaker’ (new tool in TransTools to remove spurious line endings from OCRd/converted text)!

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search