Mobile menu

Connetting the Line breaks or Paragraphs in MS Word
Thread poster: Ali Bayraktar
Ali Bayraktar  Identity Verified
Turkey
Member (2007)
English to Turkish
+ ...
Feb 4, 2009

Hi Colleagues,

I have a scanned book of 400 pages (MS Word)
But converter put the line breaks as they were in the original page.
So the sentences are not in accordance with the allowed line length in the page.
Is there an easy way to connect all sentences and arrange them from dots (.) to dots (.) ?

Thanks for the helps in advance,

M. Ali

[Edited at 2009-02-04 14:20 GMT]


Direct link Reply with quote
 
Terry Richards
France
Local time: 01:06
French to English
+ ...
Use Find/replace Feb 4, 2009

Change all manual line breaks (put ^l in the "from" box) to spaces.

If that doesn't do it, the lines are probably terminated with paragraph marks (^p) so change them to spaces.

For extra credit, if your real paragraphs are separated by a blank line, change all double line breaks (^l^l) to paragraph marks (^p) and then all single line breaks to spaces (^l).

You may have to experiment a bit to find out exactly how the lines are terminated but you some combination of this should do it for you.

Terry.


Direct link Reply with quote
 
आनंद  Identity Verified
Local time: 04:36
English to Hindi
Replace "Paragraph Mark" with single space Feb 4, 2009

You can use Find Replace feature of MS Word. You can replace Paragraph Mark with simple space.

Go to Replace, Click More, click special, and select Paragraph Mark. The Paragraph Mark's symbol (^p) will appear in Find box, put just space in Replace box. Now you can replace all marks in one go.

dubsur


Direct link Reply with quote
 
Ali Bayraktar  Identity Verified
Turkey
Member (2007)
English to Turkish
+ ...
TOPIC STARTER
Thank you very much Feb 4, 2009

Dear Terry & Dubsur
Thank you very much for your kind helps.
I have solved it with your helps.

Thanks again.

Best Regards,

M. Ali


Direct link Reply with quote
 

Christel Zipfel  Identity Verified
Partial member (2004)
Italian to German
+ ...
I use Autounbreak Feb 4, 2009

a smart little (free) tool that has been suggested some time ago in another thread.

http://www.ghacks.net/2008/09/11/unbreak-copied-text-from-pdf-documents/

You will soon find it indispensable, like I do!


Direct link Reply with quote
 

PAS  Identity Verified
Local time: 01:06
English to Polish
+ ...
Semi-primitive solution Feb 19, 2009

The problem with replacing all hard line breaks (^p) with a {space} is that the "real" hard line breaks, i.e. the ones which separate actual paragraphs in the text also disappear.

The problem with MS Word is that you cannot use the special symbols (like ^p) simultaneously with wildcards.

So I finally stumbled on a semi-primitive solution which should not take very much time and which sould be reasonably effective without upsetting the actual paragraph structure of the document extracted from the PDF.

The key is that the hard line break is NOT followed by a space before the word in the next line, so here's what you do:

in the search and replace dialogue you search for "^pa"
and replace it with " a" (i.e. {space}a ). Hit "replace all".
You then need to do this for every letter of the alphabet. "^pb" -> " b" and so on.

That's why this is "semi-primitive" - you need to repeat the step 20-odd times, depending on the alphabet you are using.

Best,
Pawel Skalinski


Direct link Reply with quote
 
Terry Richards
France
Local time: 01:06
French to English
+ ...
A slightly less-primitive solution Feb 19, 2009

1) Change all "^p " to a string that doesn't occur in the document (like "&^%$").

2) Change all (remaining) "^p" to spaces.

3) Change the string back to "^p".

T.


Direct link Reply with quote
 

PAS  Identity Verified
Local time: 01:06
English to Polish
+ ...
But... Feb 19, 2009

Terry Richards wrote:

1) Change all "^p " to a string that doesn't occur in the document (like "&^%$").


But there is no space after the line break...? At least in the PDF-derived document I was testing this on.

P.A.S.


Direct link Reply with quote
 
Terry Richards
France
Local time: 01:06
French to English
+ ...
In that case... Feb 19, 2009

...your solution wouldn't work either.

It all depends on how the PDF convertion formatted the line ends. Sometimes there's a space for real paragraph ends, or two line breaks, or two paragraph marks. You just have to experiment to get the combination you need. If there is *anything* different beween line ends and paragraph ends, you can exploit it to get the formatting back.

T.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Connetting the Line breaks or Paragraphs in MS Word

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs