Remove hard returns in pdf to word doc
Thread poster: Sonja Marks

Sonja Marks  Identity Verified
France
Local time: 00:52
Member (2006)
German to English
Apr 12, 2010

When I copy text from a pdf to a Word document it of course has a hard return at the end of each line and all free converter programs I have found wither do this too or use text boxes. Both of these methods present problems with CAT tools. Is there any easy way to remove all those irritating returns?

 

Tony M
France
Local time: 00:52
Member
French to English
+ ...
Remove hard returns (in general) Apr 12, 2010

I've encountered the same problem under other circumstances.

What I do is this, using search-&-replace under Word:

To preserve genuine paragraph breaks, I first search for double paragraph breaks ^p^p and replace those with some other character like § for example that doesn't occur anywhere else in your text.

Then I replace all the remaining single paragraph breaks ^p with spaces.

And finally, I go back and replace the § with a single proper paragraph break ^p

It takes longer to describe than to do!

[Edited at 2010-04-12 21:51 GMT]


 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 01:52
Member (2008)
English to Russian
+ ...
my way Apr 12, 2010

1) Replace double spaces with single spaces. It takes several cycles.
2) Replace ^p with a space. It may take several cycles.
3) Replace double space with a ^p
4) Replace ^L with space.
Done.

(It always works in the ideal worlds ... some things might happen... act accordingly and use logic.)

Sometimes, you need to...

5) Replace double spaces with single spaces again.

[Редактировалось 2010-04-12 19:51 GMT]


 

Pablo Bouvier  Identity Verified
Local time: 00:52
German to Spanish
+ ...
Remove hard returns in pdf to word doc Apr 12, 2010

Sonja Marks wrote:

When I copy text from a pdf to a Word document it of course has a hard return at the end of each line and all free converter programs I have found wither do this too or use text boxes. Both of these methods present problems with CAT tools. Is there any easy way to remove all those irritating returns?


Try with codezapper.


 

Hester Eymers  Identity Verified
Netherlands
Local time: 00:52
Member (2005)
English to Dutch
+ ...
Autounbreak Apr 13, 2010

Or try Autounbreak: http://digital.hollmen.dk/products/autounbreak/index.htm
It's quite good (not perfect) and it's freeware.


 

Katherine Mérignac  Identity Verified
France
Local time: 00:52
Member (2004)
French to English
Thanks Apr 13, 2010

To all - these are really useful tips, so thank you!

K


 

Pablo Bouvier  Identity Verified
Local time: 00:52
German to Spanish
+ ...
Remove hard returns in pdf to word doc Apr 13, 2010

Katherine Mérignac wrote:

To all - these are really useful tips, so thank you!

K



It was a pleasure, Katherine. Since I published te shared link to the codezapper template from Dave Turner it has been downloaded more than 25 times. I am proud to belong to such well educated community. Thank you all.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Remove hard returns in pdf to word doc

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search