Search and replace in Word/PDF copied and pasted
Thread poster: Chris Lancaster

Chris Lancaster  Identity Verified
Spain
Local time: 16:58
Member
Spanish to English
Nov 7, 2007

Can anyone remind me how to use the S&R tool in Word to reformat a PDF when the text is copied and pasted from the pdf into Word and ends up as a 'single body' of text (i.e. not with the same format as the original). I seem to recall it was a 3-part step involving a combination of search [space space space] replace with [^^^] or something like that... Thanks for any help.

Direct link Reply with quote
 

GoodWords  Identity Verified
Mexico
Local time: 09:58
Spanish to English
+ ...
It depends what format deficiency you are trying to fix... Nov 7, 2007

.... Can you be more specific?

Typically, you have to tweak the paragraph breaks, and/or word divisions, but it may vary depending on how the original was formatted.


Direct link Reply with quote
 

Chris Lancaster  Identity Verified
Spain
Local time: 16:58
Member
Spanish to English
TOPIC STARTER
Format Nov 7, 2007

Normally I go through the pasted doc 'tweaking' it manually as you say but I know there is a search and replace formula that can replicate the original format in the pdf - I just can't remember how ... and it's a problem when you've got a huge document because of the time it takes ...

Direct link Reply with quote
 
xxxLia Fail  Identity Verified
Spain
Local time: 16:58
Spanish to English
+ ...
Not sure if you mean this? Nov 7, 2007

Christopher Lancaster wrote:

Can anyone remind me how to use the S&R tool in Word to reformat a PDF when the text is copied and pasted from the pdf into Word and ends up as a 'single body' of text (i.e. not with the same format as the original). I seem to recall it was a 3-part step involving a combination of search [space space space] replace with [^^^] or something like that... Thanks for any help.


First you have to identify what kind of reformatting has to be done.

EG, a paragraph mark at the end of every line (EOL) will be two marks at an end of para (EOP) when a para mark is really meant, so you want to remove the EOL ones and keep the EOP ones.

S > para mark para mark
R > XXXXX
That will mark all the real paras

Now
S > para mark
R > space

Finally, to gte back the paras
S > XXXXX
R > para mark

I hope I'm remembering that right:-)

Another problem might be hyphens, there you just search for hyphens and replace with nothing, and a spellchecker, hopefully, will indicate the words where you need a hyphen.

You will still have to do a bit of manual tweaking and checking.


Direct link Reply with quote
 

Gerard de Noord  Identity Verified
France
Local time: 16:58
Member (2003)
German to Dutch
+ ...
PlusTools Nov 7, 2007

You could download the free Word template PlusTools:
http://www.wordfast.net/index.php?whichpage=downloadpage&lang=engb

After installing it you can click on the PlusTools icon and open the +Tools/Cnv tab. There you'll find two buttons: one to import text from a PDF and one to "Recreate paragraphs in current doc". Have your document open in Word and click on Recreate.

PlusTools will intelligently try to remove all unnecessary line breaks (with superior advanced FR operations). A real time-saver.

Regards,
Gerard


Direct link Reply with quote
 

Heinrich Pesch  Identity Verified
Finland
Local time: 17:58
Member (2003)
Finnish to German
+ ...
Text will often be messed up Nov 8, 2007

Very often the text copied from pdf to Word will be in a different order from what you see in the pdf. So it may happen that headers will appear after the body-text, paragraphs will change order, page numbers are on top instead of bottom and most image-texts will just be thrown around.
When a pdf is created from a dtp-file, the text is composed in the order of the stories, not how the text looks on screen.
So it is generally not a good idea to copy from pdf but use Abbyy Finereader instead. I have had really messy pdfs with three columns and all kind of images and texts and could convert them with Finereader so that it was possible to translate them in Word and the customer actually could use the translation in order to update her original files.
The lisence fee for Finereader is the best investment I ever made.

Only very simple pdfs can be copied in a reasonable way.

Cheers
Heinrich


Direct link Reply with quote
 

Chris Lancaster  Identity Verified
Spain
Local time: 16:58
Member
Spanish to English
TOPIC STARTER
Thanks everyone Nov 8, 2007

Some very useful tips there... many thanks to you all.

Direct link Reply with quote
 

GoodWords  Identity Verified
Mexico
Local time: 09:58
Spanish to English
+ ...
A tool to fix line breaks in a smart way Nov 16, 2007

I just learned about this freeware tool, AutoUnbreak that fixes linebreaks in a file converted from pdf. It will join up hyphenated words that were split by linebreaks and will not put multiple items in a bulleted list onto the same line.

Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Search and replace in Word/PDF copied and pasted

Advanced search






SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs