Search and replace in Word/PDF copied and pasted
Thread poster: Chris Lancaster
Chris Lancaster
Chris Lancaster  Identity Verified
Spain
Member
Spanish to English
Nov 7, 2007

Can anyone remind me how to use the S&R tool in Word to reformat a PDF when the text is copied and pasted from the pdf into Word and ends up as a 'single body' of text (i.e. not with the same format as the original). I seem to recall it was a 3-part step involving a combination of search [space space space] replace with [^^^] or something like that... Thanks for any help.

 
Margaret Schroeder
Margaret Schroeder  Identity Verified
Mexico
Local time: 09:27
Spanish to English
+ ...
It depends what format deficiency you are trying to fix... Nov 7, 2007

.... Can you be more specific?

Typically, you have to tweak the paragraph breaks, and/or word divisions, but it may vary depending on how the original was formatted.


 
Chris Lancaster
Chris Lancaster  Identity Verified
Spain
Member
Spanish to English
TOPIC STARTER
Format Nov 7, 2007

Normally I go through the pasted doc 'tweaking' it manually as you say but I know there is a search and replace formula that can replicate the original format in the pdf - I just can't remember how ... and it's a problem when you've got a huge document because of the time it takes ...

 
Lia Fail (X)
Lia Fail (X)  Identity Verified
Spain
Local time: 17:27
Spanish to English
+ ...
Not sure if you mean this? Nov 7, 2007

Christopher Lancaster wrote:

Can anyone remind me how to use the S&R tool in Word to reformat a PDF when the text is copied and pasted from the pdf into Word and ends up as a 'single body' of text (i.e. not with the same format as the original). I seem to recall it was a 3-part step involving a combination of search [space space space] replace with [^^^] or something like that... Thanks for any help.


First you have to identify what kind of reformatting has to be done.

EG, a paragraph mark at the end of every line (EOL) will be two marks at an end of para (EOP) when a para mark is really meant, so you want to remove the EOL ones and keep the EOP ones.

S > para mark para mark
R > XXXXX
That will mark all the real paras

Now
S > para mark
R > space

Finally, to gte back the paras
S > XXXXX
R > para mark

I hope I'm remembering that right:-)

Another problem might be hyphens, there you just search for hyphens and replace with nothing, and a spellchecker, hopefully, will indicate the words where you need a hyphen.

You will still have to do a bit of manual tweaking and checking.


 
Gerard de Noord
Gerard de Noord  Identity Verified
France
Local time: 17:27
Member (2003)
English to Dutch
+ ...
PlusTools Nov 7, 2007

You could download the free Word template PlusTools:
http://www.wordfast.net/index.php?whichpage=downloadpage&lang=engb

After installing it you can click on the PlusTools icon and open the +Tools/Cnv tab. There you'll find two buttons: one to import text from a PDF and one to "Recreate paragraphs in current doc". Have your document open in Word and cli
... See more
You could download the free Word template PlusTools:
http://www.wordfast.net/index.php?whichpage=downloadpage&lang=engb

After installing it you can click on the PlusTools icon and open the +Tools/Cnv tab. There you'll find two buttons: one to import text from a PDF and one to "Recreate paragraphs in current doc". Have your document open in Word and click on Recreate.

PlusTools will intelligently try to remove all unnecessary line breaks (with superior advanced FR operations). A real time-saver.

Regards,
Gerard
Collapse


 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 18:27
Member (2003)
Finnish to German
+ ...
Text will often be messed up Nov 8, 2007

Very often the text copied from pdf to Word will be in a different order from what you see in the pdf. So it may happen that headers will appear after the body-text, paragraphs will change order, page numbers are on top instead of bottom and most image-texts will just be thrown around.
When a pdf is created from a dtp-file, the text is composed in the order of the stories, not how the text looks on screen.
So it is generally not a good idea to copy from pdf but use Abbyy Finereader
... See more
Very often the text copied from pdf to Word will be in a different order from what you see in the pdf. So it may happen that headers will appear after the body-text, paragraphs will change order, page numbers are on top instead of bottom and most image-texts will just be thrown around.
When a pdf is created from a dtp-file, the text is composed in the order of the stories, not how the text looks on screen.
So it is generally not a good idea to copy from pdf but use Abbyy Finereader instead. I have had really messy pdfs with three columns and all kind of images and texts and could convert them with Finereader so that it was possible to translate them in Word and the customer actually could use the translation in order to update her original files.
The lisence fee for Finereader is the best investment I ever made.

Only very simple pdfs can be copied in a reasonable way.

Cheers
Heinrich
Collapse


 
Chris Lancaster
Chris Lancaster  Identity Verified
Spain
Member
Spanish to English
TOPIC STARTER
Thanks everyone Nov 8, 2007

Some very useful tips there... many thanks to you all.

 
Margaret Schroeder
Margaret Schroeder  Identity Verified
Mexico
Local time: 09:27
Spanish to English
+ ...
A tool to fix line breaks in a smart way Nov 16, 2007

I just learned about this freeware tool, AutoUnbreak that fixes linebreaks in a file converted from pdf. It will join up hyphenated words that were split by linebreaks and will not put multiple items in a bulleted list onto the same line.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Search and replace in Word/PDF copied and pasted






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »