Mobile menu

How to remove paragraph marks from a converted doc
Thread poster: Martin Wenzel

Martin Wenzel
Germany
Local time: 10:11
English to German
+ ...
Jun 22, 2007

The theory was easy, I wanted to convert some downloaded PDF files from the Guardian and trados a few current catchy phrases into my Trados memory.

Well, well, this was the theory...

To this purpose I purchased a PDF converter that gives me the columns w/o text boxes.

However, I still have the paragraph marks in there from the headlines or columns.

I tried to use the find/replace command and to replace all para marks by spaces, but that doesn't seem to work.

So, what's the solution to that, I wonder...

Saving the text as a text file from within Adobe PDF leaves me with the same problem...


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 10:11
Member (2004)
English to Slovenian
+ ...
Paragraph marks ... hm ... Jun 22, 2007

try to replace not ^p, but ^l ... In some cases - if the end of line is intrepreted other than a carriage return line feed pair, the big P stays in the file, but stands for something, like a soft break.

regards


Direct link Reply with quote
 

Iza Szczypka  Identity Verified
Spain
Local time: 10:11
English to Polish
+ ...
That IS a problem Jun 22, 2007

Martin Wenzel wrote:

I tried to use the find/replace command and to replace all para marks by spaces, but that doesn't seem to work.


Works a bit faster if you go paragraph by paragraph, marking the relevant portions of text first (comparing with the original). Still, not my favourite task either ... Hopefully someone has a better tip.


Direct link Reply with quote
 

Martin Wenzel
Germany
Local time: 10:11
English to German
+ ...
TOPIC STARTER
Fiddling around takes ages... Jun 22, 2007

^p works for the paragraph marks, but to replace it by a simple space ^s isn't foolproof and requires manual editing.


I am bit disappointed about the results and am hoping somebody has developed a nifty macro...

I think the other solution might be buying Adobe Acrobat, but it's always buying and upgrading...


Direct link Reply with quote
 
Richard Walker
Local time: 17:11
Japanese to English
It's doable, but... Jul 2, 2007

I've never been satisfied with programs that claim to extract from PDF because they seem to be either reheated OCR software or dumb text dumps (which is what Adobe offers and it produces spectacularly bad results when you're dealing with columns and tables).

So, to prevent myself from going totally batty, I developed a voice-activated macro that I use with Dragon Naturally Speaking to extract text from PDF to Word one paragraph at a time and eliminate unwanted hard returns as I go. The programming part isn't that hard; the difficult part is being able to trigger it in Acrobat, which doesn't have any native macro-triggering capability (and that's where Dragon comes in handy, though it does get rather old saying "next para" 517 times).

The basic strategy is for me to select a paragraph in Acrobat, copy the selection to a variable, eliminate all returns (I work from Japanese source text, so I just eliminate returns, though of course spaces could be added instead) and then append the result to the active document in Word, where the macro adds appropriate returns and in some variations, styles ("next para heading 1"). It's reliable but tedious but not as tedious as other strategies I've tried. I'd be glad to post the code if anyone is interested.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to remove paragraph marks from a converted doc

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs