How to remove paragraph marks from a converted doc
Thread poster: Martin Wenzel

Martin Wenzel
Germany
Local time: 10:56
English to German
+ ...
Jun 22, 2007

The theory was easy, I wanted to convert some downloaded PDF files from the Guardian and trados a few current catchy phrases into my Trados memory.

Well, well, this was the theory...

To this purpose I purchased a PDF converter that gives me the columns w/o text boxes.

However, I still have the paragraph marks in there from the headlines or columns.

I tried to use the find/replace command and to replace all para marks by spaces, but that doesn't seem to work.

So, what's the solution to that, I wonder...

Saving the text as a text file from within Adobe PDF leaves me with the same problem...


Direct link Reply with quote
 

Vito Smolej
Germany
Local time: 10:56
Member (2004)
English to Slovenian
+ ...
Paragraph marks ... hm ... Jun 22, 2007

try to replace not ^p, but ^l ... In some cases - if the end of line is intrepreted other than a carriage return line feed pair, the big P stays in the file, but stands for something, like a soft break.

regards


Direct link Reply with quote
 

Iza Szczypka  Identity Verified
Spain
Local time: 10:56
English to Polish
+ ...
That IS a problem Jun 22, 2007

Martin Wenzel wrote:

I tried to use the find/replace command and to replace all para marks by spaces, but that doesn't seem to work.


Works a bit faster if you go paragraph by paragraph, marking the relevant portions of text first (comparing with the original). Still, not my favourite task either ... Hopefully someone has a better tip.


Direct link Reply with quote
 

Martin Wenzel
Germany
Local time: 10:56
English to German
+ ...
TOPIC STARTER
Fiddling around takes ages... Jun 22, 2007

^p works for the paragraph marks, but to replace it by a simple space ^s isn't foolproof and requires manual editing.


I am bit disappointed about the results and am hoping somebody has developed a nifty macro...

I think the other solution might be buying Adobe Acrobat, but it's always buying and upgrading...


Direct link Reply with quote
 
Richard Walker
Local time: 17:56
Japanese to English
It's doable, but... Jul 2, 2007

I've never been satisfied with programs that claim to extract from PDF because they seem to be either reheated OCR software or dumb text dumps (which is what Adobe offers and it produces spectacularly bad results when you're dealing with columns and tables).

So, to prevent myself from going totally batty, I developed a voice-activated macro that I use with Dragon Naturally Speaking to extract text from PDF to Word one paragraph at a time and eliminate unwanted hard returns as I go. The programming part isn't that hard; the difficult part is being able to trigger it in Acrobat, which doesn't have any native macro-triggering capability (and that's where Dragon comes in handy, though it does get rather old saying "next para" 517 times).

The basic strategy is for me to select a paragraph in Acrobat, copy the selection to a variable, eliminate all returns (I work from Japanese source text, so I just eliminate returns, though of course spaces could be added instead) and then append the result to the active document in Word, where the macro adds appropriate returns and in some variations, styles ("next para heading 1"). It's reliable but tedious but not as tedious as other strategies I've tried. I'd be glad to post the code if anyone is interested.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to remove paragraph marks from a converted doc

Advanced search






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs