Can't find/replace a character in Word
Thread poster: Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 03:41
German to English
Nov 3, 2004

(Word 2003, Windows XP Home)

Whenever I use OCR to extract a Word text file from a faxed/PDF document, I frequently receive the following character "¬" (ALT 0172) in the place of hyphens. I've tried to do a "Find and replace" operation to find each occurance and replace it with nothing. The result is that I get a message indicating that this character is not found in the document. I tried performing this operation on the same document in Open Office, hoping to find a solution, but with largely the same result. The difference is that each incidence of "¬" was highlighted in OO so I could find it quickly. In a multi-page document, however, this was still a major task.

I'd appreciate any thoughts on solving this problem.
Thanks, Kevin


Direct link Reply with quote
 

Selçuk Budak  Identity Verified
Local time: 10:41
English to Turkish
+ ...
Did you try special Tab? Nov 3, 2004

This occurs frequently in scanned texts. Usually, such special characters are either optional hyphens, or nonbreaking hyphens. Whenever arises, I solve this problem by following the procedure outlined below.

Open the normal "Find & Replace" window
Expand "more" tab, if it is not visible
1. Press "Special"
2. Choose "Soft Hyphen"
3. Press "Find"

If Word finds nothing, or if the found character is not what you intent to find, repeat the above steps with "Nonbreaking Hyphen" (in step 2), but first do not forget first to clear the search box.

Forgot to mention: "Manual Line Break" is another special character found in such OCRed documents. So you may consider repeating the a/m procedure with "Manual Line Break"

h.i.h

[Edited at 2004-11-04 00:02]

[Edited at 2004-11-04 09:06]


Direct link Reply with quote
 
xxxIanW
Local time: 09:41
German to English
+ ...
Search and Replace Nov 4, 2004

Hi Kevin,

In German, this is called a "Bedingter Trennstrich" - I've no idea how this translates into English. Here's how you get rid of them:

Call up Search and Replace (Cntrl-H). Enter "^-" (the two symbols inside the inverted commas) in the Search field and leave the Replace empty. Press "Replace all" and this deletes all these Trennstriche.

Hope this helps


Ian


Direct link Reply with quote
 
Kevin Fulton  Identity Verified
United States
Local time: 03:41
German to English
TOPIC STARTER
Solution worked Nov 4, 2004

Thank you, Selçuk and Ian!
I was approaching the problem from the wrong direction, not realizing that "¬" represented a Word function instead of a character.
Once again, thanks to both of you.
Kevin


Direct link Reply with quote
 

tectranslate ITS GmbH
Local time: 09:41
German
+ ...
Why bother? Nov 4, 2004

The soft hyphens, protected spaces etc. should not have any impact on pre- or postprocessing, analysis or translation. I don't think I'd even bother to filter these characters out if I were you. But maybe I just don't understand your reasons. These ¬ characters don't appear in a printout of the document, do they?

Benjamin


Direct link Reply with quote
 
Kevin Fulton  Identity Verified
United States
Local time: 03:41
German to English
TOPIC STARTER
Screws up matches in DVX Nov 4, 2004

> The soft hyphens, protected spaces etc. should not have any impact on pre- or postprocessing<

The ¬ character appears in the source text when I import it into Deja Vu X and prevents matching. I recently had a document that had passages that were identical to those in a doc I had translated previously. Since there was some new text, there had been some some formatting changes (different line breaks resulting in different hyphenation, one doc had justified text and the document editor had used hyphenation. The first document had used no hyphenation at the end of lines). When I processed the second document with OCR, I had to manually remove all the ¬ characters to get the text to match. Even though removing this character took only a few keystrokes each time, it added up over 20 pages. Plus I didn't want my TM to have the characters.
Cheers, Kevin


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can't find/replace a character in Word

Advanced search






LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
Across v6.3
Translation Toolkit and Sales Potential under One Roof

Apart from features that enable you to translate more efficiently, the new Across Translator Edition v6.3 comprises your crossMarket membership. The new online network for Across users assists you in exploring new sales potential and generating revenue.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search