Can't find/replace a character in Word
Thread poster: Kevin Fulton
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 11:20
German to English
Nov 3, 2004

(Word 2003, Windows XP Home)

Whenever I use OCR to extract a Word text file from a faxed/PDF document, I frequently receive the following character "¬" (ALT 0172) in the place of hyphens. I've tried to do a "Find and replace" operation to find each occurance and replace it with nothing. The result is that I get a message indicating that this character is not found in the document. I tried performing this operation on the same document in Open Office, hoping to find a solution, but
... See more
(Word 2003, Windows XP Home)

Whenever I use OCR to extract a Word text file from a faxed/PDF document, I frequently receive the following character "¬" (ALT 0172) in the place of hyphens. I've tried to do a "Find and replace" operation to find each occurance and replace it with nothing. The result is that I get a message indicating that this character is not found in the document. I tried performing this operation on the same document in Open Office, hoping to find a solution, but with largely the same result. The difference is that each incidence of "¬" was highlighted in OO so I could find it quickly. In a multi-page document, however, this was still a major task.

I'd appreciate any thoughts on solving this problem.
Thanks, Kevin
Collapse


 
Selçuk Budak
Selçuk Budak  Identity Verified
Local time: 18:20
English to Turkish
+ ...
Did you try special Tab? Nov 3, 2004

This occurs frequently in scanned texts. Usually, such special characters are either optional hyphens, or nonbreaking hyphens. Whenever arises, I solve this problem by following the procedure outlined below.

Open the normal "Find & Replace" window
Expand "more" tab, if it is not visible
1. Press "Special"
2. Choose "Soft Hyphen"
3. Press "Find"

If Word finds nothing, or if the found character is not what you intent to find, repeat the above steps
... See more
This occurs frequently in scanned texts. Usually, such special characters are either optional hyphens, or nonbreaking hyphens. Whenever arises, I solve this problem by following the procedure outlined below.

Open the normal "Find & Replace" window
Expand "more" tab, if it is not visible
1. Press "Special"
2. Choose "Soft Hyphen"
3. Press "Find"

If Word finds nothing, or if the found character is not what you intent to find, repeat the above steps with "Nonbreaking Hyphen" (in step 2), but first do not forget first to clear the search box.

Forgot to mention: "Manual Line Break" is another special character found in such OCRed documents. So you may consider repeating the a/m procedure with "Manual Line Break"

h.i.h

[Edited at 2004-11-04 00:02]

[Edited at 2004-11-04 09:06]
Collapse


 
IanW (X)
IanW (X)
Local time: 16:20
German to English
+ ...
Search and Replace Nov 4, 2004

Hi Kevin,

In German, this is called a "Bedingter Trennstrich" - I've no idea how this translates into English. Here's how you get rid of them:

Call up Search and Replace (Cntrl-H). Enter "^-" (the two symbols inside the inverted commas) in the Search field and leave the Replace empty. Press "Replace all" and this deletes all these Trennstriche.

Hope this helps


Ian


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 11:20
German to English
TOPIC STARTER
Solution worked Nov 4, 2004

Thank you, Selçuk and Ian!
I was approaching the problem from the wrong direction, not realizing that "¬" represented a Word function instead of a character.
Once again, thanks to both of you.
Kevin


 
tectranslate ITS GmbH
tectranslate ITS GmbH
Local time: 16:20
German
+ ...
Why bother? Nov 4, 2004

The soft hyphens, protected spaces etc. should not have any impact on pre- or postprocessing, analysis or translation. I don't think I'd even bother to filter these characters out if I were you. But maybe I just don't understand your reasons. These ¬ characters don't appear in a printout of the document, do they?

Benjamin


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 11:20
German to English
TOPIC STARTER
Screws up matches in DVX Nov 4, 2004

> The soft hyphens, protected spaces etc. should not have any impact on pre- or postprocessing<

The ¬ character appears in the source text when I import it into Deja Vu X and prevents matching. I recently had a document that had passages that were identical to those in a doc I had translated previously. Since there was some new text, there had been some some formatting changes (different line breaks resulting in different hyphenation, one doc had justified text and the document edi
... See more
> The soft hyphens, protected spaces etc. should not have any impact on pre- or postprocessing<

The ¬ character appears in the source text when I import it into Deja Vu X and prevents matching. I recently had a document that had passages that were identical to those in a doc I had translated previously. Since there was some new text, there had been some some formatting changes (different line breaks resulting in different hyphenation, one doc had justified text and the document editor had used hyphenation. The first document had used no hyphenation at the end of lines). When I processed the second document with OCR, I had to manually remove all the ¬ characters to get the text to match. Even though removing this character took only a few keystrokes each time, it added up over 20 pages. Plus I didn't want my TM to have the characters.
Cheers, Kevin
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can't find/replace a character in Word






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »