Extracting the source text
Thread poster: Noha Kamal, PhD.

Noha Kamal, PhD.  Identity Verified
Local time: 13:01
English to Arabic
+ ...
Aug 30, 2007


I am currently working on a regular English-Arabic text. When I am done, I will clean up the file (which will of course delete the English source text) and leave me the Arabic target. Now my question is: is there any way I could extract the English source text in a separate file? I mean the client wants a clean version where there is a column on the left containing the English and another on the right containing the Arabic. So, how do I get the English (all clean and tidy) after the clean up?



Shouguang Cao
Local time: 19:01
English to Chinese
+ ...
A common problem. Aug 30, 2007

Noha, here's what I do.

1. Draw a table with two columns.
2. Copy source English words into these two columns (That is two identical copies).
3. Translate the left column.
4. Clean up the file.


Noha Kamal, PhD.  Identity Verified
Local time: 13:01
English to Arabic
+ ...
I am actually doing proofreading, Dallas Aug 30, 2007

Hi Dallas,

Thanks for your reply. Well, I guess it is a bit more complicated than that. Because I am actually proofreading a text that has already been translated. So the problem is that I do not have the source text alone. I only have the bilingual file with both languages intermingled. Any hope?

thanks again,


Narcis Lozano Drago  Identity Verified
Local time: 13:01
Member (2007)
English to Spanish
+ ...
Try this, please Aug 30, 2007

Have you tried opening the file (I assume it's a .doc with all the tags from Workbench) with SDLX?. In the test I have done it correctly separated the text in two columns: source and translation. You will only have to copy the text source.

Hope it also works for you,


Noha Kamal, PhD.  Identity Verified
Local time: 13:01
English to Arabic
+ ...
Do not have SDLX at the moment Aug 30, 2007

Unfortunately. I do not have SDLX at the moment. Anyway, I could do that using Trados?


Eugene Gulak  Identity Verified
Local time: 14:01
Member (2007)
English to Russian
+ ...
Use "Replace" command Aug 30, 2007

Hi, Noha!

If you are working with a .doc file, try using the following procedure.

The trick is to eliminate the target text which is always enclosed in Trados tags and is non-hidden.

1. Save a copy of the file.
2. Select Edit - Replace (names of software options could be a little different - I use Russian version of Word) or press Ctrl+H.
3. Click More to expand the Find/Replace dialog.
4. Be sure to check the box Use wildcards.
5. Enter the following string into "Find" field - without commas and spaces (I can't put the strings directly here because they are confused with HTML-tags and I am not a great specialist in HTML):

back slash, left curly bracket, back slash, greater-than sign, asterisk, back slash, less-than sign, 0, back slash, right curly bracket

Left the "Replace with" field empty.

The trick here is to replace target text (asterisk) between opening and closing Trados tags with nothing (back slash is needed before certain characters so that Word doesn't confuse them with wildcards: the key for back slash is usually to the left of backspace key on the keyboard).

6. Click Replace all.

This should delete your target text and some Trados tags. Next you'll have to eliminate the remaining Trados tags. They are always of purple color. So do the following.

7. Clear "Find" string.
8. Clear the check box "Use wildcards".
9. Click in the "Find" field.
10. In the lower part of Find/Replace dialog click Format - Character (or Font - I am not sure which is the English option).
11. In the Text color field (or Character color or Font color - not sure again) select the color of Trados tags - it should be in the 3 row, 7 column (to be sure of exact tag color just highlight any tag in the text and select Format - Character (Font) in the main menu: you'll see it in the color matrix).
12. Click OK. You return to Find/Replace dialog, where you are now going to replace text of purple (or whatever it is) color with nothing.
13. Click Replace All.

Now you got rid of all the tags and target text. The last step is to make the text non-hidden.

14. Click Edit - Select All or press Ctrl+A.
15. Select Format - Character (Font).
16. Click the check box "hidden" twice so it is empty (niether green nor checked).
17. Click OK.
18. Save the file.

Now you should have nice source text. Actually the procedure is fairly simple. I hope it helps. Just be sure to enter the exact string at step 5.

Good luck!


Noha Kamal, PhD.  Identity Verified
Local time: 13:01
English to Arabic
+ ...
You are incredible!!! Aug 30, 2007

Hi Eugene,

You are a life-saver! Has anyone ever told you that? Yes, the procedure, though long, makes perfect sense. I will follow it to the letter. Thanks a zillion times, friendicon_smile.gif


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Extracting the source text

Advanced search

Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »

  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search