Any way of extracting _source_ text from a bilingual document?
Thread poster: xxxOTMed

Local time: 20:10
English to Polish
+ ...
Jun 3, 2004

We need to re-create source (English) text from a bilingual version (English>Polish) for alignment (the original English text is no longer available). Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text? The only way I can think of is to select and delete all styles but the source text. The problem is this approach takes quite a lot of manual deleting.
Have you perhaps heard of a tool that would do this trick automatically? TIA


Pablo Roufogalis
Local time: 13:10
English to Spanish
Yahoo group Jun 3, 2004


There's a tool in the Yahoo Trados group that claims to convert a bi-lingual Trados doc in a two-column doc. Should be easy then to select and copy/paste the source text into another document.

Never used it but you may try it and report.


Harry Bornemann  Identity Verified
English to German
+ ...
Another CAT tool Jun 3, 2004

You could export the TM in Trados text format and import this to Déjà Vu or another tool which can export its TM in tab-separated text format.
Then it will be easy to split in Excel or Access.


Local time: 19:10
French to English
+ ...
Can you send a sample? Jun 3, 2004

you can send me a sample of text by email, 100 words?
I'm sure I'll find a tool (hand made).


Gerard de Noord  Identity Verified
Local time: 20:10
Member (2003)
German to Dutch
+ ...
Try this macro Jun 3, 2004

You could try running this Word macro on a copy of the document if the file has been segmented by Wordfast or Trados:

Sorry, the macro didn't survive posting.


[Edited at 2004-06-03 14:07]


Alison Schwitzgebel
Local time: 20:10
Member (2002)
German to English
+ ...
Bear with me on this one, but it ought to work... Jun 3, 2004

1. Save an extra copy of your bilingual document (just in case this all goes horribly wrongicon_wink.gif)

2. Clean this up into a new memory.

3. Export this memory.


5. Import the memory.

6. Run the pre-translate option over your cleaned document.

7. Clean up the pre-translated document.

This is how you can generally get memories you have in the wrong language direction turned around.

Theoretically it ought to work. Let me know how you get on.




Jaroslaw Michalak  Identity Verified
Local time: 20:10
Member (2004)
English to Polish
Yet another way... Jun 3, 2004

It depends a little on the source text and its format...

That's what I would do:
1. Create a new TM.
2. Analyse the file in question.
3. Export unknown sequences to Word format.

This is assuming you're using Trados. With other tools it might be possible, too.

Several things to note:
The format of the file might not be 100% accurate. The formating in TMs for Trados is still somewhat buggy.

The repeated phrases will show up only once - you can export repetitions separately and then insert them, if needed, but it's tedious. You can also export repetitions and using them as a guide insert some markers in the biligual source, so that they are all different. (E.g. ##01##, ##02##, etc.)

Note that this way (and most of the other mentioned above) allows you to recreate only the TEXT of the original, not the document itself. If you want that (with formatting, pictures, etc.), you need to work within Word itself (or whatever software you're using - might be useful if you specify that).

[Edited at 2004-06-03 16:44]


Local time: 20:10
English to Polish
+ ...
Thank you all for your input Jun 3, 2004

I do appreciate your assistance. What we were initially considering was aligning and switching source and target segments in some way. Having received all your suggestions, we have adopted the following strategy:
The text we have is a regular bilingual trados-segmented Eng>Pol file (Trados 5.5 freelance).
We have used the following strategy to 'reverse' the TM:
a. Translated the bilingual file with an empty Trados TM (en>pl)
b. Exported this TM as a .txt (en>pl)
c. Opened the txt format as a Wordfast 3.3 TM (en>pl)
d. 'Reversed' Wordfast TM to pl>eng using built-in Wordfast functionality

At this stage all seems to be working OK appart from the fact that Wordfast looses all Polish fonts. Or better said replaces Polish fonts with rather dramatic '?'.

Next step is to try one of the options you have proposed. I will keep you all posted on the fascinating 'TM reversal' story.


Jaroslaw Michalak  Identity Verified
Local time: 20:10
Member (2004)
English to Polish
It is, indeed, fascinating... especially for a stubborn guy like me :) Jun 3, 2004

I have found a quick and quite elegant way to restore original document from a bilingual Trados Word document.

The trick is, essentially, to repeat the sequence Open Source and Restore Source repeatedly.

Three ways of doing it for now:
1. Alt+Home
2. Alt+Del
repeat ad nauseam.

Actually, with short documents it is pretty fast...

Method two:
Create a new macro, with the lines:

Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"

Assign the macro to a button. Press repeatedly.

Method three (most effective, but not recommended):
Create a new macro, with the lines:

Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Loop Until Selection.InRange (ActiveDocument.Paragraphs.Last.Range)

This works ONLY if the segments are not separated with pictures, etc. (empty paragraphs are OK), anything that would prevent Trados from opening the next segment. Also, there must not be empty paragraphs at the end of the doc (it must end with the last segment). Otherwise the macro does not stop and Winword has to be exited with Ctrl+Alt+Del. However, it is quick and worked quite nice on several documents I've tried.

[Edited at 2004-06-03 21:06]

[Edited at 2004-06-03 21:11]


Jaroslaw Michalak  Identity Verified
Local time: 20:10
Member (2004)
English to Polish
A better version of the macro... Jun 4, 2004 here:

If Selection.Characters(1).Style.NameLocal = "tw4winMark" Then
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Selection.Collapse (wdCollapseEnd)
End If
Loop Until selection.InRange(ActiveDocument.Paragraphs.Last.Range)

It works with most bilingual Trados Word documents without the limitation specified above.
It does not include text boxes, frames or headers/footers - it would need to be much more complicated than that.

BTW, is there a repository for macros in ProZ? I've got a few that might be useful...

[Edited at 2004-06-04 01:18]


Aleksandr Okunev
Local time: 21:10
English to Russian
There's such a tool Jun 4, 2004

Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text?

This is done by Plustoys(2) utility which is available in the 'files' section of Wordfast Yahoo group. Join the group and download it. The utility does more useful things too.

Happy cleanup!


Local time: 20:10
English to Polish
+ ...
Thank you - lessons learned Jun 12, 2004

Our ambitiously labeled 'how on earth do I reverse Trados TM' project was paused for a while, so please forgive the sligthly late reply.
The task of restoring source segments from a bilingual text has developed into creating a 'reverse' TM from a bilingual doc.
Both tools posted above (thanks Jabberwock and Aleksandr) did extract the source text.
As to creating the reverse TM, following some experiments and tests we concluded with a great help from the Fusion Team (thank you Alain!) that creating a reverse TM is as simple as exporting a TM for a language pair (x-y), creating a new TM for a reverse language pair (y-x) and importing the previously exported x-y TM!

The above method may perhaps be obvious to some (most) of you, but as it was a groundbreaking discovery for us, I do hope someone may find our lessons learned helpful.

Nevertheless let me thank you all for your valuable&helpful input. I am, as always, awed by the support I have received through Proz.

Best regards, Greg


To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Any way of extracting _source_ text from a bilingual document?

Advanced search

Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for users! Save over 13% when purchasing Wordfast Pro through Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »

  • All of
  • Term search
  • Jobs
  • Forums
  • Multiple search