Mobile menu

Any way of extracting _source_ text from a bilingual document?
Thread poster: xxxOTMed
xxxOTMed
Poland
Local time: 01:34
English to Polish
+ ...
Jun 3, 2004

We need to re-create source (English) text from a bilingual version (English>Polish) for alignment (the original English text is no longer available). Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text? The only way I can think of is to select and delete all styles but the source text. The problem is this approach takes quite a lot of manual deleting.
Have you perhaps heard of a tool that would do this trick automatically? TIA


Direct link Reply with quote
 
Pablo Roufogalis
Colombia
Local time: 19:34
English to Spanish
Yahoo group Jun 3, 2004

Hello.

There's a tool in the Yahoo Trados group that claims to convert a bi-lingual Trados doc in a two-column doc. Should be easy then to select and copy/paste the source text into another document.

Never used it but you may try it and report.


Direct link Reply with quote
 

Harry Bornemann  Identity Verified
Mexico
English to German
+ ...
Another CAT tool Jun 3, 2004

You could export the TM in Trados text format and import this to Déjà Vu or another tool which can export its TM in tab-separated text format.
Then it will be easy to split in Excel or Access.
BR
Harry


Direct link Reply with quote
 
xxxLixus
Local time: 00:34
French to English
+ ...
Can you send a sample? Jun 3, 2004

you can send me a sample of text by email, 100 words?
I'm sure I'll find a tool (hand made).


Direct link Reply with quote
 

Gerard de Noord  Identity Verified
France
Local time: 01:34
Member (2003)
German to Dutch
+ ...
Try this macro Jun 3, 2004

You could try running this Word macro on a copy of the document if the file has been segmented by Wordfast or Trados:

Sorry, the macro didn't survive posting.

Regards,
Gerard

[Edited at 2004-06-03 14:07]


Direct link Reply with quote
 

Alison Schwitzgebel
France
Local time: 01:34
Member (2002)
German to English
+ ...
Bear with me on this one, but it ought to work... Jun 3, 2004

1. Save an extra copy of your bilingual document (just in case this all goes horribly wrong)

2. Clean this up into a new memory.

3. Export this memory.

4. Create a new memory WITH THE LANGUAGE PAIR REVERSED.

5. Import the memory.

6. Run the pre-translate option over your cleaned document.

7. Clean up the pre-translated document.

This is how you can generally get memories you have in the wrong language direction turned around.

Theoretically it ought to work. Let me know how you get on.

HTH

Alison


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 01:34
Member (2004)
English to Polish
Yet another way... Jun 3, 2004

It depends a little on the source text and its format...

That's what I would do:
1. Create a new TM.
2. Analyse the file in question.
3. Export unknown sequences to Word format.

This is assuming you're using Trados. With other tools it might be possible, too.

Several things to note:
The format of the file might not be 100% accurate. The formating in TMs for Trados is still somewhat buggy.

The repeated phrases will show up only once - you can export repetitions separately and then insert them, if needed, but it's tedious. You can also export repetitions and using them as a guide insert some markers in the biligual source, so that they are all different. (E.g. ##01##, ##02##, etc.)


Note that this way (and most of the other mentioned above) allows you to recreate only the TEXT of the original, not the document itself. If you want that (with formatting, pictures, etc.), you need to work within Word itself (or whatever software you're using - might be useful if you specify that).

[Edited at 2004-06-03 16:44]


Direct link Reply with quote
 
xxxOTMed
Poland
Local time: 01:34
English to Polish
+ ...
TOPIC STARTER
Thank you all for your input Jun 3, 2004

I do appreciate your assistance. What we were initially considering was aligning and switching source and target segments in some way. Having received all your suggestions, we have adopted the following strategy:
The text we have is a regular bilingual trados-segmented Eng>Pol file (Trados 5.5 freelance).
We have used the following strategy to 'reverse' the TM:
a. Translated the bilingual file with an empty Trados TM (en>pl)
b. Exported this TM as a .txt (en>pl)
c. Opened the txt format as a Wordfast 3.3 TM (en>pl)
d. 'Reversed' Wordfast TM to pl>eng using built-in Wordfast functionality

At this stage all seems to be working OK appart from the fact that Wordfast looses all Polish fonts. Or better said replaces Polish fonts with rather dramatic '?'.

Next step is to try one of the options you have proposed. I will keep you all posted on the fascinating 'TM reversal' story.


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 01:34
Member (2004)
English to Polish
It is, indeed, fascinating... especially for a stubborn guy like me :) Jun 3, 2004

I have found a quick and quite elegant way to restore original document from a bilingual Trados Word document.

The trick is, essentially, to repeat the sequence Open Source and Restore Source repeatedly.

Three ways of doing it for now:
1. Alt+Home
2. Alt+Del
repeat ad nauseam.

Actually, with short documents it is pretty fast...

Method two:
Create a new macro, with the lines:

Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"

Assign the macro to a button. Press repeatedly.

Method three (most effective, but not recommended):
Create a new macro, with the lines:

Do
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Loop Until Selection.InRange (ActiveDocument.Paragraphs.Last.Range)

This works ONLY if the segments are not separated with pictures, etc. (empty paragraphs are OK), anything that would prevent Trados from opening the next segment. Also, there must not be empty paragraphs at the end of the doc (it must end with the last segment). Otherwise the macro does not stop and Winword has to be exited with Ctrl+Alt+Del. However, it is quick and worked quite nice on several documents I've tried.


[Edited at 2004-06-03 21:06]

[Edited at 2004-06-03 21:11]


Direct link Reply with quote
 

Jabberwock  Identity Verified
Poland
Local time: 01:34
Member (2004)
English to Polish
A better version of the macro... Jun 4, 2004

...is here:

Do
Selection.Expand
If Selection.Characters(1).Style.NameLocal = "tw4winMark" Then
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Else
Selection.Collapse (wdCollapseEnd)
End If
Loop Until selection.InRange(ActiveDocument.Paragraphs.Last.Range)

It works with most bilingual Trados Word documents without the limitation specified above.
It does not include text boxes, frames or headers/footers - it would need to be much more complicated than that.

BTW, is there a repository for macros in ProZ? I've got a few that might be useful...

[Edited at 2004-06-04 01:18]


Direct link Reply with quote
 
Aleksandr Okunev
Local time: 03:34
English to Russian
There's such a tool Jun 4, 2004

Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text?


This is done by Plustoys(2) utility which is available in the 'files' section of Wordfast Yahoo group. Join the group and download it. The utility does more useful things too.

Happy cleanup!


Direct link Reply with quote
 
xxxOTMed
Poland
Local time: 01:34
English to Polish
+ ...
TOPIC STARTER
Thank you - lessons learned Jun 12, 2004

Our ambitiously labeled 'how on earth do I reverse Trados TM' project was paused for a while, so please forgive the sligthly late reply.
The task of restoring source segments from a bilingual text has developed into creating a 'reverse' TM from a bilingual doc.
Both tools posted above (thanks Jabberwock and Aleksandr) did extract the source text.
As to creating the reverse TM, following some experiments and tests we concluded with a great help from the Fusion Team (thank you Alain!) that creating a reverse TM is as simple as exporting a TM for a language pair (x-y), creating a new TM for a reverse language pair (y-x) and importing the previously exported x-y TM!

The above method may perhaps be obvious to some (most) of you, but as it was a groundbreaking discovery for us, I do hope someone may find our lessons learned helpful.

Nevertheless let me thank you all for your valuable&helpful input. I am, as always, awed by the support I have received through Proz.

Best regards, Greg


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Any way of extracting _source_ text from a bilingual document?

Advanced search


Translation news related to CAT tools





LSP.expert
You’re a freelance translator? LSP.expert helps you manage your daily translation jobs. It’s easy, fast and secure.

How about you start tracking translation jobs and sending invoices in minutes? You can also manage your clients and generate reports about your business activities. So you always keep a clear view on your planning, AND you get a free 30 day trial period!

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs