Cleaning a large translation memory - WF Pro 3
Thread poster: romine

romine
Local time: 17:45
English to German
+ ...
Sep 26

We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient.

What is the best way to go about this? In the past, I always just opened the txt in Excel and deleted or adjusted the redundant or faulty segments but I feel like this is neither efficient nor a particularly safe way to do it.

Is there a way to get the TM administration perspective to work properly? When set to the default of only
... See more
We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient.

What is the best way to go about this? In the past, I always just opened the txt in Excel and deleted or adjusted the redundant or faulty segments but I feel like this is neither efficient nor a particularly safe way to do it.

Is there a way to get the TM administration perspective to work properly? When set to the default of only retrieving 100 segments at a time, it doesn't see to let me filter the segments in any kind of practical way. When I try to set the segment limit to a higher number (no matter if it's 1,000 or 50,000) the TM won't load.

Assuming that I can't get the TM administration perspective to work, what is the best way to remove all tags/placeables? Just remove them manually using copy & paste in the txt file?

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?

I should also add that I cannot download any external tools to help me with this as my company has a very strict policy about that and it would probably take weeks for them to approve a new tool.

Does anyone have any recommendations how to go about this? We are on the most recent version of WF Pro 3 (3.4.14).

Thank you in advance!
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:45
Member (2006)
English to Afrikaans
+ ...
@Romine Sep 26

romine wrote:
We are trying to clean our main translation memories, starting with one of approx. 50,000 segments, to make things faster and more efficient. ...
I should also add that I cannot download any external tools to help me with this as my company has a very strict policy about that and it would probably take weeks for them to approve a new tool.


I have never had good experiences with Wordfast's TM editors.

If you were able to download other utilities, I would have recommended that you try Okapi Olifant and Xbench, both of which I have used in the past to edit WF TMs.

Assuming that I can't get the TM administration perspective to work, what is the best way to remove all tags/placeables? Just remove them manually using copy & paste in the txt file?


Yes, I think w.r.t. tags, it should be fairly straight-forward to just do find/replace on the text file itself. You can also open it in Excel, or you can open it in a text editor and then copy it to Excel.

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?


Yes, TUs (translation units) whose date starts with "xx" are marked for deletion. There are processes in Wordfast that will actually delete them, but until you use those processes, the TUs are simply marked for deletion.

As to why they exist at all: whether Wordfast updates an existing TU or creates a new one (and marking the old one for deletion) depends on the TM settings in Wordfast. For example, if your current user ID different from the user ID of the TU, etc.


[Edited at 2019-09-26 19:33 GMT]


 

romine
Local time: 17:45
English to German
+ ...
TOPIC STARTER
@Samuel Sep 26

Thank you very much, and sorry, of course I meant to say find/replace and not copy/paste for the tags.

Regarding the units marked for deletion, how can I use the processes to delete them? If I find xx marked segments in the txt can I just delete them there directly or will that corrupt the TM?


 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:45
Member (2006)
English to Afrikaans
+ ...
@Romine Sep 27

romine wrote:
1. Regarding the units marked for deletion, how can I use the processes to delete them?
2. If I find xx marked segments in the txt can I just delete them there directly or will that corrupt the TM?


I don't know what the process in WFP3 is to delete them. But if you view the file in a Unicode-aware text editor with word wrap disabled, you can delete any line (i.e. the whole line) that starts with "xx", and the rest of the TM will be safe. Opening and deleting lines in Excel is also safe, except that if Excel believes a cell contains a formula, it will corrupt that particular cell (and that will affect that particular TU, but it won't affect the rest of the TM).


 

romine
Local time: 17:45
English to German
+ ...
TOPIC STARTER
@Samuel Sep 27

Thanks again - we found 9,000 segments marked for deletion in the first TM today, wow. I can't believe we have been working with Wordfast for years and never knew about this.

 

kneyens
Belgium
Local time: 17:45
French to Dutch
delete TU's marked with "xx" Oct 1

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. Am I right in assuming that Wordfast won't retrieve these segments for matches anyway and I can just delete them from the txt? Why are they even still in the txt? I always thought that, if I update a segment previously committed to the TM, only the updated version is kept in the TM?


The TU's marked with "xx" will disappear from your TM when you reorganise. I always ask to perform a reorganization before cleaning up a large TM. For the moment I do the clean-up in Excel, but this has many flaws, as you already mentioned yourself. I have searched and asked for tips for a better way, but I haven't found anything so far. I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...

Kind regard,
Katleen


 

Samuel Murray  Identity Verified
Netherlands
Local time: 17:45
Member (2006)
English to Afrikaans
+ ...
The purpose of WF's TM editor Oct 1

kneyens wrote:
I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...


I think the main purpose of WF's TM editor is to be a TM previewer, i.e. to help users see generally what is contained in the TM.


 

Milan Condak  Identity Verified
Local time: 17:45
English to Czech
Erase of not valid TUs Oct 1

[quote]kneyens wrote:

Also, I think I read somewhere that "invalid" segments (segments that were updated at some point) are marked with "xx" at the beginning in the txt. ... I think it is rather ridiculous of WF to provide a TM editor that actually doesn't do the job at all...

Kind regard,
Katleen


I read manuals and the features works.

https://www.wordfast.net/zip/WFC_7_manual.html

Find (Ctrl+F): Data Editor.

Results: 8 times

8/8 Remarks:

The date does not necessarily have a tilde (~) separating date and time. Any printable character can be used there, except a number. WFC uses the tilde (~), the equal (=) sign, and the star sign(*). The equal sign means the TU was "marked" (flagged) by WFC's data editor. This has no consequence on the TU's status: it remains fully valid. Although WFC always records the date and time when writing a TU, the date and time are optional and could be empty (or even made of an invalid date) in which case WFC would simply assume the current computer's date and time, or previous TU incremented by one second, if in a sequential loop. Dates and times are "local", taken from the local computer's clock.
If any optional field is left empty, its trailing tabulator should be present. For a TU to be valid, there must be at least six tabulators, with the fifth field (the source segment, located between the fourth and the fifth tabulator) made of at least one printable character.
The date's first character (a number from 0 to 9, usually, a number 2 if the TU was created in the current millenium) can be "x". It means that this TU is not valid anymore - WFC marked it for future deletion.

xxx The first full reorganisation of the TM by WFC will erase this TU. xxx

Do not remove the "x", or replace it with a number, unless you know what you are doing.

--

The erase of TUs marked by "x" worked in all previous versions of WFC.

--
https://www.wordfast.net/index.php?whichpage=downloadpage

Documentation:
Wordfast Classic manual, version 7.xx (English, rev. 1 Sep 2019)
Wordfast Classic manual, version 6.9 (English, rev. 10 Aug 2017)
Wordfast Classic manual, version 5.9x (English, rev. 06 Oct 2010)
Download reference manuals in other languages


I do not use this feature for TEXT TM. I convert WTM to TMX. I manage translation memories in TMX format in other tools.

Milan

[Edited at 2019-10-01 11:36 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Cleaning a large translation memory - WF Pro 3

Advanced search


Translation news related to Wordfast





WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search