How to delete redundant entries in WF glossary?
Thread poster: euge bellini

euge bellini  Identity Verified
Uruguay
Local time: 09:37
Member (2009)
English to Spanish
+ ...
Jan 25, 2010

Hi all,
I have a quite big glossary with many duplicate entries. Can I delete them using the Data Editor or something like that?
I haven´t found out how.

Thanks!


 

Attila Piróth  Identity Verified
France
Local time: 14:37
Member
English to Hungarian
+ ...
Excel? Jan 25, 2010

I would use Excel to do that

1.) Open the txt file in Word
2.) Convert it into a two-column table (using Table / Convert / Text to Table, and specifying the "Tabs" as separator
3.) Copy the table to Excel
4.) Perform alphabetic ordering according to the source-language column
5.) Weed out the duplicates. Using some Excel functions this can be done in a quite straightforward way. If the file is relatively short, flagging the duplicate entries is probably enough; you can then remove duplicates manually. To flag duplicates, you can use the IF function. If A is the source-language column, and B the target language column, use column C, e.g., write the following into cell C2:
=IF(A1=A2;"DUPLICATE";"")

(Note that I used the semicolon as separator; it may be different in your Excel version, but when you start to type IF, Excel will help you with that).

If the file is longer, you can use a function to flag conflicting duplicate entries (same source term with different target terms) and remove identical duplicates (same source term and same target term).
Incidentally, this will issue is addressed in this webinar.

6.) Once weeded out, you can copy the table back to Word, convert it into tab-delimited text, and save it as a txt file.

Kind regards,
Attila

[Edited at 2010-01-25 17:49 GMT]


 

euge bellini  Identity Verified
Uruguay
Local time: 09:37
Member (2009)
English to Spanish
+ ...
TOPIC STARTER
Will try that way... Jan 25, 2010

I have thought about it, but wanted to check ifWF could do it.

Thanks!


 

Attila Piróth  Identity Verified
France
Local time: 14:37
Member
English to Hungarian
+ ...
WF can do it, too Jan 25, 2010

Open the TM editor, select the glossary, click on Tools, and select among the special filters "Mark redundant entries (same source)" or "Mark redundant entries (same source+target)". You will be prompted to select whether you want to sort the glossary first (recommended).

Kind regards,
Attila


 

euge bellini  Identity Verified
Uruguay
Local time: 09:37
Member (2009)
English to Spanish
+ ...
TOPIC STARTER
I did that Jan 25, 2010

But they are not deleted that way, just marked...how do I erase them afterwards from the TM editor? Is it possible?

Thank you again!


 

SPI-Trad
Local time: 14:37
Czech to French
+ ...
Ctrl+Z Jan 25, 2010

(In WF Classic)
Once the redundant entries are marked, you erase them with Ctrl+Z (Ctrl+X then Ctrl+Del work too, methinks)
Also, when you're in the Glossary editor, you can press F1 to display all the command shortcuts


 

Attila Piróth  Identity Verified
France
Local time: 14:37
Member
English to Hungarian
+ ...
Follow pop-up instructions Jan 26, 2010

Hi again,

After the glossary has been sorted and redundant entries marked, a pop-up panel appears:

XX redundant entries marked.

You can *cut* marked entries with Ctrl+X, then *delete* them with Ctrl+Delete.


Kind regards,
Attila


 

euge bellini  Identity Verified
Uruguay
Local time: 09:37
Member (2009)
English to Spanish
+ ...
TOPIC STARTER
Thank you both! Jan 26, 2010

icon_smile.gif

 

Marina Aleyeva  Identity Verified
Ukraine
Local time: 15:37
English to Russian
+ ...
Workaround to process glossaries in Olifant Jan 27, 2010

Although Olifant originally is for TMs only, you can easily use it to manage your glossaries thanks to the fact that the structure of Wordfast TMs and glossaries is so similar. I have just tried this workaround that I found on an Enlaso yahoo group forum and it worked like charm:


I finally opened the glo in Excel, cut it into 2 parts of less than 9000 entries, added 4 columns before the source, another column betwen source and target, and any old numbers on the last line to preserve the columns.

http://tech.groups.yahoo.com/group/enlasotools/message/563

Once you have prepared your glossary, just open it in Olifant and delete any redundant entries as usual.

Hope this helps.

P.S. Don't forget to save it as Unicode text after you have finished with columns in Excel! (Copy and paste into a text editor, e.g. Notepad, then save as text - Unicode.)

After you have finished with Olifant, you can reverse the structure of your glossary back to the form that WF understands by copying and pasting the contents into an Excel sheet and deleting all the extra columns that you previously added and also deleting the first raw with attributes. Now you can save your glossary as Unicode text and open it in WF.

P.P.S. If you don't know what Olifant is, just check Okapi website: http://okapi.sourceforge.net/downloads.html
Olifant is my tool of choice for all TM (and now glossary) management.

[Edited at 2010-01-27 02:34 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to delete redundant entries in WF glossary?

Advanced search


Translation news related to Wordfast





WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search