Removing duplicates from a termbase
Thread poster: Jacques DP
| You may search for duplicate terms || Nov 10, 2006 |
unfortunately you cannot search for duplicate entries.
However, it might be useful to search for duplicate terms by opening the menu "Search" and clicking on "Search for duplicate terms". This command searches through your current source language and lists all terms that occur more than once. You will have to browse through that list but at least you don't have to work your way through your entire database.
I hope this helps.
| How I solved it || Nov 10, 2006 |
Thanks for your answer. I saw this, but there were too many duplicate entries, and deleting them manually, even having the list of duplicate terms, was not feasible.
Since I imported the termbase from an Excel file, I reasoned that the problem would be easier to solve within Excel. (I am surprised, though, that the MultiTerm importing process doesn't offer the option of removing duplicate entries, since they are generally useless.)
Having verified that it couldn't be done through the menus in Excel, and not feeling like coding the Visual Basic script myself, I googled for it and found it here: http://www.softplatz.com/Soft/Business/Office-Suites-Tools/Excel-Unique-Duplicate-Data-Remover.html
It's shareware, but the free version will do the trick (choose Duplicate > Duplicate Row Wizard).
The only price is the risk of installing something of unknown origin: it may contain a virus, spyware, or whatever (in fact, though it's just a macro, it comes as an executable to install the macro...). Use at your own risk.
(Since the search query I used in Google was not overly specific, this means that the site where I found the macro had a good Google pagerank, which in turns makes it likely that no virus are posted there. But this is just a quick guess, not a guarantee.)
| || || |
| | Vito Smolej
Local time: 06:49
English to Slovenian
| What I would do... || Nov 10, 2006 |
is create a pivot table in Excel to remove duplicates. I know it borders on obscene, but then again...
btw, how did you manage to crate so many duplicates? using the same XML file to import (that's my source of doubles)?
| Answering your question || Nov 11, 2006 |
How did I get the duplicates in the first place: As reported in my messages above, I downloaded the new MS glossary (see URL above). Then, I only kept English and French (it's a multilingual glossary). If you do that, you will find that there is an enormous number of duplicates. Common words can have up to 10 occurrences (with the same translation). It may be because the same term has sometimes been translated differently in other languages, so that these rows are really not duplicates in the complete glossary.
Anyway, it's solved now.