Extract repetitions from an excel document
Thread poster: Jorge22112002
Mar 17, 2008


I have an excel document that will be divided in four. This four documents have more new words that the original, because some repetitions are new words in the divided documents.

I'd like to extract all the repetitions from the originall document to get two file, one with the new one, and other one with the repetitions. Then I'll send to translate the repetitions document and the new one will be divided.

Do you know how can I do it?

Thank you very much in advance.



Direct link Reply with quote

Tjasa Kuerpick  Identity Verified
Local time: 16:56
Member (2006)
Slovenian to German
+ ...
How to deal with duplicates in Excel Mar 17, 2008

Well I don’t understand why you want to split a document finding out the repetitions, because if you work with Trados, he will store all those words which you have translated ones, and will not recognize them as new words in his TM.
Unless you want to know actually who many repetitions there are, but in this case you should note that Trados has another way of counting.

Before attempting, any changes to the original file always make a copy and leave the original file untouched for security reasons.

There is a way to eliminate the repetitions in the Excel file simple by filtering them out and clicking on the options no duplicates. Depending on how large your file is this might take some time for Excel too finish this job. He then marks all the cells (which should be preferably in one column) and you may now mark the whole table and copy it to the new destination (new excel table)
When copying the key combinations ALT+C and shortly after that ALT+X deletes from the original file all cells that have no duplicates.
When you return to the original file, you may now check the repetitions and store them into a second table.
BUT you should notice that Excel is sensible to spaces. What does this mean?
Let’s say, you have two cells with the same words in it:
Cell C13: Customer service
Cell E24: Customer service
Obviously these two files are duplicates, hence Excel does not recognize them as duplicates, because the second cell (E24) has a space after the word service, which the cell C13 does not. Thus, Excel will not touch these two cells and leave the duplicates in the original table.
As these spaces are often important you may not just delete them, as it might influence the final layout or any other feature or two words are then merged together and so on. If you want that Excel regardlessly of that recognizes them as duplicates you have to some VBA programming in Excel, where you can define how Excel should treat such examples.

After all this way of working is rather time intensive especially in a very big file and is advised only in cases where a good preparation is needed, to increase the quality of the translation or when two or more people have to work on this document because of a short deadline or to ensure that they all use the same terms or wording. Nevertheless, you should note as well that depending on the content some repetitions might have a completely different meaning if for example a part of the sentence was put in the next cell or in the words occur in combination with other words (which after all depends as well on the subject of your translation). It will be therefore important to check all those repetitions while translating wether they are proper or making sence to the the context.

Direct link Reply with quote
Extract repetitions from an excel document Mar 18, 2008


Thank you for your answer. It is a goog way to solve my problem.
The problem is because I have to send the divided files to diferent translator and I don't want that they translate de repetead segments in different ways.

Thank you very much for your help



Direct link Reply with quote

To report site rules violations or get help, contact a site moderator:

You can also contact site staff by submitting a support request »

Extract repetitions from an excel document

Advanced search

BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »

All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs