Offline tool to compare two word lists
Thread poster: Hans Lenting

Hans Lenting  Identity Verified
Netherlands
Member (2006)
German to Dutch
Oct 26

I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists.

Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported.


 

Tony M
France
Local time: 02:29
Member
French to English
+ ...
Excel? Oct 26

Could you do it going via Excel?

Somthing like IF (value in Column A) = (value in Column B), THEN Column C = (value in Column A), ELSE [0]

Then when you copy back to word, it would be easy enough to sort the table on Column C, and manually remove all the lines where C is empty, finally resorting alphabetically on (say) C if that's important.


 

Jean Dimitriadis  Identity Verified
France
Local time: 02:29
Member
English to French
+ ...
Diff tool Oct 26

I'd look for a diff tool for text files/directories (Meld, Diffuse, Beyond Compare, etc.) or one that is specifically for Excel (ExcelMerge) if you prefer that route.

 

esperantisto  Identity Verified
Local time: 04:29
Member (2006)
English to Russian
+ ...
kdiff3 maybe Oct 26

https://stackoverflow.com/questions/12826132/from-a-kdiff3-file-comparison-can-i-generate-a-diff-in-unified-diff-format

 

Samuel Murray  Identity Verified
Netherlands
Local time: 02:29
Member (2006)
English to Afrikaans
+ ...
Try my little glossary comparison scripts (AutoIt) Oct 26

Hans Lenting wrote:
I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. ... Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported.


Oh, dear. Well, I may have something that you can use while you search for the perfect solution:
http://www.leuce.com/autoit/WFC%20Glossary%20Comparer.zip

Each of these two scripts attempts to compare two Wordfast Classic glossaries (which are tab-delimited files). I tried to quickly adapt one of them for comparing word lists that contain only 1 column (i.e. your scenario), but I'm afraid I'm too stoned right now.

So, what you need to do, is temporarily replace any existing tabs in your files with a marker, e.g. "|||", and then add a tab to the end of each line (i.e. replace \n with \t\n, or replace CRLF with TAB & CRLF, or whatever), and then use the "compare column 1" script. Also type "NONE" when prompted. The readme file is your friend.

The script outputs two additional files, named after the two original files. If an entry occurs in both files, it gets the word [BOTH] added in front of it. If an entry occurs in one file only, then, well, it just remains in that file.

Look, I used these scripts during a large translation project but did not develop them beyond the point where they were useful to me at the time. These scripts are SLOW with large files, though.


[Edited at 2019-10-26 13:00 GMT]


 

Jean Lachaud  Identity Verified
United States
Local time: 20:29
English to French
+ ...
Excel Oct 26

Top off my head:

  • Add the content of one list to the other

  • import/copy into an Excel column

  • Sort the column (if required) ([Data Tab | Sort])

  • Remove Duplicates ([Data Tab | Remove Duplicates])


  •  

    Samuel Murray  Identity Verified
    Netherlands
    Local time: 02:29
    Member (2006)
    English to Afrikaans
    + ...
    @Jean Oct 26

    Jean Lachaud wrote:
  • Add the content of one list to the other

  • Import/copy into an Excel column

  • Remove Duplicates ([Data | Data Tools | Remove Duplicates])


  • If you do this, then you end up with a column that contains all terms.

    The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term.

    In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list).


     

    Jean Lachaud  Identity Verified
    United States
    Local time: 20:29
    English to French
    + ...
    My bad Oct 26

    You are right.

    Still, I'm pretty sure there is a quick way to do that in Excel, but I don't have time today to research it.



    Samuel Murray wrote:

    If you do this, then you end up with a column that contains all terms.

    The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term.

    In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list).


     

    Samuel Murray  Identity Verified
    Netherlands
    Local time: 02:29
    Member (2006)
    English to Afrikaans
    + ...
    @Hans, here's a superfast one Oct 26

    Hans Lenting wrote:
    I'm looking for an offline tool [etc.]


    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip

    It's super, super fast. It doesn't sort the files. It creates three files: one with terms that occur only in file 1, one with terms that occur only in file 2, and one with only terms that occur in both files. Note that the script counts all instances of a term in either file as a single term (put differently: so if a term occurs twice in the same file, the script counts it as one term only; put differently: the script removes all duplicates from each file's content before comparing the two files). It leaves the original files intact.



    [Edited at 2019-10-26 15:13 GMT]


     

    Luca Tutino  Identity Verified
    Italy
    Local time: 02:29
    Member (2002)
    English to Italian
    + ...
    Just add a couple of variations to Jean solution (case sensitive) Oct 26

    Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command.

    Then you merge them and sort the merged list as suggested by Jean.

    Now, you can add a formula like this in Cell B2: =identical(A2;A1).

    Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which m
    ... See more
    Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command.

    Then you merge them and sort the merged list as suggested by Jean.

    Now, you can add a formula like this in Cell B2: =identical(A2;A1).

    Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which means originally appearing in both lists, and false for all the other terms, as well as for the first appearance of the double terms.

    Use the Automatic Filter in column B to select the "True" rows.

    Copy the filtered column A in a new worksheet, and you have your desired list.


    [Edited at 2019-10-26 16:22 GMT]

    [Edited at 2019-10-26 16:24 GMT]
    Collapse


     

    Luca Tutino  Identity Verified
    Italy
    Local time: 02:29
    Member (2002)
    English to Italian
    + ...
    Additional step for case insensitive Oct 26

    Just add the function "=upper(A1)" in B1 and copy Cell B1 in the remaining rows of column B.

    Then proceed as above by referring the "identical" formula to column B rather than column A and placing it in column C rather than column B.


    [Edited at 2019-10-26 16:24 GMT]


     

    Samuel Murray  Identity Verified
    Netherlands
    Local time: 02:29
    Member (2006)
    English to Afrikaans
    + ...
    Note Oct 27

    Samuel Murray wrote:
    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip


    The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script.


     

    Hans Lenting  Identity Verified
    Netherlands
    Member (2006)
    German to Dutch
    TOPIC STARTER
    Thank you Oct 27

    Samuel Murray wrote:

    Samuel Murray wrote:
    I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip


    The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script.


    Thank you all!

    I've used the second script that Samuel provided. @Samuel, if you can find a case-insensitive solution, I'd be much obliged. @Jean: I'll test the Mac version of Beyond Compare.


     


    To report site rules violations or get help, contact a site moderator:


    You can also contact site staff by submitting a support request »

    Offline tool to compare two word lists

    Advanced search






    WordFinder Unlimited
    For clarity and excellence

    WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

    More info »
    SDL Trados Business Manager Lite
    Create customer quotes and invoices from within SDL Trados Studio

    SDL Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

    More info »



    Forums
    • All of ProZ.com
    • Term search
    • Jobs
    • Forums
    • Multiple search