merging Excel cells with same content for import into Multiterm
Thread poster: Lutz Molderings

Lutz Molderings  Identity Verified
Germany
Local time: 00:19
Member (2007)
German to English
+ ...
Feb 15, 2010

I have a fairly large bilingual Excel glossary I would like to import into Multiterm.

The problem I have is that cells with the same content will create separate entries in Multiterm.

The word "accelerator" for example (see below), will appear as four separate entries.
Is there any way I can import the glossary so that I only have one entry for "accelerator" with four possible German translations?
I know I can merge entries in Multiterm, but this would be far too time-consuming as the file is very large.
Thanks for your help.




Direct link Reply with quote
 
FarkasAndras
Local time: 00:19
English to Hungarian
+ ...
Excel formula Feb 15, 2010

I use Excel formulas for this. It's not elegant, but it works well even with large files and is a pretty flexible solution.
A rough idea of how to do it:

- Supposing that the two columns in your picture are A and B, sort the whole table alphabetically by column A. Write IF(a1=a2;b2;"notsyn") in colum C. (In excelspeak, this means: if the content of a1 and a2 is the same, put the content of b2 in this cell, otherwise, write "notsyn" here.)
- Copy the formula to all relevant rows of column C. Now you have (some of) your synonyms in separate columns in the same row, which is what you want.
- Repeat in column D with IF(a1=a3;b3;"notsyn") to add the third synonym and so on. Of course you need to repeat the formula in as many columns as there are synonyms for any single term. Now all German synonyms are in separate columns in the row where the corresponding English term first occurs.
- Remove unneeded rows with a filter (roughly: select column A, click Data/filter/special, check only show uniqe records, filter in place).

This is a just rough idea of what I do with term lists like this, I didn't test every step now... I may have screwed it up somewhere. If all went well, all you need to do is select the whole table and copy-paste to a text editor and then back to a different Excel worksheet. Then you can name columns B, C, D etc the same (i.e. German), remove "notsyn" with search and replace and import into Multiterm as normal. Multiterm will recognize all identically named columns as containing synonyms.

It takes a bit of fiddling, but the good thing is that the process takes pretty much the same amount of time regardless of whether your termbase has 100 entries or 50,000.
As long as it is rigorously formatted, Excel tricks like these can usually do what needs to be done.

If the table also has multiple English synonyms for the same German term, you'll have to repeat the process making sure you don't lose your hard-earned German columns.


Direct link Reply with quote
 

Drew MacFadyen
SITE STAFF
SUMIF formula Feb 16, 2010

You can also make a new column A then use the SUMIF formula and then copy/paste in to the entire column A (SUMIF b1=b2, if TRUE = SYN, if FALSE = OK) this will add a data set to column A that will show OK for all single entries and then SYN for duplicates. This will keep the first entry as it is is not a duplicate, but will add SYN to the 2nd, 3rd, 4th entry for a synonym. Then all you have to do is select Data and sort by OK. 2007 even has a data sort drop down. Once you have the data sorted (excluding your Synonyms) you can copy and paste the resultant data in to a new spreadsheet that is free from duplicates. There are other tutorials and methods on how to de-duplicate spreadsheet data - this one from microsoft http://office.microsoft.com/en-us/excel/HA010346261033.aspx is good, but permanently deletes the data so make a copy first

Drew


Direct link Reply with quote
 
FarkasAndras
Local time: 00:19
English to Hungarian
+ ...
Dupes vs synonyms Feb 16, 2010

Drew MacFadyen wrote:

You can also make a new column A then use the SUMIF formula and then copy/paste in to the entire column A (SUMIF b1=b2, if TRUE = SYN, if FALSE = OK) this will add a data set to column A that will show OK for all single entries and then SYN for duplicates. This will keep the first entry as it is is not a duplicate, but will add SYN to the 2nd, 3rd, 4th entry for a synonym. Then all you have to do is select Data and sort by OK. 2007 even has a data sort drop down. Once you have the data sorted (excluding your Synonyms) you can copy and paste the resultant data in to a new spreadsheet that is free from duplicates. There are other tutorials and methods on how to de-duplicate spreadsheet data - this one from microsoft http://office.microsoft.com/en-us/excel/HA010346261033.aspx is good, but permanently deletes the data so make a copy first

Drew


If I understand you correctly, this would remove all lines where the English entry is identical to a previous English term. This is not what we want here. The duplicated English terms are not superfluous dupes: they are paired with different German terms (i.e. there are several German synonyms for the same English term).

Anyway, if one just wanted to remove dupes, that would be easier to do in fewer steps with a filter (third step in my description, select both colums if you want to only remove the records where both entries are identical).

Edit: I just spotted an unclear bit in my explanation: of course you have to put IF(a1=a2;b2;"notsyn") in C1 and then copy down from there, that's how the cell references will be correct. Also, when I say copy to a text editor and then back to Excel, I mean copy to a text editor, select all text in the text editor, copy to clipboard and then paste to Excel. This gets rid of all wonky things we created, like formulas and hidden rows.

[Edited at 2010-02-16 17:06 GMT]


Direct link Reply with quote
 

Lutz Molderings  Identity Verified
Germany
Local time: 00:19
Member (2007)
German to English
+ ...
TOPIC STARTER
perfect Feb 17, 2010

Thank you very much Farkas! This is exactly the kind of explanation I was hoping for. I followed your instructions and everything worked out perfectly.

Thanks also for your suggestion Drew, but as Farkas said, the duplicated English terms are not superfluous. But thanks anyway.


Direct link Reply with quote
 

István Hirsch  Identity Verified
Local time: 00:19
English to Hungarian
Another method for later use Feb 17, 2010

Column = capital letter. You have source items in A, target items in B.
Copy A1 into C1, click on C2 and write in the fx line (formula bar): =IF(A1=A2;"";A2) then press enter. Click on C2 again, move the mouse to its right bottom corner and when the + sign appears, drag it down along C as long as there are data in B. Now in C you have every "source language" item only once. Select and copy C and paste it into A, by clicking on the little arrow next to the Paste symbol in the toolbar and choosing „Value” from the drop-down list. Delete C.
To produce the desired cell structure, copy/past A and B into Word, and convert the table into text (separator: tabulator). Then choose Editing/Replace and write in the Find box: ^p^t ( end of paragraph, tabulator) and in the Replace box: space (or anything that you want to put between the several target language items in the same line), and click on Replace All. Copy/paste this into Excel.


Direct link Reply with quote
 
FarkasAndras
Local time: 00:19
English to Hungarian
+ ...
Welcome Feb 17, 2010

Lutz Molderings wrote:

Thank you very much Farkas! This is exactly the kind of explanation I was hoping for. I followed your instructions and everything worked out perfectly.

Thanks also for your suggestion Drew, but as Farkas said, the duplicated English terms are not superfluous. But thanks anyway.



You're welcome, I'm glad it worked.


István Hirsch wrote:

Column = capital letter. You have source items in A, target items in B.
Copy A1 into C1, click on C2 and write in the fx line (formula bar): =IF(A1=A2;"";A2) then press enter. Click on C2 again, move the mouse to its right bottom corner and when the + sign appears, drag it down along C as long as there are data in B. Now in C you have every "source language" item only once. Select and copy C and paste it into A, by clicking on the little arrow next to the Paste symbol in the toolbar and choosing „Value” from the drop-down list. Delete C.
To produce the desired cell structure, copy/past A and B into Word, and convert the table into text (separator: tabulator). Then choose Editing/Replace and write in the Find box: ^p^t ( end of paragraph, tabulator) and in the Replace box: space (or anything that you want to put between the several target language items in the same line), and click on Replace All. Copy/paste this into Excel.


That's smart. In this case (the goal is to import to Multiterm) the most convenient solution would be to replace ^p^t with ^t in order to get the terms in separate columns.
(Note: you could also replace with any unique character string like xxxsynxxx, run Multiterm Convert and then open the resulting xml and replace xxxsynxxx with </term></termGrp><termGrp><term>. This is what I do with term lists that contain synonyms within the same cell separated by a unique character string)


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

merging Excel cells with same content for import into Multiterm

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search