Creating a memory from tab delimited Word document
Thread poster: Madeleine Guerra
Madeleine Guerra  Identity Verified
Canada
Local time: 19:19
French to English
Jul 24, 2009

My client has provided me with a 290-page reference document to be used in translations I produce for them. The document is a tab-delimited glossary of sorts that contains corresponding translations such as:

- (lien) de parenté [tab] kinship

some have variations:

- actifs du savoir; fonds de connaissances; avoirs de connaissance [tab] knowledge assets

as well as department titles with their abbreviations and a certain number of phrases.

I would like to be able to either use MultiTerm or load it into Studio memory file (perhaps convert the contents to a table, save each language in a separate doc, WinAlign and import into a Studio memory file?) to have access with concordance search.

I have only basic experience with CAT tools and have never used MultiTerm. Can someone suggest what is the best option for me? Hopefully, this could be done automatically -- entering each term would obviously be a huge job.

Thanks for any advice!


Direct link Reply with quote
 

Marina Soldati  Identity Verified
Argentina
Local time: 21:19
Member (2005)
English to Spanish
+ ...
Hi Madeleine Jul 24, 2009

I´m sure there´s a better way to do this but here´s what I would do.

1. First, convert text to table in Word using tabs as column separator.
2. Copy the columns into an Excel file. In the first row specify the Languages.
3. Use Multiterm Convert to convert the Excel file into a Multiterm termbase.

Read the following thread
http://www.proz.com/forum/sdl_trados_support/130847-from_scratch_excel_import_into_trados_07_multiterm.html

for clear instructions on the conversion process.

Hope this helps.

Marina


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 19:19
English to French
+ ...
A bit less complicated Jul 24, 2009

You can open a tab-delimited text file directly in Excel. Just right-click on the text file, select Open with... and click on Excel. Then, you will have the choice to save this document in a large variety of file formats, including an Excel sheet. Once you have saved the file as an Excel sheet, you just need to add an extra row at the top, in which you enter the source language and the target language (MultiTerm will need this to figure out what the source and target languages are and which string goes where in the termbase).

For more information, please, visit this link: http://ecolotrain.uni-saarland.de/index.php?id=1714&L=1

Also explore the rest of the Ecolotrain site--you will learn lots about Trados and SDLX. It really is worth a visit, even for people who are already managing to use the software.


Direct link Reply with quote
 
Madeleine Guerra  Identity Verified
Canada
Local time: 19:19
French to English
TOPIC STARTER
Additional column for variants? Jul 24, 2009

Grand merci, Viktoria

That should save me a bunch of work. I will go to the website you suggested but in the meantime, since I have some entries that contain variants such as:

abstention; refus; non-administration; non-utilisation; fait de ne pas recourir à
--> withholding; non-initiation of treatment

I'd like to clean up my Excel document before importing the whole thing in MultiTerm by cutting and pasting the variants in a column for alternative source terms and alternative target terms and (how ambitious am I!!!) creating new entries with the alternatives. What title should I give those new columns so MultiTerm knows what to do with them?

Yo! what a lovely weekend lies ahead of me!


Direct link Reply with quote
 
FarkasAndras
Local time: 01:19
English to Hungarian
+ ...
synonyms Jul 24, 2009

You can make Multiterm handle the synonyms correctly in a couple of ways.
You can do this automatically if they have a unique separator, in this case the semicolon - i.e. there are no semicolons anywhere else in the file. Don't fool around doing it by hand.


One solution is to rearrange the spreadsheet, making several French columns like you are saying (just give them all the same header). They will automatically be recognized and imported accordingly, as synonyms within the same record. You can do this by selecting one column, copying it to Notepad to strip the formatting, copying it to Word from there because Word has better search and replace and replacing ; with ^t. Then select all and copy to Excel (note that it will take up several columns now.)

The other (my preferred) solution is to feed the xls into Multiterm Convert first, then open the resulting xml and replace semicolons with </term></termGrp><termGrp><term>

If there are more than a few dozen replacements to do, use MS Word. Notepad is slow... Double click on the xml to open it with Notepad, select all, copy to Word, do the replacement, copy back to Notepad and save. Then you can import to Multiterm as normal.

[Edited at 2009-07-24 20:06 GMT]


Direct link Reply with quote
 

ViktoriaG  Identity Verified
Canada
Local time: 19:19
English to French
+ ...
A simple method Jul 24, 2009

First, you need to make sure that every single term on a line is separated by a tab. For instance, you would get (where -> is a tab):
abstention -> refus -> non-administration -> non-utilisation -> fait de ne pas recourir à -> withholding -> non-initiation of treatment

Next, you need to ensure that the first target term is separated from the beginning of the line by the same number of tabs thoughout. So, if you have a line with only one term that has two translations and another line with three equivalent terms that have three translations, your file should look like this:

source1 -> nothing -> nothing -> target1 -> target2 -> nothing
source1 -> source2 -> source3 -> target1 -> target2 -> target3

This is because on the second line, the first target (English) column is the fourth column (three tabs), so on the first line, even if you would end up with some blank cells, the first target (English) term should also be in the fourth column (three tabs).

All you really need to do is ensure that the first target term on each line is in the same column. Then, ensure that the column header for each column for your source terms is "French" and that the column header for each target term column is "English". It doesn't matter if you have a column that has a header but doesn't contain any terms--MultiTerm Convert will simply skip empty columns and cells. Thus:

French -> French -> French -> English -> English -> English
source1 -> nothing -> nothing -> target1 -> target2 -> nothing
source1 -> source2 -> source3 -> target1 -> target2 -> target3

Once you are done preparing the file in this way, you just need to convert it using MultiTerm Convert. Everything that is on the same line will be part of a single MultiTerm entry, and when you use term recognition in Trados, all of the synonyms will be listed for you to choose from.

Please, do post here again if you have any questions on the above.

Edited to add:
Please, don't be too ambitious. By creating a separate entry for variants, you could end up with a much larger termbase and that would be a LOT of work in Excel. With the volume you are talking about, a week-end would be far from enough. Only for the example you posted above, you would end up with ten lines in Excel, whereas you could replace that by only one line (which is already there, so it needs practically no work--you only need to ensure the first target term is always in the same column for each line). By putting all variants in the same MultiTerm entry (on the same line in Excel), you are ensuring that Workbench displays all synonyms along with the source term without you having to scroll and search (you will still be able to choose which one you insert).

Edited again to address other information in the glossary:
If the other "fields" in your glossary are other types of content (e.g., context, name of department, etc.), then you can use descriptive fields for them. However, you will have to add these descriptive fields into your termbase when you create it. At some point, MultiTerm asks you to define descriptive fields. If you have department names, define a field called "Department". Later during termbase creation, you are asked to confirm the termbase structure--make sure to add the descriptive fields you have added earlier (such as "Department") at the term level. You will then need to add "Department" as a column header in your Excel document and ensure that all department names on all lines are in this column. Thus:

French -> French -> French -> English -> English -> English -> Department
source1 -> nothing -> nothing -> target1 -> target2 -> nothing -> Accounting Dept.
source1 -> source2 -> source3 -> target1 -> target2 -> target3 -> Purchasing Dept.

[Edited at 2009-07-24 21:10 GMT]


Direct link Reply with quote
 
Madeleine Guerra  Identity Verified
Canada
Local time: 19:19
French to English
TOPIC STARTER
How does MultiTerm handle queries? Jul 26, 2009

As I mentioned in my very first post, I have never used MultiTerm before. I am wondering if I really need to separate the synonyms. If the record contains the following information, for example:

French
aide au suicide; suicide avec aide; cas d'aide au suicide [?]; suicide secondé [?]; suicide assisté [?]; faciliter le suicide de qqn
English
assisted suicide; suicide; aiding suicide

and I search for "faciliter le suicide," will MultiTerm bring up that record? If so, then I don't need to spend innumerable hours separating the entries.

What do you think?


Direct link Reply with quote
 
FarkasAndras
Local time: 01:19
English to Hungarian
+ ...
just do it Jul 26, 2009

Madeleine Guerra wrote:

As I mentioned in my very first post, I have never used MultiTerm before. I am wondering if I really need to separate the synonyms. If the record contains the following information, for example:

French
aide au suicide; suicide avec aide; cas d'aide au suicide [?]; suicide secondé [?]; suicide assisté [?]; faciliter le suicide de qqn
English
assisted suicide; suicide; aiding suicide

and I search for "faciliter le suicide," will MultiTerm bring up that record? If so, then I don't need to spend innumerable hours separating the entries.

What do you think?


Read my post, setting this up takes about 20 seconds, not hours.

Multiterm has pretty good fuzzy search which will find the right entry even from a tiny fraction if you look for it hard enough, but if you'll be using it for automatic lookups during translation, it is crucial to process synonyms correctly. Lookup works on the basis of a match %, and if there are 3 synonyms in one entry, then obviously the match on one of them isn't likely to be above about 30% so MultiTerm won't identify it as a match and won't serve it up to you.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Creating a memory from tab delimited Word document

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search