Converting 2-column list to glossary
Thread poster: Tony M

Tony M
France
Local time: 22:57
Member
French to English
+ ...
Mar 10, 2016

I have a two-part problem:

1) I often receive 2 column aligned SOURCE / TARGET lists of terms — basically, a client glossary.
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)? As far as I have been able to ascertain, my CAT tool doesn't have a built-in utility for doing this (though I may be wrong!)

I have a manual workaround which involves looking at an existing glossary — which is basically a tab-separated text file — and counting the number of tabs (many of which only separate blank fields that I don't need to use. I then add enough blank columns to the right of my bilingual table to create the corresponding number of tabs, convert table-to-text, and then to be on the safe side, copy and paste that text into an existing blank glossary. But it's a bit long-winded, and a little routine for doing it would certainly help!

2) I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field; anyone got any brilliant ideas how to do this? i suspect I am going to have to first manually combing the expanded text + translation into a 3rd column alongside the acronyms, and then proceed with my original system as above, albeit with one less 'extra' column.

All suggestions gratefully received!


 

Patrick Porter
United States
Local time: 16:57
Spanish to English
+ ...
Regular expression find and replace could work Mar 11, 2016

For your second issue...if you have a text editor that allows find/replace with regex...you could use that to find every pair of lines and then take out the line ending in the middle.

If you have Notepad++ (or care to download..it's free)...open the file....press Ctrl+H (for find/replace) and put the following expressions in the corresponding boxes:

Find: (.+?)\t(.+?)\r\n(.+?)\t(.+?)\r\n
Replace: $1\t$2\t$3\t$4\r\n


Make sure to check at the bottom: Search Mode: Regular Expression....and check the box "matches newline"

This will make every second line appear as fields 3 and 4 of the previous line. Make sure that there is a carriage return after the last line (i.e. the file doesn't end at the very end of the last line but at the beginning of a new line.) Also, if you are not on a Windows machine or the file was not created on a Windows machine, then the newline might be \n instead of the \r\n in the expressions above. You can tell by trying a quick find and if nothing comes up then try removing all the \r from the regexes (there are 3 total).


 

CafeTran Training (X)
Netherlands
Local time: 22:57
Try another CAT tool? Mar 11, 2016

Tony M wrote:

but the actual tool isn't really the issue


When you're willing to try another tool: CafeTran's native glossary format is tab-delimited: you can use your list right away.

It also allows source-side and target-side alternatives:

ACRONYM;long form source TAB ACRONYM;long form target

That way, you can keep together what belongs together. During translation, you can easily switch between automatic insertion of the alternative target via the right mouse button: so you can choose to have the acronym translated as acronym or as a long form. Once or in the whole project.

A similar regex would be needed to prep the glossary.

I've recorded a short video to demonstrate this: https://youtu.be/roX4yksMssk

[Edited at 2016-03-11 07:21 GMT]


 

Philippe Etienne  Identity Verified
Spain
Local time: 22:57
Member
English to French
Glossary search without CAT tools Mar 11, 2016

Not sure how helpful it may be to your issue, but I've been using Search and Replace from Funduc (http://www.funduc.com/) since I started.
When I receive 2+-column glossaries, I adapt/convert them to .csv or .txt and the app searches all files, the results window shows all occurrences line by line.
It has many other features that I've never used, but to search quickly many heterogenous files at once without opening them, it's handy.

If I remember well, incorporating 2-column glossaries into MemoQ is also quite easy.

Philippe


 

Samuel Murray  Identity Verified
Netherlands
Local time: 22:57
Member (2006)
English to Afrikaans
+ ...
Two columns is enough for WFC Mar 11, 2016

Tony M wrote:
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)?


WFC does not care if different records have different numbers of fields, as long as the two required fields (source and target) are present. So you can safely add more terms to the WFC glossary, even if the terms that you add has only source and target, whereas the other entries have more tabs.

I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field...


I see a long road of manual copying ahead.

Samuel


 

esperantisto  Identity Verified
Local time: 23:57
Member (2006)
English to Russian
+ ...
No need in any tool Mar 11, 2016

You don’t need any tool. As mentioned, some CAT programs can use tab-delimited text files as glossaries directly (just to add: OmegaT, Anaphraseus), others can import them by an established procedure using a built-in tool or feature. Just read the respective manual.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting 2-column list to glossary

Advanced search







SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search