Converting 2-column list to glossary
Thread poster: Tony M

Tony M  Identity Verified
France
Local time: 05:53
Member
French to English
+ ...
Mar 10, 2016

I have a two-part problem:

1) I often receive 2 column aligned SOURCE / TARGET lists of terms — basically, a client glossary.
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)? As far as I have been able to ascertain, my CAT tool doesn't have a built-in utility for doing this (though I may be wrong!)

I have a manual workaround which involves looking at an existing glossary — which is basically a tab-separated text file — and counting the number of tabs (many of which only separate blank fields that I don't need to use. I then add enough blank columns to the right of my bilingual table to create the corresponding number of tabs, convert table-to-text, and then to be on the safe side, copy and paste that text into an existing blank glossary. But it's a bit long-winded, and a little routine for doing it would certainly help!

2) I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field; anyone got any brilliant ideas how to do this? i suspect I am going to have to first manually combing the expanded text + translation into a 3rd column alongside the acronyms, and then proceed with my original system as above, albeit with one less 'extra' column.

All suggestions gratefully received!


 

Patrick Porter
United States
Local time: 23:53
Spanish to English
+ ...
Regular expression find and replace could work Mar 11, 2016

For your second issue...if you have a text editor that allows find/replace with regex...you could use that to find every pair of lines and then take out the line ending in the middle.

If you have Notepad++ (or care to download..it's free)...open the file....press Ctrl+H (for find/replace) and put the following expressions in the corresponding boxes:

Find: (.+?)\t(.+?)\r\n(.+?)\t(.+?)\r\n
Replace: $1\t$2\t$3\t$4\r\n


Make sure to check at the bottom: Search Mode: Regular Expression....and check the box "matches newline"

This will make every second line appear as fields 3 and 4 of the previous line. Make sure that there is a carriage return after the last line (i.e. the file doesn't end at the very end of the last line but at the beginning of a new line.) Also, if you are not on a Windows machine or the file was not created on a Windows machine, then the newline might be \n instead of the \r\n in the expressions above. You can tell by trying a quick find and if nothing comes up then try removing all the \r from the regexes (there are 3 total).


 

CafeTran Training
Netherlands
Local time: 05:53
Try another CAT tool? Mar 11, 2016

Tony M wrote:

but the actual tool isn't really the issue


When you're willing to try another tool: CafeTran's native glossary format is tab-delimited: you can use your list right away.

It also allows source-side and target-side alternatives:

ACRONYM;long form source TAB ACRONYM;long form target

That way, you can keep together what belongs together. During translation, you can easily switch between automatic insertion of the alternative target via the right mouse button: so you can choose to have the acronym translated as acronym or as a long form. Once or in the whole project.

A similar regex would be needed to prep the glossary.

I've recorded a short video to demonstrate this: https://youtu.be/roX4yksMssk

[Edited at 2016-03-11 07:21 GMT]


 

Philippe Etienne  Identity Verified
Spain
Local time: 05:53
Member
English to French
Glossary search without CAT tools Mar 11, 2016

Not sure how helpful it may be to your issue, but I've been using Search and Replace from Funduc (http://www.funduc.com/) since I started.
When I receive 2+-column glossaries, I adapt/convert them to .csv or .txt and the app searches all files, the results window shows all occurrences line by line.
It has many other features that I've never used, but to search quickly many heterogenous files at once without opening them, it's handy.

If I remember well, incorporating 2-column glossaries into MemoQ is also quite easy.

Philippe


 

Samuel Murray  Identity Verified
Netherlands
Local time: 05:53
Member (2006)
English to Afrikaans
+ ...
Two columns is enough for WFC Mar 11, 2016

Tony M wrote:
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)?


WFC does not care if different records have different numbers of fields, as long as the two required fields (source and target) are present. So you can safely add more terms to the WFC glossary, even if the terms that you add has only source and target, whereas the other entries have more tabs.

I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field...


I see a long road of manual copying ahead.

Samuel


 

esperantisto  Identity Verified
Local time: 06:53
Member (2006)
English to Russian
+ ...
No need in any tool Mar 11, 2016

You don’t need any tool. As mentioned, some CAT programs can use tab-delimited text files as glossaries directly (just to add: OmegaT, Anaphraseus), others can import them by an established procedure using a built-in tool or feature. Just read the respective manual.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting 2-column list to glossary

Advanced search







PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search