Extracting glossary/TM from existing files • Mac
Thread poster: BabelOn-line

BabelOn-line
United Kingdom
Local time: 17:25
English to French
+ ...
Apr 4, 2012

Hello All

I am a fairly unfrequent and not-very-advanced user of OmegaT as most of my work is mainly for magazines with few repeats (even though i am very, very happy to have OmegaT at hand for some jobs).

I have quite a number of Word files, both in source and target versions, with source and target structures that are almost identical.

For one given client, i'd like to reuse this resource in the future for consistency.

I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.


 

Joakim Braun  Identity Verified
Sweden
Local time: 18:25
German to Swedish
+ ...
I'm working on it Apr 4, 2012

I'm working on a TMX editor for MacOSX with a built-in aligner. It's not even beta-stage, but you can certainly create multi-language TMX memories from text that you paste. (No formatting is preserved.)

E-mail me if you're interested in a pre-alpha version...


 

FarkasAndras
Local time: 18:25
English to Hungarian
+ ...
Aligner Apr 4, 2012

Alignment is indeed the name of the procedure, so you need an aligner.

There are various options, the best of the free ones that work on mac are probably bitext2tmx and LF Aligner, and possibly PlusTools (if it works in Word for mac). I'm the author of LF Aligner.
If you're willing to pay, perhaps have a look at ABBYY Aligner Online, which by nature should be platform independent.

LF Aligner currently has no GUI - you've been warned.


 

lidija68  Identity Verified
Italy
Local time: 18:25
Italian to Serbian
+ ...
lf aligner Apr 4, 2012

perhaps you should look here:
http://www.proz.com/forum/cat_tools_technical_help/184708-new_free_open_source_aligner_for_windows_os_x_and_linux.html

It has a command line interface, but if you read instructions carefully it works great (I've tried it on windows xp and windows 7)


 

Milan Condak  Identity Verified
Local time: 18:25
English to Czech
Free tools Apr 4, 2012

BabelOn-line wrote:

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.


1/ OmegaT can search in glossary or in dictionary. User can create both files.

2/ Two free tools for Windows (or others):


a) Tool for alignment - LF Aligner:

http://www.condak.net/tools/align-sentence/lf-aligner/cs/00.html

http://www.condak.net/cat_other/omegat/2011-08-10/cs/09.html

On my site look at camel logo (LF Aligner logo).

b) Lexical extractor = Lexterm

http://www.condak.net/cat_other/omegat/2011-08-10/cs/10.html

http://www.condak.net/tools/align-word/lexterm/cs/00.html

Milan Condak


 

Rodolfo Raya  Identity Verified
Local time: 13:25
English to Spanish
Stingray Document Aligner Apr 4, 2012

Take a look at Stingray (http://www.maxprograms.com/products/stingray.html). You can use it to align different types of documents.

Regards,
Rodolfo


 

Didier Briel  Identity Verified
France
Local time: 18:25
Member (2007)
English to French
+ ...
Look also at term extraction Apr 4, 2012

BabelOn-line wrote:
I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

To extract short terms (rather than full segments), what you need is "term extraction", which is different from alignment (but is often based on already aligned translation memories).

Okapi Rainbow provides monolingual term extraction:
http://www.opentag.com/okapi/wiki/index.php?title=Term_Extraction_Step

One has then to provide the translation. If you have done a previous "classical" alignment, this is easier to find. (For instance, using OmegaT's search function.)

I'm not aware of any free tool offering bilingual term extraction.
There are commercial tools offering that feature.

Didier


 

BabelOn-line
United Kingdom
Local time: 17:25
English to French
+ ...
TOPIC STARTER
A big thank you. Apr 5, 2012

Thanks all for your input, I have quite a lot of apps that i can try now.

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

As a result, my source and target, while similar in terms of paragraph and overall length, do not match sentence to sentence.

Thanks to Didier for pointing out that what i am after is actually "term extraction"a nd for offering an more targeted app for this.

Anyway, big thanks to all for you help. Have a great Easter!


 

Rodolfo Raya  Identity Verified
Local time: 13:25
English to Spanish
Align paragraphs Apr 5, 2012

BabelOn-line wrote:

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.


With Stingray you can align paragraphs. Simply check the "Paragraph Segmentation" box when creting a project. You can also join, split and reorder sentences if you work at sentence level.

You can use Anchovy (http://www.maxprograms.com/products/anchovy.html) for term extraction. It is free and lets you generate monolingual glossaries from existing documents and bilingual glossaries from TMX files.

Regards,
Rodolfo


 

Milan Condak  Identity Verified
Local time: 18:25
English to Czech
I made a short presentation of Lexterm Apr 7, 2012

Milan Condak wrote:

b) Lexical extractor = Lexterm



I made a short presentation of Lexterm and creating glossaries and dictionaries for CATs.

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT. Here is one screenshot of OmegaT.

I made a short presentation of Lexterm and creating glosaries and dictionaries for CAT

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT.

HTH

Milan


[Upraveno: 2012-04-08 09:07 GMT]


 

BabelOn-line
United Kingdom
Local time: 17:25
English to French
+ ...
TOPIC STARTER
Thanks all for your help Apr 10, 2012

I have a lot of apps to try. but at least in know that i am after "term extraction" rather than alignment.

Thansks again


 

Milan Condak  Identity Verified
Local time: 18:25
English to Czech
Word alignment is OK, too Apr 11, 2012

BabelOn-line wrote:

I have a lot of apps to try, but at least in know that i am after "term extraction" rather than alignment.

Thanks again


First step is paragraph or sentence alignment.

Second step is term extraction, or "word alignment". See,

http://en.wikipedia.org/wiki/Bitext_word_alignment

On the site are links to some toolkits.

Or, you can google for "word alignment toolkit".

Milan


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Extracting glossary/TM from existing files • Mac

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search