Extracting glossary/TM from existing files • Mac
Thread poster: BabelOn-line

BabelOn-line
United Kingdom
Local time: 05:20
English to French
+ ...
Apr 4, 2012

Hello All

I am a fairly unfrequent and not-very-advanced user of OmegaT as most of my work is mainly for magazines with few repeats (even though i am very, very happy to have OmegaT at hand for some jobs).

I have quite a number of Word files, both in source and target versions, with source and target structures that are almost identical.

For one given client, i'd like to reuse this resource in the future for consistency.

I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.


Direct link Reply with quote
 
Joakim Braun  Identity Verified
Sweden
Local time: 06:20
German to Swedish
+ ...
I'm working on it Apr 4, 2012

I'm working on a TMX editor for MacOSX with a built-in aligner. It's not even beta-stage, but you can certainly create multi-language TMX memories from text that you paste. (No formatting is preserved.)

E-mail me if you're interested in a pre-alpha version...


Direct link Reply with quote
 
FarkasAndras
Local time: 06:20
English to Hungarian
+ ...
Aligner Apr 4, 2012

Alignment is indeed the name of the procedure, so you need an aligner.

There are various options, the best of the free ones that work on mac are probably bitext2tmx and LF Aligner, and possibly PlusTools (if it works in Word for mac). I'm the author of LF Aligner.
If you're willing to pay, perhaps have a look at ABBYY Aligner Online, which by nature should be platform independent.

LF Aligner currently has no GUI - you've been warned.


Direct link Reply with quote
 

lidija68  Identity Verified
Italy
Local time: 06:20
Italian to Serbian
+ ...
lf aligner Apr 4, 2012

perhaps you should look here:
http://www.proz.com/forum/cat_tools_technical_help/184708-new_free_open_source_aligner_for_windows_os_x_and_linux.html

It has a command line interface, but if you read instructions carefully it works great (I've tried it on windows xp and windows 7)


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 06:20
English to Czech
Free tools Apr 4, 2012

BabelOn-line wrote:

I am not even sure that a CAT or OmegaT is the right tool for this.

1/ Do you think this is achievable with OmegaT?

2/ What kind of tool could I use to create a translation memory based on my source/target Word files? I work on a Mac, but i can use a PC emulator (VM ware) if the application you know only works on PC.

Thanks a lot for your help.


1/ OmegaT can search in glossary or in dictionary. User can create both files.

2/ Two free tools for Windows (or others):


a) Tool for alignment - LF Aligner:

http://www.condak.net/tools/align-sentence/lf-aligner/cs/00.html

http://www.condak.net/cat_other/omegat/2011-08-10/cs/09.html

On my site look at camel logo (LF Aligner logo).

b) Lexical extractor = Lexterm

http://www.condak.net/cat_other/omegat/2011-08-10/cs/10.html

http://www.condak.net/tools/align-word/lexterm/cs/00.html

Milan Condak


Direct link Reply with quote
 

Rodolfo Raya  Identity Verified
Local time: 02:20
English to Spanish
Stingray Document Aligner Apr 4, 2012

Take a look at Stingray (http://www.maxprograms.com/products/stingray.html). You can use it to align different types of documents.

Regards,
Rodolfo


Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 06:20
Member (2007)
English to French
+ ...
Look also at term extraction Apr 4, 2012

BabelOn-line wrote:
I'd like is to be able to "align" (not sure this is the right term here) my existing source and target in order to do the following: i would like OmegaT to spot the expressions that i have already translated in the past and suggest the corresponding translation.

Most of the time, the "reusable wordings" won't be a whole segment, but rather short wordings like e.g. "Fully Integrated management module", "central dashboard" or possibly acronyms like "MSP".

To extract short terms (rather than full segments), what you need is "term extraction", which is different from alignment (but is often based on already aligned translation memories).

Okapi Rainbow provides monolingual term extraction:
http://www.opentag.com/okapi/wiki/index.php?title=Term_Extraction_Step

One has then to provide the translation. If you have done a previous "classical" alignment, this is easier to find. (For instance, using OmegaT's search function.)

I'm not aware of any free tool offering bilingual term extraction.
There are commercial tools offering that feature.

Didier


Direct link Reply with quote
 

BabelOn-line
United Kingdom
Local time: 05:20
English to French
+ ...
TOPIC STARTER
A big thank you. Apr 5, 2012

Thanks all for your input, I have quite a lot of apps that i can try now.

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.

As a result, my source and target, while similar in terms of paragraph and overall length, do not match sentence to sentence.

Thanks to Didier for pointing out that what i am after is actually "term extraction"a nd for offering an more targeted app for this.

Anyway, big thanks to all for you help. Have a great Easter!


Direct link Reply with quote
 

Rodolfo Raya  Identity Verified
Local time: 02:20
English to Spanish
Align paragraphs Apr 5, 2012

BabelOn-line wrote:

I have tried Stingray (thanks Rodolfo), but the main pitfall quickly became apparent: as i translate "editorial" pieces, i quite often split a long sentence into two or change the structure of the sentence around to make it sound more French.


With Stingray you can align paragraphs. Simply check the "Paragraph Segmentation" box when creting a project. You can also join, split and reorder sentences if you work at sentence level.

You can use Anchovy (http://www.maxprograms.com/products/anchovy.html) for term extraction. It is free and lets you generate monolingual glossaries from existing documents and bilingual glossaries from TMX files.

Regards,
Rodolfo


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 06:20
English to Czech
I made a short presentation of Lexterm Apr 7, 2012

Milan Condak wrote:

b) Lexical extractor = Lexterm



I made a short presentation of Lexterm and creating glossaries and dictionaries for CATs.

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT. Here is one screenshot of OmegaT.

I made a short presentation of Lexterm and creating glosaries and dictionaries for CAT

http://www.condak.net/cat_other/omegat/2012-04-07/cs/00.html

In an example are Glossary in Wordfast Classic and Dictionary in OmegaT.

HTH

Milan


[Upraveno: 2012-04-08 09:07 GMT]


Direct link Reply with quote
 

BabelOn-line
United Kingdom
Local time: 05:20
English to French
+ ...
TOPIC STARTER
Thanks all for your help Apr 10, 2012

I have a lot of apps to try. but at least in know that i am after "term extraction" rather than alignment.

Thansks again


Direct link Reply with quote
 

Milan Condak  Identity Verified
Local time: 06:20
English to Czech
Word alignment is OK, too Apr 11, 2012

BabelOn-line wrote:

I have a lot of apps to try, but at least in know that i am after "term extraction" rather than alignment.

Thanks again


First step is paragraph or sentence alignment.

Second step is term extraction, or "word alignment". See,

http://en.wikipedia.org/wiki/Bitext_word_alignment

On the site are links to some toolkits.

Or, you can google for "word alignment toolkit".

Milan


Direct link Reply with quote
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Extracting glossary/TM from existing files • Mac

Advanced search






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search