Technical question about SDL Studio "events"
Thread poster: Mpoma

Mpoma  Identity Verified
United Kingdom
Local time: 23:32
French to English
May 29, 2016

Dear all,

I am underwhelmed with SDL Studio (2014)'s facilities for concordance searching and termbase searching.

Over the years I have put together a French -> English table in an Access dbase (Access 2000!), currently with about 30,000 French "head word" entries (although the English definition side can sometimes be quite large).

About a year ago I decided to write a Java application which would "reverse index" all the words in this dbase table, including all the words in the "definition" side. By "reverse index" I mean using the very powerful Lucene technology, similar to the technology which lies behind Google searching and all that sort of stuff.

Lucene uses things like "stemming", so that (in English) "approve" and "approval" would probably both be stored as "approv"... It also uses very clever algorithms for "scoring" individual documents held in its index. It can be "forgiving" (you don't have to have a perfect match). Above all it is the technology you absolutely need to have if you are searching for multi-word terms.

In French, for example, if I enter "juge référé", it will list many entries with "juge" or "jugement" or "référence" ... but at the TOP of the ranking it will consistently list "juge des référés" ("urgent injunctions judge"), because Lucene's index tells it that that particular entry contains both sought terms (or at least "juge" and a French-stemmed version of "référés", e.g. "refer" maybe).

What I want to do now is to make a comprehensive, automatic search using this Lucene index, every time I move to a new segment in SDL Studio, i.e. a search for all the new terms which appear in the source of the new segment. This can be guaranteed to produce far higher quality results than a dull-witted SDL concordance or TermBase search.

I'd just like to know if anyone knows if there is a way of "trapping" SDL events: what I want to do is having something "listening" for the the "event" of moving to a new segment, and on detecting such an event it should take all the source text in this new segment and do a series of queries on the Lucene index ...

PS Does anyone have a view why SDL is so late to discover Lucene-style "reverse index" technology? It seems inexplicable, as it is a perfect match for the intense language search work which translators do all the time...

[Edited at 2016-05-29 16:12 GMT]


 

Ben Senior  Identity Verified
Germany
Local time: 00:32
German to English
APIs will do that May 29, 2016

If you are wanting to trap events in Studio you need to download the SDK and the APIs. Then using the APIs you will be able to write a standalone app or a Studio plug-in to trap various events. But you should be able to program in C# to do this.

 

Mpoma  Identity Verified
United Kingdom
Local time: 23:32
French to English
TOPIC STARTER
thanks May 29, 2016

thanks... hmmm, unfortunately I know nothing about C#... strictly Java and Python (and Jython).

Having said that... do you have any pointers about how to get started doing this kind of "plug-in" stuff with SDL Studio... an example of a simple one, maybe?

[Edited at 2016-05-29 19:19 GMT]


 

Mpoma  Identity Verified
United Kingdom
Local time: 23:32
French to English
TOPIC STARTER
Autohotkey to the rescue May 30, 2016

Hmmm... rather than spending the next 6 months on learning about SDL plugins, this Autohotkey script will do pretty much what I want!

^J::
Send ^{enter}
Send !{insert}
Send ^a
Send ^c
Sleep, 50
Send ^1
Send {left}

Ctrl-J then moves you to the next segment, copying the source to the clipboard, and then inserting the first match if there is one.

With a bit of luck a Java app can then have a listener which listens for clipboard changes, and performs intelligent Lucene querying of the source text of the new segment, relative to any loaded TMs and external vocab sources...

Incidentally, another thing about SDL and its dull-wittedness: it has occurred to me that in a situation like this, if the translator is working on Segment S, you might actually expect that searching and sequence-identification and Lucene querying relative to Segment S+1 might be going on *in the background*, ... in anticipation of you moving to Segment S+1 after you've finished with Segment S!

The amount of analysis would then potentially be colossal: seconds of processing time.

If I were a megacorporation rather than silly little me you might possibly think that I might have thought of this some time over the past 20 or more years!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Technical question about SDL Studio "events"

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search