What’s the problem with sentence matching?
Thread poster: Els Eerdekens

Els Eerdekens
Belgium
French to Dutch
May 2, 2013

Dear all,

In her article from 2008, Carme Colominas writes the following:

(http://benjamins.com/#catalog/journals/babel.54.4.03col/details)

“Most of the current Translation Memory systems are based on segments determined by marks that in most cases correspond to a complete sentence. The problem of complete sentence matching is that examples are often excluded from the matching candidates even though they probably contain one or more useful sub-segments that could be helpful to the translation."

I don’t completely understand what the problem with sentence matching is. I suppose that the concordance search resolves the problem, but that noun phrases or pre-and postmodified noun phrases cannot be found. How is it possible that some matching candidates are excluded?

"In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.”

Are there yet systems that work “below” the sentence level?

Thanks!

Els


 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 00:56
Member (2008)
English to Russian
+ ...
... May 2, 2013

Simply add a custom end-of-segment separator to break long sentences into smaller pieces.

E.g. if you set comma as a custom separator, it will break at each comma.


 

IrimiConsulting  Identity Verified
Sweden
Local time: 23:56
Member (2006)
English to Swedish
+ ...
Below sentence level -> phrase level May 2, 2013

Matching on the phrase level would definitely be possible, but would require a lot more intelligence from the software since it needs to analyse word classes and grammar rather than just text strings, which in turn requires the use of dictionaries. There will always be problems with words not found in the dictionary and discontinous phrases, and some languages will be less suitable for phrase-level matching.

For "my" languages (English, Swedish, German and French), phrase-level matching would be fairly easy in English, Swedish and French. The German word order would complicate matters a bit, but it would still be quite doable.

In the end, the result would depend to a large extent on the quality of the source text. The GIGO principle (garbage in - garbage out) is very valid in all sorts of language automation.

"In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.”

Are there yet systems that work “below” the sentence level?


 

Heinrich Pesch  Identity Verified
Finland
Local time: 00:56
Member (2003)
Finnish to German
+ ...
Its real May 2, 2013

In SDL Studio it is called Autosuggest, in DVX Deep Mining.
I haven't used those features yet, but they search for phrases within the text and in the TM and would speed up translation process.


 

Christine Andersen  Identity Verified
Denmark
Local time: 23:56
Member (2003)
Danish to English
+ ...
I find AutoSuggest very useful May 2, 2013

Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored.

It might be possible to avoid some of them by filtering or editing the TM before using it to create the AutoSuggest dictionary, if one was aware of what to avoid. I did not do that, but still only get a few 'impossible' suggestions.

It would be ideal if they could be edited out afterwards, but they are not a serious problem.


 

Els Eerdekens
Belgium
French to Dutch
TOPIC STARTER
@ Sergei May 3, 2013

Sergei Leshchinsky wrote:

Simply add a custom end-of-segment separator to break long sentences into smaller pieces.

E.g. if you set comma as a custom separator, it will break at each comma.


Dear Sergei,

Where can I do this? In WinAlign from Trados (segmentation rules) or in the source document?
If I have to change the segmentation rules, what do I have to do exactly?

Kind regards,

Els


 

David Turner  Identity Verified
Local time: 23:56
French to English
+ ...
All CAT tools should be able to segment at a comma May 4, 2013

In TWB, for example, File/Setup/Segmentation rules, click "Add", add "Comma" and then in "Rule"/"Stop character", enter a comma (",").

TWB will then segment:

"Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored"

as four segments:

Because it works purely on statistical analysis and is not 'intelligent',
it does occasionally come up with a few ridiculous suggestions,
but on the whole the benefit far outweighs these,
and they are easily ignored


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What’s the problem with sentence matching?

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search