Term search is STILL broken!
Thread poster: Stuart Allsop
Stuart Allsop  Identity Verified
Chile
Local time: 03:48
Spanish to English
+ ...
Jul 4, 2006

I have reported various bugs in the "Term Search" engine over the past few months, but nobody seems to be interested. Am I the only one that ever uses this feature?

The lastest bug that I found is in searching for accented words. I entered the word "papelón" for a Spanish to English search, and got the following results:

- KOG results: 327
- KudoZ archive results: 108
- Personal glossary results: 462

It would be very nice indeed to have a thousand results to choose from, except that NOT ONE of those answers actually contains the word "papelón", Instead, they are ALL references to "papel".

It looks as though the search engine only matched against the first part of my word, up to the accented letter, and ignored everything from there on.


Direct link Reply with quote
 

Nahun Silva  Identity Verified
Local time: 03:48
zzz Test Language (1) to zzz Test Language (2)
It is not broken Jul 19, 2006

Hi

That is the correct behavior of the term search, if it doesn't find the exact word it will try to get the most related word, that is why it returns the results for papel


Direct link Reply with quote
 
Stuart Allsop  Identity Verified
Chile
Local time: 03:48
Spanish to English
+ ...
TOPIC STARTER
Yes it is broken! Jul 19, 2006

Nahun Silva wrote:
That is the correct behavior of the term search, if it doesn't find the exact word it will try to get the most related word, that is why it returns the results for papel
No. I had "Whole words only" turned on.

What you say is NOT correct. This is demonstrated very easily: Search for "papelon" WITHOUT the accent, and you get zero matches. search for "papelón" WITH the accent and you get those garbage matches.

If what you say is true, then it would return the exact same matches for "papelon" and "papelón". If it behaves as you say it does, and did not find anything for "papelon" without the accent, then it would return the exact same "papel" words, either way. It does not. Hence, it is broken.

The simple fact that it returns DIFFERENT behavior for the SAME word when it is spelled with or without an accent, PROVES BEYOND DOUBT that it is broken.

If it were not broken, then the behavior would be the same for ANY word that starts with "papel...", but this is not the case.


In any event, I did all of these searches with "Match exact phrase" turned on AND ALSO "Whole words only" turned on. If what you say were true, then it would NOT have returned "papel" for "papelón". (Unless maybe the "Whole words only" function is also broken?)

Either way, it is broken. If you don't believe me, try it yourself. It is cutting the word at accented characters, and searching for ONLY the part that comes before the accent.


Direct link Reply with quote
 

Nahun Silva  Identity Verified
Local time: 03:48
zzz Test Language (1) to zzz Test Language (2)
I didn't get any garbage Jul 21, 2006

Stuart Allsop wrote:

Either way, it is broken. If you don't believe me, try it yourself. It is cutting the word at accented characters, and searching for ONLY the part that comes before the accent.


I've tried myself, with and without accent and I didn't get any garbage. Maybe we are not looking at the same place, but KudoZ->ProZ.com term search works for me.

Regards


Direct link Reply with quote
 
Stuart Allsop  Identity Verified
Chile
Local time: 03:48
Spanish to English
+ ...
TOPIC STARTER
Different planet, maybe? Jul 21, 2006

Nahun Silva wrote:

I've tried myself, with and without accent and I didn't get any garbage. Maybe we are not looking at the same place, but KudoZ->ProZ.com term search works for me.


Nahun, this is very simple:

Go here:
http://www.proz.com/?sp=ksearch

Set the following fields as indicated:
- Match exact phrase: ON
- Languages: Spanish To: English
- Also search reverse language pair (target into source): ON
- Search for all likely character encodings: ON
- Field: Any
- Difficulty: Any
- KOG (KudoZ Open Glossary): ON
- KudoZ archive: ON
- Personal glossaries: ON
all others: OFF

Now, in the field that says "Term or phrase:" type the word "papelón" with an accent on the "o". Hit "Search and save settings". You should get something like the following:

» KOG results: 328
» KudoZ archive results: 111
» Personal glossary results: 474

Take a very careful look at all of the answers that it returns. NOT ONE of them contains the word "papelón". Zero. Zilch. Nada. Zip. They all contain the word "papel", but NONE contain the word "papelón". Since we did specify “Match exact phrase: ON”, and it is returning items that do NOT match the exact phrase, the Term Search feature is BROKEN.



Now, go back to the field that says "Term or phrase:" type the word "papelon" WITHOUT an accent, and hit the "Search and save settings" button again. This time, you will get something like the following:

» KOG results: 0
» KudoZ archive results: 0
» Personal glossary results: 0



It is returning something DIFFERENT from making ONE SINGLE change to the word, and that change is simply to remove the accent. Hence, the Term Search feature is BROKEN!



It is NOT working the way you say it should. The word "papelón" is obviously not in the database, either with or without an accent. You said that if the search feature does not find a word, it returns the closest match. This is NOT the case. Since neither "papelón" nor "papelon" are in the database, the term search feature should give the exact same answer for BOTH searches. But it does NOT. It gives DIFFERENT answers.

Hence, the Term Search feature is BROKEN.

Please note that in BOTH cases you have turned on the "Match exact phrase" parameter, so that it should NOT go looking for partial matches if it does not find the exact word. However, it IS STILL finding entries that contain the word "papel" when you specify "papelón", while it is NOT finding those entries when you specify "papelon".

It is quite obvious that for some reason it only uses the part of the word up to the first accented "ó". It is considering THAT portion of the word to be the ENTIRE word, and therefore is coming up with the "papel" entries.

Hence, the Term Search feature is BROKEN.


Direct link Reply with quote
 
Stuart Allsop  Identity Verified
Chile
Local time: 03:48
Spanish to English
+ ...
TOPIC STARTER
Another prefect example Jul 24, 2006

I just found another perfect example of the BROKENESS of the term search feature, when dealing with accents.

I searched for the term "Gestión de seguros de siniestros" with "Match exact phrase" turned ON.

It returns 21 "Personal glossary results". NOT ONE of them has anything at all to do with "seguros de siniestros". Not one of them matches. Zero, None. Zilch. They are ALL totally unrelated terms, even though I have "Match exact phrase" selected. NONE of them match.

So I changed my search term very simply, by replacing the accented "ó" with a normal "o", and searched for "Gestion de seguros de siniestros". I left ALL of the other paramaters just the same.

I now get ZERO results, which is the CORRECT answer, since there are no matching terms in the data base.

Hence, one more time, the term seach feature is STILL broken, and does NOT work as advertised.


Direct link Reply with quote
 

Nahun Silva  Identity Verified
Local time: 03:48
zzz Test Language (1) to zzz Test Language (2)
Instructions are sometimes useful Jul 24, 2006

Have you read the info box that is opened when you click on the info box link next to the " Search for all likely character encodings" option??, if not, here is the info you can read there:


Because data entered at ProZ.com prior to March 2006 did not all use the same character encoding, it is possible that a term search will not return all possible results. This can happen when your search term is encoded using one character encoding, but the term in our database is encoded with a different encoding.

When you select "search for all likely character encodings", your search will automatically be performed in all character encodings that are likely to be used in the language pairs you specify. (Please note that this may also cause a few search results to appear that may not match your search term.)


Regards


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Term search is STILL broken!

Advanced search






Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search