Mobile menu

erratic Proz term search results
Thread poster: Ken Cox
Ken Cox  Identity Verified
Local time: 07:39
German to English
+ ...
Mar 14, 2006

Searches for German terms containing umlaut characters are rather erratic lately, which seems to coincide with the switch to Unicode.

At the minimum, I have two questions:

1. How do I know (and how can I control) which encoding my browser uses for entries on the Proz site? (I'm using Opera under Windows XP Pro. Does it use the same encoding for posting as for viewing?)

2. How does the search engine handle diacritical characters, and how does it handle historical Kudoz postings that were presumably posted in a variety of encodings?

Just to give an example: the recent Kudoz question for Störkontur (http://www.proz.com/kudoz/1280964). A colleague pointed the asker to a previous Kudoz question for this term (http://www.proz.com/kudoz/243634), and the asker replied that she did a Kudoz term search but didn't find that entry. I'm not surprised, because my search using 'Störkon' (German to English with full word match disabled) returns entries for several combinations of St and various umlauted vowels, as well as some terms that have no evident relationship to the search string, but not the entry for the above-mentioned previous Kudoz question (which incidentally is the first hit regurned by a google search for Störkontur + proz).

Maybe I've missed a thread on this topic, but a forum search didn't turn up anything dealing directly with this subject.


Direct link Reply with quote
 

KathyT  Identity Verified
Australia
Local time: 17:39
Japanese to English
See http://www.proz.com/topic/43380 Mar 14, 2006

Hi Kenneth,

See the above link for a recent thread on this topic and problems encountered in various languages, including German.
Hopefully, you'll find answers to some of your questions there. If not, perhaps try posting any additional problems in that same forum(?). Jason has been monitoring it quite closely and doing a great job at trying to solve everyone's various problems.

Hope to help a little!

Kathy

P.S. Sorry about that, Kenneth - I see that you yourself posted in the exact same thread..... still no joy?

[Edited at 2006-03-14 22:21]


Direct link Reply with quote
 
Ken Cox  Identity Verified
Local time: 07:39
German to English
+ ...
TOPIC STARTER
thanks for the link Mar 14, 2006

and it turns out to be a forum with a posting by me (looks like I'm in the market for a new memory).

Direct link Reply with quote
 

Jason Grimes
Local time: 01:39
SITE STAFF
Umlaut character search should be fixed Mar 15, 2006

Hi Kenneth,

We did some database maintenance last night that appears to have resolved the problem searching for umlauted characters (at least for your 'Störkon' example). Is it working properly for you now?

Kenneth Cox wrote:
1. How do I know (and how can I control) which encoding my browser uses for entries on the Proz site? (I'm using Opera under Windows XP Pro. Does it use the same encoding for posting as for viewing?)


The headers in our pages now tell your browser to both post and view everything in Unicode. In most browsers, you can see what encoding the browser is using by selecting View->Encoding from the browser's menu.


2. How does the search engine handle diacritical characters, and how does it handle historical Kudoz postings that were presumably posted in a variety of encodings?


The search terms you submit are in Unicode. By default, the search engine compares the search term against the database, and should only match terms that are also encoded in Unicode. When you select "Search for all likely character encodings", the search engine automatically converts your search terms into all common character encodings used for the languages you specify, and does multiple searches. This should find most matching terms no matter their character set, but may also cause some spurious matches to be found.

The error you were experiencing appeared to be caused by the database not recognizing the umlauted characters, and effectively assuming they were spaces between words. The database changes made last night should have corrected this.

I hope this helps to clarify.

Thanks,

Jason


Direct link Reply with quote
 
Ken Cox  Identity Verified
Local time: 07:39
German to English
+ ...
TOPIC STARTER
still not quite a happy camper Mar 15, 2006

Hi Jason,

Thanks for your informative reply.

The situation for me is now:

A. Using 'Störkont' as the search term and searching for German to English:

1) Whole word & exact match disabled, 'search for all likely encodings' disabled: only 1 hit (modifizierte Stärke' in the TTilch personal glossary).

2) Same as (1) but with 'search for all likely encodings' enabled: 4844 KOG hits, 3750 archive hits, & 6211 PG hits, all including *many* non-matches (STA, StA, Sta'in, Staaten, ...)

B. Using 'Störkontur' as the search term and searching for German to English:

1) Whole word enabled & exact match disabled, 'search for all likely encodings' disabled: same as case 1 above: 1 hit (modifizierte Stärke' in the TTilch personal glossary).

2) Same as (1) but with 'search for all likely encodings' enabled: 1 KOG hits (Störkontur entered by Judek), 1 archive hit (Störkontur asked by Heller), & 1 PG (modifizierte Stärke' in the TTilch personal glossary).

I thus have the impression that the search engine effectively treats umlauted characters as wild cards if 'match whole word' is not enabled ('match exact phrase' appears to have the same effect as 'match whole word' if the search string is only a single 'word').

Naturally, the ability to search for partial words is particularly valuable in languages such as German that commonly form compound words. That's also why I would like to see non-whole-word searches also return results for words where the search string is not at the beginning of the word.

Regards,
Ken


Direct link Reply with quote
 

Jason Grimes
Local time: 01:39
SITE STAFF
Unable to reproduce this behavior Mar 15, 2006

Hi Kenneth,

The personal glossary database has not yet been corrected so personal glossary searches may return inaccurate results. We will update that database later this week. Sorry I was not clear about this.

Kenneth Cox wrote:
2) Same as (1) but with 'search for all likely encodings' enabled: 4844 KOG hits, 3750 archive hits, & 6211 PG hits, all including *many* non-matches (STA, StA, Sta'in, Staaten, ...)


I have not been able to reproduce this behavior in the KudoZ and KOG searches. Do you see these "wildcard" results consistently, or are they intermittent? What is the chararcter encoding setting in your browser?

Thanks,

Jason


Direct link Reply with quote
 

Lesley Clarke  Identity Verified
Mexico
Local time: 00:39
Spanish to English
Me too Mar 15, 2006

I had come to rely so heavily on the Proz glossaries and now they are disaster. There is no way of searching for partial words, and this is also very important in Spanish, with verbs and adjectives.
Also lately, I get very few answers for terms that used to turn up twenty or thirty answers.
Help!


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

erratic Proz term search results

Advanced search






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



All of ProZ.com
  • All of ProZ.com
  • Term search
  • Jobs