Google Translate API messing up Cyrillic
Thread poster: Susan Welsh

Susan Welsh  Identity Verified
United States
Local time: 01:30
Member (2008)
Russian to English
+ ...
Dec 27, 2011

I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 09:30
Member (2006)
English to Russian
+ ...
No problem with Anaphraseus Dec 28, 2011

Please, provide more details on your environment. I experience no problem with the latest build of Anaphraseus in LibreOffice 3.4.4/OpenOffice.org 3.3.0 in Windows 7 and openSUSE 11.3/11.4 when translating ENG→RUS. Previously Anaphraseus returned strings of incorrectly decoded characters in UTF-8 (BTW, you description makes me think, your problem may be the same, but you need to provide a sample of what you get), but Ole solved the problem.

Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 07:30
Member (2007)
English to French
+ ...
Are you translating long segments? Dec 28, 2011

Susan Welsh wrote:
I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Do your "garbage" segments begin with
Server returned HTTP response code: 414 for URL:

I can reproduce the issue in OmegaT if I try and translate long segments from Russian to English. Short segments are translated fine, but I get the 414 error for long segments.

That's because Russian characters have to be encoded, so the strings are much longer than for "ASCII" based languages.

E.g.,
googleapis.com/language/translate/v2?key=xxxxx&source=RU&target=EN&q=%D0%92+1526+%D0%B3%D0%BE%D0%B4%D1%83+%D0%BF%D0%B5%D1%80%D0%B5%D0%B1%D1%80%D0%B0%D0%BB%D1%81%D1%8F

I know there is another method, which allows to send slightly longer strings. I'll check with Alex (he's more concerned than I am), but eventually the problem will always exist for lengthy segments.

Didier


Direct link Reply with quote
 

Susan Welsh  Identity Verified
United States
Local time: 01:30
Member (2008)
Russian to English
+ ...
TOPIC STARTER
example Dec 28, 2011

Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am seeing on my screen. It is exclusively full of %DO%BE%DO%B4 and stuff like that, with no Cyrillic words. I'm going to delete the example, except for the source text and the error code.

Выросши в холодной Сибири, постоянно с величайшим вниманием следя за описаниями полярных путешествий и многое узнав о них от покойного моего друга Норденшильда, совершившего ряд славных экспедиций в области льдов, я получил полное убеждение в возможности решительной победы над полярными льдами при помощи соответственных для того приспособлений и, главное, - ясного понимания сил, до сих пор препятствовавших кораблям проникнуть в неведомую околополюсную область, занимающую пространство около 4 млн кв.
Server returned HTTP response code: 414 ...

[Edited at 2011-12-28 14:49 GMT]


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 08:30
Finnish to French
Anaphraseus Dec 29, 2011

Susan Welsh wrote:
(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Anaphraseus (http://anaphraseus.sourceforge.net/ ) is a Wordfast (Classic) "clone". It works in OpenOffice instead of MS Office, is quite slower than Wordfast and has a much smaller feature set.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google Translate API messing up Cyrillic

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search