Google Translate API messing up Cyrillic
Thread poster: Susan Welsh

Susan Welsh  Identity Verified
United States
Local time: 12:28
Member (2008)
Russian to English
+ ...
Dec 27, 2011

I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

 

esperantisto  Identity Verified
Local time: 19:28
Member (2006)
English to Russian
+ ...
No problem with Anaphraseus Dec 28, 2011

Please, provide more details on your environment. I experience no problem with the latest build of Anaphraseus in LibreOffice 3.4.4/OpenOffice.org 3.3.0 in Windows 7 and openSUSE 11.3/11.4 when translating ENG→RUS. Previously Anaphraseus returned strings of incorrectly decoded characters in UTF-8 (BTW, you description makes me think, your problem may be the same, but you need to provide a sample of what you get), but Ole solved the problem.

 

Didier Briel  Identity Verified
France
Local time: 18:28
Member (2007)
English to French
+ ...
Are you translating long segments? Dec 28, 2011

Susan Welsh wrote:
I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Do your "garbage" segments begin with
Server returned HTTP response code: 414 for URL:

I can reproduce the issue in OmegaT if I try and translate long segments from Russian to English. Short segments are translated fine, but I get the 414 error for long segments.

That's because Russian characters have to be encoded, so the strings are much longer than for "ASCII" based languages.

E.g.,
googleapis.com/language/translate/v2?key=xxxxx&source=RU&target=EN&q=%D0%92+1526+%D0%B3%D0%BE%D0%B4%D1%83+%D0%BF%D0%B5%D1%80%D0%B5%D0%B1%D1%80%D0%B0%D0%BB%D1%81%D1%8F

I know there is another method, which allows to send slightly longer strings. I'll check with Alex (he's more concerned than I am), but eventually the problem will always exist for lengthy segments.

Didier


 

Susan Welsh  Identity Verified
United States
Local time: 12:28
Member (2008)
Russian to English
+ ...
TOPIC STARTER
example Dec 28, 2011

Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am seeing on my screen. It is exclusively full of %DO%BE%DO%B4 and stuff like that, with no Cyrillic words. I'm going to delete the example, except for the source text and the error code.

Выросши в холодной Сибири, постоянно с величайшим вниманием следя за описаниями полярных путешествий и многое узнав о них от покойного моего друга Норденшильда, совершившего ряд славных экспедиций в области льдов, я получил полное убеждение в возможности решительной победы над полярными льдами при помощи соответственных для того приспособлений и, главное, - ясного понимания сил, до сих пор препятствовавших кораблям проникнуть в неведомую околополюсную область, занимающую пространство около 4 млн кв.
Server returned HTTP response code: 414 ...

[Edited at 2011-12-28 14:49 GMT]


 

Dominique Pivard  Identity Verified
Local time: 19:28
Finnish to French
Anaphraseus Dec 29, 2011

Susan Welsh wrote:
(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Anaphraseus (http://anaphraseus.sourceforge.net/ ) is a Wordfast (Classic) "clone". It works in OpenOffice instead of MS Office, is quite slower than Wordfast and has a much smaller feature set.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google Translate API messing up Cyrillic

Advanced search






SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features, ensures new

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search