Google Translate API messing up Cyrillic
Thread poster: Susan Welsh

Susan Welsh  Identity Verified
United States
Local time: 13:48
Member (2008)
Russian to English
+ ...
Dec 27, 2011

I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Direct link Reply with quote
 
esperantisto  Identity Verified
Local time: 20:48
Member (2006)
English to Russian
+ ...
No problem with Anaphraseus Dec 28, 2011

Please, provide more details on your environment. I experience no problem with the latest build of Anaphraseus in LibreOffice 3.4.4/OpenOffice.org 3.3.0 in Windows 7 and openSUSE 11.3/11.4 when translating ENG→RUS. Previously Anaphraseus returned strings of incorrectly decoded characters in UTF-8 (BTW, you description makes me think, your problem may be the same, but you need to provide a sample of what you get), but Ole solved the problem.

Direct link Reply with quote
 

Didier Briel  Identity Verified
France
Local time: 19:48
Member (2007)
English to French
+ ...
Are you translating long segments? Dec 28, 2011

Susan Welsh wrote:
I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Do your "garbage" segments begin with
Server returned HTTP response code: 414 for URL:

I can reproduce the issue in OmegaT if I try and translate long segments from Russian to English. Short segments are translated fine, but I get the 414 error for long segments.

That's because Russian characters have to be encoded, so the strings are much longer than for "ASCII" based languages.

E.g.,
googleapis.com/language/translate/v2?key=xxxxx&source=RU&target=EN&q=%D0%92+1526+%D0%B3%D0%BE%D0%B4%D1%83+%D0%BF%D0%B5%D1%80%D0%B5%D0%B1%D1%80%D0%B0%D0%BB%D1%81%D1%8F

I know there is another method, which allows to send slightly longer strings. I'll check with Alex (he's more concerned than I am), but eventually the problem will always exist for lengthy segments.

Didier


Direct link Reply with quote
 

Susan Welsh  Identity Verified
United States
Local time: 13:48
Member (2008)
Russian to English
+ ...
TOPIC STARTER
example Dec 28, 2011

Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am seeing on my screen. It is exclusively full of %DO%BE%DO%B4 and stuff like that, with no Cyrillic words. I'm going to delete the example, except for the source text and the error code.

Выросши в холодной Сибири, постоянно с величайшим вниманием следя за описаниями полярных путешествий и многое узнав о них от покойного моего друга Норденшильда, совершившего ряд славных экспедиций в области льдов, я получил полное убеждение в возможности решительной победы над полярными льдами при помощи соответственных для того приспособлений и, главное, - ясного понимания сил, до сих пор препятствовавших кораблям проникнуть в неведомую околополюсную область, занимающую пространство около 4 млн кв.
Server returned HTTP response code: 414 ...

[Edited at 2011-12-28 14:49 GMT]


Direct link Reply with quote
 

Dominique Pivard  Identity Verified
Local time: 20:48
Finnish to French
Anaphraseus Dec 29, 2011

Susan Welsh wrote:
(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Anaphraseus (http://anaphraseus.sourceforge.net/ ) is a Wordfast (Classic) "clone". It works in OpenOffice instead of MS Office, is quite slower than Wordfast and has a much smaller feature set.


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google Translate API messing up Cyrillic

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search