Google Translate API messing up Cyrillic
Thread poster: Susan Welsh

Susan Welsh  Identity Verified
United States
Local time: 06:39
Member (2008)
Russian to English
+ ...
Dec 27, 2011

I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

 

esperantisto  Identity Verified
Local time: 13:39
Member (2006)
English to Russian
+ ...
No problem with Anaphraseus Dec 28, 2011

Please, provide more details on your environment. I experience no problem with the latest build of Anaphraseus in LibreOffice 3.4.4/OpenOffice.org 3.3.0 in Windows 7 and openSUSE 11.3/11.4 when translating ENG→RUS. Previously Anaphraseus returned strings of incorrectly decoded characters in UTF-8 (BTW, you description makes me think, your problem may be the same, but you need to provide a sample of what you get), but Ole solved the problem.

 

Didier Briel  Identity Verified
France
Local time: 12:39
Member (2007)
English to French
+ ...
Are you translating long segments? Dec 28, 2011

Susan Welsh wrote:
I have been translating a Russian document in a CAT tool, using the GT API with no problems, and all of a sudden it starts throwing in garbage for Cyrllic segments (lots of %signs, Ds and Os). Universal Online Cyrillic converter identifies the encoding as KOI-7. Yesterday when this happened, it worked fine on other segments; but today it's happening again. Has anyone else had this problem?

Do your "garbage" segments begin with
Server returned HTTP response code: 414 for URL:

I can reproduce the issue in OmegaT if I try and translate long segments from Russian to English. Short segments are translated fine, but I get the 414 error for long segments.

That's because Russian characters have to be encoded, so the strings are much longer than for "ASCII" based languages.

E.g.,
googleapis.com/language/translate/v2?key=xxxxx&source=RU&target=EN&q=%D0%92+1526+%D0%B3%D0%BE%D0%B4%D1%83+%D0%BF%D0%B5%D1%80%D0%B5%D0%B1%D1%80%D0%B0%D0%BB%D1%81%D1%8F

I know there is another method, which allows to send slightly longer strings. I'll check with Alex (he's more concerned than I am), but eventually the problem will always exist for lengthy segments.

Didier


 

Susan Welsh  Identity Verified
United States
Local time: 06:39
Member (2008)
Russian to English
+ ...
TOPIC STARTER
example Dec 28, 2011

Hi Didier and esperantisto,

Didier, you seem to have identified the problem (although I would not say that this segment is terribly long), because it does give that code (below). I am working with OmegaT 2.5.2 on Ubuntu Linux, OOo 3.2.0.

(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Thanks,
Susan

PS - After some editing, the garbage is no longer displaying in this message as I am seeing on my screen. It is exclusively full of %DO%BE%DO%B4 and stuff like that, with no Cyrillic words. I'm going to delete the example, except for the source text and the error code.

Выросши в холодной Сибири, постоянно с величайшим вниманием следя за описаниями полярных путешествий и многое узнав о них от покойного моего друга Норденшильда, совершившего ряд славных экспедиций в области льдов, я получил полное убеждение в возможности решительной победы над полярными льдами при помощи соответственных для того приспособлений и, главное, - ясного понимания сил, до сих пор препятствовавших кораблям проникнуть в неведомую околополюсную область, занимающую пространство около 4 млн кв.
Server returned HTTP response code: 414 ...

[Edited at 2011-12-28 14:49 GMT]


 

Dominique Pivard  Identity Verified
Local time: 13:39
Finnish to French
Anaphraseus Dec 29, 2011

Susan Welsh wrote:
(Esperantisto, I'm not familiar with Anaphraseus -- not sure what it is. I'll check when I get a chance.)

Anaphraseus (http://anaphraseus.sourceforge.net/ ) is a Wordfast (Classic) "clone". It works in OpenOffice instead of MS Office, is quite slower than Wordfast and has a much smaller feature set.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Google Translate API messing up Cyrillic

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search