IRIS PDF OCR plug-in trouble
Thread poster: Pavel Slama

Pavel Slama  Identity Verified
United Kingdom
Local time: 17:44
Member (2014)
English to Czech
+ ...
Aug 23

Good afternoon. I was quite excited about the new OCR feature, with which Trados’s OCR will support my languages, such as Czech.

However, I am probably doing something wrong: I’ve installed the plug-in and enabled it in options, however attempts to OCR a Czech document via Trados still result in illegible gibberish, as if though the software had not recognized the language of the document (and perhaps defaulted to English).

I first installed the plug-in, but when I was subsequently ticking it in Options, there was a message that I should download & install it.

Thanks for any advice.


Direct link Reply with quote
 
CafeTran Training
Netherlands
Local time: 18:44
Words glued togehter? Aug 23

Pavel Slama wrote:

Good afternoon. I was quite excited about the new OCR feature


I watched the video and I was wondering: are these words like "Itis rowthe mostLiked etc." really glued together?

Screen Shot 2017-08-23 at 18.43.06

If so, I'd say it's a rather poor result of Iris' OCR.


Direct link Reply with quote
 

Pavel Slama  Identity Verified
United Kingdom
Local time: 17:44
Member (2014)
English to Czech
+ ...
TOPIC STARTER
Czech example Aug 23

OK, so to be more specific, I’ll give a very straightforward example.

Original:
Capture

Google Docs buildt in OCR (0 mistakes in this paragraph):
Capture2

Trados with IRIS OCR:
Capture3

But I’m still hoping there may be a human factor on my part.


Direct link Reply with quote
 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 15:44
English to Portuguese
+ ...
Butting in... Aug 23

Though my late parents were Polish, I don't speak any of it. Nor Czech, if that matters.

However I see that the ž (CZ) was OCR'd as ż (PL).
Is there any chance your program was set up for Polish (too)?

I had such experience with an ancient OCR program (can't recall its name), where ó (PT) was OCR'd as 6, until I realized that it was still set for EN, in spite of my insistent setting for PT.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 18:44
English
You watched the video? Aug 23

CafeTran Training wrote:

If so, I'd say it's a rather poor result of Iris' OCR.


If that was IRIS I'd agree with you


Direct link Reply with quote
 

Pavel Slama  Identity Verified
United Kingdom
Local time: 17:44
Member (2014)
English to Czech
+ ...
TOPIC STARTER
That’s what I’m wondering, whether my setup’s right Aug 23

José Henrique Lamensdorf wrote:
... any chance your program was set up for Polish (too)?


It’s not my programme, it’s the brand new plug-in made by SDL themselves, I believe. It is completely possible the setup is not right, and that's why I’m asking for help. I used it from a project set up as CS>EN.

Otherwise, well done, José, for recognizing Polish characters where Czech ones should be.


Direct link Reply with quote
 

SDL Community  Identity Verified
United Kingdom
Local time: 18:44
English
I copied your image... Aug 23

Pavel Slama wrote:

OK, so to be more specific, I’ll give a very straightforward example.

Original:
Capture

But I’m still hoping there may be a human factor on my part.


... with a screen capture and saved as a PDF. Then opened the PDF in Studio using IRIS. Doesn't look as bad as your test to me. Are you sure you used IRIS?

https://www.dropbox.com/s/byi088wuew3wcfq/cz_iris.jpg?dl=0

Regards

Paul
Why not try the new SDL Community


[Edited at 2017-08-23 22:36 GMT]


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

IRIS PDF OCR plug-in trouble

Advanced search







Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »
memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search