Off topic: If it doesn't say "OCR" then it aint OCR
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 22:30
Member (2006)
English to Afrikaans
+ ...
May 8, 2009

To anyone who might want to buy a PDF converter: If the word "OCR" is not on the web site, then the converter can't do OCR.

A somehwat computer illiterate colleague of mine just bought Nuance PDF Converter for $50 and could not understand why she is unable to edit the converted MS Word files. Well, the reason is that the converted Word file is full of little images, with no text. Nuance's PDF Converter faithfully "converted" the PDF, but not in any way that is useful.

My personal opinion (which is often not shared by software developers or their marketing departments) is that if a user may reasonably have a certain expectation about a product, which is not available in that paricular version, it should be clearly stated on the web site or in the product description. For PDF converters, I think "we don't do OCR" should be mandatory.


 

Angela Dickson  Identity Verified
United Kingdom
Local time: 21:30
French to English
+ ...
on the other hand... May 8, 2009

Samuel Murray wrote:

To anyone who might want to buy a PDF converter: If the word "OCR" is not on the web site, then the converter can't do OCR.

A somehwat computer illiterate colleague of mine just bought Nuance PDF Converter for $50 and could not understand why she is unable to edit the converted MS Word files. Well, the reason is that the converted Word file is full of little images, with no text. Nuance's PDF Converter faithfully "converted" the PDF, but not in any way that is useful.

My personal opinion (which is often not shared by software developers or their marketing departments) is that if a user may reasonably have a certain expectation about a product, which is not available in that paricular version, it should be clearly stated on the web site or in the product description. For PDF converters, I think "we don't do OCR" should be mandatory.


I just checked the Abbyy website, and their PDF Transformer product--I'm still using version 1--is not described as OCR, but it does produce a perfectly usable Word document from a text or image PDF, provided the image is reasonably clear (in this respect it does not differ from any other OCR product). The trick is to select the option 'Retain font and font size' instead of 'Retain full page layout' - the latter option can result in the annoying box problem.

I'm not familiar with the Nuance product, though.

Edited to add link to Abbyy: http://www.pdftransformer.com/

[Edited at 2009-05-08 09:53 GMT]


 

Susan Welsh  Identity Verified
United States
Local time: 16:30
Member (2008)
Russian to English
+ ...
ABBYY PDF Transformer/OCR May 8, 2009

Angela Dickson wrote:

I just checked the Abbyy website, and their PDF Transformer product--I'm still using version 1--is not described as OCR, but it does produce a perfectly usable Word document from a text or image PDF, provided the image is reasonably clear (in this respect it does not differ from any other OCR product). The trick is to select the option 'Retain font and font size' instead of 'Retain full page layout' - the latter option can result in the annoying box problem.

Edited to add link to Abbyy: http://www.pdftransformer.com/


?? I have the ABBYY PDF Transformer, and it does say it has a limited OCR capability (compared to ABBYY Finereader, which is mainly intended for OCR, I believe). Perhaps it's not in the marketing material but in the user's manual. But it definitely does say that.


[Edited at 2009-05-08 10:22 GMT]


 

Miroslav Jeftic  Identity Verified
Local time: 22:30
English to Serbian
+ ...
Trial May 8, 2009

Well, before buying any software, you really ought to try it first and see whether it suits your needs. For example, Abbyy FineReader is so far the only solution that is useful for me, and anything else I have seen so far is not good enough, OCR or no.
Of course, FR is not perfect, but the best for now.icon_smile.gif


 

Angela Dickson  Identity Verified
United Kingdom
Local time: 21:30
French to English
+ ...
FAQ May 8, 2009

Yes, Susan, I just looked at the FAQ and OCR is mentioned. I was just suggesting that if you applied Samuel's metric to the Abbyy product, you might rule it out as a solution, when in fact it does the job pretty well.

Looking at the description of the Nuance product, I'm surprised it produced the result Samuel says it did, so have downloaded a trial version to see how it copes with the PDF I'm currently working on. Watch this space...


 

Julia_O_K  Identity Verified
United Kingdom
Local time: 21:30
English to Russian
+ ...
SolidPdfConverter May 8, 2009

SolidPdfConverter is quite good for lots of graphics and tables. Sometimes even better than Finereader, if the text is well visible!

 

Magdalena Bergmann
Germany
Local time: 22:30
Romanian to German
+ ...
FR May 8, 2009

Angela Dickson wrote:

...
The trick is to select the option 'Retain font and font size' instead of 'Retain full page layout' - the latter option can result in the annoying box problem.



Thank you Angela for sharing the trick... it is very helpful.
I am also very happy with my FineReader von Abbyy (no commercial). All other programs I've used didn't meet my expectations.

greets and a nice weekend!
Magda


 

Kathryn Litherland  Identity Verified
United States
Local time: 16:30
Member (2007)
Spanish to English
+ ...
no trial version available? May 8, 2009

I'm not familiar with the Nuance product, but I'm hesitant to pay money for any piece of software that I haven't been able to try out on a trial version just to make sure it does what I need it to do, up to the standards that I need.

Specifically in the OCR realm, I've heard many good things about Finereader, but the price tag is a little steep. I'd been using Able2Extract Pro for about a year ($129) but frankly am not satisfied with the quality/resource usage. I've been trialing ReadIris and think it's much superior in many many ways for the same price.


 

ViktoriaG  Identity Verified
Canada
Local time: 16:30
English to French
+ ...
I disagree May 8, 2009

Samuel Murray wrote:

For PDF converters, I think "we don't do OCR" should be mandatory.


I personally think that a person who wants to do OCR should know what the term means. They also should know the difference between text files and image files. It's as easy as trying to select a portion of the PDF document to see if you can select anything using the text selection tool. If one doesn't know how to use Acrobat Reader, then maybe they should read up on the subject before spending money on things they don't understand.

Another thing about the term OCR is that if you want OCR, you should look for the term in the product description. If the term isn't there, then you know the product isn't the right one for you. To me, forcing software developers to clearly state that their product doesn't do OCR is the same as forcing a pet food manufacturer to label their packaging with "not intended for human consumption"...


 

Miroslav Jeftic  Identity Verified
Local time: 22:30
English to Serbian
+ ...
Spot on May 8, 2009

ViktoriaG wrote:

I personally think that a person who wants to do OCR should know what the term means. They also should know the difference between text files and image files.


Agree 100%.icon_smile.gif


 

Samuel Murray  Identity Verified
Netherlands
Local time: 22:30
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Disagree somewhat May 21, 2009

ViktoriaG wrote:
I personally think that a person who wants to do OCR should know what the term means.


Yes, but someone who wants to convert a file format should not need to know the technical terms for the specific type of conversion. Few people know that "converting graphics to text" is called "OCR". Also, few people realise that PDF often means graphics.

To them, their problem is called "I want to convert PDF to Word". You can't expect such people to know instinctively that there is an acronym "OCR" and what is stands for and that they should be looking for it.


 

Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 23:30
Member (2008)
English to Russian
+ ...
They mention "OCR"... May 21, 2009

... only when there is "manual mode"... Otherwise they offer it as a "black box". No need to say it is OCR. It is some "tool converting input to output".

 

ViktoriaG  Identity Verified
Canada
Local time: 16:30
English to French
+ ...
Precisely Jun 3, 2009

Samuel Murray wrote:

To them, their problem is called "I want to convert PDF to Word".

That is precisely what I was hinting at. People who work with PDFs should know just what a PDF really is. It is not the term OCR that is misused, underused, overused, etc. It is the term PDF that is not well understood. Besides, OCR is used on an infinite variety of file formats, not just on PDFs. I sometimes wonder why some people think of PDFs as soon as they hear OCR...

Knowing what constitutes a PDF document should normally prompt a person to read the available documentation searching for a clue on what kind of PDF the software can convert. I agree that this isn't always clear, so what I do when the information is ambiguous is contact the software editor and pop the question. This always works and I've never bought any software that didn't do exactly what I wanted it to.

As most of us know, assuming serves only one goal: to make an ASS out of U and ME.


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Fernanda Rocha[Call to this topic]

You can also contact site staff by submitting a support request »

If it doesn't say "OCR" then it aint OCR

Advanced search






memoQ translator pro
Kilgray's memoQ is the world's fastest developing integrated localization & translation environment rendering you more productive and efficient.

With our advanced file filters, unlimited language and advanced file support, memoQ translator pro has been designed for translators and reviewers who work on their own, with other translators or in team-based translation projects.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search