https://www.proz.com/forum/software_applications/192842-cost_of_ocr.html

cost of OCR
Thread poster: Signe Golly
Signe Golly
Signe Golly  Identity Verified
Denmark
Local time: 01:59
English to Danish
+ ...
Feb 23, 2011

I know I've seen members talk about sending pdf files to a "professional" to have them OCR'ed. Of course I can't find the threads right now.
I'm curious how much something like that usually costs and if perhaps anybody would be willing to share their contacts with me for this kind of task (feel free to msg/email me privately)?
I have a pdf of a tax form as well as an OCR'ed version in Word (from the outsourcer agency) but I'm wondering if someone might be able to produce a cleaner v
... See more
I know I've seen members talk about sending pdf files to a "professional" to have them OCR'ed. Of course I can't find the threads right now.
I'm curious how much something like that usually costs and if perhaps anybody would be willing to share their contacts with me for this kind of task (feel free to msg/email me privately)?
I have a pdf of a tax form as well as an OCR'ed version in Word (from the outsourcer agency) but I'm wondering if someone might be able to produce a cleaner version (it's kind of a mess and will take HOURS before I can even plug it into Trados for translation). What are the chances that one of these OCR wizards will be willing to look at both versions and honestly say whether they can do it better?
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 01:59
Member (2006)
English to Afrikaans
+ ...
Better OCR Feb 23, 2011

Sgolly wrote:
I have a pdf of a tax form as well as an OCR'ed version in Word (from the outsourcer agency) but I'm wondering if someone might be able to produce a cleaner version (it's kind of a mess and will take HOURS before I can even plug it into Trados for translation).


My pragmatic opinion is that the quality of scan you'll get from the typical software that you get for free when you buy a scanner is the quality that you'll get from any OCR program, and that if that is not sufficient, then your next best bet is to use the services of a typist.

I have learnt my lesson when it comes to clients telling me that they'll "convert" the PDF to MS Word, and all I get in the end is a bad OCR job. If the client does not expect you to check the translation against the PDF, then it may be okay to accept the poor OCR job, but in reality most clients believe it is your job (not theirs) to check whether the OCR'ed source text is a flawless rendition of the original before you start translating it.


 
Nikita Kobrin
Nikita Kobrin  Identity Verified
Lithuania
Local time: 02:59
Member (2010)
English to Russian
+ ...
I am one of those Feb 23, 2011

Hi Sgolly,

Sgolly wrote:

I know I've seen members talk about sending pdf files to a "professional" to have them OCR'ed. Of course I can't find the threads right now.

I am one of those who think that everyone should do his own work and thus prefer to give PDF files to a DTP opertator for processing. You can read more on the following page:

http://www.proz.com/forum/general_technical_issues/187073-convert_a_scanned_document_to_word.html#1640059


Sgolly wrote:

I'm curious how much something like that usually costs and if perhaps anybody would be willing to share their contacts with me for this kind of task (feel free to msg/email me privately)?

I will contact you privately.

Nikita Kobrin


 
Tony M
Tony M
France
Local time: 01:59
Member
French to English
+ ...
SITE LOCALIZER
Strip out formatting Feb 23, 2011

I usually find that the OCR, in attempting to reproduce the document layout as faithfully as possible, tends to apply loads of styles and other formatting that make life very difficult when translating.

I have taken to starting out by removing all formatting (under 'styles' in Word), and then checking the resulting plain text for hard line breaks, etc. that are likely to upset segmentation.

Once I've done this, it's then easy enough to translate (with or without CAT), a
... See more
I usually find that the OCR, in attempting to reproduce the document layout as faithfully as possible, tends to apply loads of styles and other formatting that make life very difficult when translating.

I have taken to starting out by removing all formatting (under 'styles' in Word), and then checking the resulting plain text for hard line breaks, etc. that are likely to upset segmentation.

Once I've done this, it's then easy enough to translate (with or without CAT), and then last of all, put back at least some semblance of the original formatting.

I can't say I've ever tried paid bureau services, but have been very satsified with the results I've had from ABBYY in various incarnations.
Collapse


 
Ralf Lemster
Ralf Lemster  Identity Verified
Germany
Local time: 01:59
English to German
+ ...
HAPO Formatierungsdienste Feb 23, 2011

Hi "Sgolly",
(Sorry, still cannot get used to addressing people using funny names...)

Sgolly wrote:

I know I've seen members talk about sending pdf files to a "professional" to have them OCR'ed. Of course I can't find the threads right now.
I'm curious how much something like that usually costs and if perhaps anybody would be willing to share their contacts with me for this kind of task (feel free to msg/email me privately)?

The question about the cost of post-OCR formatting is as easy to answer as "how much a translation is". Seriously: you will need to provide the file in order to get a quote; there's no point trying to guess here.

You may want to contact HAPO Formatting Services; I have been working with them for some time, and found them to be extremely efficient.

HTH, Ralf


 
Nikita Kobrin
Nikita Kobrin  Identity Verified
Lithuania
Local time: 02:59
Member (2010)
English to Russian
+ ...
It all depends Feb 24, 2011

Tony M wrote:

I can't say I've ever tried paid bureau services, but have been very satsified with the results I've had from ABBYY in various incarnations.


It all depends upon the file you need to convert.

I have seen Sgolly's document: it is a poorly scanned PDF with lots of tables. No existing software is able to convert it automatically: this conversion will require a lot of manual work.

Nikita Kobrin


 
Signe Golly
Signe Golly  Identity Verified
Denmark
Local time: 01:59
English to Danish
+ ...
TOPIC STARTER
Learning Feb 24, 2011

Ralf Lemster wrote:

Hi "Sgolly",
(Sorry, still cannot get used to addressing people using funny names...)



It's actually just my first initial and my last name - so maybe not THAT funny?

As far as the uncertainties of pricing OCR/DTP services go, I've realized the folly of my ways and Nikita was kind enough to forward the document to his contact for a quote.

In the end, I've decided to pay a professional typist to retype the document from scratch.

Thanks to everyone for their help and responses!


 
Anna Villegas
Anna Villegas
Mexico
Local time: 17:59
English to Spanish
Next time, email me! Feb 25, 2011

I am an expert converting PDF files into MS Word.



[Edited at 2011-02-25 16:20 GMT]


 
Kristyna Marrero
Kristyna Marrero  Identity Verified
United States
Local time: 19:59
TRY WORDFAST ANYWHERE Apr 4, 2011

Hi all,

The latest version of Wordfast Anywhere now offers support for scanned (dead) PDFs using server-side OCR technology. You can view the announcement here - http://www.proz.com/topic/195890

To create a free account and begin using Wordfast Anywhere go to http://www.FreeTM.com

Sin
... See more
Hi all,

The latest version of Wordfast Anywhere now offers support for scanned (dead) PDFs using server-side OCR technology. You can view the announcement here - http://www.proz.com/topic/195890

To create a free account and begin using Wordfast Anywhere go to http://www.FreeTM.com

Sincerely,

Kristyna Marrero
Director of Sales and Marketing

[Edited at 2011-04-04 18:43 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

cost of OCR






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »