Pages in topic:   [1 2 3] >
Is there a CAT tool with integrated OCR?
Thread poster: xxx6764890385
xxx6764890385

Jun 25, 2013

I need this because I got this powerpoint file that's composed of both text and images with text in them. Now the OCR I got (Finereader) won't properly format the target file like Trados will, and Trados won't read the images and extract the text out of them.
If I do this manually it will take a ridiculous amount of time.

Any suggestions?


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:49
Member (2009)
Dutch to English
+ ...
Hi Mert, Jun 25, 2013

Sadly, FineReader is pretty much the best there is in terms of converting a difficult PDF into a usable document. Some CAT tools have built in PDF converters (some using OCR) but not one of them is as good as FineReader.

One question, how well do you know FR? Often, with a little messing around you can greatly improve your results. Fiddle with the little drop-down thing that selects whether the area under scrutiny will be read as text, image or as a table.

Michael

[Edited at 2013-06-25 21:14 GMT]


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 20:49
Member (2009)
Dutch to English
+ ...
try various different converters Jun 25, 2013

My second piece of advice would be to try as many converters as you can get your hands on. I find that sometimes a certain converter will just do a much better job on a particular file. For example, do you own Adobe Acrobat? Adobe Acrobat can sometimes do a great job, sometimes even better than FineReader. Or try the free Wordfast Anywhere converter. I have heard good things about that too.

Michael

[Edited at 2013-06-25 16:13 GMT]


Direct link Reply with quote
 

Michael J.H. Davies  Identity Verified
Denmark
Local time: 21:49
Member (2009)
English to Danish
+ ...
OCR (powerpoint with text in images) Jun 25, 2013

I do not know of any CAT with built-in OCR but there are a number of on-line OCR services (some require payment - often a quite low price per page - others free of cost though usually with a limitation on the number of pages or pages per hour).

One such service I have just 'googled' is able to convert PDF documents to powerpoint - so an alternative solution could be to first convert your powerpoint slides to PDF and then use this on-line OCR (or maybe your FR?) to convert back to powerpoint including the texts within images to editable texts, which Trados should be happier with.

Yu can see more about this on-line OCR and conversion of PDF to powerpoint at: http://www.verypdf.com/wordpress/201211/how-to-convert-image-pdf-to-editable-powerpoint-by-ocr-tech-33121.html.

The OCR referred to in the text is http://www.verypdf.com/app/pdf-to-table-extractor-ocr/index.html.

I wish you luck!


Direct link Reply with quote
 

Michael J.H. Davies  Identity Verified
Denmark
Local time: 21:49
Member (2009)
English to Danish
+ ...
Wordfast Anywhere Jun 25, 2013

I can definitely recommend Wordfast Anywhere as an excellent on-line translation tool, which I use sometimes as an alternative to Trados (I have Studio 2011), as it has some features, which Trados does not. I do not, however, have any exprience of using it on documents such as the ones you are having problems with.

According to http://www.wordfast.com/products_wordfast_anywhere.html it can work with both Powerpoint and PDF (including scanned PDFs - which suggests that it does have OCR capabilities).


Direct link Reply with quote
 
xxx6764890385

TOPIC STARTER
Couldn't batch modify the resolutions of images within the pptx Jun 25, 2013

I don't know much about it, but I wanted to increase the resolution of each image and couldn't find a way to do that in batch. Now you can change the resolution of images individually or mess around as you said showing the program where there's text and tables and images etc. but that just takes too much time.
I tried Wordfast Anywhere and it didn't even have a clue that there's images in the file.

Is there a way to batch modify the resolution of images in FR?

---

Michael Davies: thanks but MS Office can already make a pdf from the presentation or save each slide as an image, and that's already what I'm providing FR to read from (PNG images). And I did hear that Wordfast Anywhere has OCR but the file probably needs to be only image based for it to apply that (see above), it recognized no images in my powerpoint and only took the text.


Direct link Reply with quote
 

ghislandi  Identity Verified
Local time: 20:49
English to Italian
OCR Software Jun 25, 2013

Hello,
I am from SDL and SDL Trados Studio does not have OCR.
However in our experience Abbyy has been one of the best OCR

http://finereader.abbyy.com/professional/?adw=google_EU_UK&FR=n_search&gclid=CLHUn_nO_7cCFSfLtAod-BIAQg

Regards
Massimo

[Edited at 2013-06-25 16:23 GMT]


Direct link Reply with quote
 
xxx6764890385

TOPIC STARTER
SDL Trados and OCR Jun 25, 2013

Hi Massimo, maybe Trados should have OCR, I don't know maybe come with FineReader integrated in it in partnership with ABBYY, so as to prevent us from having to figure out ridiculous workarounds to manage difficult files like this.

Just a thought.

[Edited at 2013-06-25 16:49 GMT]


Direct link Reply with quote
 
Bernard Lieber  Identity Verified
Local time: 21:49
English to French
+ ...
Fluency 2013 Jun 25, 2013

Hi Mert,

Fluency 2013 (Western Digital) has a built-in OCR app (under Tools), download a trial version and give it a go.

HTH,

Bernard


Direct link Reply with quote
 
xxx6764890385

TOPIC STARTER
Fluency Jun 25, 2013

I can't download the trial, it won't accept my phone number. I tried putting both a plus and two zeros at the beginning of the number to no avail.
"Please enter a valid phone number."

By the way, what I want is the software to be able to recognize both images and text, WordFast also had an OCR but it didn't read the images in my pptx which also has text in it. So do you know if Fluency can manage that?

[Edited at 2013-06-25 18:22 GMT]


Direct link Reply with quote
 
Bernard Lieber  Identity Verified
Local time: 21:49
English to French
+ ...
Tech Support Jun 25, 2013

Hi again,

Send an e-mail to support@westernstandard.com, you should get a reply within 5-10 minutes.

HTH,

Bernard

[Edited at 2013-06-25 19:23 GMT]


Direct link Reply with quote
 
xxx6764890385

TOPIC STARTER
Did get the trial Jun 25, 2013

Sorry I did get that trial, apparently it wasn't my telephone (typed my email wrong) but it nonetheless prompted about the phone number.

Direct link Reply with quote
 
Bernard Lieber  Identity Verified
Local time: 21:49
English to French
+ ...
Keep me posted Jun 25, 2013

... about the results as I've never tried that functionality and tech support will certainly help you out if necessary.

Thanks,

Bernard

[Edited at 2013-06-25 20:14 GMT]


Direct link Reply with quote
 
xxx6764890385

TOPIC STARTER
Sure Jun 25, 2013

Will do

Direct link Reply with quote
 
Walter Moura  Identity Verified
Brazil
Local time: 17:49
Member
English to Portuguese
Wordfast Anywhere is Good Jun 25, 2013

I have used the OCR function of Wordfast Anywhere several times, and I can say that it is very good. And you don't have to use the TM to make it work.

Enter WordFast Anywhere website, go to File, Upload Document,

Select document in your HD, then click UPload

WFA will display a disclaimer, with the option to Download as .Doc file. or load it to start using WFA.

If you chose to download it, just click Download as .Doc file. Next option, click OK.

Chose where you want it downloaded to. It will be downloaded as a .zip file.

Just open the ZIP package with winrar, et voila, You have you scanned file, with the same format as the original. Provided, of course, that your PDF file is of good quality.

I did convert a file while writing this for you. I am writing long after the scanning finished, and the file is open and ready for me to work on.


Hope to have helped.

Regards,


Direct link Reply with quote
 
Pages in topic:   [1 2 3] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is there a CAT tool with integrated OCR?

Advanced search







PDF Translation - the Easy Way
TransPDF converts your PDFs to XLIFF ready for professional translation.

TransPDF converts your PDFs to XLIFF ready for professional translation. It also puts your translations back into the PDF to make new PDFs. Quicker and more accurate than hand-editing PDF. Includes free use of Infix PDF Editor with your translated PDFs.

More info »
WordFinder
The words you want Anywhere, Anytime

WordFinder is the market's fastest and easiest way of finding the right word, term, translation or synonym in one or more dictionaries. In our assortment you can choose among more than 120 dictionaries in 15 languages from leading publishers.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search