Pages in topic:   < [1 2 3] >
Is there a CAT tool with integrated OCR?
Thread poster: 6764890385 (X)
6764890385 (X)
6764890385 (X)

TOPIC STARTER
I'm assuming said feature of Wordfast only works on files that're only image based Jun 25, 2013

... because it didn't work on mine, which has both regular text, text boxes, tables, etc. and images that got text in em.

[Edited at 2013-06-25 20:36 GMT]


 
Walter Moura
Walter Moura  Identity Verified
Brazil
Local time: 06:17
English to Portuguese
Convert into a Full PDF File Jun 25, 2013

Perhaps this would solve your problem.

If your document is a mix of text, text boxes and pictures, convert it into a full PDF file.

Use the Force... I mean, the FineReader

Open the Fine Reader, click on the Folder icon to open your file.
Select you file. FineReader will bring the file in. On the left side, there is a option to save the file in the required format. It is a down arrow,
... See more
Perhaps this would solve your problem.

If your document is a mix of text, text boxes and pictures, convert it into a full PDF file.

Use the Force... I mean, the FineReader

Open the Fine Reader, click on the Folder icon to open your file.
Select you file. FineReader will bring the file in. On the left side, there is a option to save the file in the required format. It is a down arrow, which will present a dropdown menu with several file formats.

Chose PDF document, Click Save.

On the next windown, chose where you want it saved.

Check it out to see if it is a full PDF file.

Then Use the FineReader again to convert it to DOC, DOCX, etc. Or use WordFast Anywhere.

Good luck

[Editada em 2013-06-25 21:23 GMT]
Collapse


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
lol at "use the force" Jun 25, 2013

I already mentioned doing that but FR went all dark side on me and didn't give me an option to change all images' resolution at once. Not that I know for a fact that doing that would solve it, but with the current resolutions there's plenty of errors in recongition.

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 09:17
Member (2009)
Dutch to English
+ ...
A word to the wise. Jun 25, 2013

Let me more or less repeat what I said earlier.

I have been converting difficult PDFs for aeons as well as helping my friends and colleagues do the same and none of these built in converters in CAT tools are going to be better than ABBYY FineReader. I don't have the time to research this properly but many of these CAT tools actually use the same semi-decent (cheap) third-party engines. None of them are going to be as good as ABBYY (in the hands of someone who knows what he or she is
... See more
Let me more or less repeat what I said earlier.

I have been converting difficult PDFs for aeons as well as helping my friends and colleagues do the same and none of these built in converters in CAT tools are going to be better than ABBYY FineReader. I don't have the time to research this properly but many of these CAT tools actually use the same semi-decent (cheap) third-party engines. None of them are going to be as good as ABBYY (in the hands of someone who knows what he or she is doing).

Yes, Fluency has a PDF converter. Big deal. So does memoQ now, and Wordfast Anywhere, and maybe a few others. It might look good on their press release and on their website but if you really need to convert difficult PDFs in the long term (which most of us do, sadly) I would recommend you get yourself ABBYY FineReader (and Adobe Acrobat if at all possible, because it is the best at converting 'good PDFs', no matter how complex. After all, PDF is Adobe's format) and spend some time mastering the dark arts of PDF conversion.

Oh, and whatever you do please do not buy a CAT tool based on its PDF converter!

Michael
Collapse


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
I don't want a CAT tool anyway Jun 25, 2013

What I tried at first was FR, but it doesn't seem to read certain parts and images in the powerpoint file that I have (not a pdf), and the pages are full of yellow triangles with exclamation marks in them most of them recommending increasing the resolution. Now that I don't know how to do this in batch, it would take too long to change the resolutions individually. Now do you know how to do that in batch?

 
Bernard Lieber
Bernard Lieber  Identity Verified
Local time: 10:17
English to French
+ ...
Source Files Jun 25, 2013

Insist on getting the source files is definitely the best approach!

Bernard


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
- Jun 25, 2013

Bernard Lieber wrote:

Insist on getting the source files is definitely the best approach!

Bernard


Not following.

[Edited at 2013-06-25 22:13 GMT]


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
Changing the resolution didn't work Jun 25, 2013

I found the option to batch change the images' resolutions but it only made things worse, set it to 400 and now it doesn't read anything. What else can I try?

 
Heartsome Support
Heartsome Support
Local time: 17:17
ABBYY is a good choice Jun 26, 2013

You can convert the PowerPoint files to PDF, and then scan the PDF with ABBYY.

Or save the picture separately out from the PowerPoint file, and then scan these pictures with ABBYY. This is a good workaround.


 
Bernard Lieber
Bernard Lieber  Identity Verified
Local time: 10:17
English to French
+ ...
Pics Jun 26, 2013

What I meant is that the pictures have been created with a graphics package (Photoshop, Illustrator, etc.) and then saved as .bmp, etc. if you get the source files you won't have to OCR anything (tell your customer you'll charge more for the extra work, etc., most of the time you'll get the source files). You can then use a CAT tool like Alchemy Publisher 3.0 (the graphics are shown in the tree structure), translate the graphics, save them as .bmp, etc. and replace them on-the-fly in the tree st... See more
What I meant is that the pictures have been created with a graphics package (Photoshop, Illustrator, etc.) and then saved as .bmp, etc. if you get the source files you won't have to OCR anything (tell your customer you'll charge more for the extra work, etc., most of the time you'll get the source files). You can then use a CAT tool like Alchemy Publisher 3.0 (the graphics are shown in the tree structure), translate the graphics, save them as .bmp, etc. and replace them on-the-fly in the tree structure.

Hope that's clear enough,

Bernard
Collapse


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
FineReader Jun 26, 2013

To Heartsome:
Those are the two things I tried anyway. It's clearly not good enough as when I process em most pictures get a warning about resolution, and I don't know why FR's resolution system is based on DPI, but that's useless enough as images don't necessarily have to be scanned, they can be digital ones like the ones I extracted out of this ppt.

To Bernard:
I already have the source file, it's a powerpoint, and I exported the images out of it myself.

[Edited
... See more
To Heartsome:
Those are the two things I tried anyway. It's clearly not good enough as when I process em most pictures get a warning about resolution, and I don't know why FR's resolution system is based on DPI, but that's useless enough as images don't necessarily have to be scanned, they can be digital ones like the ones I extracted out of this ppt.

To Bernard:
I already have the source file, it's a powerpoint, and I exported the images out of it myself.

[Edited at 2013-06-26 11:12 GMT]
Collapse


 
Wolfgang Jörissen
Wolfgang Jörissen  Identity Verified
Belize
Dutch to German
+ ...
Using FineReader Jun 26, 2013

If an OCR output is not suitable for translation in CAT tools, don't necessarily blame it on FineReader but learn how to use it more efficiently. It does take some time and preparation, but it pays off. So instead of blindly stuffing the whole PDF as is into OCR, maybe go through each page, eliminate the clutter that you don't need, define the paragraphs etc. And if absolutely nothing helps: get a typist!

 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
FR Jun 26, 2013

That's what everyone keeps talking about here knowing how to use FR more efficiently, on the other hand they can't seem to be able to answer the simplest of questions I ask about it. So why don't someone tell me how to solve this, if there's a way to solve it anyway, instead of going on and on about how FR's actually awesome but a poor misunderstood program.

So maybe instead of blindly rushing into a topic, you might think about reading what I said earlier, about how dealing with ea
... See more
That's what everyone keeps talking about here knowing how to use FR more efficiently, on the other hand they can't seem to be able to answer the simplest of questions I ask about it. So why don't someone tell me how to solve this, if there's a way to solve it anyway, instead of going on and on about how FR's actually awesome but a poor misunderstood program.

So maybe instead of blindly rushing into a topic, you might think about reading what I said earlier, about how dealing with each page individually takes too much time and isn't an option.
Collapse


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 05:17
German to English
Embedded images Jun 26, 2013

Mert Dirice wrote:

What I tried at first was FR, but it doesn't seem to read certain parts and images in the powerpoint file that I have (not a pdf), and the pages are full of yellow triangles with exclamation marks in them most of them recommending increasing the resolution.


I suspect you're dealing with embedded images in your PPT presentation. That is, the images were not created with the PowerPoint presentation you're working with; they have been copied from another source (possibly a PDF of another PPT file). Changing the resolution won't help.

If this is the case, your best hope is to copy these images to a graphics editing program and translate them there, then paste them back into the PowerPoint file. Normally this is billed separately on an hourly basis


 
6764890385 (X)
6764890385 (X)

TOPIC STARTER
Yup Jun 26, 2013

Yes they're embedded. But doing that is no different than simply working on a new Word file and creating all the formats manually as the only difficult formatting I encounter is with the images.

 
Pages in topic:   < [1 2 3] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Is there a CAT tool with integrated OCR?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »