Pages in topic: < [1 2 3] > | Is there a CAT tool with integrated OCR? Thread poster: 6764890385 (X)
| 6764890385 (X) TOPIC STARTER I'm assuming said feature of Wordfast only works on files that're only image based | Jun 25, 2013 |
... because it didn't work on mine, which has both regular text, text boxes, tables, etc. and images that got text in em.
[Edited at 2013-06-25 20:36 GMT] | | | Convert into a Full PDF File | Jun 25, 2013 |
Perhaps this would solve your problem. If your document is a mix of text, text boxes and pictures, convert it into a full PDF file. Use the Force... I mean, the FineReader Open the Fine Reader, click on the Folder icon to open your file. Select you file. FineReader will bring the file in. On the left side, there is a option to save the file in the required format. It is a down arrow, ... See more Perhaps this would solve your problem. If your document is a mix of text, text boxes and pictures, convert it into a full PDF file. Use the Force... I mean, the FineReader Open the Fine Reader, click on the Folder icon to open your file. Select you file. FineReader will bring the file in. On the left side, there is a option to save the file in the required format. It is a down arrow, which will present a dropdown menu with several file formats. Chose PDF document, Click Save. On the next windown, chose where you want it saved. Check it out to see if it is a full PDF file. Then Use the FineReader again to convert it to DOC, DOCX, etc. Or use WordFast Anywhere. Good luck
[Editada em 2013-06-25 21:23 GMT] ▲ Collapse | | | 6764890385 (X) TOPIC STARTER lol at "use the force" | Jun 25, 2013 |
I already mentioned doing that but FR went all dark side on me and didn't give me an option to change all images' resolution at once. Not that I know for a fact that doing that would solve it, but with the current resolutions there's plenty of errors in recongition. | | | Michael Beijer United Kingdom Local time: 09:17 Member (2009) Dutch to English + ... A word to the wise. | Jun 25, 2013 |
Let me more or less repeat what I said earlier. I have been converting difficult PDFs for aeons as well as helping my friends and colleagues do the same and none of these built in converters in CAT tools are going to be better than ABBYY FineReader. I don't have the time to research this properly but many of these CAT tools actually use the same semi-decent (cheap) third-party engines. None of them are going to be as good as ABBYY (in the hands of someone who knows what he or she is... See more Let me more or less repeat what I said earlier. I have been converting difficult PDFs for aeons as well as helping my friends and colleagues do the same and none of these built in converters in CAT tools are going to be better than ABBYY FineReader. I don't have the time to research this properly but many of these CAT tools actually use the same semi-decent (cheap) third-party engines. None of them are going to be as good as ABBYY (in the hands of someone who knows what he or she is doing). Yes, Fluency has a PDF converter. Big deal. So does memoQ now, and Wordfast Anywhere, and maybe a few others. It might look good on their press release and on their website but if you really need to convert difficult PDFs in the long term (which most of us do, sadly) I would recommend you get yourself ABBYY FineReader (and Adobe Acrobat if at all possible, because it is the best at converting 'good PDFs', no matter how complex. After all, PDF is Adobe's format) and spend some time mastering the dark arts of PDF conversion. Oh, and whatever you do please do not buy a CAT tool based on its PDF converter! Michael ▲ Collapse | |
|
|
6764890385 (X) TOPIC STARTER I don't want a CAT tool anyway | Jun 25, 2013 |
What I tried at first was FR, but it doesn't seem to read certain parts and images in the powerpoint file that I have (not a pdf), and the pages are full of yellow triangles with exclamation marks in them most of them recommending increasing the resolution. Now that I don't know how to do this in batch, it would take too long to change the resolutions individually. Now do you know how to do that in batch? | | | Source Files | Jun 25, 2013 |
Insist on getting the source files is definitely the best approach! Bernard | | | 6764890385 (X) TOPIC STARTER
Bernard Lieber wrote: Insist on getting the source files is definitely the best approach! Bernard Not following.
[Edited at 2013-06-25 22:13 GMT] | | | 6764890385 (X) TOPIC STARTER Changing the resolution didn't work | Jun 25, 2013 |
I found the option to batch change the images' resolutions but it only made things worse, set it to 400 and now it doesn't read anything. What else can I try? | |
|
|
ABBYY is a good choice | Jun 26, 2013 |
You can convert the PowerPoint files to PDF, and then scan the PDF with ABBYY. Or save the picture separately out from the PowerPoint file, and then scan these pictures with ABBYY. This is a good workaround. | | |
What I meant is that the pictures have been created with a graphics package (Photoshop, Illustrator, etc.) and then saved as .bmp, etc. if you get the source files you won't have to OCR anything (tell your customer you'll charge more for the extra work, etc., most of the time you'll get the source files). You can then use a CAT tool like Alchemy Publisher 3.0 (the graphics are shown in the tree structure), translate the graphics, save them as .bmp, etc. and replace them on-the-fly in the tree st... See more What I meant is that the pictures have been created with a graphics package (Photoshop, Illustrator, etc.) and then saved as .bmp, etc. if you get the source files you won't have to OCR anything (tell your customer you'll charge more for the extra work, etc., most of the time you'll get the source files). You can then use a CAT tool like Alchemy Publisher 3.0 (the graphics are shown in the tree structure), translate the graphics, save them as .bmp, etc. and replace them on-the-fly in the tree structure. Hope that's clear enough, Bernard ▲ Collapse | | | 6764890385 (X) TOPIC STARTER
To Heartsome: Those are the two things I tried anyway. It's clearly not good enough as when I process em most pictures get a warning about resolution, and I don't know why FR's resolution system is based on DPI, but that's useless enough as images don't necessarily have to be scanned, they can be digital ones like the ones I extracted out of this ppt. To Bernard: I already have the source file, it's a powerpoint, and I exported the images out of it myself.
[Edited... See more To Heartsome: Those are the two things I tried anyway. It's clearly not good enough as when I process em most pictures get a warning about resolution, and I don't know why FR's resolution system is based on DPI, but that's useless enough as images don't necessarily have to be scanned, they can be digital ones like the ones I extracted out of this ppt. To Bernard: I already have the source file, it's a powerpoint, and I exported the images out of it myself.
[Edited at 2013-06-26 11:12 GMT] ▲ Collapse | | | Using FineReader | Jun 26, 2013 |
If an OCR output is not suitable for translation in CAT tools, don't necessarily blame it on FineReader but learn how to use it more efficiently. It does take some time and preparation, but it pays off. So instead of blindly stuffing the whole PDF as is into OCR, maybe go through each page, eliminate the clutter that you don't need, define the paragraphs etc. And if absolutely nothing helps: get a typist! | |
|
|
6764890385 (X) TOPIC STARTER
That's what everyone keeps talking about here knowing how to use FR more efficiently, on the other hand they can't seem to be able to answer the simplest of questions I ask about it. So why don't someone tell me how to solve this, if there's a way to solve it anyway, instead of going on and on about how FR's actually awesome but a poor misunderstood program. So maybe instead of blindly rushing into a topic, you might think about reading what I said earlier, about how dealing with ea... See more That's what everyone keeps talking about here knowing how to use FR more efficiently, on the other hand they can't seem to be able to answer the simplest of questions I ask about it. So why don't someone tell me how to solve this, if there's a way to solve it anyway, instead of going on and on about how FR's actually awesome but a poor misunderstood program. So maybe instead of blindly rushing into a topic, you might think about reading what I said earlier, about how dealing with each page individually takes too much time and isn't an option. ▲ Collapse | | | Kevin Fulton United States Local time: 05:17 German to English Embedded images | Jun 26, 2013 |
Mert Dirice wrote: What I tried at first was FR, but it doesn't seem to read certain parts and images in the powerpoint file that I have (not a pdf), and the pages are full of yellow triangles with exclamation marks in them most of them recommending increasing the resolution. I suspect you're dealing with embedded images in your PPT presentation. That is, the images were not created with the PowerPoint presentation you're working with; they have been copied from another source (possibly a PDF of another PPT file). Changing the resolution won't help. If this is the case, your best hope is to copy these images to a graphics editing program and translate them there, then paste them back into the PowerPoint file. Normally this is billed separately on an hourly basis | | | 6764890385 (X) TOPIC STARTER
Yes they're embedded. But doing that is no different than simply working on a new Word file and creating all the formats manually as the only difficult formatting I encounter is with the images. | | | Pages in topic: < [1 2 3] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Is there a CAT tool with integrated OCR? Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |