Be very careful with Able2Extract 7 and large docs Thread poster: Fausto Navarro de Vicente-Gella
|
Hello fellow translators, I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss. However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6. ... See more Hello fellow translators, I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss. However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6. Hope this helps. ▲ Collapse | | | Natalie Poland Local time: 13:13 Member (2002) English to Russian + ... Moderator of this forum SITE LOCALIZER
Thank you for the warning, however, my question is: do you start translating converted files without checking them? Natalia P.S. If honestly, I have never heard of such a tool (I use FineReader) | | | Checked, not thoroughly enough | Feb 18, 2011 |
I checked, obviously not well enough. It won't happen again, of course. The lost data was all in tables and lists, huge tables and lists, which seemed to be Ok. | | | Kevin Fulton United States Local time: 07:13 German to English Thanks for the heads-up | Feb 18, 2011 |
It's tough to learn a lesson the hard way. I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount. My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison. In the future I'll be sure to make a page-by page com... See more It's tough to learn a lesson the hard way. I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount. My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison. In the future I'll be sure to make a page-by page comparison. ▲ Collapse | |
|
|
PDF extractors are generally inadequate | Feb 18, 2011 |
I've tried quite a few PDF extractors, and have yet to find one that does the job properly in all cases. On the other hand, OCR (e.g. Finereader) works much better; recent versions of Finereader do also extract text directly from PDF whenever possible anyway. | | | Page count, OCR and another useful tip | Feb 19, 2011 |
Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming. The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents. And here's another tip. When the docum... See more Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming. The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents. And here's another tip. When the document has both vertical and horizontal page layouts, create separate .doc files for each group. Otherwise, when printing the document, after the first layout change, the upper and lower part of the pages are lost. ▲ Collapse | | | When did you last try OCR? | Feb 19, 2011 |
Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well. | | | Not in a long time | Feb 19, 2011 |
Anton Konashenok wrote: Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well. Anton, thank you for the info. I am downloading the demo right now and really hope it adapts to my needs. | |
|
|
A couple of tips | Feb 19, 2011 |
Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scan... See more Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scanned one. When saving the results, you can retain the original layout or just save the formatted text - with all pictures and tables, boldface and italics, but without 100 different paragraph styles, without multi-column layouts, etc. The latter is probably a better choice for translation purposes, because the translation hardly ever fits the same layout.
[Edited at 2011-02-19 13:59 GMT] ▲ Collapse | | | Did not work | Feb 19, 2011 |
Anton, Thank you again for the tip and for the advice. However, it did not work for me and my complex pdfs, at least the demo version which only extracts the first two pages. The outcome was really bad, considering they were probably the two simplest ones. Able2Extract 6 is not perfect, but at least it saves formatting time. | | | Solid Converter® PDF | Feb 19, 2011 |
Sorry for your loss... That's why I would like to share with you my discovery after many years of searching for the best pdf converter: it is Solid Converter® PDF (http://www.soliddocuments.com/). I have never seen such an accurate converter! You can try the trial version. | | |
Egidijus, Thank you for the info and the condolences. I have downloaded the trial version (much better than others, because it actually allows you to convert the whole document) and although very good with format, it creates monster files in terms of size. What A2Extr converted into an 18MB doc, Solid Converter converted into 145 MB. I am analysing the doc with Wordfast and it seems to have gone bonkers... Sadly, I have to stick to A2Extr for now. | |
|
|
some advices | Feb 20, 2011 |
yes, Solid Converter wants to convert everything (btw., there are 3 reconstruction methods) some work after conversion is needed. The problem is, that Solid Converter recovers character spacing (with is not needed). Therefore you have to do it normal in MS Word. I don't have this problem, because Star Transit can eliminate character spacing in the conversion process. Regards! Egidijus | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Be very careful with Able2Extract 7 and large docs Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |