Be very careful with Able2Extract 7 and large docs
Thread poster: Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
Feb 18, 2011

Hello fellow translators,

I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss.

However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6.
... See more
Hello fellow translators,

I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss.

However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6.

Hope this helps.
Collapse


 
Natalie
Natalie  Identity Verified
Poland
Local time: 13:13
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Hi Fausto Feb 18, 2011

Thank you for the warning, however, my question is: do you start translating converted files without checking them?

Natalia
P.S. If honestly, I have never heard of such a tool (I use FineReader)


 
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
TOPIC STARTER
Checked, not thoroughly enough Feb 18, 2011

I checked, obviously not well enough. It won't happen again, of course.

The lost data was all in tables and lists, huge tables and lists, which seemed to be Ok.


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 07:13
German to English
Thanks for the heads-up Feb 18, 2011

It's tough to learn a lesson the hard way.

I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount.

My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison.

In the future I'll be sure to make a page-by page com
... See more
It's tough to learn a lesson the hard way.

I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount.

My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison.

In the future I'll be sure to make a page-by page comparison.
Collapse


 
Anton Konashenok
Anton Konashenok  Identity Verified
Czech Republic
Local time: 13:13
French to English
+ ...
PDF extractors are generally inadequate Feb 18, 2011

I've tried quite a few PDF extractors, and have yet to find one that does the job properly in all cases. On the other hand, OCR (e.g. Finereader) works much better; recent versions of Finereader do also extract text directly from PDF whenever possible anyway.

 
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
TOPIC STARTER
Page count, OCR and another useful tip Feb 19, 2011

Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming.

The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents.


And here's another tip. When the docum
... See more
Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming.

The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents.


And here's another tip. When the document has both vertical and horizontal page layouts, create separate .doc files for each group. Otherwise, when printing the document, after the first layout change, the upper and lower part of the pages are lost.
Collapse


 
Anton Konashenok
Anton Konashenok  Identity Verified
Czech Republic
Local time: 13:13
French to English
+ ...
When did you last try OCR? Feb 19, 2011

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.

 
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
TOPIC STARTER
Not in a long time Feb 19, 2011

Anton Konashenok wrote:

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.


Anton, thank you for the info. I am downloading the demo right now and really hope it adapts to my needs.


 
Anton Konashenok
Anton Konashenok  Identity Verified
Czech Republic
Local time: 13:13
French to English
+ ...
A couple of tips Feb 19, 2011

Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scan... See more
Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scanned one.
When saving the results, you can retain the original layout or just save the formatted text - with all pictures and tables, boldface and italics, but without 100 different paragraph styles, without multi-column layouts, etc. The latter is probably a better choice for translation purposes, because the translation hardly ever fits the same layout.

[Edited at 2011-02-19 13:59 GMT]
Collapse


 
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
TOPIC STARTER
Did not work Feb 19, 2011

Anton,

Thank you again for the tip and for the advice. However, it did not work for me and my complex pdfs, at least the demo version which only extracts the first two pages. The outcome was really bad, considering they were probably the two simplest ones.
Able2Extract 6 is not perfect, but at least it saves formatting time.


 
Egidijus Slepetys
Egidijus Slepetys  Identity Verified
Local time: 14:13
German to Lithuanian
Solid Converter® PDF Feb 19, 2011

Sorry for your loss...
That's why I would like to share with you my discovery after many years of searching for the best pdf converter: it is Solid Converter® PDF (http://www.soliddocuments.com/).
I have never seen such an accurate converter! You can try the trial version.


 
Fausto Navarro de Vicente-Gella
Fausto Navarro de Vicente-Gella  Identity Verified
Spain
Local time: 13:13
English to Spanish
+ ...
TOPIC STARTER
Solid Feb 20, 2011

Egidijus,

Thank you for the info and the condolences. I have downloaded the trial version (much better than others, because it actually allows you to convert the whole document) and although very good with format, it creates monster files in terms of size. What A2Extr converted into an 18MB doc, Solid Converter converted into 145 MB. I am analysing the doc with Wordfast and it seems to have gone bonkers...

Sadly, I have to stick to A2Extr for now.


 
Egidijus Slepetys
Egidijus Slepetys  Identity Verified
Local time: 14:13
German to Lithuanian
some advices Feb 20, 2011

yes, Solid Converter wants to convert everything
(btw., there are 3 reconstruction methods)

some work after conversion is needed. The problem is, that Solid Converter recovers character spacing (with is not needed). Therefore you have to do it normal in MS Word.
I don't have this problem, because Star Transit can eliminate character spacing in the conversion process.

Regards!
Egidijus


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Be very careful with Able2Extract 7 and large docs






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »