Be very careful with Able2Extract 7 and large docs
Thread poster: Fausto2112

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
Feb 18, 2011

Hello fellow translators,

I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss.

However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6.

Hope this helps.


Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 06:24
Member (2002)
English to Russian
+ ...

Moderator of this forum
Hi Fausto Feb 18, 2011

Thank you for the warning, however, my question is: do you start translating converted files without checking them?

Natalia
P.S. If honestly, I have never heard of such a tool (I use FineReader)


Direct link Reply with quote
 

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
TOPIC STARTER
Checked, not thoroughly enough Feb 18, 2011

I checked, obviously not well enough. It won't happen again, of course.

The lost data was all in tables and lists, huge tables and lists, which seemed to be Ok.


Direct link Reply with quote
 
Kevin Fulton  Identity Verified
United States
Local time: 00:24
German to English
Thanks for the heads-up Feb 18, 2011

It's tough to learn a lesson the hard way.

I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount.

My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison.

In the future I'll be sure to make a page-by page comparison.


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 06:24
English to Russian
+ ...
PDF extractors are generally inadequate Feb 18, 2011

I've tried quite a few PDF extractors, and have yet to find one that does the job properly in all cases. On the other hand, OCR (e.g. Finereader) works much better; recent versions of Finereader do also extract text directly from PDF whenever possible anyway.

Direct link Reply with quote
 

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
TOPIC STARTER
Page count, OCR and another useful tip Feb 19, 2011

Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming.

The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents.


And here's another tip. When the document has both vertical and horizontal page layouts, create separate .doc files for each group. Otherwise, when printing the document, after the first layout change, the upper and lower part of the pages are lost.


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 06:24
English to Russian
+ ...
When did you last try OCR? Feb 19, 2011

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.

Direct link Reply with quote
 

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
TOPIC STARTER
Not in a long time Feb 19, 2011

Anton Konashenok wrote:

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.


Anton, thank you for the info. I am downloading the demo right now and really hope it adapts to my needs.


Direct link Reply with quote
 

Anton Konashenok  Identity Verified
Czech Republic
Local time: 06:24
English to Russian
+ ...
A couple of tips Feb 19, 2011

Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scanned one.
When saving the results, you can retain the original layout or just save the formatted text - with all pictures and tables, boldface and italics, but without 100 different paragraph styles, without multi-column layouts, etc. The latter is probably a better choice for translation purposes, because the translation hardly ever fits the same layout.

[Edited at 2011-02-19 13:59 GMT]


Direct link Reply with quote
 

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
TOPIC STARTER
Did not work Feb 19, 2011

Anton,

Thank you again for the tip and for the advice. However, it did not work for me and my complex pdfs, at least the demo version which only extracts the first two pages. The outcome was really bad, considering they were probably the two simplest ones.
Able2Extract 6 is not perfect, but at least it saves formatting time.


Direct link Reply with quote
 

Egidijus Slepetys  Identity Verified
Local time: 07:24
German to Lithuanian
Solid Converter® PDF Feb 19, 2011

Sorry for your loss...
That's why I would like to share with you my discovery after many years of searching for the best pdf converter: it is Solid Converter® PDF (http://www.soliddocuments.com/).
I have never seen such an accurate converter! You can try the trial version.


Direct link Reply with quote
 

Fausto2112  Identity Verified
Spain
Local time: 06:24
English to Spanish
+ ...
TOPIC STARTER
Solid Feb 20, 2011

Egidijus,

Thank you for the info and the condolences. I have downloaded the trial version (much better than others, because it actually allows you to convert the whole document) and although very good with format, it creates monster files in terms of size. What A2Extr converted into an 18MB doc, Solid Converter converted into 145 MB. I am analysing the doc with Wordfast and it seems to have gone bonkers...

Sadly, I have to stick to A2Extr for now.


Direct link Reply with quote
 

Egidijus Slepetys  Identity Verified
Local time: 07:24
German to Lithuanian
some advices Feb 20, 2011

yes, Solid Converter wants to convert everything
(btw., there are 3 reconstruction methods)

some work after conversion is needed. The problem is, that Solid Converter recovers character spacing (with is not needed). Therefore you have to do it normal in MS Word.
I don't have this problem, because Star Transit can eliminate character spacing in the conversion process.

Regards!
Egidijus


Direct link Reply with quote
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Be very careful with Able2Extract 7 and large docs

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL MultiTerm 2017
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2017 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2017 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search