Be very careful with Able2Extract 7 and large docs (Software applications)

Technical forums » Software applications »
Be very careful with Able2Extract 7 and large docs
Track this topic

Be very careful with Able2Extract 7 and large docs

Thread poster: Fausto Navarro de Vicente-Gella

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

Feb 18, 2011

Hello fellow translators,

I just recently lost around 400$ due to a really bad conversion from pdf to doc with Able2Extract 7. It "forgot" to convert around 5000-6000 words. I had already sent the quote to my client and could not change the terms of the agreement, so I have to accept the loss.

However, Able2Extract 6 does seem to work properly (with its limitations), so do not be lured by the nice UI on the new version and stick to the old and reliable Version 6.
... See more

Natalie

Poland
Local time: 13:13
Member (2002)
English to Russian
+ ...

Moderator of this forum

SITE LOCALIZER

Hi Fausto

Feb 18, 2011

Thank you for the warning, however, my question is: do you start translating converted files without checking them?

Natalia
P.S. If honestly, I have never heard of such a tool (I use FineReader)

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

TOPIC STARTER

Checked, not thoroughly enough

Feb 18, 2011

I checked, obviously not well enough. It won't happen again, of course.

The lost data was all in tables and lists, huge tables and lists, which seemed to be Ok.

Kevin Fulton

United States
Local time: 07:13
German to English

Thanks for the heads-up

Feb 18, 2011

It's tough to learn a lesson the hard way.

I've been an enthusiastic user of Able2Extract for several years. On large PDF files I tend to rely on PractiCount whenever possible, or by saving the document to text in order to obtain a wordcount.

My own verification of the extraction has been by a page count. If the number of source and target pages are the same, I haven't bothered to do a closer comparison.

In the future I'll be sure to make a page-by page com... See more

Anton Konashenok

Czech Republic
Local time: 13:13
French to English
+ ...

PDF extractors are generally inadequate

Feb 18, 2011

I've tried quite a few PDF extractors, and have yet to find one that does the job properly in all cases. On the other hand, OCR (e.g. Finereader) works much better; recent versions of Finereader do also extract text directly from PDF whenever possible anyway.

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

TOPIC STARTER

Page count, OCR and another useful tip

Feb 19, 2011

Kevin, I agree. A2Ex has worked quite well for me with huge manuals full of tables, pictures, several columns in one page, etc. which OCR cannot recreate. With a bit of MS Word editing know-how and a lot of care, the final result is quite similar to the original and not so time consuming.

The page count method was also my way of checking that the process had completed successfully, particularly in 300 to 400-page documents.

And here's another tip. When the document has both vertical and horizontal page layouts, create separate .doc files for each group. Otherwise, when printing the document, after the first layout change, the upper and lower part of the pages are lost. ▲ Collapse

Anton Konashenok

Czech Republic
Local time: 13:13
French to English
+ ...

When did you last try OCR?

Feb 19, 2011

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

TOPIC STARTER

Not in a long time

Feb 19, 2011

Anton Konashenok wrote:

Fausto, what do you mean by "tables, pictures, several columns in one page, etc. which OCR cannot recreate"? It's EXACTLY what modern OCR software does, and does it well.

Anton, thank you for the info. I am downloading the demo right now and really hope it adapts to my needs.

Anton Konashenok

Czech Republic
Local time: 13:13
French to English
+ ...

A couple of tips

Feb 19, 2011

Fausto, if it's Finereader that you are downloading, keep in mind that it gives you an easy opportunity to intervene manually in the OCR process. You can first let it recognize all the areas by itself, and if it does something you don't like, you can correct the areas by hand. Most often, I correct dividers within the tables. You can move them, delete them, insert new ones, merge cells, split cells, etc. It is rarely needed in a text-based PDF, but becomes almost necessary in a poor-quality scanned one.
When saving the results, you can retain the original layout or just save the formatted text - with all pictures and tables, boldface and italics, but without 100 different paragraph styles, without multi-column layouts, etc. The latter is probably a better choice for translation purposes, because the translation hardly ever fits the same layout.

[Edited at 2011-02-19 13:59 GMT] ▲ Collapse

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

TOPIC STARTER

Did not work

Feb 19, 2011

Anton,

Thank you again for the tip and for the advice. However, it did not work for me and my complex pdfs, at least the demo version which only extracts the first two pages. The outcome was really bad, considering they were probably the two simplest ones.
Able2Extract 6 is not perfect, but at least it saves formatting time.

Egidijus Slepetys

Local time: 14:13
German to Lithuanian

Solid Converter® PDF

Feb 19, 2011

Sorry for your loss...
That's why I would like to share with you my discovery after many years of searching for the best pdf converter: it is Solid Converter® PDF (http://www.soliddocuments.com/).
I have never seen such an accurate converter! You can try the trial version.

Fausto Navarro de Vicente-Gella

Spain
Local time: 13:13
English to Spanish
+ ...

TOPIC STARTER

Solid

Feb 20, 2011

Egidijus,

Thank you for the info and the condolences. I have downloaded the trial version (much better than others, because it actually allows you to convert the whole document) and although very good with format, it creates monster files in terms of size. What A2Extr converted into an 18MB doc, Solid Converter converted into 145 MB. I am analysing the doc with Wordfast and it seems to have gone bonkers...

Sadly, I have to stick to A2Extr for now.

Egidijus Slepetys

Local time: 14:13
German to Lithuanian

some advices

Feb 20, 2011

yes, Solid Converter wants to convert everything

(btw., there are 3 reconstruction methods)

some work after conversion is needed. The problem is, that Solid Converter recovers character spacing (with is not needed). Therefore you have to do it normal in MS Word.
I don't have this problem, because Star Transit can eliminate character spacing in the conversion process.

Regards!
Egidijus

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

Be very careful with Able2Extract 7 and large docs

Forum rules

Help and orientation

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

CafeTran Espresso
You've never met a CAT tool this clever! Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free Buy now! »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Be very careful with Able2Extract 7 and large docs

Be very careful with Able2Extract 7 and large docs

You have native languages that can be verified

Your current localization setting

Select a language