A good PDF2JPG converter for Windows, anyone?
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 19:08
Member (2006)
English to Afrikaans
+ ...
Sep 18, 2017

Hello everyone

Can you recommend a PDF to JPG converter that runs on Windows 7 that can convert an editable PDF's individual pages to high quality JPGs of around 4000 x 5500 pixels? BMP or similar format is fine as well. My PDF is 400 pages long and weighs in at 1.4 MB. Free, if possible.

Thanks
Samuel


 

Jean Dimitriadis  Identity Verified
France
Local time: 19:08
Member (2015)
English to French
+ ...
PDFBox Sep 18, 2017

Hello Samuel,

Have you tried PDFBox?

https://pdfbox.apache.org/

PDFBox comes with a series of command-line utilities. They are available as standard Java applications.

Check for PDFToImage utility on https://pdfbox.apache.org/2.0/commandline.html (JPG or PNG supported)
You can tweak the output resolution by setting the -dpi option. Try -dpi 600.

It can also extract images from editable PDFs (see ExtractImages on the above link), among other things.

Note: I’m currently using it on GNU/Linux, but being a Java application, it should work just as fine on Windows.

PS: May I ask what you intend to do with those images? Other utilities can help you further manipulate the resulting images, depending on your purpose.

Jean

[Edited at 2017-09-18 14:21 GMT]

[Edited at 2017-09-18 14:22 GMT]


 

esperantisto  Identity Verified
Local time: 21:08
Member (2006)
English to Russian
+ ...
IrfanViewer Sep 18, 2017

Try IrfanViewer. It is a graphics viewer with a lot of conversion options.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Jean Sep 18, 2017

Jean Dimitriadis wrote:
May I ask what you intend to do with those images?


I have a glossary in PDF format, and the page layout has two columns. My OCR program sometimes recognises such pages as a grid (i.e. converts it to a table), but sometimes recognises it as two columns (i.e. will write column 2 under column 1 in the final file), and sometimes recognises no columns (i.e. will "merge" the two columns on a per line basis with multiple spaces between the columns.

I want to convert this to JPG and then use e.g. XnView to slice the JPGs vertically in half, to feed it to OCR again. For this to work, the JPGs must high quality and they must be big (around 5500 x 2000 pixels per half), otherwise it leads to OCR errors.

When the OCR function runs on an editable PDF, it actually extracts the text, but when it runs off JPGs, it does actual OCR.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Re: IrfanView Sep 18, 2017

esperantisto wrote:
Try IrfanViewer. It is a graphics viewer with a lot of conversion options.


Thanks. IrfanView (with the PDF plugin installed) does convert individual pages of a PDF file to images, but the user has no control over the size of the images. It converts my PDF to images of 816 x 1056 pixels, which is woefully too small.


 

Jean Dimitriadis  Identity Verified
France
Local time: 19:08
Member (2015)
English to French
+ ...
@Samuel Sep 18, 2017

I see, thank you for providing more details.

In this case, I would suggest you try Tabula directly on the PDF.

http://tabula.technology/

Tabula is a tool for liberating data tables locked inside PDF files.

If the columns are well defined, it can help you quickly select the tables area and then export as CSV, and other formats.

I have already used it to extract glossaries from PDFs.

Works fine, as long as the PDF is editable and the information in table form.

Jean


 

neilmac  Identity Verified
Spain
Local time: 19:08
Spanish to English
+ ...
The culprit Sep 18, 2017

I hate getting sent PDFs which turnout to be JPGs. Now I know who's behind it all!

But seriously, I've often wondered why people do this and usually assume it's an oversight, especially when they are sending something which has to be modified/edited/translated... It would also be interesting to know if there is some kind of program that does the reverse, i.e. converts JPG back into PDF, or some other more amenable format.


 

esperantisto  Identity Verified
Local time: 21:08
Member (2006)
English to Russian
+ ...
OCR Sep 18, 2017

Samuel, I think, you're about to choose a wrong way. You should better explore the features of your OCR program. ABBYY FineReader has a feature of manually splitting pages. Or, even better, manually marking up tables.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Re: ABBYY FineReader Sep 18, 2017

esperantisto wrote:
I think, you're about to choose a wrong way. You should better explore the features of your OCR program. ABBYY FineReader has a feature of manually splitting pages. Or, even better, manually marking up tables.


Manually marking up 400 pages is a non-starter, I'm afraid.

The "manually split" option is slightly faster than manually marking up whole pages, but only by a little bit. There is an option to automatically split images, but that is meant for cases where two pages were scanned onto a single PDF page. FineReader can't detect my pages' split point automatically.

You can see a sample page here (I've marked the left column in red, and two glossary entries with blue):
z3guyzijeouehq55iqxl.png


 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:08
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Jean and @Mac Sep 18, 2017

Jean Dimitriadis wrote:
In this case, I would suggest you try Tabula directly on the PDF.
http://tabula.technology/


Thanks, I can confirm that Tabula produces useful output. One has to select the two columns manually, but then there is an option to repeat the selections on subsequent pages automatically. It doesn't appear to have an option to preserve font colours and e.g. bold etc.

neilmac wrote:
I hate getting sent PDFs which turn out to be JPGs. ... I've often wondered why people do this and usually assume it's an oversight, especially when they are sending something which has to be modified/edited/translated...


Yes, but that is a different topic altogether. A previously editable PDF that was converted to a PDF with embedded images is a one-way conversion. The only way to convert in the other direction is to use an OCR program or a human typist.


 

Lincoln Hui  Identity Verified
Hong Kong
Local time: 02:08
Member
Chinese to English
+ ...
Irfanview Sep 19, 2017

Samuel Murray wrote:

esperantisto wrote:
Try IrfanViewer. It is a graphics viewer with a lot of conversion options.


Thanks. IrfanView (with the PDF plugin installed) does convert individual pages of a PDF file to images, but the user has no control over the size of the images. It converts my PDF to images of 816 x 1056 pixels, which is woefully too small.

As far as individual pages are concerned, it's basically a matter of resizing it to whatever you please then export to a graphics format just as you would with any other type of image. I don't know if it has a way to deal with multiple pages.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

A good PDF2JPG converter for Windows, anyone?

Advanced search






BaccS – Business Accounting Software
Modern desktop project management for freelance translators

BaccS makes it easy for translators to manage their projects, schedule tasks, create invoices, and view highly customizable reports. User-friendly, ProZ.com integration, community-driven development – a few reasons BaccS is trusted by translators!

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search