Previously I have written an article about how to generate images from pdf using javascript ( pdf.js ). I was finding some better solutions for this process and then I give it a try with python.

In this article, we will see how to generate images from pdf using python. of course, we need some additional libraries of python to do this like flask, pillow, wand, and ImageMagick.

What is FLASK?

It’s a python micro-framework, it’s small but powerful and easy to learn a framework that enables us to build a web app in a short amount of time.

So let's start with installing all the required libraries.

Install flask, pillow, and wand.

# flask command
pip install flash

# Install pillow
pip install pillow

# Install wand
pip install wand

Install & link ImageMagick.

# Steps to install ImageMagick
brew install imagemagick@6
brew unlink imagemagick
brew link imagemagick@6 --force

Now create a template under the template directory of your project and name it index.html, we will code our pdf upload form here.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Pdf To Image Service</title>
    <link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta.2/css/bootstrap.min.css" rel="stylesheet"/>
    <link href="../static/css/index.css" rel="stylesheet"/>
</head>
<body>
    <div class="jumbotron">
        <div class="container">
            <img class="logo" src="../static/images/logo.png"/>
            <h2>PDF TO IMAGE SERVICE</h2>
            <div class="box">
                <form id="upload-form" action="{{ url_for('upload') }}" method="POST" enctype="multipart/form-data">
                    <div class="form-group">
                        <label for="pdfFile">Pdf File</label>
                        <input type="file" class="form-control" id="pdfFile" name="file"/>
                    </div>
                    <button class="btn btn-success">CREATE</button>
                </form>
            </div>
        </div>
    </div>
</body>
</html>

Create one other template under the same directory and name it completed.html and write your success message in that template.

Now here is our python code to convert pdf to image.

import os

from flask import Flask, render_template, request
from wand.image import Image
from pyPdf import PdfFileReader, PdfFileWriter

__author__ = 'EME'

app = Flask(__name__)

APP_ROOT = os.path.dirname(os.path.abspath(__file__))

# Function for preparing images from pdf
def prepare_images(pdf_path):
    # Output dir
    output_dir = os.path.join(APP_ROOT, 'static/pdf_image/')

    with(Image(filename=pdf_path, resolution=300, width=600)) as source:
        images = source.sequence
        pages = len(images)
        for i in range(pages):
            Image(images[i]).save(filename=output_dir + str(i) + '.png')

# App routing
@app.route('/')
def main():
    return render_template('index.html')

@app.route('/upload', methods=['POST'])
def upload():
    pdf_target = os.path.join(APP_ROOT, 'static/pdf')

    # Preparing directory
    if not os.path.isdir(pdf_target):
        os.mkdir(pdf_target)

    # Uploading File
    for file in request.files.getlist('file'):
        filename = file.filename
        destination = "/".join([pdf_target,filename])
        file.save(destination)

        # Creating images
        if os.path.isfile(destination):
            prepare_images(destination)

    return render_template('completed.html')


if __name__ == '__main__':
    app.run()

We have defined 2 routes in this script, One is for our landing page that will show the upload form and the other is “/upload” under which we have written our code for uploading the pdf file to the upload destination. After successfully uploading the “prepare_images” function loop through each page of the pdf file and convert each page to an image.

I have tried this code with a pdf file having 24 pages and it took around 5 min. however pdf.js method is much quicker than this.