Easily Generate PDFs in Python

A common task in any web application these days is generating files for user reports, the most common being PDFs.  I’ve been building a simple app to track assets and who’s been assigned what asset within a department, and I needed a PDF report of the main page that shows asset allocation.

Naturally, I went with xhtml2pdf since I wanted to take a HTML file, feed it with some context data, and return the output as the PDF.  This is all fine for basic table PDFs, but the moment you add anything fancy like border lines and background colours….things get a bit messed up layout.

After a few minutes of Google search, I came across weasyprint.  It renders the PDFs much better, exactly what is seen in the browser.  For installing this, the recommended way is to go through the CheeseShop as always:

pip install WeasyPrint

But for my Ubuntu 12.04 dev environment, I needed a few more extra plugins to get things installed:

sudo apt-get install libgdk-pixbuf2.0-0 libffi-dev

If these libraries are missing, you’ll get this error:

weasyprint-install-error

So after running through the setup…should take a couple of seconds to download and compile all the dependencies, you can now generate your PDF. I’m developing the system in Django, so here’s what I did:

from django.http import HttpResponse
from django.template import RequestContext, loader
from weasyprint import HTML

context_dict = {
'assets': Item.objects.all(),
}

template = loader.get_template('asset/pdf.html')
html = template.render(RequestContext(request, context_dict))
response = HttpResponse(mimetype='application/pdf')
HTML(string=html).write_pdf(response)

return response

Incase you have inline images using the static templatetag, you might want to make a few edits make a HTTP request on the app since WeasyPrint will have trouble determining the base URL. Note that it might cause a deadlock on a single-threaded server as pointed out in this StackOverflow post:


HTML(string=html, base_url=request.build_absolute_uri()).write_pdf(response)

After that, the result should be a pleasing to look at PDF supporting all sorts of CSS formatting options. For my deployment, I’m using inline CSS in my base template, but you can also feed in more CSS from the included CSS class. More details on this: Python API and StackOverflow

Of course your can always render the HTML directly without an extra library parsing it then return the plain HTML and have the user print the page as a PDF, but this means the user’s browser determines the output….Chrome and Firefox always do a good job here, but you never know what might happen down the line. But if this is OK for you, here’s what you can do:


context_dict = {
'assets': Item.objects.all(),
}
template_name = "asset/pdf.html"
return render_to_response(template_name, context_dict)

Advertisements