Reportlab PDF Processing Python
Reportlab PDF Processing Python
Michael Driscoll
* * * * *
This is a Leanpub book. Leanpub empowers authors and publishers with the
Lean Publishing process. Lean Publishing is the act of publishing an in-progress
ebook using lightweight tools and many iterations to get reader feedback, pivot
until you have the right book and build traction once you do.
* * * * *
In this book, you will learn how to use Reportlab to create PDFs too. This book
will be split into three sections. We will be covering the following topics in the
first section:
The canvas
Drawing
Working with fonts
PLATYPUS
Paragraphs
Tables
Other Flowables
Graphics
and More!
In the second section, we will learn about data processing. The idea here is to
take in several different data formats and turn them into PDFs. For example, it is
quite common to receive data in XML or JSON. But learning how to take that
information and turn it into a report is something that isn’t covered very often.
You will learn how to do that here. In the process we will discover how to make
multipage documents with paragraphs and tables that flow across the pages
correctly.
The last section of the book will cover some of the other libraries you might
need when working with PDFs with Python. In this section we will learn about
the following:
PyPDF2
pdfminer
PyFPDF
I realized that one way for me to remember how to do certain things in Python
was to write about them and that’s how my Python blog came about:
http://www.blog.pythonlibrary.org/. As I wrote, I would receive feedback from
my readers and I ended up expanding the blog to include tips, tutorials, Python
news, and Python book reviews. I work regularly with Packt Publishing as a
technical reviewer, which means that I get to try to check for errors in the books
before they’re published. I also have written for the Developer Zone (DZone)
and i-programmer websites as well as the Python Software Foundation. In
November 2013, DZone published The Essential Core Python Cheat Sheet
that I co-authored. I have also self-published the following books:
The >>> is a Python prompt symbol. You will see this in the Python interpreter
and in IDLE. Other code examples will be shown in a similar manner, but
without the >>>. Most of the book will be done creating examples in regular
Python files, so you won’t be seeing the Python prompt symbol all that often.
When you are using a Python Virtual Environment, you will need to first activate
it. Activation of a virtual environment is like starting a virtual machine up in
VirtualBox or VMWare, except that in this case, it’s just a Python Virtual
Environment instead of an entire operating system.
Creating a virtual sandbox with the virtualenv package is quite easy. On Mac and
Linux, all you need to do is the following in your terminal or command prompt:
virtualenv FOLDER_NAME
You should still change directories into your new folder, but instead of bin, there
will be a Scripts folder that can run activate out of:
Scripts\activate
Once activated, you can install any other 3rd party Python package.
Note: It is recommended that you install all 3rd party packages, such as
ReportLab or Pillow, in a Python Virtual Environment or a user folder. This
prevents you from installing a lot of cruft in your system Python installation.
I would also like to mention that pip supports a –user flag that tells it to install
the package just for the current user if the platform supports it. There is also an –
update flag (or just -U) that you an use to update a package. You can use this
flag as follows:
python -m pip install PACKAGE_NAME --upgrade
While you can also use pip install PACKAGE_NAME, it is now becoming a
recommended practice to use the python -m approach. What this does differently
is that it uses whatever Python is on your path and installs to that Python version.
The -m flag tells Python to load or run a module which in this case is pip. This
can be important when you have multiple versions of Python installed and you
don’t know which version of Python pip itself will install to. Thus, by using the
python -m pip approach, you know that it will install to the Python that is
mapped to your “python” command.
Dependencies
You will need the Python language installed on your maching to use ReportLab.
Python is pre-installed on Mac OS and most Linux distribututions. Reportlab 3
works with both Python 2.7 and Python 3.3+. You can get Python at
https://www.python.org/. They have detailed instructions for installing and
configuring Python as well as building Python should you need to do so.
ReportLab depends on the Python Imaging Library for adding images to PDFs.
The Python Imaging Library itself hasn’t been maintained in years, but you can
use the Pillow (https://pillow.readthedocs.io/en/latest/) package instead. Pillow
is a fork of the Python Imaging Library that supports Python 2 and Python 3 and
has lots of new enhancements that the original package didn’t have. You can
install it with pip as well:
python -m pip install pillow
You may need to run pip as root or Administer depending on where your Python
is installed or if you are installing to a virtualenv. You may find that you enjoy
Pillow so much that you want to install it in your system Python in addition to
your virtual environment.
Installation
Reportlab 3 works with both Python 2.7 and Python 3.3+. This book will be
focusing on using Python 3 and ReportLab 3.x, but you can install ReportLab 3
the same way in both versions of Python using pip:
python -m pip install reportlab
If you are using an older version of Python such as Python 2.6 or less, then you
will need to use ReportLab 2.x. These older versions of ReportLab have *.exe
installers for Windows or a tarball for other operating systems. If you happen to
run a ReportLab exe installer, it will install to Python’s system environment and
not your virtual environment.
If you run into issues installing ReportLab, please go to their website and read
the documentation on the subject at https://www.reportlab.com/
As I said, there are a lot of other settings that you can modify in that Python
script. I highly recommend opening it up and reading through the various
options to see if there’s anything that you will need to modify for your
environment. In fact, you can do so in your Python interpreter by doing the
following:
>>> from reportlab import rl_settings
>>> rl_settings.verbose
0
>>> rl_settings.shapeChecking
1
You can now easily check out each of the settings in an interactive manner.
Reader Feedback
I welcome your feedback. If you’d like to let me know what you thought of this
book, you can send comments to the following email address:
[email protected]
Errata
I try my best not to publish errors in my writings, but it happens from time to
time. If you happen to see an error in this book, feel free to let me know by
emailing me at the following:
Code Examples
Code from the book can be downloaded from Github at the following address:
https://github.com/driscollis/reportlabbookcode
http://bit.ly/2nc7sbP
In this section of the book, you will learn how to create PDFs using ReportLab’s
low level canvas API. You will then move on to learn how to create PDFs using
ReportLab’s Flowable via their PLATYPUS sub-module. Then we will wrap up
Part I by learning about ReportLab’s graphics capabilities and the special
features you can apply to a PDF such as page transitions and encryption.
pretty much any layout that you can think of. I have used it to
replicate many complex page layouts over the years. In this chapter we
Draw text
Learn about fonts and text colors
Creating a text object
Draw lines
Draw various shapes
The pdfgen package is very low level. You will be drawing or “painting”
on a canvas to create your PDF. The canvas gets imported from the pdfgen
package. When you go to paint on your canvas, you will need to specify
X/Y coordinates that tell ReportLab where to start painting. The default
is (0,0) whose origin is at the lowest left corner of the page. Many
desktop user interface kits, such as wxPython, Tkinter, etc, also have
this concept. You can place buttons absolutely in many of these kits
using X/Y coordinates as well. This allows for very precise placement of
the elements that you are adding to the page.
The other item that I need to make mention of is that when you are
points you are from the origin. It’s points, not pixels or
millimeters or inches. Points! Let’s take a look at how many points are
>>> from reportlab.lib.pagesizes import letter
>>> letter
(612.0, 792.0)
Here we learn that a letter is 612 points wide and 792 points high.
Let’s find out how many points are in an inch and a millimeter,
respectively:
>>> from reportlab.lib.units import inch
>>> inch
72.0
>>> from reportlab.lib.units import mm
>>> mm
2.834645669291339
# hello_reportlab.py
c = canvas.Canvas("hello.pdf")
c.drawString(100, 100, "Welcome to Reportlab!")
c.showPage()
c.save()
Canvas object. You will note that the only requirement argument is a
tell it to start drawing the string 100 points to the right of the
origin and 100 points up. After that we call showPage() method. The
showPage() method will save the current page of the canvas. It’s
also ends the current page. If you draw another string or some other
page. Finally we call the canvas object’s save() method, which save
the document to disk. Now we can open it up and see what our PDF looks
like:
Fig. 1-1: Welcome to ReportLab
What you might notice is that our text is near the bottom of the
document. The reason for this is that the origin, (0,0), is the bottom
text, we were telling it to start painting 100 points from the left-hand
side and 100 points from the bottom. This is in contrast to creating a
whatever is in the ReportLab config, which is usually A4. There are some
arguments:
def __init__(self,filename,
pagesize=None,
bottomup = 1,
pageCompression=None,
invariant = None,
verbosity=0,
encrypt=None,
cropMarks=None,
pdfVersion=None,
enforceColorSpace=None,
):
Here we can see that we can pass in the pagesize as an argument. The
want to change the origin from the default of bottom left, then you can
want to keep the default of zero. However if speed isn’t a concern and
you’d like to use less disk space, then you can turn on page
primary use case for turning on page compression is when you have a huge
info (for regression testing). I have never seen anyone use this
argument in their code and since the source says it is for regression
zero (0), ReportLab will allow other applications to capture the PDF
will be printed out every time a PDF is created. There may be additional
levels added, but at the time of writing, these were the only two
documented.
encrypt, that string will be the password for the PDF. If you want
encrypt argument.
Crop marks are used by printing houses to know where to crop a page.
When you set cropMarks to True in ReportLab, the page will become 3 mm
larger than what you set the page size to and add some crop marks to the
corners. The object that you can pass to cropMarks contains the
The pdfVersion argument is used for ensuring that the PDF version is
appropriate color settings within the PDF. You can set it to one of the
following:
cmyk
rgb
sep
sep_black
sep_cmyk
be used to do the color enforcement. You can also pass in a callable for
color enforcement.
Let’s go back to our original example and update it just a bit. Now as I
images, etc) using points. But thinking in points is kind of hard when
(http://stackoverflow.com/questions/4726011/wrap-text-in-a-table-reportlab):
def coord(x, y, height, unit=1):
return x, y
the page. You can also pass in a unit size. This will allow you to do
the following:
# canvas_coords.py
c = canvas.Canvas("hello.pdf", pagesize=letter)
width, height = letter
from the left and 20 mm from the top of the page. Yes, you read that
right. When we use the coord function, it uses the height to swap
the origin’s y from the bottom to the top. If you had set your Canvas’s
following:
def coord(x, y, unit=1):
x, y = x * unit, y * unit
return x, y
# canvas_coords2.py
c = canvas.Canvas("hello.pdf", bottomup=0)
That seems pretty straight-forward. You should take a minute or two and
play around with both examples. Try changing the x and y coordinates
that you pass in. Then try changing the text too and see what happens!
Canvas Methods
The canvas object has many methods. Let’s learn how we can use some
of them to make our PDF documents more interesting. One of the easiest
methods to use setFont, which will let you use a PostScript font
name to specify what font you want to use. Here is a simple example:
# font_demo.py
if __name__ == '__main__':
my_canvas = canvas.Canvas("fonts.pdf",
pagesize=letter)
fonts = my_canvas.getAvailableFonts()
font_demo(my_canvas, fonts)
my_canvas.save()
we can use on the system that the code is ran on. Then we will pass the
canvas object and the list of font names to our font_demo function.
Here we loop over the font names, set the font and call the
drawString method to draw each font’s name to the page. You will
also note that we have set a variable for the starting Y position that
If you want to change the font color using a canvas method, then you
long as you call that before you draw the string, the color of the text
Another fun thing you can is use the canvas’s rotate method to draw
def rotate_demo():
my_canvas = canvas.Canvas("rotated.pdf",
pagesize=letter)
my_canvas.translate(inch, inch)
my_canvas.setFont('Helvetica', 14)
my_canvas.drawString(inch, inch, 'Normal')
my_canvas.line(inch, inch, inch+100, inch)
my_canvas.rotate(45)
my_canvas.drawString(inch, -inch, '45 degrees')
my_canvas.line(inch, inch, inch+100, inch)
my_canvas.rotate(45)
my_canvas.drawString(inch, -inch, '90 degrees')
my_canvas.line(inch, inch, inch+100, inch)
my_canvas.save()
if __name__ == '__main__':
rotate_demo()
Here we use the translate method to set our origin from the bottom
left to an inch from the bottom left and an inch up. Then we set out
font face and font size. Next write out some text normally and then we
According to the ReportLab user guide, you will want to specify the y
rotated state. If you don’t do that, your string will be drawn outside
the page’s boundary and you won’t see it. Finally we rotate the
the coordinate system. You can see that the origin of the last line
moved all the way to the very left-hand edge of the page.
String Alignment
The canvas supports more string methods than just the plain
draw your string right-aligned to the x-coordinate. You can also use
drawAlignedString, which will draw a string aligned to the first
# string_alignment.py
def string_alignment(my_canvas):
width, height = letter
my_canvas.showPage()
if __name__ == '__main__':
my_canvas = canvas.Canvas("string_alignment.pdf")
string_alignment(my_canvas)
my_canvas.save()
When you run this code, you will quickly see how each f these strings
the most interesting, but the others are certainly handy in their own
The next canvas methods we will learn about are how to draw lines,
it, you can actually create very complex drawings in your documents,
The method to draw a straight line is simply line. Let’s take a look
at a simple example:
# drawing_lines.py
def draw_lines(my_canvas):
my_canvas.setLineWidth(.3)
start_y = 710
my_canvas.line(30, start_y, 580, start_y)
for x in range(10):
start_y -= 10
my_canvas.line(30, start_y, 580, start_y)
if __name__ == '__main__':
my_canvas = canvas.Canvas("lines.pdf", pagesize=letter)
draw_lines(my_canvas)
my_canvas.save()
object as its sole parameter. Then we set the line’s width via the
notice that the line method accepts four arguments: x1, y1, x2, y2.
These are the beginning x and y coordinates as well as the ending x and
run this code, your output will look something like this:
The canvas supports several other drawing operations. For example, you
can also draw rectangles, wedges and circles. Here’s a simple demo:
# drawing_polygons.py
if __name__ == '__main__':
draw_shapes()
When you run this code, you should end up with a document that draws
Let’s take a few moments to go over the various arguments that each of
these polygon methods accept. The rect’s code signature looks like
this:
def rect(self, x, y, width, height, stroke=1, fill=0):
That means that you set the lower left-hand corner of the rectangle’s
position via its x/y parameters. Then you set its width and height. The
ReportLab to fill the interior of the polygon that I drew with a color.
def ellipse(self, x1, y1, x2, y2, stroke=1, fill=0):
the x1, y1, x2, y2 parameters are the corner points of the enclosing
rectangle. The stroke and fill parameters operate the same way as the
rect’s. Just for fun, we went ahead and set the ellipse’s fill to 1.
def wedge(self, x1,y1, x2,y2, startAng, extent, stroke=1, fill=0):
The x1,y1, x2,y2 parameters for the wedge actually correspond to the
360 degree circle version of the wedge. So you will need to imagine that
the full circle with a rectangle around it to help you position a wedge
extent parameter, which basically tells the wedge how far out to arc
Finally we reach the circle polygon. It’s method looks like this:
def circle(self, x_cen, y_cen, r, stroke=1, fill=0):
the polygons we have looked at. The x_cen and y_cen arguments
are the x/y coordinates of the center of the circle. The r argument
is the radius. The stroke and fill arguments are pretty obvious.
All the poloygons have the ability to set the stroke (or line) color via
its parameters. You can also set the stroke color by using the
demo code. The reason that wasn’t covered above is that we are going to
can add a color to a drawing using one of two methods: RGB or CMYK. In
By name or
By gray level
you’re really only specifying what level of gray you want, not any other
color. Let’s start with gray levels though and work our way up through
# gray_color_demo.py
def gray_color_demo():
my_canvas = canvas.Canvas("grays.pdf",
pagesize=letter)
my_canvas.setFont('Helvetica', 10)
x = 30
my_canvas.save()
if __name__ == '__main__':
gray_color_demo()
set the fill color using setFillGray. After that we draw a circle
each circle so we have the circles labeled with their gray value. When
you run this code, you should see something like this:
Fig. 1-7: ReportLab’s shades of gray
Let’s move on to learn how to add some color. The first method we will
# colors_demo.py
def color_demo():
my_canvas = canvas.Canvas("colors.pdf",
pagesize=letter)
my_canvas.setFont('Helvetica', 10)
x = 30
sample_colors = [colors.aliceblue,
colors.aquamarine,
colors.lavender,
colors.beige,
colors.chocolate]
my_canvas.save()
if __name__ == '__main__':
color_demo()
a list of sample colors to iterate over like we did with the gray demo
earlier. Then we do the loop and call setFillColor with the color’s
name. Now if you actually were to print the color to standard out, so
>>> print(colors.aliceblue)
Color(.941176,.972549,1,1)
So these aren’t exactly just names. In fact, they are ReportLab Color
objects with RGB values and an intensity level between 0 (dark) and 1
grab the color’s name via the _lookupName() method. The rest of the
Fig. 1-8: Example colors
If you want to use straight-up RGB or CMYK values, then you can edit the
respectively. The primary reason to use CMYK is for when you want more
control over how the ink in your printer is applied to the printer. Of
course you will need a printer that support CMYK for this to be really
useful.
Adding a Photo
ReportLab supports adding images to your PDFs via the Python Imaging
recommended that you download the Pillow project, a fork of PIL that
works with both Python 2 and Python 3, something that the original PIL
didn’t do. To install Pillow, you just need to issue the pip command in
your terminal, although you will only need to run this if it didn’t
automatically install when you installed ReportLab itself:
pip install pillow
Now that we have Pillow installed, let’s talk about how to insert a
photo into your PDF. The ReportLab canvas object supports two methods:
the newer drawImage method as it will cache the image and allow you
to draw it many times while only being stored once in the PDF. If you
use drawInlineImage, it will embed the image into the page stream
itself, which makes it much less efficient as the image will be added
multiple times to the document if you draw it more than once. While the
documentation doesn’t mention this, I would assume that this can also
example:
# image_on_canvas.py
def add_image(image_path):
my_canvas = canvas.Canvas("canvas_image.pdf",
pagesize=letter)
my_canvas.drawImage(image_path, 30, 600,
width=100, height=100)
my_canvas.save()
if __name__ == '__main__':
image_path = 'snakehead.jpg'
add_image(image_path)
As you can see, the drawImage method accepts the image’s file path
and its x / y position. These arguments are required. You can also
specify the width and height of the image. Note that this will not
automatically scale the image or keep its aspect ratio, so you may end
up stretching the image if you don’t know what you are doing. Finally
you can also supply a mask parameter which will allow you to create
proper usage.
Fig. 1-9: Adding an image
The textobject
For additional control of your text’s presentation, you can also use a
textobject. Frankly I have never had the need for one of these as
ReportLab’s Paragraph class gives you more than enough control over
the presentation of your text. But to be thorough, I will show you how
that it will make the PDF generation faster if you use it instead of
# textobject_demo.py
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
def textobject_demo():
my_canvas = canvas.Canvas("txt_obj.pdf",
pagesize=letter)
# Create textobject
textobject = my_canvas.beginText()
my_canvas.save()
if __name__ == '__main__':
textobject_demo()
setTextOrigin. Then we set the font face and size as we saw before.
The next new item is the call to textLine, which will allow you to
The docstring for this method states that it makes the “text cursor
out as well. If you want to control the location of the cursor, then you
In this example, we set the the next string of text to a red color. The
last step is to call drawText, which will actually draw whatever you
text won’t be written out and you may end up with an empty PDF document.
Fig. 1-10: Using a textobject
There are a lot of other methods you can call from your textobject. For
example, if you want to move your cursor’s position somewhere other than
the very next line, you can call moveCursor. Let’s take a look:
# cursor_moving.py
def textobject_cursor():
canvas_obj = canvas.Canvas("textobj_cursor.pdf", pagesize=letter)
# Create textobject
textobject = canvas_obj.beginText()
canvas_obj.drawText(textobject)
canvas_obj.save()
if __name__ == '__main__':
textobject_cursor()
Here we just set up a loop that will print out the same string four
times, but at four different positions. You will note that we move the
cursor 15 points to the right and 15 points down the page with each
Now, let’s say you would like to change the inter-character spacing; all
# char_spacing_demo.py
def textobject_char_spacing():
canvas_obj = canvas.Canvas("textobj_char_spacing.pdf",
pagesize=letter)
# Create textobject
textobject = canvas_obj.beginText()
spacing = 0
for indent in range(8):
textobject.setCharSpace(spacing)
line = '{} - ReportLab spacing demo'.format(spacing)
textobject.textLine(line)
spacing += 0.7
canvas_obj.drawText(textobject)
canvas_obj.save()
if __name__ == '__main__':
textobject_char_spacing()
spacing and then add 0.7 in each iteration. You can see the result here:
# wordspacing_demo.py
def wordspacer():
canvas_obj = canvas.Canvas("textobj_word_spacing.pdf",
pagesize=letter)
# Create textobject
textobject = canvas_obj.beginText()
word_spacing = 0
for indent in range(8):
textobject.setWordSpace(word_spacing)
line = '{} - ReportLab spacing demo'.format(word_spacing)
textobject.textLine(line)
word_spacing += 1.5
canvas_obj.drawText(textobject)
canvas_obj.save()
if __name__ == '__main__':
wordspacer()
This example is pretty much the same as the previous one, but you will
# canvas_rising.py
def main():
canvas_obj = canvas.Canvas("textobj_rising.pdf",
pagesize=letter)
# Create textobject
textobject = canvas_obj.beginText()
textobject.setFont("Helvetica-Oblique", 12)
# Set text location (x, y)
textobject.setTextOrigin(10, 730)
textobject.textOut('ReportLab ')
apply_scripting(textobject, 'superscript ', 7)
textobject.textOut('and ')
canvas_obj.drawText(textobject)
canvas_obj.save()
if __name__ == '__main__':
main()
The main function will create our canvas and all the other bits and
pieces we need. Then we write out some normal text. The next few lines
(negative). Note that we need to set the rise back to zero between the
superscript and subscript to make the word, “and”, appear in the right
from that point on. So you will want to reset it to zero to make sure
the text stays in a normal location. You will also note that we set the
font size for the super and subscripts to be smaller than the regular
Check out ReportLab’s user guide for more interesting things you can do
ReportLab was how to add a page break so I could have multipage PDF
documents. The canvas object allows you to do this via the showPage method.
Note however that for complex documents, you will almost
Flowables are kind of mind bending in their own right, but they are also
a lot nicer to use than trying to keep track of which page you are on
Canvas Orientation (Portrait vs. Landscape)
ReportLab defaults its page orientation to Portrait, which is what all
word processors do as well. But sometimes you will want to use a page in
from reportlab.lib.pagesizes import landscape, letter
from reportlab.pdfgen import canvas
c = canvas.Canvas('test.pdf', pagesize=letter)
c.setPageSize( landscape(letter) )
The other way to set landscape is just set the page size explicitly:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
c = canvas.Canvas('test.pdf', pagesize=letter)
c.setPageSize( (11*inch, 8.5*inch) )
You could make this more generic by doing something like this though:
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
width, height = letter
c = canvas.Canvas('test.pdf', pagesize=letter)
c.setPageSize( (height, width) )
This might make more sense, especially if you wanted to use other
Other methods
There are a bunch of additional methods that I’m not even going to cover
in this chapter. For example, there are methods to set some metadata for
if you want to create bookmarks in your PDF though. There are also
user guide for a complete list or check out the canvas’s source.
if applied. So let’s take some of the methods we’ve learned about here
# sample_form_letter.py
my_canvas.save()
if __name__ == '__main__':
create_form('form.pdf', '01/23/2018',
'$1,999', 'Mike')
accepts the filename, the date we want for our form, the amount owed and
the person who receives the amount owed. Then we paint everything in the
desired locations and save the file. When you run this, you will see the
following:
Wrapping Up
We covered a lot of information in this chapter. You should now know how
to create a pretty basic PDF. I highly recommend trying out the examples
in this chapter and then going back and editing them a bit to see what
all you can accomplish on your own. Once you are done playing around the
years ago, ReportLab added support for Asian languages. They also
support TrueType fonts and Type-1 fonts. It’s also worth talking about
Python 3’s “strings” are Unicode by default, so you won’t even have to
think all that much about this topic if you just use the latest Python.
However if you are using an older version of Python AND your string is
The fix is to just encode your text as UTF8 or use a Unicode object.
Just keep that in mind if you run into these sorts of issues.
they will be there. You can get a list of the fonts available by calling
Courier
Courier-Bold
Courier-BoldOblique
Courier-Oblique
Helvetica
Helvetica-Bold
Helvetica-BoldOblique
Helvetica-Oblique
Symbol
Times-Bold
Times-BoldItalic
Times-Italic
Times-Roman
ZapfDingbats
quick demo:
# basic_font_demo.py
if __name__ == '__main__':
my_canvas = canvas.Canvas("basic_font_demo.pdf",
pagesize=letter)
fonts = my_canvas.getAvailableFonts()
font_demo(my_canvas, fonts)
my_canvas.save()
You will note that all we need to do to get a list of fonts is to call
the getAvailableFonts method. When you run this code, you will get
Fig. 2-1: The standard fonts
font description files. One needs to be in the Adobe AFM (Adobe Font
Metrics) format and the other needs to be in PFB (Printer Font Binary)
format. The Adobe AFM file is actually ASCII and tells ReportLab about
the glyph’s of the font. A font’s glyph describes the height, width,
bounding box information and other font metrics. The PFB describes the
shapes of the font and is in binary format, so you won’t be able to read
it without a hex editor or similar. I have had to use these files for
folder. Let’s write a little demo that shows how to embed this font in
our PDF:
# type1_font_demo.py
import os
import reportlab
def embedded_font_demo():
my_canvas = canvas.Canvas("type1_font_demo.pdf",
pagesize=letter)
reportlab_folder = os.path.dirname(reportlab.__file__)
fonts_folder = os.path.join(reportlab_folder, 'fonts')
print('ReportLab font folder is located at {}'.format(
fonts_folder))
face_name = 'DarkGardenMK'
font = pdfmetrics.Font('DarkGardenMK',
face_name,
'WinAnsiEncoding')
pdfmetrics.registerFont(font)
This is a fairly complex process. First we have our imports. Note that
where this folder is located in case you would like to browse through
it. Next we get the paths to the AFM and PFB files. Now we’re finally
it the AFM and PFB file paths. Next we register the font’s face via the
that if I don’t call that function, the code works just fine, so I am
actually not sure why this is needed other than possibly a sanity check.
the name of the font, the face name and the encoding. Then you can just
the font in our PDF. This is the result I got when I ran this code:
Fig. 2-2: Embedding a Type-1 Font
I don’t know when you would want to use this font other than possibly
You can also edit the T1 font search path in rl_settings.py, which
>>> from reportlab import rl_settings
>>> rl_settings.T1SearchPath
('c:/Program Files/Adobe/Acrobat 9.0/Resource/Font',
'c:/Program Files/Adobe/Acrobat 8.0/Resource/Font',
'c:/Program Files/Adobe/Acrobat 7.0/Resource/Font',
'c:/Program Files/Adobe/Acrobat 6.0/Resource/Font',
'c:/Program Files/Adobe/Acrobat 5.0/Resource/Font',
'c:/Program Files/Adobe/Acrobat 4.0/Resource/Font',
'%(disk)s/Applications/Python %(sys_version)s/reportlab/fonts',
'/usr/lib/Acrobat9/Resource/Font',
'/usr/lib/Acrobat8/Resource/Font',
'/usr/lib/Acrobat7/Resource/Font',
'/usr/lib/Acrobat6/Resource/Font',
'/usr/lib/Acrobat5/Resource/Font',
'/usr/lib/Acrobat4/Resource/Font',
'/usr/local/Acrobat9/Resource/Font',
'/usr/local/Acrobat8/Resource/Font',
'/usr/local/Acrobat7/Resource/Font',
'/usr/local/Acrobat6/Resource/Font',
'/usr/local/Acrobat5/Resource/Font',
'/usr/local/Acrobat4/Resource/Font',
'/usr/share/fonts/default/Type1',
'%(REPORTLAB_DIR)s/fonts',
'%(REPORTLAB_DIR)s/../fonts',
'%(REPORTLAB_DIR)s/../../fonts',
'%(CWD)s/fonts',
'~/fonts',
'~/.fonts',
'%(XDG_DATA_HOME)s/fonts',
'~/.local/share/fonts',
'/usr/share/fonts/type1/gsfonts')
TrueType Fonts
Adding TrueType fonts in your PDF using ReportLab is a bit less
complicated than embedding Type-1 Fonts. Let’s take a look and see how
# truetype_font_demo.py
import os
import reportlab
def embedded_font_demo():
my_canvas = canvas.Canvas("truetype_font_demo.pdf",
pagesize=letter)
reportlab_folder = os.path.dirname(reportlab.__file__)
fonts_folder = os.path.join(reportlab_folder, 'fonts')
print('ReportLab font folder is located at {}'.format(
fonts_folder))
# Usage: TTFont(name,filename)
vera_font = TTFont('Vera', vera_font_path)
pdfmetrics.registerFont(vera_font)
passing it the font’s name and the font’s file path. Then we call
I would also like to point out that there is a registerFontFamily method that you
should be aware of. This method will allow you to map
the bold, italic and bolditalic versions of the font to the same name.
Of course if you have all the different versions of the font, than you
can use this function to register those name too. Here’s the signature
registerFontFamily('Vera', normal='Vera',bold='VeraBd',italic='VeraIt',boldItali\
c='VeraBI')
As you can see, you just pass the name of the various flavors of the
search path. For TrueType fonts, you will want to set the
are as follows:
>>> from reportlab import rl_settings
>>> rl_settings.TTFSearchPath
('c:/winnt/fonts',
'c:/windows/fonts',
'/usr/lib/X11/fonts/TrueType/',
'/usr/share/fonts/truetype',
'/usr/share/fonts',
'/usr/share/fonts/dejavu',
'%(REPORTLAB_DIR)s/fonts',
'%(REPORTLAB_DIR)s/../fonts',
'%(REPORTLAB_DIR)s/../../fonts',
'%(CWD)s/fonts',
'~/fonts',
'~/.fonts',
'%(XDG_DATA_HOME)s/fonts',
'~/.local/share/fonts',
'~/Library/Fonts',
'/Library/Fonts',
'/Network/Library/Fonts',
'/System/Library/Fonts',
'/usr/share/fonts/truetype',
'/usr/share/fonts/truetype/kacst-one',
'/usr/share/fonts/truetype/freefont',
'/usr/share/fonts/truetype/nanum',
'/usr/share/fonts/truetype/ttf-khmeros-core',
'/usr/share/fonts/truetype/lohit-punjabi',
'/usr/share/fonts/truetype/takao-gothic',
'/usr/share/fonts/truetype/sinhala',
'/usr/share/fonts/truetype/ancient-scripts',
'/usr/share/fonts/truetype/tlwg',
'/usr/share/fonts/truetype/lyx',
'/usr/share/fonts/truetype/lao',
'/usr/share/fonts/truetype/fonts-guru-extra',
'/usr/share/fonts/truetype/abyssinica',
'/usr/share/fonts/truetype/dejavu',
'/usr/share/fonts/truetype/tibetan-machine',
'/usr/share/fonts/truetype/ubuntu-font-family',
'/usr/share/fonts/truetype/ttf-bitstream-vera',
'/usr/share/fonts/truetype/kacst',
'/usr/share/fonts/truetype/openoffice',
'/usr/share/fonts/truetype/liberation',
'/usr/share/fonts/truetype/noto',
'/usr/share/fonts/truetype/padauk')
If your font is in one of those paths, then you can simplify your code a
bit because you won’t need to include the entire path any longer:
vera_font = TTFont('Vera', 'Vera.ttf')
pdfmetrics.registerFont(vera_font)
Asian Fonts
A few years ago, ReportLab added support for Asian fonts. ReportLab
If you use one of these fonts in your PDF, then you will likely
Fig. 2-3: Font package warning
Let’s write some code to write the characters used for the word
“Nippon”, which means Japan. The characters we will use are Japanese
kanji that were converted to Unicode. For this example, I just looked up
# asian_font_demo.py
def asian_font_demo():
my_canvas = canvas.Canvas("asian_font_demo.pdf",
pagesize=letter)
if __name__ == '__main__':
asian_font_demo()
When I ran this and tried to open the resulting PDF with Adobe Reader, I
MB download. If you don’t install the font package, then the PDF will
appear to be blank when opened. After installing the fonts, I got the
following result:
Since Python 3 supports Unicode out of the box, you don’t have to use
Unicode characters directly like we did in the example above. You can
just use the actual Kanji! Just for fun, I went ahead and updated the
example to use some actual Kanji:
# asian_font_demo2.py
def asian_font_demo():
my_canvas = canvas.Canvas("asian_font_demo2.pdf",
pagesize=letter)
if __name__ == '__main__':
asian_font_demo()
You can also use TrueType fonts that have Asian characters. This is even
documentation that these kinds of fonts can take time to parse and that
large subsets of the font will need to be embedded in your PDF for the
.ttf.
There are also the Noto and Source Han (Sans/Serif) fonts, which are
both open source. Noto is from Google and aims to support all languages
Adobe.
make sure my readers were quite clear on how to change fonts while
demo:
# font_switching.py
import string
import sys
def standardFonts():
"""
Create a PDF with all the standard fonts
"""
for enc in ['MacRoman', 'WinAnsi']:
canv = canvas.Canvas(
'StandardFonts_%s.pdf' % enc,
)
canv.setPageCompression(0)
x = 0
y = 744
for faceName in pdfmetrics.standardFonts:
if faceName in ['Symbol', 'ZapfDingbats']:
encLabel = faceName+'Encoding'
else:
encLabel = enc + 'Encoding'
canv.setFont('Times-Bold', 18)
canv.drawString(80, y, fontName)
y -= 20
alpha = "abcdefghijklmnopqrstuvwxyz"
canv.setFont(fontName, 14)
canv.drawString(x+85, y, alpha)
y -= 20
canv.save()
if __name__ == "__main__":
standardFonts()
can think of them as standard or default fonts. The script above will
pair of for loops to extract the various fonts and register them with
Reportlab. Then we called setFont with the selected font. From that
point forward, ReportLab will use that font when it draws text. When you
call setFont again, that will change your currently selected font to
the one you specified and then that will be the font used. In other
words, ReportLab always uses the last set font or the default until you
By the way, the difference between MacRoman and WinAnsi is that one was
character set. While they are identical for some character (32 - 126 of
Wrapping Up
In this chapter we covered how to embed Type-1 and TrueType fonts in our
You should take some time and play around with these examples to make
sure you fully understand how to embed fonts in your PDF. This is quite
handy when you need to give your documents a unique look or you need to
document accordingly.
We will be learning about flowables for the next few chapters. But for now, let’s
start learning the basics so you can get going with PLATYPUS!
# hello_platypus.py
def hello():
doc = SimpleDocTemplate("hello_platypus.pdf",
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
hello()
Let’s start from the top. We have some new imports. The first two come from the
reportlab.platypus submodule. Here we get SimpleDocTemplate and
Paragraph. Then we import a style sheet from the styles submodule. Next we
create a function called hello and create a template by instantiating an instance
of SimpleDocTemplate. The only required argument here is the filename.
But I went ahead and set the page size to letter and then I set the
PLATYPUS. Next up, we get ReportLab’s sample style sheet and we create an
empty list called flowables. This list will contain all the flowable objects we
want to draw in our PDF. Finally we create a
Paragraph, which is a flowable that can format a block of text using the
specified style sheet. We append this flowable object to our
flowables list and then we call our template’s build method. The build method
will then construct our PDF.
# form_letter.py
import time
from reportlab.lib.enums import TA_JUSTIFY
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
def form_letter():
doc = SimpleDocTemplate("form_letter.pdf",
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
flowables = []
logo = "python_logo.png"
magName = "Pythonista"
issueNum = 12
subPrice = "99.00"
limitedDate = "03/05/2010"
freeGift = "tin foil hat"
formatted_time = time.ctime()
full_name = "Mike Driscoll"
address_parts = ["411 State St.", "Waterloo, IA 50158"]
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(name='Justify', alignment=TA_JUSTIFY))
ptext = '<font size=12>%s</font>' % formatted_time
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 12))
flowables.append(Spacer(1, 12))
ptext = '<font size=12>Dear %s:</font>' % full_name.split()[0].strip()
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 12))
ptext = '''
<font size=12>We would like to welcome you to our subscriber
base for {magName} Magazine! You will receive {issueNum} issues at
the excellent introductory price of ${subPrice}. Please respond by
{limitedDate} to start receiving your subscription and get the
following free gift: {freeGift}.</font>
'''.format(magName=magName,
issueNum=issueNum,
subPrice=subPrice,
limitedDate=limitedDate,
freeGift=freeGift)
flowables.append(Paragraph(ptext, styles["Justify"]))
flowables.append(Spacer(1, 12))
flowables.append(Paragraph(ptext, styles["Justify"]))
flowables.append(Spacer(1, 12))
ptext = '<font size=12>Sincerely,</font>'
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 48))
ptext = '<font size=12>Ima Sucker</font>'
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 12))
doc.build(flowables)
if __name__ == '__main__':
form_letter()
Let’s break this down a bit as it’s a fairly lengthy example. You will note that
this time we import a few new items from platypus: Spacer and Image. The
Spacer class gives us a convenient way to add space between paragraphs or other
flowables, while the Image class gives us a nice way to insert images into our
document. We also import ParagraphStyle, which allows us to create a new
style and add it to our styles list.
def form_letter(): doc = SimpleDocTemplate("form_letter.pdf",
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
flowables = []
logo = "python_logo.png"
magName = "Pythonista"
issueNum = 12
subPrice = "99.00"
limitedDate = "03/05/2010"
formatted_time = time.ctime()
This chunk of code sets up our template object and our flowables list as before.
We also set up a few variables that we will use later on in the code.
im = Image(logo, 2*inch, 2*inch) flowables.append(im)
styles = getSampleStyleSheet()
styles["Normal"].fontSize = 12
styles["Normal"].leading = 14
styles.add(ParagraphStyle(name='Justify', alignment=TA_JUSTIFY))
flowables.append(Paragraph(formatted_time, styles["Normal"]))
flowables.append(Spacer(1, 12))
flowables.append(Paragraph(full_name, styles["Normal"]))
for part in address_parts:
flowables.append(Paragraph(part.strip(), styles["Normal"]))
flowables.append(Spacer(1, 12))
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 12))
Here we create our Image flowable. The first argument that we pass to it is the
path to the image that we wish to insert. Then we tell the Image the what width
and height to use. If you don’t specify the width and height, then ReportLab will
get the size from the image itself and use that. Next we get our style sheet and
create a new style via
ParagraphStyle. Note that it’s name is Justify. After that we create some
Paragraphs and Spacer objects and append them to our
flowables list.
Note: If you dive into ReportLab’s source code, you will find that the Normal
style definition looks like this:
stylesheet.add(ParagraphStyle(name='Normal', fontName=_baseFontName,
fontSize=10,
leading=12)
The settings here mirror what is already the defaults in ParagraphStyle, so the
Normal style is basically the default you get when you create an instance of
ParagraphStyle. Thus when we create our Justify style, it is the same as Normal
except for its alignment.
Now let’s move on and look at the next part of the code:
ptext = '''
'''.format(magName=magName,
issueNum=issueNum,
subPrice=subPrice,
limitedDate=limitedDate,
freeGift=freeGift)
flowables.append(Paragraph(ptext, styles["Justify"]))
flowables.append(Spacer(1, 12))
This section of code is pretty similar to the previous one in that once again we
create a series of Paragraphs and Spacers and add them to our flowables list. The
part I want to highlight is that the last Paragraph that we create, we apply our
Justify style to. Everything else used the Normal style.
ptext = '''Thank you very much and we look forward to serving you.'''
flowables.append(Paragraph(ptext, styles["Justify"]))
flowables.append(Spacer(1, 12))
ptext = 'Sincerely,'
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 48))
flowables.append(Paragraph(ptext, styles["Normal"]))
flowables.append(Spacer(1, 12))
doc.build(flowables)
if __name__ == ‘__main__’:
form_letter()
This last chunk just finishes out the code by adding a few more flowables
and then building the document. When you run the code, you
Let’s move on and learn about a simple way to change static content across
pages.
One of the first examples in ReportLab’s user guide on PLATYPUS talks about
using some special parameters the template’s build method that I think is often
easily overlooked. When you call build, you can pass it 4 parameters: The
flowables list, onFirstPage, onLaterPages and canvasmaker. The only
required parameter is the flowables list. I won’t be covering canvasmaker here,
but we will look into that later on in the book.
The two that we want to focus on in this section are the onFirstPage and
onLaterPages parameters. These parameters are supposed to be functions that
the builder will call when creating your document. The onFirstPage function
will annotate the first page of your PDF while the onLaterPages function will
annotate all the pages after page one. You can use these functions to add page
numbers, headers, footers, logos, etc.
For this example, we will create a script that adds a title and a string of text that
will only appear on the first page. On all the subsequent pages, we will add a
page number to the bottom of each page. Let’s take a look:
# alternating_text.py
canvas.saveState()
canvas.setFont('Times-Bold', 18)
canvas.drawCentredString(PAGE_WIDTH/2.0, PAGE_HEIGHT-108, title)
canvas.setFont('Times-Roman', 10)
text = 'Welcome to the first PLATYPUS Page!'
canvas.drawString(inch, 10*inch, text)
canvas.restoreState()
def create_document():
doc = SimpleDocTemplate("platypus_first_later.pdf",
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
styles = getSampleStyleSheet()
flowables = []
spacer = Spacer(1, 0.25*inch)
if __name__ == '__main__':
create_document()
In this example, we create three functions. The first function will only get
executed for the first page of the PDF. All it does is add a title to the page and
centers it by calling drawCentredString. Note that this method call does require
us to pass in and x and y coordinate. The x coordinate is used for positioning the
string horizontally on the page while the y coordinate determines the text’s
position vertically on the page. We also draw some other text on the page in the
first function to help differentiate it from function number two, later_pages.
The later_pages function gets called on all the pages after page one. So starting
on page two, it will draw a page number on the page at the specified location.
You will also note that in both of these
functions, we call saveState and restoreState. These will save the current
graphics state so that it will restore it back to what it was when the restoreState
is reached. You cannot save the state from one page and restore it to another, for
example.
Note: If you’d like to see the margins for debugging purposes, you can do so by
setting the document template’s showBoundary argument to 1
or True.
Flowable Methods
We’ve been talking about flowables since the beginning of this chapter and it’s
high time we actually define what a flowable is in ReportLab. A Flowable is
something that can be drawn and which has the following
methods:
draw
wrap
split (optionally)
ReportLab actually has a class named Flowable that is a Python abstract class
and that you would normally use as a base class for your own custom flowable.
Note that the Flowable class itself does not implelment draw. Instead the calling
code should call drawOn which will call draw internally. The calling code
should also make sure that the flowable object has a pdfgen.Canvas object on
which it can draw and that said Canvas is in an state that can be drawn on.
Here’s a fairly simply example that uses drawOn and a method related to wrap
called wrapOn:
# absolute_pos_flowable.py
def mixed():
my_canvas = canvas.Canvas("mixed_flowables.pdf", pagesize=letter)
styles = getSampleStyleSheet()
width, height = letter
if __name__ == '__main__':
mixed()
What this does it that it allows us to mix a flowable with the Canvas object itself.
The drawOn method will allow you to render your flowable object on the canvas
and translates the flowab’es position to the canvas coordinate. The wrapOn
method is useful in that it will take the available width and available height and
use that to calculate how to wrap the text in your paragraph based on that
information.
The split function also also takes the available width and available height and
use that information to know when to split the flowable
should the flowable reach a frame boundary. A good flowable will split
themselves and return a list of flowables. A stupid flowable will just return an
empty list because they are unable to split.
If you need to find out what your vertical spacing is, then you can call
getSpaceAfter or getSpaceBefore to find out. These methods will tell you how
much space should follow or precede a flowable. Also all flowables have an
hAlign property: (‘LEFT’, ‘RIGHT’, ‘CENTER’ or ‘CENTRE’). Paragraphs
ignore this property, but tables, images and other objects which are smaller than
the width of the frame will use this
Frame Basics
A Frame is a container that is itself contained within a PageTemplate. A Frame
object has a location and size and will also maintain a concept of remaining
drawable space. Here is the signature of the Frame class:
Frame(x1, y1, width, height, leftPadding=6, bottomPadding=6, rightPadding=6,
topPadding=6, id=None, showBoundary=0)
There are quite a few arguments here. The first two are the x/y coordinates for
the lower left-hand corner of the frame. Then you
specify the width and height of the frame, respectively. Those first
fours arguments are the only required ones. After that, you can add
padding to each of the four sides of the frame. These padding arguments are
positive values that are used to reduce the space available for
drawing. The id argument is the identifier for your frame. It’s actually usually a
string that you can use to help you differentiate
between the frames on your page, should you need to do so. Lastly, you have the
showBoundary parameter, which if non-zero, will show the boundary of the
frame. This can be useful for getting the padding right while you are creating
your document.
Let’s look at a quick demo that shows how you can use a Frame:
# two_column_demo.py
def frame_demo():
my_canvas = Canvas("frame_demo.pdf",
pagesize=letter)
styles = getSampleStyleSheet()
normal = styles['Normal']
heading = styles['Heading1']
flowables = []
flowables.append(Paragraph('Heading #1', heading))
flowables.append(Paragraph('Paragraph #1', normal))
right_flowables = []
right_flowables.append(Paragraph('Heading #2', heading))
right_flowables.append(Paragraph('ipsum lorem', normal))
left_frame.addFromList(flowables, my_canvas)
right_frame.addFromList(right_flowables, my_canvas)
my_canvas.save()
if __name__ == '__main__':
frame_demo()
Here we do our usual imports, but you will note that we also import the Frame
class too. Then we create our little demo function and set up a Canvas object.
Next we create a style object and extract a couple of different styles from said
object. Then we create some flowables that we will place in each of our frame
objects. You see, for this demo I wanted to create a page that had two columns of
text, side-by-side, kind of
like a newspaper. To do that, I need to create two lists of flowables and two
Frames. Each frame is the same size, but you will note that one frame is pushed
to the right by 4 inches. At the end of the code, we
actually add the flowables to the frames by calling the addFromList method,
which takes a list of flowables and the canvas that you want to
flowables had been too large for the frame in which I put it, this
method would raise an exception.
Anyway, then we save the document and the result should look something like
this:
Note that you don’t usually need to use a Frame directly as other Flowables will
do that work for you, unless you want to create a custom PageTemplate or other
custom Flowable.
Now let’s move on and learn a bit more about ReportLab’s templates.
Templates
We have been using the SimpleDocTemplate up to this point for creating our
multi-page PDFs. However the SimpleDocTemplate is actually a subclass of
BaseDocTemplate, which implements the magic behind your document’s
formatting. When you create an instance of this class, it will contain one or more
other PageTemplates that are a description of the layout of each of the pages of
your document. When you call the build method of a template class, it will
process the list of Flowables that we’ve been creating to create your PDF
document.
BaseDocTemplate(self, filename, pagesize=defaultPageSize,
pageTemplates=[],
showBoundary=0,
leftMargin=inch,
rightMargin=inch,
topMargin=inch,
bottomMargin=inch,
allowSplitting=1,
title=None,
author=None,
_pageBreakQuick=1,
encrypt=None)
The filename parameter is the only one required when creating an instance of
the template. You can pass a string in for this parameter that can be the PDF’s
file name or an absolute path to said file. It can also be an object that implements
a write method, such as StringIO, BytesIO, file or socket type.
The other methods are pretty obvious, but I will point out that there is a
showBoundary parameter that you can use to turn on the boundaries of the
frames in your document, should you have any.
The allowSplitting parameter tells ReportLab that it should try to split each
Flowable across the Frames. There is also an encrypt parameter that defaults to
None, that is unencrypted. If you pass in a string object, that will be the user’s
password to the document.
There are a series of other methods that you can use with your templates. I won’t
be covering these here as I have never needed any of them. However if you need
to do some serious custom layouts, then you will want to take a look at the
documentation. The templates have event handlers that you can use as well to
help you with complex or unusual layouts.
this case, you could do something as simple as the following to put the entire
document in landscape rather than portrait (the default):
from reportlab.lib.pagesizes import letter, landscape
from reportlab.platypus import SimpleDocTemplate
def landscape_orientation():
doc = SimpleDocTemplate("orientations.pdf",
pagesize=landscape(letter),
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
You will notice that all we had to do here was set the pagesize parameter to
equal landscape(letter). The landscape function just takes a page size and sets
it to landscape. Let’s try to do
something that’s a bit more interesting though. Let’s write some code that will
alternate between portrait and landscape in the same document:
# flowable_orientation.py
def alternate_orientations():
doc = SimpleDocTemplate("orientations.pdf",
pagesize=letter,
rightMargin=72,
leftMargin=72,
topMargin=72,
bottomMargin=18)
styles = getSampleStyleSheet()
normal = styles["Normal"]
story = []
story.append(Paragraph('This is a page in portrait orientation', normal))
doc.build(story)
if __name__ == '__main__':
alternate_orientations()
frame that almost matches the size of the page itself. Then we create two
PageTemplates, one that is in the portrait orientation and the other in landscape.
Then we call our main template’s
name.
Now we just need to create a list of flowables! Here we just add a simple
Paragraph and then call NextPageTemplate() to change to landscape on the
second page. You will note that we didn’t have to tell the first page what its
orientation was as it defaults to portrait. After setting the landscape orientation,
we insert a page break and add a spacer. The reason for the spacer is that now the
page is shorter in height, so when we add the next Paragraph instance, we need a
spacer to position it so that the text is on the page instead of off in the ether.
Finally we
call NextPageTemplate() again and set the orientation back to portrait, add a
page break and add one more paragraph. When you run this code, you should get
a 3-page document with the first and last pages in portrait and the middle page in
landscape.
Wrapping Up
At this point, you should have a good understanding of how PLATYPUS
works to layout your documents. In the next chapter, we will learn much more
about one of the flowables that you will use a lot if you happen to use ReportLab
for much of anything. The flowable that I am referring to is the Paragraph,
which is one of the most important flowables in ReportLab.
Chapter 4 - Paragraphs
ReportLab has a Paragraph Flowable that you can import from
flexible and useful of all the Flowables that ReportLab includes. You
will most likely use it more than any other part of their PLATYPUS
framework. The Paragraph can format text and allows the developer the
ability to change font style and color inline using XML tags. You can
also use the Paragraph to justify your text as well as set it to right,
p = Paragraph(text, style, bulletText=None)
The text argument is just the string that you want to draw on your
PDF. Any excess whitespace will be removed from the ends of the text.
While this can be annoying when you want the whitespace for formatting,
Paragraphs. Note that when you pass in your string, you can add some XML
markup to format parts of the string. We will look at this later on the
chapter though.
bullet for a paragraph. We will talk more about bullets soon, but first
let’s look at paragraph styles as you will be using those much more
frequently.
Paragraph Styles
The style argument to the Paragraph class has to be an instance of
from reportlab.lib.styles import ParagraphStyle
StyleSheet. In fact, you can get a sample style sheet from ReportLab
from reportlab.lib.styles import getSampleStyleSheet
What this allows you to do is to create a style sheet instance that you
normal_style = styles['Normal']
This just grabs the Normal style from ReportLab’s sample style
sheet. If you want to get a listing of all their styles, you can do the
following:
>>> from reportlab.lib.styles import getSampleStyleSheet
>>> styles = getSampleStyleSheet()
>>> styles.byName
{'BodyText': <ParagraphStyle 'BodyText'>,
'Bullet': <ParagraphStyle 'Bullet'>,
'Code': <ParagraphStyle 'Code'>,
'Definition': <ParagraphStyle 'Definition'>,
'Heading1': <ParagraphStyle 'Heading1'>,
'Heading2': <ParagraphStyle 'Heading2'>,
'Heading3': <ParagraphStyle 'Heading3'>,
'Heading4': <ParagraphStyle 'Heading4'>,
'Heading5': <ParagraphStyle 'Heading5'>,
'Heading6': <ParagraphStyle 'Heading6'>,
'Italic': <ParagraphStyle 'Italic'>,
'Normal': <ParagraphStyle 'Normal'>,
'OrderedList': <ListStyle 'OrderedList'>,
'Title': <ParagraphStyle 'Title'>,
'UnorderedList': <ListStyle 'UnorderedList'>}
You can call the list method on your style object to get a full
settings:
Normal None
name = Normal
parent = None
alignment = 0
allowOrphans = 0
allowWidows = 1
backColor = None
borderColor = None
borderPadding = 0
borderRadius = None
borderWidth = 0
bulletAnchor = start
bulletFontName = Helvetica
bulletFontSize = 10
bulletIndent = 0
endDots = None
firstLineIndent = 0
fontName = Helvetica
fontSize = 10
justifyBreaks = 0
justifyLastLine = 0
leading = 12
leftIndent = 0
rightIndent = 0
spaceAfter = 0
spaceBefore = 0
spaceShrinkage = 0.05
splitLongWords = 1
textColor = Color(0,0,0,1)
textTransform = None
underlineProportion = 0.0
wordWrap = None
You can get the any of these settings from any of the styles whenever
>>> from reportlab.lib.styles import getSampleStyleSheet
>>> styles = getSampleStyleSheet()
>>> styles['Normal']['alignment']
>>> styles['Normal'].alignment
0
Changing a style’s properties on the fly is also easy. All you need to
do is this:
>>> styles['Normal'].leading
12
>>> styles['Normal'].leading = 14
>>> styles['Normal'].leading
14
Note that these changes will only be in effect from the point at
which you change the style and also only if you keep a reference to that
change.
Now that we have that, we can easily see what the defaults are on a
fresh paragraph style. Let’s try creating our own paragraph style in
Python’s interpreter:
>>> from reportlab.lib.styles import ParagraphStyle
>>> p = ParagraphStyle('Test')
>>> p.name
'Test'
>>> p.leading
12
>>> p.leading = 14
Here we create a test style with the name of “Test”. Then we check its
default leading property, which turns out to be 12. So for fun, we set
# custom_style.py
class TestStyle(ParagraphStyle):
self.__dict__.update(self.defaults)
self.leading = 14
if __name__ == '__main__':
p = TestStyle('test')
print(p.name)
print(p.leading)
print(p.fontSize)
class, which is called PropertySet. If you take a deep dive into the
source, you will find that it uses Python’s magic to update the class’s
Paragraph Markup Tags
The Paragraph in ReportLab supports XML-like markup tags. You can use
formatting.
supports:
Attribute Synonyms
alignment alignment, align
allowOrphans alloworphans, allowOrphans
allowWidows allowWidows, allowwidows
autoLeading autoleading, autoLeading
backColor bgcolor, backColor, backcolor, bg
borderColor bordercolor, borderColor
borderRadius borderRadius, borderradius
borderWidth borderWidth, borderwidth
borderpadding borderpadding
bulletAnchor bulletanchor, banchor, bulletAnchor
bulletColor bulletColor, bcolor, bulletcolor
bulletFontName bulletFontName, bulletfontname, bfont
bulletFontSize bulletfontsize, bulletFontSize, bfontsize
bulletIndent bindent, bulletindent, bulletIndent
bulletOffsetY boffsety, bulletoffsety, bulletOffsetY
endDots endDots, enddots
firstLineIndent findent, firstLineIndent, firstlineindent
fontName font, fontname, fontName, face
fontSize size, fontsize, fontSize
justifyBreaks justifybreaks, justifyBreaks
justifyLastLine justifyLastLine, justifylastline
leading leading
leftIndent leftIndent, lindent, leftindent
rightIndent rightindent, rightIndent, rindent
spaceAfter spacea, spaceafter, spaceAfter
spaceBefore spaceb, spacebefore, spaceBefore
spaceShrinkage spaceShrinkage, spaceshrinkage
splitLongWords splitlongwords, splitLongWords
textColor color, textColor, textcolor, fg
textTransform texttransform, textTransform
underlineProportion underlineProportion, underlineproportion
wordWrap wordWrap, wordwrap
official tag name. The Synonyms column will also work and you will
also notice that where possible, the ReportLab developers use equivalent
Note that you must use these tags in conjunction with the
# para_tags.py
def paragraph_para_markup():
doc = SimpleDocTemplate("para_tags.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
text = "<para align=center>Hello, I'm a Paragraph</para>"
para = Paragraph(text, style=styles["Normal"])
flowables.append(para)
doc.build(flowables)
if __name__ == '__main__':
paragraph_para_markup()
Here we create a pretty simple demo using some code from earlier in the
book. The main line you should pay attention to is line 15 where we
actually use the <para> tag to center our text. If you run
this code, you should end up with a PDF that has a single line of text
Here’s an example:
# para_tags2.py
def paragraph_para_markup():
doc = SimpleDocTemplate("para_tags2.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
styles.add(ParagraphStyle(name='Centered', alignment=TA_CENTER))
print(styles)
flowables = []
text = "<para align=center>Hello, I'm a Paragraph</para>"
para = Paragraph(text, style=styles["Centered"])
flowables.append(para)
doc.build(flowables)
if __name__ == '__main__':
paragraph_para_markup()
Here we just add a ParagraphStyle object to our sample StyleSheet and set its
alignment to TA_CENTER.
Let’s take a look at how you might use these tags in your code:
# intra_paragraph_markup.py
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
def intra_tags():
doc = SimpleDocTemplate("intra_paragraph_markup.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
text = """
This <b>text</b> is important,
not <strong>strong</strong>.<br/><br/>
doc.build(flowables)
if __name__ == '__main__':
intra_tags()
text and use the <br/> tag to add some line breaks between
lines. If you do not add the <br> tag, then all the text will run
together as the Paragraph class removes excess white space. You will
for adding an anchor. If you run this code, you should get something that
If you need to control spacing between Paragraph flowables, you can edit the
example:
# paragraph_spacing.py
flowables = []
text = """
This <b>text</b> is important,
not <strong>strong</strong>.
"""
para = Paragraph(text, style=styles["Normal"])
flowables.append(para)
text = """
You can link to <a href="https://www.google.com" color="blue">Google</a>
like this.
"""
para = Paragraph(text, style=styles["Normal"])
flowables.append(para)
doc.build(flowables)
if __name__ == '__main__':
paragraph_spacing()
Of course, if you are dealing with just a single block of text, then you will
be forced to use <br> tags for spacing since that text will not be
Now we’re ready to learn about another intra-paragraph tag that controls
fonts.
formatting your text. It supports changing the font’s name, size and
color substring within your paragraph. You can use the following
The name (or face) attribute represents the font family name.
You do not need to specify “bold” or “italic” suffixes when setting the
font’s name. Let’s take a look:
# paragraph_fonts.py
def paragraph_fonts():
doc = SimpleDocTemplate("paragraph_fonts.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
'(helvetica)</font>'
para = Paragraph(ptext, style=styles["Normal"])
flowables.append(para)
"(courier)</font>"
para = Paragraph(ptext, style=styles["Normal"])
flowables.append(para)
"(times-roman)</font>"
para = Paragraph(ptext, style=styles["Normal"])
flowables.append(para)
doc.build(flowables)
if __name__ == '__main__':
paragraph_fonts()
Here we create the same string 3 times in 3 different fonts and 3
different sizes. When you run this code, you will get the following
You will note that in the code example above, we alternated specifying
the font name by using the name and face attributes. These attributes
I am no typography expert, however I want to point out that when you change
font family and font size, you will likely need to change its leading value
of text instead of single lines, then you would see the difference in the
When you want to change the color of your text, you can use the
previous example:
# paragraph_font_colors.py
def paragraph_font_colors():
doc = SimpleDocTemplate("paragraph_font_colors.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
paragraph_font_colors()
Here we modify the strings from the last demo to contain the color or fg
attribute, which allows us to specify the color of the text.
You will note that we use the HTML names red and blue for the
first two strings and a hex string (#777215) for the last string.
Here’s something fun that you can try on your own. See if you can modify the
code
follows:
<super> - superscript
<sub> - subscript
<greek> - Greek letter
The <super> and <sub> tags also have a rise attribute that can be supplied. For
example, you could do this:
text = "Use <super rise=12>Foobar</super> in your example"
While using rise gives you more control over the exact placement of your
Here is a quick demo that shows how to use all three tags:
# paragraph_scripting.py
def paragraph_scripting():
doc = SimpleDocTemplate("paragraph_scripting.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
flowables.append(Spacer(1, 15))
flowables.append(Spacer(1, 15))
ptext = "Reportlab Greek letter e: <greek>e</greek>"
para = Paragraph(ptext, style=styles["Normal"])
flowables.append(para)
doc.build(flowables)
if __name__ == '__main__':
paragraph_scripting()
applied. We also create a string that includes one Greek character. When
you run this code, you should end up with a PDF that contains the
following:
Personally, I think using the markup tags for the Paragraph to be a bit
has support for inline images via the <img> tag. You can apply the
src (required)
width
height
valign
file that you want to insert. You can also set the width or
height (or both) of the image. Note that if you just set the width
Finally you can set the alignment of the image to any of the following
values:
baseline
sub
super
top
text-top
middle
bottom
text-bottom
value. For all of these attributes, you will need to surround the value
in double quotes. Here is a very simple example:
# paragraph_inline_images.py
def paragraph_inline_images():
doc = SimpleDocTemplate("paragraph_inline_images.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
paragraph_inline_images()
In this code, we create a string that sets its src to a file name. Then
explicit valign value. Then we build our flowable and save the document.
If you have some free time, I recommend trying your hand at adding some
valign values to the code above and re-running the code with different
# paragraph_numbering.py
def paragraph_numbering():
doc = SimpleDocTemplate("paragraph_numbering.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
# templates
seq = getSequencer()
seq.setFormat('Section', '0')
seq.setFormat('FigureNo', 'A')
doc.build(flowables)
if __name__ == '__main__':
paragraph_numbering()
Here we create a pretty simple string in the first for loop. In each
create something new: a sequencer object! You can get this object by
starting with the number zero. Then we set a formatter for a string
called “FigureNo” and tell it to start with “A”. Next we use Python’s
formatters into our string. You will note that for the FigureNo string,
we added a “+” symbol. What this means is that it will increment each
time. If you don’t supply the plus symbol, then it won’t increment that
formatter’s value.
If you run this code, you should end up with the following:
Fig. 4-6:
Adding
numbers to
paragraphs
As you can see, the Section didn’t increment at all while the
FigureNo did.
Bullets
There are times when you will want to add bullet lists to your report.
<bullet> tag. Either way, the ReportLab guide says that you
should try using the Unicode bullet first, which you can write in any of
•
&\#x2022;
\xe2\x80\xa2
# paragraph_bullets.py
def paragraph_bullets():
doc = SimpleDocTemplate("paragraph_bullets.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
paragraph_bullets()
In our first example, we set the bullet to a simple dash (-) character.
For the second example, we use the Unicode bullet character byte to
<bullet> tag. You will note that when you use the bullet tag, you
need to tell it what to use for its bullet character as well. Also note
that when you use the <bullet> tag it overrides the use of the
though.
Note that depending on the operating system you run this example on and
the version of Python you use, you may have to experiment a bit to find
a bullet type that will work for you. For example, \xe2\x80\xa2 worked on
Linux but I had to use the Unicode bullet to make it work on
Mac OS.
Wrapping Up
I hope at this point you have a good idea of just how powerful and
useful the Paragraph class is. You will be using it a lot in your
reports and you’ll even find it useful for formatting other ReportLab
to create and style our paragraphs. Then we learned about the many
Report Markup Language or RML that goes much farther than the tags
that we can use in the open source version. However I have found the
creating simple and complex tables out of your data. There is actually a
LongTable class as well. The main difference between the two is that
column widths that makes the creation of long tables faster. Anyway, a
Table object’s cells can hold anything that Python can convert to a
Flowables.
They can contain anything you can turn into a Python string.
They may contain flowables, including other tables
A Table will calculate the row heights to fit your data if you
A Table can split across pages, just like a Paragraph can. See the
different ways even when you don’t know how many rows of data will
actually be displayed.
The Table’s style and its data are separate, so you can create
subset thereof.
Table(data, colWidths=None, rowHeights=None, style=None, splitByRow=1,
spaceAfter=None)
The only required argument is the data that you want to load. However
you can also pass in the column widths (colWidths), row heights
heights of rows. You can also pass None to these arguments, which
means that the table will calculate the widths and heights itself.
with the appropriate commands) that you want applied to the table.
need it for tables that are both too tall AND too wide to fit your
they want the table to be “tiled” down and across or if they’d like it
to go across and then down. This argument is a Boolean, so you can pass
True, you are telling the Table to split itself by row before attempting
to split by column when there isn’t enough space to draw the table.
NotImplementedError.
leading rows that should be repeated when the Table is told to split. If
you choose to pass in a tuple, then that tuple needs to specify which of
you might use this to prevent splitting from occurring too close to the
kind of built-in Spacer. They allow you to specify the amount of extra
# simple_table.py
def simple_table():
doc = SimpleDocTemplate("simple_table.pdf", pagesize=letter)
story = []
tbl = Table(data)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
simple_table()
Here we create a template object and a flowable list as usual. Then we
create a list of lists which we will use as the data that we load into
our Table. Each list within the top-level list represents a row of data.
Thus there should be 3 rows of data in our Table when we run this code.
adding it to our flowable list (the story) and building the document.
When you run this code, you should end up with the following:
That was easy. Now let’s learn a bit about the Table’s methods.
drawOn
setStyle
wrapOn
The drawOn and wrapOn methods are usually used when you are
working with a Canvas object directly and you want to add a Table to
your Canvas. When you do that, then you may need to handle the splitting
of the Table itself if you want to use the Canvas to create a multi-page
document. I think this is usually a bad idea as you are basically just
re-inventing the wheel here. Just use your Table objects in PLATYPUS and
almost always want to set some kind of style on your Table. The
TableStyle(cmds=None, parent=None, **kw)
The cmds argument is just a list of tuples that define what cell
('BACKGROUND', (0, 0), (-1, 0), colors.red)
The first element in the tuple is the cell formatting command. In
The second and third elements define the cell coordinates that the
formatting will apply to. The coordinates are (column, row), so in this
the formatting to extend across all the columns from left-to-right. When
you use negative values for cell coordinates, you will basically count
backwards from the other end of the table, exactly as you would if you
you wanted to apply the cell formatting from the beginning column to the
second from the last column, you could specify -2, for example.
Anyway, when you tell ReportLab to apply the formatting from (0, 0) to
(-1, 0), what you are saying is that you want the formatting to apply to
the entire first row of the table. In this example, we want to add a red
background color to the first row. If you wanted to apply the formatting
to the entire table, then you would say from (0, 0) to (-1, -1), which
is top-left to bottom-right.
The last tuple element is the color to be applied to the cell formatting
# simple_table_with_style.py
def simple_table_with_style():
doc = SimpleDocTemplate("simple_table_with_style.pdf", pagesize=letter)
story = []
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
simple_table_with_style()
Here we create a TableStyle that will make the first row’s background
color red. The second command we pass in is TEXTCOLOR, which will change
the text’s color to whatever we specify. In this case, we want the text
to be blue. We also want to apply this to column zero, row one all the
If you run this code, you should end up with the following table:
Let’s learn what other types of formatting we can apply to our tables!
Cell Formatting
There are several cell formatting commands that you can use to format
Command Description
ALIGNMENT (or
LEFT, RIGHT, CENTRE/CENTER or DECIMAL
ALIGN)
BACKGROUND The cell’s background color
The font name to be applied (optionally can add font
FONT
size and leading)
FONTNAME (or
The font name
FACE)
The size of the font in points (leading will likely get
FONTSIZE (or SIZE)
out of sync)
LEADING The leading space in points
TEXTCOLOR The color name string or (R,G,B) tuple
LEFTPADDING The amount of left padding as an integer (default: 6)
The amount of right padding as an integer (default:
RIGHTPADDING
6)
The amount of bottom padding as an integer (default:
BOTTOMPADDING
3)
TOPPADDING The amount of top padding as an integer (default: 3)
COLBACKGROUNDS A list of colors that ReportLab will cycle through
ROWBACKGROUNDS A list of colors that ReportLab will cycle through
VALIGN TOP, MIDDLE or the default of BOTTOM
did want to add a little more information about the BACKGROUND command.
string name or a numeric tuple / list. If you go with the last one, then
# table_background_gradient.py
def table_background_gradient():
doc = SimpleDocTemplate("table_background_gradient.pdf", pagesize=letter)
story = []
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_background_gradient()
tblstyle = TableStyle([('BACKGROUND', (0, 0), (-1, 0),
])
Here we are telling the Table to apply a style on the first row. That
starts at red and goes to blue. The result ends up looking like this:
TableStyle. You can use this command to change one or more of your
# table_fonts.py
def table_fonts():
doc = SimpleDocTemplate("table_fonts.pdf", pagesize=letter)
story = []
doc.build(story)
if __name__ == '__main__':
table_fonts()
The FONT command is actually a bit different from the commands we have
others do not. In this case, we can add a font size and leading. For
this example, we set the Font face to Helvetica and the font size to
24. When you run this code, you will see something like this:
Now if you want to, you can use the FONT command and the
# table_fonts_bad.py
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
def table_fonts_bad():
doc = SimpleDocTemplate("table_font_bad.pdf", pagesize=letter)
story = []
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_fonts_bad()
Here we set the Font to the same font face but we skip setting the font
size. Then we try setting the font size by using the FONTSIZE command. If you
run this, you should see something like the following:
the source code you will find that the FONT command has a built-in
leading value or padding value that the FONTSIZE command does not.
Now let’s take a look at how we might switch between fonts in our Table:
# table_multiple_fonts.py
def table_multiple_fonts():
doc = SimpleDocTemplate("table_multiple_fonts.pdf", pagesize=letter)
story = []
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_multiple_fonts()
Here we change the font face in each row. You will note that we don’t
set the font size for the first row and just let it go to its default.
The other two rows, we set the font size explicitly. Here is the result
Now let’s move on and learn how we can draw lines in our Table.
Line Commands
You might be wondering why a table doesn’t have lines around its border
or its cell’s borders. I honestly don’t know why they’re off by default,
but it’s actually pretty easy to turn them on. In fact, ReportLab has
several line commands you can use to add lines to your Tables. Here are
BOX
GRID or OUTLINE
INNERGRID
LINEBELOW
LINEABOVE
LINEBEFORE
LINEAFTER
The GRID line command is actually equivalent to applying the BOX and
INNERGRID commands. Basically BOX will apply lines to the
outside of the specified cells while INNERGRID will add lines inbetween
the cells. The other line commands allow us to add lines above, below,
before or after the specified cells. In other words, they allow you to
# table_grids.py
def table_grids():
doc = SimpleDocTemplate("table_grids.pdf", pagesize=letter)
story = []
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
story.append(Spacer(0, 25))
doc.build(story)
if __name__ == '__main__':
table_grids()
Here we just create a simple table and add a box around the entire
table. Then we also draw lines in-between all the cells. The result ends
You will note that you can apply a line width, which is the 4th element
in the line command. In this example, we set it to 0.25, but you can set
# table_cell_lines.py
def table_cell_lines():
doc = SimpleDocTemplate("table_cell_lines.pdf", pagesize=letter)
story = []
tblstyle = TableStyle(
[('LINEABOVE', (0, 0), (-1, 0), 0.5, colors.red),
('LINEBELOW', (0, 0), (-1, 0), 1.5, colors.blue),
('LINEBEFORE', (0, 0), (0, -1), 2.5, colors.orange),
('LINEAFTER', (-1, 0), (-1, -1), 3.5, colors.green),
])
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_cell_lines()
to the first row in the color red with a line thickness of 0.5. Since it
specifies the line should go above the cells, it appears at the top of
row one. The next command, LINEBELOW, specifies that we want to add
a line below the cells of row one. In this case, we add a blue line
2.5 and we tell it to be applied to the first column, so that means the
line will appear on the far left side of our Table. To balance things
out, we apply the LINEAFTER command to the last column of our table
and set it to green with a thickness of 3.5. When you run this code, you
Cell Alignment
The examples we have looked at so far do not demonstrate that the cells
default to being aligned to the bottom left of each cell. Let’s create
an example where we actually specify the column and row widths so we can
# table_default_cell_alignment.py
def table_default_cell_alignment():
doc = SimpleDocTemplate("table_default_cell_alignment.pdf", pagesize=letter)
story = []
doc.build(story)
if __name__ == '__main__':
table_default_cell_alignment()
The main difference to this code versus the ones we have seen previously
a list of rowWidths. When you run this example, your output should
Now let’s update our example to apply cell alignment to some of the
# table_cell_alignment.py
def table_cell_alignment():
doc = SimpleDocTemplate("table_cell_alignment.pdf", pagesize=letter)
story = []
doc.build(story)
if __name__ == '__main__':
table_cell_alignment()
Here we tell our Table to center the cells in column one, which will
result in the text being centered on the bottom of the cell. Next we
tell the Table to put the second column’s cells in the middle via the
VALIGN command. What this does is put all the text in column two in
the middle left. The next two commands tell the Table to ALIGN the third
column’s cells to the CENTER and to VALIGN them in the MIDDLE. This
results in the text in those cells being in the exact center of the
cells. We leave column four at its default position and apply ALIGN
RIGHT to the final column, which puts the text on the bottom right of
the cell.
cells.
financial reports where you will see a lot of reports that have white
alternating with gray or occasionally, a light green. Let’s learn how
# table_alternating.py
def table_alternating():
doc = SimpleDocTemplate("table_alternating.pdf", pagesize=letter)
story = []
rs.white]),
('COLBACKGROUNDS', (0,0), (-1,-1),
[colors.red, colors.white, colors.blue])
])
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
story.append(Spacer(0, 25))
doc.build(story)
if __name__ == '__main__':
table_alternating()
rows, we tell ReportLab to alternate between gray and white. Then for
the columns, we pass in red, white and blue. When you run this example,
As you can see, changing row or column colors is quite easy with
ReportLab.
strings. Let’s create a Paragraph with some formatting applied and add
# table_paragraph.py
def table_paragraph():
doc = SimpleDocTemplate("table_paragraph.pdf", pagesize=letter)
story = []
styles = getSampleStyleSheet()
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_paragraph()
Here we extend our previous example by adding some imports to get the
Paragraph class and a paragraph style sheet. Then we format some text so
that it has one word that is blue and in 14 point font. If you run this
Now sometimes you want to also add images to your Tables. We will look
at that next.
One method would be to insert an image into a Paragraph object and then
add that Paragraph to a cell. When you do that, you will find that the
fit the image. Instead, the best way to insert an image into a Table
# table_with_images.py
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
from reportlab.platypus import Paragraph, Image
def table_with_images():
doc = SimpleDocTemplate("table_with_images.pdf", pagesize=letter)
story = []
styles = getSampleStyleSheet()
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_with_images()
This example builds on the previous example by importing the Image flowable
and then creating an image that we set to a height and width of
50 points. Then we add that object to row three 5 times so that we have
Spanning
Any time you are dealing with tabular data you will find occasions where
(SPAN, (begin_col, begin_row), (end_col, end, row))
For this command, we just tell ReportLab’s Table that we want it to SPAN
or combine the cells starting with (col, row) and ending with another
# table_cell_spanning.py
tbl = Table(data)
tbl.setStyle(tblstyle)
story.append(tbl)
doc.build(story)
if __name__ == '__main__':
table_cell_spanning()
As with all of the commands we have looked at, you can always apply both
positive and negative numbers. In this case, we add grid lines to make
the cell combining more obvious. Then we call the SPAN command and tell
combine that cell with (column 1, last row). In other words, we want the
bottom left two rows to combine or span. If you run this code, you
Fig. 5-14: A Table with cell spanning
work. Feel free to try it out with various other beginning and ending
sets of coordinates.
Wrapping Up
We covered a lot of information about ReportLab’s Table in this chapter.
and columns. Another topic we covered was how to create complex cell
Preformatted
XPreformatted
Image
Spacer
PageBreak
CondPageBreak
KeepTogether
TableOfContents
SimpleIndex
ListFlowable
Preformatted
ReportLab has a couple of Flowables that are somewhat related called
ReportLab’s docstring:
It attempts to display text exactly as you typed it in a fixed width
“typewriter” font. By default the line breaks are exactly where you
put them, and it will not be wrapped. You can optionally define a
maximum line length and the code will be wrapped; and extra characters
Preformatted(text, style, bulletText=None,
newLineChars=None)
create an instance of this Flowable. You can also turn it into a bullet
you happen to set dedent, then any common leading space will be removed.
For example, if you were to set the dedent to 5, then it would dedent or
you define, the line will be automatically split. If you don’t set a
maximum line length and your line is too long for the page, then your
line will just draw until it reaches the edge of the page and then
parenthesis, left square bracket and left curly brace. Of course, you
character.
# preformatted_paragraph.py
def preformatted_paragraph():
doc = SimpleDocTemplate("preformatted_paragraph.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
preformatted_paragraph()
in it. Both have some XML tags in their text strings. When you run this,
Paragraph’s XML tags, so they are completely visible and have no effect
on the text.
XPreformatted
The XPreformatted Flowable is actually a sub-class of the
rearranging form of the Paragraph class”. It will actually work with XML
tags in the same manner that they work with a regular Paragraph. To
XPreformatted(text, style, bulletText=None,
dedent=0, frags=None)
The parameters here work the same way that they do when using a
# xpreformatted_paragraph.py
def xpreformatted_paragraph():
doc = SimpleDocTemplate("xpreformatted_paragraph.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables)
if __name__ == '__main__':
xpreformatted_paragraph()
This Flowable works pretty much the same was as the Preformatted one.
Frankly I personally use the Paragraph Flowable the most and
unless you need one of these, I highly recommend sticking with it.
Image(filename, width=None, height=None)
As you can see, the Image class takes 3 arguments: The filename, the
width and the height of the image. The filename argument is required
and can be a file path, a file-like object or an instance of
supports the jpeg format. However if you have the Pillow (or PIL)
package installed, then most other image types are also supported. The
will just assume that the other dimension of the image is in points too
and use it as is. What this means is that if you only specify one of the
dimensions, the aspect ratio of the image will not be maintained and the
# image_demo.py
def full_size_image():
doc = SimpleDocTemplate("image_full_size.pdf", pagesize=letter)
story = []
img = Image("snakehead.jpg")
story.append(img)
doc.build(story)
def no_scaling():
doc = SimpleDocTemplate("image_no_scaling.pdf", pagesize=letter)
story = []
img = Image("snakehead.jpg", width=50)
story.append(img)
doc.build(story)
if __name__ == '__main__':
full_size_image()
no_scaling()
This demo will create two PDFs. The first one will create a PDF with the
squished-looking manner.
Fig. 6-2: A full size image
Fig. 6-3: An image with no scaling
Now let’s create another script that can actually maintain the aspect
# scaled_image.py
def scaled_image(desired_width):
doc = SimpleDocTemplate("image_with_scaling.pdf", pagesize=letter)
story = []
img = utils.ImageReader('snakehead.jpg')
img_width, img_height = img.getSize()
aspect = img_height / float(img_width)
img = Image("snakehead.jpg",
width=desired_width,
height=(desired_width * aspect))
img.hAlign = 'CENTER'
story.append(img)
doc.build(story)
if __name__ == '__main__':
scaled_image(50)
then use the ImageReader class from the utils submodule to get
an image object. It will use Pillow to get information about the image.
Then we can call it’s getSize method to get the image’s width and
height. Once we have that, we can use a simple calculation to get the
this case, we want the image to be 50 points wide. When you run this
You can also set the Image’s horizontal alignment via it’s hAlign attribute. We
demonstrated that in our scaled demo above.
Spacer(width, height)
Note that the width parameter is ignored currently and has been for
many years. The only use of the Spacer is for adding vertical space that
simple example:
# spacer_demo.py
def no_spacers():
doc = SimpleDocTemplate("no_spacers.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
for p in range(3):
text = "<para align=center>Hello, I'm a Paragraph</para>"
para = Paragraph(text, style=styles["Normal"])
flowables.append(para)
doc.build(flowables)
def use_spacers():
doc = SimpleDocTemplate("use_spacers.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
for p in range(3):
text = "<para align=center>Hello, I'm a Paragraph</para>"
para = Paragraph(text, style=styles["Normal"])
flowables.append(para)
spacer = Spacer(width=0, height=50)
flowables.append(spacer)
doc.build(flowables)
if __name__ == '__main__':
no_spacers()
use_spacers()
Fig. 6-6: Paragraphs
with spacing
I believe you will find this simple Flowable very useful in your own
code.
PageBreak
We mentioned the PageBreak Flowable back in chapter 3 and as its
name implies, it will add a page break to your document by consuming all
the vertical space left on the page. This will cause any other Flowables
that are added after the PageBreak to appear on the following page. Note
that case, ReportLab would detect that as a Frame break and the Flowable
would continue in the next Frame, which could be on the same page. The
# page_break.py
def page_break():
doc = SimpleDocTemplate("page_break.pdf")
styles = getSampleStyleSheet()
flowables = []
pagebreak = PageBreak()
flowables.append(pagebreak)
doc.build(flowables)
if __name__ == '__main__':
page_break()
All this code does is create two Paragraphs with a PageBreak between
each page.
CondPageBreak
The CondPageBreak is a Flowable that applies to Frames and will
attempt to force a break in a Frame. In fact, the documentation says
that this Flowable is poorly named and really ought to have been called
CondFrameBreak.
CondPageBreak(height)
Personally I have never had the need for one of these nor seen one in
the wild.
KeepTogether
The KeepTogether Flowable is a class that takes a list of other
Flowables and will attempt to keep the entire list in the same
Frame.
KeepTogether(flowables)
If the list of Flowables exceeds the height of the Frame in which they
How to Add a Table of Contents
ReportLab has a neat Flowable called TableOfContents that will allow
from reportlab.lib.styles import ParagraphStyle
from reportlab.platypus.tableofcontents import TableOfContents
toc = TableOfContents()
replace them with your own. You can do that like this:
from reportlab.lib.styles import ParagraphStyle
from reportlab.platypus.tableofcontents import TableOfContents
toc = TableOfContents()
toc.levelStyles = [
ParagraphStyle(name = 'Heading1',
fontSize = 16,
leading = 16),
ParagraphStyle(name = 'Heading2',
fontSize = 12,
leading = 14),
]
As you can see, all you need to do is pass in some ParagraphStyles to add your
styles. According to the documentation, you can use the
user guide:
# toc_creator.py
class MyDocTemplate(BaseDocTemplate):
def __init__(self, filename, **kw):
self.allowSplitting = 0
BaseDocTemplate.__init__(self, filename, **kw)
template = PageTemplate('normal', [Frame(2.5*cm, 2.5*cm, 15*cm, 25*cm, i\
d='F1')])
self.addPageTemplates(template)
def main():
heading1_style = ParagraphStyle(name = 'Heading1',
fontSize = 16,
leading = 16)
heading2_style = ParagraphStyle(name = 'Heading2',
fontSize = 12,
leading = 14,
)
doc = MyDocTemplate('toc.pdf')
doc.multiBuild(story)
if __name__ == '__main__':
main()
things in the __init__. Then we override the afterFlowable method. The key
point here is that we call self.notify with a
object that the template has detected an entry that should be added to
the table of contents. In the notification, you will need to pass the
The destination key is a bookmark that should make those items that have
it into clickable links. The rest of the code in this example lies in
You will note that we need to call the multiBuild method on our
is because ReportLab will need to make several passes over the document
SimpleIndex
ReportLab has basic support for adding an index to your PDFs as well.
word in ReportLab, you will need to use the <index> tag along
with the item attribute, like this:
ptext = """I'm a custom <index item="bulletted"/>bulletted paragraph"""
This will tell ReportLab that you would like to add an index for the
# simple_index.py
def simple_index():
doc = SimpleDocTemplate("simple_index.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
flowables = []
doc.build(flowables, canvasmaker=index.getCanvasMaker())
if __name__ == '__main__':
simple_index()
create our index object. Finally, to add the SimpleIndex you need to
doc.build(flowables, canvasmaker=index.getCanvasMaker())
This will make the document builder use the index’s canvas maker object.
When you run this, you should end up with an index that looks like this:
Fig. 6-8: A simple
index
Note that you can supply some other items to the SimpleIndex constructor when
you instantiate it, but the documentation doesn’t
really explain what it does. However you can kind of get a gist for how
Looking at that and the user guide tells us that we could change the
index = SimpleIndex(dot='.')
If you do that then your index will now look like this:
I actually think this is more visually appealing than just using the
defaults for SimpleIndex. You can also create an index that contains
<index item="foo,bar,word"/>
The first term will be at the top-most level while the last will be at
the lowest. If you have additional lists of indexed terms, they will
One other thing of note is that if you happen to have the need to index
a term that has a comma in it, then you will need to escape it by
ListFlowable / ListItem
The last Flowables we will be covering in this chapter is the
can be used to make ordered and unordered lists. You can also nest the
lists. The cool thing about the ListFlowable is that it can contain
any other Flowable and will use them to create the ordered list. You can
also change the font, color, size, style and position of the list number
or the bullets in unordered lists. You may also change the type of
# list_flowable_demo.py
def list_flowable_demo():
doc = SimpleDocTemplate("list_flowable_demo.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
normal = styles['Normal']
story = []
flowables = [
Paragraph('Paragraph numero uno', normal),
ListItem(Paragraph('Paragraph #2', normal),
bulletColor="blue", value=5),
Paragraph('Paragraph #3', normal),
]
lflow = ListFlowable(flowables)
story.append(lflow)
doc.build(story)
if __name__ == '__main__':
list_flowable_demo()
This will create an ordered list which some might call a numbered list.
is that the ordered list will start at “1” but the second item will
its value. Since the value got changed to a “5”, the third item
code:
Now let’s enhance the previous code so that we can add some sub-list
# list_flowable_squares.py
def list_flowable_squares():
doc = SimpleDocTemplate("list_flowable_squares.pdf",
pagesize=letter
)
styles = getSampleStyleSheet()
normal = styles['Normal']
story = []
flowables = [
Paragraph('Paragraph numero uno', normal),
ListItem(Paragraph('Paragraph #2', normal),
bulletColor="blue"),
Paragraph('Paragraph #3', normal),
]
flowables.append(
ListFlowable(
[Paragraph("I'm a sublist item", normal),
ListItem(Paragraph("I'm another sublist item", normal),
bulletColor='blue'),
ListItem(Paragraph("I'm the last sublist item", normal),
bulletColor='red')
],
bulletType='bullet',
start='square'
))
doc.build(story)
if __name__ == '__main__':
list_flowable_squares()
The first half of the code is pretty much the same as we saw previously.
two ListItems. For this nested ListFlowable, we set its bulletType to ‘bullet’
and its shape to square by assigning square to
different.
For those that are curious, you can open up flowables.py in the
_bulletNames = dict(
square=u'\u25a0',
disc=u'\u25cf',
diamond=u'\u25c6',
diamondwx=u'\u2756',
rarrowhead=u'\u27a4',
sparkle=u'\u2747',
squarelrs=u'\u274f',
blackstar=u'\u2605',
As you can see, there 10 bullet names and the example we did only used
the square bullet name. Feel free to edit the code and try running
Wrapping Up
In this chapter we learned about how to create the following Flowables:
Preformatted
XPreformatted
Image
Spacer
PageBreak
CondPageBreak
KeepTogether
TableOfContents
SimpleIndex
ListFlowable
and PageBreak Flowables the most. The rest are used primarily for
very specific use cases that I don’t believe happen all that often.
However you should now know enough about each of these Flowables to be
own custom Flowables whenever you need to. You can look at the Flowables
that are included with ReportLab for ideas. There is also a chapter in
the ReportLab User Guide that has a fun example of creating a custom
Flowable that looks like a hand. You might find some user contributed
In this chapter we will look at how to create two very basic custom
Flowables that should help you see how easy it is to create your own.
# line_flowable.py
class MyLineFlowable(Flowable):
"""
A Custom Line Flowable
"""
def __repr__(self):
return "Line(w=%s)" % self.width
def draw(self):
"""
draw the line
"""
self.canv.line(0, self.height, self.width, self.height)
height as its sole arguments. Then we override the __repr__ method to make the
output from printing our instances of this Flowable
line on the Flowable’s canvas using the width and height that were
passed in.
Now let’s create some code that actually uses our custom Flowable:
# line_flowable_demo.py
def create_line_flowable():
"""
Create a pdf
"""
story = []
doc = SimpleDocTemplate("create_line_flowable.pdf", pagesize=letter)
styles = getSampleStyleSheet()
spacer = Spacer(0, 0.25*inch)
line = MyLineFlowable(500)
story.append(line)
story.append(spacer)
doc.build(story)
if __name__ == "__main__":
create_line_flowable()
The first thing we do is import our custom Flowable. Then we import the
various other bits and pieces we need from ReportLab itself. Next we
Spacer object. Then we add a Paragraph and a spacer to our story list.
line = MyLineFlowable(500)
This will draw a line that is 500 points long! We add it to our story
list just like we did with our previous Flowables. Then we ad a spacer
and another Paragraph before finally generating our PDF, which ends up
This looks pretty good, but I think it’s kind of a boring Flowable. So
a line that went from the top of the box off to the right and I needed
-----------------------------------------
| foobar |
----------
As you may have guessed, this custom Flowable is a little bit more
that much more code to create a really simple version. Let’s take a
look!
# boxy_line.py
class BoxyLine(Flowable):
"""
Draw a box + line + text
-----------------------------------------
| foobar |
---------
"""
def draw(self):
"""
Draw the shape, text, etc
"""
self.canv.rect(self.x, self.y, self.width, self.height)
self.canv.line(self.x, 0, 500, 0)
self.canv.drawString(self.x+5, self.y+3, self.text)
This time when we create our class, we set it up to take five arguments:
x, y, width, height and text. The x and y are for helping in positioning
the Flowable’s lines correctly. The width and height are for setting the
# boxy_line_demo.py
def create_boxy_line_flowable():
"""
Create a pdf
"""
doc = SimpleDocTemplate("boxy_line_flowable.pdf",pagesize=letter)
story=[]
box = BoxyLine(text="foo")
story.append(box)
story.append(Spacer(0, 1*inch))
box = BoxyLine(text="bar")
story.append(box)
doc.build(story)
if __name__ == "__main__":
create_boxy_line_flowable()
Here we import our BoxyLine Flowable class and then we create two
instances of it. We stick a Spacer object between them to give them some
page real estate and then we build the story and generate the PDF. This
is the result:
I think it turned out pretty well. One of the immediate improvements you
string that was passed into it and adjust the bounding rectangle
in a way that the default version doesn’t appear to do. You can easily
subclass a pre-existing Flowable like the Paragraph or the Image and just
override its methods to make it do something different.
Just take a look at the Flowable you would like to change and see what
methods it has. Then you can make an educated guess which ones you will
need to change to make your new Flowable work the way you want it to.
The ReportLab User Guide has one example where they subclassed the
example.
Wrapping Up
At this point you should have a pretty good idea how to create your own
Flowables and subclass a pre-existing one. This gives you a lot of power
as a developer as you can create pretty much any kind of drawing with
What is a Drawing?
ReportLab’s definition of a drawing is a platform-independent
PostScript. On the plus side, most systems that create vector graphics
have all followed the model that Postscript uses, so it is actually easy
unambiguous way.
All drawings contain shapes. The ones you learned back in primary school
drawings. There is another shape that you will see in computer programs
called a Group, which is a container of other shapes that you can
manipulate via transforms. If you have ever used Microsoft Publisher or Adobe
Photoshop, then you may have seen that you can group images
and then manipulate them as a whole (i.e. resize the group, flip or turn
them, etc). This is what I meant when I mentioned the word transform.
ReportLab’s Renderers
ReportLab has several Renderers that you can use to draw your
drawings. In other words, ReportLab can take your drawings and output
your drawings in PDF using the renderPDF renderer. You can also
and Fredrik Lundh’s Python Imaging Library (PIL). Note however that
you cannot install PIL in Python 3. Instead ReportLab uses Alex Clark’s
Pillow fork of PIL for that. In fact, you can use Pillow for Python
which means that its API is the same as PIL’s. There is also an
modules. What this all means is that if you have all of these extensions
The Drawings in ReportLab are also Flowables, so you can add them to
your Platypus documents as you would any other Flowable. You can also
too. Another neat feature is that you can use a utility function to
As an aside, the SVG renderer that I mentioned has not had any
optimization applied to it and it can only use the drawing features that
ReportLab Graphics supplies. What this means is that the SVG format that
defaults to which is that the Y-direction is from the bottom up. Some
graphics models (i.e. SVG) use the opposite coordinate system where the
Y-coordinate points down. The SVG renderer that ReportLab provides will
ReportLab.
A Simple Drawing
After all that information, you’re probably starting to wonder if we
would ever get to actually learning how to draw in ReportLab. You will
be happy to know that that time is now. Let’s create a simple drawing
# simple_drawing.py
def simple_drawing():
drawing = shapes.Drawing(400, 400)
drawing.add(shapes.Circle(200, 200, 100, fillColor=colors.red))
renderPDF.drawToFile(drawing, 'simple_drawing.pdf', msg='My Drawing')
if __name__ == '__main__':
simple_drawing()
Fig. 8-1: An example drawing
Supported Outputs
Drawings support the following output formats:
bmp
eps
gif
jpeg
jpg
pct
pdf
pict
png
ps
py
svg
tif
tiff
tiff1
tiffl
tiffp
You can change the code in the previous example to easily export PNG for
# simple_drawing2.py
def simple_drawing():
drawing = shapes.Drawing(400, 400)
drawing.add(shapes.Circle(200, 200, 100, fillColor=colors.red))
renderPM.drawToFile(drawing, 'simple_drawing.png', 'PNG')
if __name__ == '__main__':
simple_drawing()
You will note that each renderer has a similar API. For example, if you
drawToFile method.
Verifying Attributes
ReportLab has some attribute verification built-in to their Graphics
you can easily add an attribute to any of you objects that may or may
>>> class MyRect:
pass
>>>
>>> rect.fullColor
'green'
To avoid runtime and logic errors, ReportLab will run a check on every
example:
>>> from reportlab.graphics import shapes
>>> reportlab_rect = shapes.Rect(5, 5, 100, 100)
>>> reportlab_rect.fullColor = 'red'
Traceback (most recent call last):
Python Shell, prompt 16, line 1
File "/usr/lib/python3/dist-packages/reportlab/graphics/shapes.py", line 361, \
in __setattr__
validateSetattr(self,attr,value) #from reportlab.lib.attrmap
File "/usr/lib/python3/dist-packages/reportlab/lib/attrmap.py", line 115, in v\
alidateSetattr
raise AttributeError("Illegal attribute '%s' in class %s" % (name, obj.__cla\
ss__.__name__))
builtins.AttributeError: Illegal attribute 'fullColor' in class Rect
>>> import reportlab.rl_config
>>> reportlab.rl_config.shapeChecking = 0
tests to verify that your code is doing what you expect it to do.
Naming Shapes
When you add shapes to your Drawing or Group objects, you can
give them a name. This makes it possible for the developer to easily
access the shape after it was created and also allow you to change any
>>> from reportlab.graphics import shapes
>>> my_drawing = shapes.Drawing(width=400, height=200)
>>> circle = shapes.Circle(50, 50, 100)
>>> my_drawing.add(circle, name='my_circle')
>>> circle.cx
50
>>> my_drawing.my_circle.cx
50
The ReportLab guide points out that you can use the same shape instance
in multiple locations of your code base and even use different names to
refer to the same instance. If you make a change to the object, it will
affect all the names since they all refer to the same thing. This is
via the Graphics sub-module. Here is a list of the types you can
currently create:
Bar charts
Line charts
Line plots
Pie charts
developer to tell ReportLab where to put the plot on the page, add
several copies of the same plot or even add annotations to the plot.
Each chart has two axes. These axes may be set to either Value or
Category. Axes have a property called Label that you can use to
configure all the text labels in your chart at once or individually
change them. When it comes to charts, you will find that most of their
One other thing to note about charts is that they can be easily
Labels
The ReportLab Graphics sub-module provides a Label class that you can
basically just a string of text. You may use a Label for axes, titles or
data points. A Label may contain newline characters, but are limited to
When creating a Label, you will typically set its parent object via
Label methods. You can set properties of your Label (or labels) to
formatting.
Here is a table that describes the properties that are Label specific:
Property Meaning
dx The x displacement of the label
dy The y displacement of the label
angle The angle of rotation (counterclockwise) applied to the label
The label’s box anchor, one of ‘n’, ‘e’, ‘w’, ‘s’, ‘ne’, ‘nw’,
boxAnchor
‘se’, ‘sw’
The place where to anchor the label’s text, one of ‘start’,
textAnchor
‘middle’, ‘end’
boxFillColor The fill color used in the label’s box
boxStrokeColor The stroke color used in the label’s box
boxStrokeWidth The line width of the label’s box
fontName The font name of the label
fontSize The font size of the label
leading The leading value of the label’s text lines
x The X-coordinate of the reference point
y The Y-coordinate of the reference point
width The label’s width
height The label’s height
# simple_label.py
def simple_label():
drawing = shapes.Drawing(width=400, height=200)
x = 50
angle = 0
for item in range(3):
label = Label()
label.setOrigin(200, 100)
label.boxAnchor = 'se'
label.angle = angle
#label.boxStrokeColor = colors.black
label.setText('ReportLab label')
drawing.add(label)
x += 25
angle += 45
renderPDF.drawToFile(drawing, 'simple_label.pdf')
if __name__ == '__main__':
simple_label()
Here we create our Drawing object and then add a Rectangle via the
Rect class. This rectangle will help us mark the origin point for
our Label. Next we create 3 Label objects. For each Label, we set
its origin to the same location and anchor them to se, or South-East.
Then we set the angle for the Label and the Label’s text value. Finally
we add it to our Drawing. You will also note that we update the
will see that ReportLab draws a rectangular box around each of the
Labels. I thought this was less visually appealing for this example, so
I commented it out.
Feel free to play around with any of these settings and see how it
Axes
Earlier in this chapter, we learned that ReportLab supports two kinds of
Let’s create a simple demo that shows how you can create some axes:
# axes_demo.py
def axes_demo():
drawing = shapes.Drawing(width=500, height=300)
x_axis = XCategoryAxis()
x_axis.setPosition(100, 100, 350)
x_axis.configure(data, barWidth=None)
x_axis.categoryNames = ['Python', 'Ruby', 'C++', 'Haskell']
x_axis.labels.boxAnchor = 'n'
x_axis.labels[0].angle = 45
x_axis.labels[0].fontName = 'Times-Bold'
x_axis.labels[1].fontName = 'Courier'
x_axis.labels[1].fontSize = 16
drawing.add(x_axis)
y_axis = YValueAxis()
y_axis.setPosition(75, 75, 150)
y_axis.configure(data)
drawing.add(y_axis)
renderPDF.drawToFile(drawing, 'axes_demo.pdf')
if __name__ == '__main__':
axes_demo()
Next we create our Drawing object and then we create a list with two
tuples in it. We will use these tuples to configure out axes. The first
for the labels. When you configure with the data, you are telling
ReportLab to figure out how to position the axis’s labels itself. The
last few lines for the X axis shows how to configure the first and
second labels. For the first label, we change its angle to 45 degrees
The next step is to create our YValueAxis. Then we set its position
and configure it the same way we did for the X axis. When you run this
Normally you do not create axes like this though. Instead, you will
create a chart or plot and that widget will generate the axes for you.
The last three properties are all uses for joining the two axes
properties in reportlab/graphics/axes.py.
use to join the first axis to the other axis. You will note that it
contains mode and pos parameters that are the same as the ones
Let’s take a few minutes to learn a bit more about how each of these
The XCategoryAxis
The Category Axis divides itself into equally sized “buckets” based on
Value axis is. You can set the location of a category axis by using the
can also use this method to do the same thing. The next step in creating
a Category Axis is to give it some data by calling its configure method. You can
set the axis’s reversed attribute to 1 (or True) to
tickUp - Sets how far above the axis the tick marks
tickDown - Sets how far below the axis the tick marks should
protrude
each of the categories on your axis. You can edit any property of
The YValueAxis
The left axis in your charts is known as the YValueAxis. A Value
versus the Category Axis, which maps data along the x value plane
that you give to it. It will also convert Y values to points to help the
You can use the setPosition and configure methods of the Value
Axis in the same way that you did for the Category Axis. If you don’t
set the max, min and tick interval, then when you call configure(),
You can use the scale method to convert y data values to your
drawing space.
The default is that the largest data point will be at the top of the
axis and the lowest will be at the bottom. The axis will choose “nice
round numbers” to use for it tick marks. You can override any the
tickLeft - Sets how far to the left of the axis the tick marks
tickRight - Sets how far to the right of the axis the tick
set to. Default is None, which means that the axis sets the
valueMax - The y value that the axis’s top should be set to.
Default is None, which means that the axis sets the maximum to the
so the axis will try to set the step to the best group of “round
spacing in points.
Bar Charts
ReportLab supports several different bar charts. You can create the
following:
VerticalBarChart
VerticalBarChart3D
HorizontalBarChart
HorizontalBarChart3D
Each of these bar charts utilize the Labels and Axes that we learned
about in the previous few sections of this chapter. Let’s take a look at
a simple example:
# simple_bar_chart.py
from reportlab.lib.colors import PCMYKColor
from reportlab.graphics.shapes import Drawing
from reportlab.graphics.charts.barcharts import VerticalBarChart
def simple_bar_chart():
d = Drawing(280, 250)
bar = VerticalBarChart()
bar.x = 50
bar.y = 85
data = [[1, 2, 3, None, None, None, 5],
[10, 5, 2, 6, 8, 3, 5],
[5, 7, 2, 8, 8, 2, 5],
[2, 10, 2, 1, 8, 9, 5],
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = PCMYKColor(0,100,100,40,alpha=85)
bar.bars[1].fillColor = PCMYKColor(23,51,0,4,alpha=85)
d.add(bar, '')
if __name__ == '__main__':
simple_bar_chart()
the same as what we learned about in the Label and Axes sections of this
chapter. That is until we reach the end, where we see the following:
d.save(formats=['pdf'], outDir='.', fnRoot='simple_bar_chart')
Here we just call the Drawing’s save command and tell it we want to
save in PDF format. We also tell it to save it to the folder that the
code is run from and we give it a name via the fnRoot parameter. If
You could import the renderPDF module and use that instead like
this:
from reportlab.graphics import renderPDF
renderPDF.drawToFile(d, 'simple_bar_chart.pdf')
Or you could just add the chart to a list of Flowables and use a
def simple_bar_chart_flowable():
d = Drawing(280, 250)
bar = VerticalBarChart()
bar.x = 50
bar.y = 85
data = [[1, 2, 3, None, None, None, 5],
[10, 5, 2, 6, 8, 3, 5],
[5, 7, 2, 8, 8, 2, 5],
[2, 10, 2, 1, 8, 9, 5],
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = PCMYKColor(0,100,100,40,alpha=85)
bar.bars[1].fillColor = PCMYKColor(23,51,0,4,alpha=85)
d.add(bar, '')
doc = SimpleDocTemplate("simple_bar_chart_flowable.pdf"
)
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_bar_chart_flowable()
then you may need to adjust the Drawing object’s size a bit to better
fit your chart.
of numbers.
that it has enough room for its axis labels and ticks.
Default: None
Default: 5. If you have only one series of data, use this property
formatting information)
formatting information)
Let’s try updating the code a bit so that the chart’s labels are angled
# simple_bar_chart_angled_labels.py
def simple_bar_chart_angled_labels():
d = Drawing(280, 250)
bar = VerticalBarChart()
bar.x = 50
bar.y = 85
data = [[1, 2, 3, None, None, None, 5],
[10, 5, 2, 6, 8, 3, 5],
[5, 7, 2, 8, 8, 2, 5],
[2, 10, 2, 1, 8, 9, 5],
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = colors.green
bar.bars[1].fillColor = colors.blue
bar.bars[2].fillColor = colors.red
bar.bars[3].fillColor = colors.purple
bar.categoryAxis.labels.angle = 45
bar.categoryAxis.labels.dy = -15
bar.groupSpacing = 12
bar.barSpacing = 2
d.add(bar, '')
doc = SimpleDocTemplate('simple_bar_chart_angled_labels.pdf')
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_bar_chart_angled_labels()
Here we set all the labels in the chart to a 45 degree angle and we set
it’s y displacement at -15 points to move the labels away from the axis.
You will note that we also updated the bar chart’s colors to use just
named colors from ReportLab’s color module rather than continuing to use
PCMYKColor. We also update the bar and group spacing via the
Fig. 8-5: A simple bar chart with angled labels
A bar chart’s labels are automatically shown for negative numbers below
the lower end of bar and for positive values above the upper end. You
can add stacked bars by setting the categoryAxis’s style attribute to “stacked”.
Here is some example code:
# simple_stacked_bar_chart.py
def simple_stacked_bar_chart():
"""
Creates a bar chart in a PDF
"""
d = Drawing(280, 250)
bar = VerticalBarChart()
bar.x = 50
bar.y = 85
data = [[1, 2, 3, None, None, None, 5],
[10, 5, 2, 6, 8, 3, 5]
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = colors.green
bar.bars[1].fillColor = colors.blue
bar.categoryAxis.labels.angle = 45
bar.categoryAxis.labels.dy = -15
bar.categoryAxis.style = 'stacked'
d.add(bar, '')
doc = SimpleDocTemplate('simple_stacked_bar_chart.pdf')
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_stacked_bar_chart()
When you run this code, you should see the following:
a VerticalBarChart object:
# simple_vertical_3d_bar_chart.py
from reportlab.lib import colors
from reportlab.graphics.shapes import Drawing
from reportlab.graphics.charts.barcharts import VerticalBarChart3D
from reportlab.platypus import SimpleDocTemplate
def simple_vertical_3d_bar_chart():
d = Drawing(280, 250)
bar = VerticalBarChart3D()
bar.x = 50
bar.y = 85
bar.height = 225
bar.width = 350
data = [[1, 2, 3, None, None],
[10, 5, 2, 6, 8],
[5, 7, 2, 8, 8],
[2, 10, 2, 1, 8],
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = colors.green
bar.bars[1].fillColor = colors.blue
bar.bars[2].fillColor = colors.red
bar.bars[3].fillColor = colors.purple
bar.categoryAxis.labels.angle = 45
bar.categoryAxis.labels.dy = -15
d.add(bar, '')
doc = SimpleDocTemplate('simple_vertical_3d_bar_chart.pdf')
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_vertical_3d_bar_chart()
This code is virtually the same as the last with the exception of
switching to a different chart type and updating the chart’s width and
height. Here is the output:
# simple_horizontal_bar_chart.py
def simple_horizontal_bar_chart():
d = Drawing(280, 250)
bar = HorizontalBarChart()
bar.x = 50
bar.y = 85
bar.height = 225
bar.width = 250
data = [[1, 2, 3, None, None],
[10, 5, 2, 6, 8],
[5, 7, 2, 8, 8],
[2, 10, 2, 1, 8],
]
bar.data = data
bar.categoryAxis.categoryNames = ['Year1', 'Year2', 'Year3',
'Year4', 'Year5', 'Year6',
'Year7']
bar.bars[0].fillColor = colors.green
bar.bars[1].fillColor = colors.blue
bar.bars[2].fillColor = colors.red
bar.bars[3].fillColor = colors.purple
bar.categoryAxis.labels.angle = 45
bar.categoryAxis.labels.dx = -15
d.add(bar, '')
doc = SimpleDocTemplate('simple_horizontal_bar_chart.pdf')
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_horizontal_bar_chart()
Fig. 8-8: A simple horizontal bar chart
Line Charts
A Line Chart in ReportLab is basically the same as their Bar Chart
except that they are drawn with lines. The Line and Bar charts both use
Category and Value axes whereas Line Plots use two Value axes. You can
Property Meaning
data The data to be plotted (list of lists of numbers)
x, y, width, The rectangle around the line chart. Note: x/y specify the
height bottom left corner
valueAxis The value axis (see axes section)
categoryAxis The category axis (see axes section)
If set, it will draw a border around the plot. Default is None.
strokeColor
Axes will overwrite this
If set, it will fill the plot’s rectangle with a solid color.
fillColor
Default is None
lines.strokeColor The color of the line
lines.strokeWidth The width of the line
A collection (i.e. 2-dimensional array) of labels used for
lineLabels
formatting
Default is None. You may supply a function or format string
lineLabelFormat
to format the labels. You may
also pass it “values” to display the array you supply to
lineLabelArray
An explicit list of line label values that must match the size
lineLabelArray
of the data provided.
Only shown if lineLabelFormat is set to “values”
HorizontalLineChart class:
# simple_line_chart.py
def simple_line_chart():
d = Drawing(280, 250)
line = HorizontalLineChart()
line.x = 50
line.y = 85
line.height = 150
line.width = 250
line.lines[0].strokeColor = colors.green
line.lines[1].strokeColor = colors.blue
line.lines[0].strokeWidth = 3
line.categoryAxis.labels.angle = 45
line.categoryAxis.labels.dy = -15
d.add(line, '')
doc = SimpleDocTemplate('simple_line_chart.pdf')
story = []
story.append(d)
doc.build(story)
if __name__ == '__main__':
simple_line_chart()
as what we saw in our Bar Chart examples. The main differences here is
that when we want to set the line colors, we need to set strokeColor as fillColor
doesn’t exist for a line in a Line Chart. We can also
set out strokeWidth, which will change the width of the line that it
Fig. 8-9: A simple line chart
The other Line Chart that you might want to create is the 3D version of
Fig. 8-10: A simple 3D line chart
There are some sample charts you can also import and play around with
Line Plots
A Line Plot is slightly different than a Line Chart. It has two Value
Axes instead of a Value Axis and a Category Axis, for example. They also
each data point on the plot. Here is a listing of the various properties
Property Meaning
data The data to be plotted (list of lists of numbers)
x, y, width, The rectangle around the line chart. Note: x/y specify the
height bottom left corner
The vertical value axis which can be formatted as seen in the
xValueAxis
axes section
The vertical value axis which can be formatted as seen in the
yValueAxis
axes section
If set, it will draw a border around the plot. Default is None.
strokeColor
Axes will overwrite this
strokeWidth If set, this will be the width of the plot’s border
If set, it will fill the plot’s rectangle with a solid color.
fillColor
Default is None
lines.strokeColor The color of the line
lines.strokeWidth The width of the line
The marker used for each point in the line. You can create a
lines.symbol
marker via makeMarker()
A collection (i.e. 2-dimensional array) of labels used for
lineLabels
formatting
Default is None. You may supply a function or format string
lineLabelFormat
to format the labels. You may
also pass it “values” to display the array you supply to
lineLabelArray
An explicit list of line label values that must match the size
lineLabelArray
of the data provided.
Only shown if lineLabelFormat is set to “values”
Now let’s try actually creating a Line Plot so we can visually see how
# line_plot_demo.py
def line_plot_demo():
d = Drawing(400, 400)
line = LinePlot()
line.x = 50
line.y = 85
line.height = 150
line.width = 250
line.lineLabelFormat = '%2.0f'
data = [
((1,1), (2,2), (2.5,1), (3,3), (4,5)),
((1,2), (2,3), (2.5,2), (3.5,5), (4,6))
]
line.data = data
line.lines[0].strokeColor = colors.green
line.lines[1].strokeColor = colors.blue
line.lines[0].strokeWidth = 3
line.lines[0].symbol = makeMarker('Circle')
line.lines[1].symbol = makeMarker('Hexagon')
line.xValueAxis.valueMin = 0
line.xValueAxis.valueMax = 10
line.xValueAxis.valueSteps = [1, 2, 4]
line.xValueAxis.labelTextFormat = '%2.1f'
line.yValueAxis.valueMin = 0
line.yValueAxis.valueMax = 12
d.add(line, '')
d.save(formats=['pdf'], outDir='.', fnRoot='line_plot_demo')
if __name__ == '__main__':
line_plot_demo()
When you run this code, you will see the following output:
As you can see, this plot has markers (i.e. dots) that mark each data
point on the plot. If you look closely, you will notice that the green
line had circle markers while the blue line has hexagon shaped markers.
You can set your marker shape to any of the following values:
‘Smiley’,’ArrowHead’, ‘FilledArrowHead’
Now let’s move on and learn about making pie charts!
Pie Charts
ReportLab also supports the creation of Pie charts. Personally I find
their pie charts to be the most fun of the bunch. Let’s start off with a
pie property list. You will notice that this table also shows what
Property Meaning
data A list or tuple of numbers that will go in your chart
The rectangle around the pie chart. Note: x/y specify
x, y, width, height
the bottom left corner
Note: The width and height do not have to be the
same. Pies may be elliptical
None or a list of strings. It is recommended that you
labels
leave this set to None and put the
labels in a Legend object instead for readability
The start angle of the first pie slice. Default is 90 or
startAngle
twelve o’clock
The direction the pie slices progress in. Default is
direction
‘clockwise’
This will create a chart with labels in two columns,
sideLabels
one per side
A fraction of the width of the pie that determines the
sideLabelsOffset
horizontal distance between the pie
itself and the column of labels
Defaults to 1. Set to 0 if you want to enable custom
simpleLabels
labels and properties
A collection of wedges that allows you to customize
slices
each slice
slices.strokeWidth The border width of a slice
slices.strokeColor The color of the border of a slice
slices.strokeDashArray Solid a dashed lines
Distance that the slice(s) stick out from the pie’s
slices.popout
center. Default is 0
slices.fontName The slice label’s font name
slices.fontSize The slice label’s font size
slices.fontColor The slice label’s font color
The anchor point for the slice’s text label. This is a
slices.labelRadius
fraction of the radius of the pie
# simple_pie_chart.py
def simple_pie_chart():
data = [10, 20, 30, 40]
drawing = Drawing()
pie = Pie()
pie.x = 150
pie.y = 65
pie.data = data
pie.labels = [letter for letter in 'abcd']
pie.slices.strokeWidth = 0.5
pie.slices[3].popout = 20
pie.slices[3].strokeDashArray = [1,1]
drawing.add(pie)
drawing.save(formats=['pdf'], outDir='.', fnRoot='simple_pie_chart')
if __name__ == '__main__':
simple_pie_chart()
set the wedge’s strokeWidth and which wedge will be “popped out”,
which in this case is the 4th slice. You will note that to access
enable customization of said labels, then you must set your pie
Now let’s look at an example that does some minor label customization:
# simple_pie_chart_label_customization.py
def simple_pie_chart_label_customization():
data = [10, 20, 30, 40]
drawing = Drawing()
pie = Pie()
pie.x = 150
pie.y = 65
pie.data = data
pie.labels = [letter for letter in 'abcd']
if __name__ == '__main__':
simple_pie_chart_label_customization()
This will change the first label to have a 45 degree angle and it will
also change its text to “foobar”. When you run this example, you will
Here is a listing of all the properties you can change for the labels:
Property Meaning
label_dx The X offset of the label
label_dy The Y offset of the label
The angle of the label. Default is 0 (horizontal), 90
label_angle
is vertical and
180 is upside down
label_boxAnchor The label’s anchor point
label_boxStrokeColor The label box’s border color
label_boxStrokeWidth The label box’s border width
label_boxFillColor The label box’s filling color
label_strokeColor The label’s border color
label_strokeWidth The label’s border width
label_text The label’s text
label_width The label’s width
label_maxWidth The label’s maximum width that it can grow into
label_height The label’s height
label_textAnchor The label’s maximum height that it can grow into
label_visible Whether or not the label will be drawn (visible)
label_topPadding The top padding of the box
label_leftPadding The left padding of the box
label_rightPadding The right padding of the box
label_bottomPadding The bottom padding of the box
label_simple_pointer Enables simple pointers if set to 1
label_pointer_strokeColor The color of the indicator line
label_pointer_strokeWidth The width of the indicator line
labels that have a line drawn from the label to the slice that they
pie.sideLabels = True
The anchor for the side label’s right hand side is set to “start” while
the label’s left hand side is set to “end”. The distance from the pie’s
Let’s modify our previous pie chart example to add some side labels:
# simple_pie_chart_side_labels.py
def simple_pie_chart_side_labels():
data = list(range(15, 105, 15))
drawing = Drawing()
pie = Pie()
pie.sideLabels = True
pie.x = 150
pie.y = 65
pie.data = data
pie.labels = [letter for letter in 'abcdefg']
pie.slices.strokeWidth = 0.5
drawing.add(pie)
drawing.save(formats=['pdf'], outDir='.',
fnRoot='simple_pie_chart_side_labels')
if __name__ == '__main__':
simple_pie_chart_side_labels()
amount of data and labels. Then we run the code and we get this:
Fig. 8-14: A simple pie chart with side labels
This looks pretty good. However there are some known issues with using
side labels. The main two mentioned in the ReportLab user guide is that
if you have too many slices in your pie chart, the side label’s pointers
will cross over each other and become difficult to follow. The other
issue is that labels can also overlap each other, making them difficult
legend. We could easily add a title using the String class from
# simple_pie_chart_with_title.py
from reportlab.graphics.charts.piecharts import Pie
from reportlab.graphics.shapes import Drawing, String
def simple_pie_chart_with_title():
data = list(range(15, 105, 15))
drawing = Drawing(width=400, height=200)
my_title = String(170, 40, 'My Pie Chart', fontSize=14)
pie = Pie()
pie.sideLabels = True
pie.x = 150
pie.y = 65
pie.data = data
pie.labels = [letter for letter in 'abcdefg']
pie.slices.strokeWidth = 0.5
drawing.add(my_title)
drawing.add(pie)
drawing.save(formats=['pdf'], outDir='.',
fnRoot='simple_pie_chart_with_title')
if __name__ == '__main__':
simple_pie_chart_with_title()
Here we import the String class and instantiate it. The rest is pretty
much the same as our previous pie chart code. When you run this, you get
the following:
Fig. 8-15: A simple pie chart with a title
If you’d like to add a Legend, then you will want to import it from the
# pie_chart_with_legend.py
def pie_chart_with_legend():
data = list(range(15, 105, 15))
drawing = Drawing(width=400, height=200)
my_title = String(170, 40, 'My Pie Chart', fontSize=14)
pie = Pie()
pie.sideLabels = True
pie.x = 150
pie.y = 65
pie.data = data
pie.labels = [letter for letter in 'abcdefg']
pie.slices.strokeWidth = 0.5
drawing.add(my_title)
drawing.add(pie)
add_legend(drawing, pie, data)
drawing.save(formats=['pdf'], outDir='.',
fnRoot='pie_chart_with_legend')
if __name__ == '__main__':
pie_chart_with_legend()
Here we use the same code we were using before with some minor
modifications The big one is that we now have an add_legend function that
takes a Drawing object, the Chart object and the data we
chart with the Legend via the Auto validator. Here is what it looks
like:
Fig. 8-16: A simple pie chart with a title and legend
to the Legend class to make it support all the charts correctly. There
do seem confident that you can use the tools in the Graphics module to
create your own. They also provide the tools you need to generate your
own markers and custom shapes such that you could create your own chart
You can also combine charts in the same drawing, creating fairly complex
charts.
There are several other chart classes that are not covered in this
chapter. In fact, the Graphics sub module is in flux so the best way to
see a full list is to run graphdoc.py. You can run this utility with
Wrapping Up
This chapter covered a lot of ground. You should now have the knowledge
them. We also learned how to generate bar charts, line charts, line
If you need other types of charts that ReportLab does not, you can
also worth learning about. While we won’t dive into every single facet
within the Graphics sub-module, I want to make sure that you know about
Shapes
Widgets
Bar codes
Shapes
Shapes are the building blocks that we use to create any output from
using shapes. You can create anything that you can imagine just by
taking the primitive shapes and combining them in different ways. This
is actually how the charts that we created in the previous chapter are
can see what primitive shapes are available for your use. Here’s a
listing:
Circle
Group
Ellipse
Line
PolyLine
Polygon
Rect
String
Wedge (pie slice)
# solid_shapes.py
def solid_shapes():
drawing = Drawing(width=400, height=200)
startangledegrees=0,
endangledegrees=45)
wedge.fillColor = colors.yellow
drawing.add(wedge)
280, 150,
280, 100,
250, 100
])
poly.fillColor = colors.purple
drawing.add(poly)
if __name__ == '__main__':
solid_shapes()
The Ellipse takes arguments called cx, cy, rx and ry. The cx/cy
values are the x and y positions of the center of the ellipse. The rx/ry
the Ellipse in that it also has cx/cy values, but only requires one
radius argument, which is the 3rd one.
The Wedge class takes an x, y and radius for its first three
parameters. Then we also set the beginning and ending angles of the
series of points where the total must be an even amount. The reason is
rectangle, but you could draw any number of points to create whatever
Now let’s learn a bit more about the properties of your shapes.
Properties of a Shape
All shapes have two kinds of properties: those that define their
Most shapes have some required geometric properties that you have to set
Circle, then you must set the circle’s center x/y coordinates and
its radius. For a Rect, you need to set its x/y start position,
which is the bottom left position that it should start at. You must also
The other properties can be set either on separate lines in your code or
properties come with good defaults, so it’s not a problem if you end up
# circle_demo.py
def create_circle():
drawing = Drawing(width=400, height=200)
circle = Circle(50, 170, 25)
circle.fillColor = colors.green
circle.strokeColor = colors.red
circle.strokeWidth = 5
drawing.add(circle)
if __name__ == '__main__':
create_circle()
When you run this code, you should get the following:
If you would like to see a full listing of the properties that you can
change, then you can call the shape’s dumpProperties method. Here is
an example:
>>> from reportlab.graphics.shapes import Circle
>>> circle = Circle(10, 10, 100)
>>> circle.dumpProperties()
cx = 10
cy = 10
fillColor = Color(0,0,0,1)
fillOpacity = None
r = 100
strokeColor = Color(0,0,0,1)
strokeDashArray = None
strokeLineCap = 0
strokeLineJoin = 0
strokeMiterLimit = 0
strokeOpacity = None
strokeWidth = 1
feature you can use to tell ReportLab what should happen when a line
turns a corner, but these properties are used less often or rarely. All
Lines
ReportLab gives you the ability to draw Lines and PolyLines. These Line
classes have all the stroke properties that the other shapes have,
but since they are lines, they do not have the fillColor property.
# line_demo.py
def line_demo():
drawing = Drawing(width=400, height=200)
line = Line(25, 25, 150, 150)
line.strokeColor = colors.red
line.strokeWidth = 5
drawing.add(line)
if __name__ == '__main__':
line_demo()
As you might have guessed, for a Line you just need to pass in the first
x/y coordinate and a second x/y coordinate and then ReportLab will
connect the dots. We also set the color of the line along with its
width.
same manner than the Polygon class did. In this example, we set the
width and the color as parameters, but you could have set those
properties the same way that we did for the Line example. The result of
Fig. 9-3: A Line and a PolyLine
Strings
The Graphics package does not have robust support for fancy text layout.
However you can easily add strings anywhere in your drawing using x/y
coordinates and you can align the text to the right, left or center. We
actually used a String shape near the end of the previous chapter to
Here is a simple example that creates a String object and then dumps the
>>> from reportlab.graphics.shapes import String
>>> my_string = String(50, 50, "Python is amazing!")
>>> my_string.dumpProperties()
fillColor = Color(0,0,0,1)
fontName = Times-Roman
fontSize = 10
text = Python is amazing!
textAnchor = start
x = 50
y = 50
This shows us that there is a textAnchor property that we can set.
The valid values are ‘start’, ‘middle’ and ‘end’. What this means is
that if you set it to ‘start’, then the x and y will relate to where the
string starts. You can use the anchor to align your text, in other
words.
The font standard for strings is Type 1 fonts. If you remember way back
default. These fonts have accurate metrics associated with them that
makes anchoring and editing them easier. There is also support for other
register a Type 1 font and you will be able to use them in your String
classes too.
Let’s take a look at another string demo that’s loosely based on one
# string_demo.py
def string_demo():
drawing = Drawing(width=400, height=200)
for size in range(10, 32, 4):
x = 15 + size * 1.5
y = 15 + size * 1.5
my_string = String(x, y, 'Python rocks!')
my_string.fontName = 'Courier'
my_string.fontSize = size
drawing.add(my_string)
if __name__ == '__main__':
string_demo()
The first thing we do is create a loop that loops over different font
the String with those coordinates and set a few of its properties before
Fig. 9-4: Different strings
Postscript Paths
According to the ReportLab user guide, Postscript paths are a widely
Groups
Group objects are basically containers for other shapes. They also allow
set the transform directly. If you do not, then use the convenience
# group_demo.py
def group_demo():
drawing = Drawing(width=400, height=200)
radius = 25
circles = Group(
Circle(50, 40, radius, fillColor=colors.blue),
Circle(75, 40, radius, fillColor=colors.red),
Circle(100, 40, radius, fillColor=colors.green),
Circle(125, 40, radius, fillColor=colors.yellow),
String(75, 5, 'Circles')
)
drawing.add(circles)
more_circles = Group(circles)
more_circles.translate(75, 55)
more_circles.rotate(35)
drawing.add(more_circles)
if __name__ == '__main__':
group_demo()
For this demo, we create a group that contains four Circles and a
String. Then we put them into a second Group object. For the second
group, we apply both translate and rotate methods. The result is that we
have the same group drawn in difference locations on our drawing and at
Fig. 9-5: Group transforms
This can give you a lot of flexibility for moving around lots of shapes
Widgets
ReportLab also has the concept of Widgets. In fact, in the previous
chapter you were using Widgets without even realizing it. The Charts
of shapes that have their own properties and methods that you can use to
enhance your widget. The founding idea behind all of this is that it is
just a drawing with primitive shapes which makes the renderer simpler
Widgets are built on top of the shapes sub-module. You can create
your own widgets and build up your own custom library of widgets just as
your Widget.
Here is a list of items that a fully fleshed out Widget should be able
to do:
It is a reusable shape
You can initialize the Widget with no arguments
When you call its draw() method it will create a primitive
shape or group
It can have whatever parameters that you desire, but you will be
of your widget
A Widget isn’t just a group of shapes because it has its own unique
code. But it can convert itself into shapes. A Widget may have other
Widgets nested inside of it. During the rendering process, all the
around with. One of their sillier ones is the Face class. Let’s take
a look:
# face_demo.py
def face_demo():
drawing = Drawing(width=400, height=200)
sad_face = widgetbase.Face()
sad_face.skinColor = colors.blue
sad_face.mood = 'sad'
drawing.add(sad_face)
ok_face = widgetbase.Face()
ok_face.skinColor = colors.beige
ok_face.mood = 'ok'
ok_face.x = 110
drawing.add(ok_face)
happy_face = widgetbase.Face()
happy_face.skinColor = colors.yellow
happy_face.mood = 'happy'
happy_face.x = 215
drawing.add(happy_face)
if __name__ == '__main__':
face_demo()
Here we create a “Face” widget. You can make the face’s mood ‘happy’, ‘sad’ or
‘ok’. You can also set its skinColor and various
other properties. Just for fun, this demo creates all three moods with
# flag_demo.py
def flag_demo():
drawing = Drawing(width=612, height=792)
y = 692
flag = flags.Flag(kind='USA')
flag.y = y
drawing.add(flag)
label = String(95, y-15, 'USA', fontSize=14,
textAnchor='middle')
drawing.add(label)
countries = flag.availableFlagNames()
countries.pop(1)
country = 1
for flag in range(5):
flag = flags.Flag()
flag.kind = countries[country]
flag.y = y - 125
drawing.add(flag)
fontSize=14, textAnchor='middle')
drawing.add(label)
country += 1
y -= 125
if __name__ == '__main__':
flag_demo()
When you run this, you will see something like this (truncated for
brevity):
Fig. 9-6: Flag demo
There are several other modules inside the widgets module that you
can look at too for inspiration. I won’t be going into all of them, so
feel free to explore them to get ideas of what you can do with your own
Bar Codes
ReportLab’s Graphics library has a barcodes sub-module that you can
use to add bar codes to your PDFs or just generate them as images. Here
code39 (code 3 of 9)
code93
code128
EANBC
QR
USPS Postnet
Code39
The Code 39 barcode is a very common barcode type that is used for
symbols represents the numbers 0-9, upper-case letters A-Z, the space
look at the code39 module, you will note that ReportLab also supports
the Extended Code 39 Barcode Font Information, which means that you can
standard keyboard.
Code 39 is also known as the Alpha39, Code 3 of 9, Code 3/9, Type 39,
USS Code 39, or USD-3. There are several standards for how a Code 39 bar
organized:
The asterisk character is typically used as the start and the stop
character. You should not use it within the barcode. If you are using
characters can also be used for start and stop characters, which would
If you need more information, please see Wikipedia as they have a really
too.
# code39_demo.py
def code39_demo(barcode_value):
doc = SimpleDocTemplate('code39_demo.pdf')
styles = getSampleStyleSheet()
flowables = []
flowables.append(Paragraph('Code 39 Standard:',
style=styles['Normal']))
barcode39Std = code39.Standard39(
barcode_value, barHeight=20, stop=1)
flowables.append(barcode39Std)
flowables.append(Spacer(0, 25))
flowables.append(Paragraph('Code 39 Extended:',
style=styles['Normal']))
barcode39 = code39.Extended39(barcode_value)
flowables.append(barcode39)
doc.build(flowables)
if __name__ == "__main__":
code39_demo('ReportLab')
In this example, we import the code39 module from the barcode sub-module.
Then we create a standard Code 39 barcode as well as an
Extended Code 39 bar code. You will note that the standard version of
the bar code has its barHeight set as well as its stop character. When you run
this code, you should see something like this:
Fig. 9-7: Code 39 demo
Code93
Code 93 is basically a higher density version of Code 39 that also
code 93 barcode is nine modules wide and always consists of three bars
and three spaces. Also, each bar and space may be 1 to 4 modules wide.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
. $ / + % SPACE
There are also 5 special characters that can be combined with the above
# code93_demo.py
def code93_demo(barcode_value):
doc = SimpleDocTemplate('code93_demo.pdf')
styles = getSampleStyleSheet()
flowables = []
flowables.append(Paragraph('Code 93 Standard:',
style=styles['Normal']))
barcode93Std = code93.Standard93()
flowables.append(barcode93Std)
flowables.append(Spacer(0, 25))
flowables.append(Paragraph('Code 93 Extended:',
style=styles['Normal']))
barcode93 = code93.Extended93(barcode_value)
flowables.append(barcode93)
doc.build(flowables)
if __name__ == "__main__":
code93_demo('123456789')
standard and extender versions of the bar code. This is what you end up
with:
The API for this module appears to be the same as the previous one, so
you can edit the the bar’s width and height here too. There are also
properties for the stop character and the quiet zones. Admittedly, Code
tell.
Code128
The Code 128 bar code is very compact and can encode all 128 characters
is well supported and commonly used. One example use-case would be for
generating labels for the United Parcel Service (UPS). The docstring for
this module has a lot more information and is actually one of the better
# code128_demo.py
def code128_demo(barcode_value):
doc = SimpleDocTemplate('code128_demo.pdf')
styles = getSampleStyleSheet()
flowables = []
flowables.append(Paragraph('Code 128:',
style=styles['Normal']))
barcode128 = code128.Code128(barcode_value)
flowables.append(barcode128)
doc.build(flowables)
if __name__ == "__main__":
code128_demo('123456789')
it via the Code128 class. When you run this example, you will see
the following bar code:
The docstring has a bunch of URLs in it that you can follow for
write up on it.
EANBC
The EANBC bar code is a bit of a misnomer. This appears to actually be
contains the EAN-13 bar code class, which is used for generating certain
types of Universal Product Codes (UPC). If you look at the source code,
you will find that there are ways to set which country you want. This
module also supports EAN-8 and EAN-5. While the code does not mention
this, I assume that it supports the Global Trade Item Number standard as
well.
# eanbc_demo.py
def eanbc_demo(barcode_value):
c = canvas.Canvas('eanbc_demo.pdf')
barcode_eanbc8 = eanbc.Ean8BarcodeWidget(barcode_value)
d = Drawing(50, 10)
d.add(barcode_eanbc8)
renderPDF.draw(d, c, 15, 555)
if __name__ == "__main__":
eanbc_demo('12345678')
Ean13BarcodeWidget instance. Here is what you will see when you run
this code:
QR Codes
A Quick Response or QR Code is a trademark for a type of matrix barcode
popular because of its fast readability and because it had more storage
numeric
alphanumeric
byte / binary
kanji
You may also use extensions with QR Codes. A QR Code is made up of black
blog:
# qr_code_demo.py
def qr_code_demo(barcode_value):
c = canvas.Canvas('qr_code_demo.pdf')
qr_code = qr.QrCodeWidget(barcode_value)
qr_code.barWidth = 145
qr_code.barHeight = 145
qr_code.qrVersion = 1
bounds = qr_code.getBounds()
width = bounds[2] - bounds[0]
height = bounds[3] - bounds[1]
d = Drawing()
d.add(qr_code)
renderPDF.draw(d, c, 15, 405)
c.save()
if __name__ == "__main__":
qr_code_demo('www.mousevspython.com')
As you can see, ReportLab’s QR code also does not support adding them to
result:
If you scan this with your cell phone, it should take you to my blog. If
you want, you can also try the qrcode Python package:
https://pypi.python.org/pypi/qrcode/
You can generate a QR code much more simply using that library, but if
you want to add it to your PDF in ReportLab, then you will want to
# qr_code_demo2.py
import qrcode
from reportlab.platypus import SimpleDocTemplate
from reportlab.platypus import Image
def qr_code_demo(barcode_value):
doc = SimpleDocTemplate('qr_code_demo2.pdf')
flowables = []
qr_img = qrcode.make(barcode_value)
qr_img.save('test.png')
flowables.append(Image('test.png'))
doc.build(flowables)
if __name__ == "__main__":
qr_code_demo('www.mousevspython.com')
Here we just generate our QR code using the qrcode Python package
and save it to disk. Then we create an instance of the Image class and
If you’d like to give this code a try, then you will need to install
qrcode using pip:
pip install qrcode
Check out its documentation for details on creating the various versions
USPS Postnet
ReportLab supports two different types of bar codes related to the
United State Postal Service: POSTNET and FIM. POSTNET is used for
encoding the zip code. It supports 5, 9, and 11 digit codes. You will
see these bar codes along the bottom of the envelopes. You should use
The FIM (Facing ID Marks) bar code is used for encoding a single letter.
According to the docstring for the FIM class, there are four types
defined:
def usps_demo(barcode_value):
doc = SimpleDocTemplate('usps_demo.pdf')
styles = getSampleStyleSheet()
flowables = []
flowables.append(Paragraph('USPS POSTNET:',
style=styles['Normal']))
barcode128 = usps.POSTNET(barcode_value)
flowables.append(barcode128)
doc.build(flowables)
if __name__ == "__main__":
usps_demo('50158-9999')
Here we create a bar code using one from Iowa. You should end up with
Wrapping Up
We covered a diverse set of topics in this chapter. First we learned
Shapes and are used for the charts we looked at in the last chapter.
creating bar codes. You can create several different types of bar codes
with ReportLab including code 39, code 93, EAN, QR and USPS.
You can generate bar codes with other Python libraries and insert them
Feel free to play around with the examples and see how you can change
them. It is also worth your time to take a dive into the source code of
chapter:
File Annotations
Bookmarks
Page Transitions
Encryption
Interactive Forms
Let’s start by learning about how you can annotate your PDF with
ReportLab.
File Annotations
You can add metadata to your PDF with ReportLab as well. This is
setAuthor()
setTitle()
setSubject()
File annotations are invisible. You can only view them by opening up the
# annotations.py
def annotations():
my_canvas = canvas.Canvas('annotations.pdf')
my_canvas.setAuthor('Mike Driscoll')
my_canvas.setTitle('ReportLab: PDF Processing with Python')
my_canvas.setSubject('Use Python to create PDFs')
my_canvas.save()
if __name__ == '__main__':
annotations()
All this does is set the author, title and subject of the PDF. If you
open up the PDF in Adobe Reader and go to File –> Properties, you
Bookmarks
ReportLab has support for adding Bookmarks to your PDFs. A bookmark is
ReportLab just supports the ability to jump from one part of the
document to another part. You can also control the zoom level in the
document reader after the jump. To add a bookmark to your document, you
bookmark visible to be clicked on, you will also need to call the
addOutlineEntry method.
canvas.bookmarkPage(name,
fit="Fit",
left=None,
top=None,
bottom=None,
right=None,
zoom
the page itself as the destination to jump to. It will also scale the
Here is a table that explains the various ways that the fit argument
can be set:
Parameters
Fit Description
Required
Fit The entire page will fit in the window (default)
The top coord at the top of the window with the width
FitH top
scaled to fit
The left coord at the left of the window with the height
FitV left
scaled to fit
left bottom
FitR Scales the window to fit the specified rectangle
right top
Fine grained control. If a parameter is omitted, the PDF
XYZ left top zoom
interprets that as “leave as is”
# bookmark_demo.py
def bookmark_demo():
my_canvas = canvas.Canvas('bookmarks.pdf')
my_canvas.showPage()
my_canvas.save()
if __name__ == '__main__':
bookmark_demo()
Here we create our canvas object and draw a string. Then we add a
bookmark and call the showPage method. This will effectively create
a pagebreak so that any other drawing calls made to the canvas will be
made on the next page. Thus we draw a new string on page 2 and add
another bookmark. Finally we add the bookmarks to the PDF’s outline via
canvas.addOutlineEntry(self, title, key, level=0, closed=None)
In our example, we pass in just the title and the key. The key must
match the name of the bookmark that we added earlier in the document.
When you run this code, you will see something like the following in
Note: There are two other bookmark related methods in ReportLab called
The Outline Tree
ReportLab calls the navigation sidebar an Outline Tree. When you
want to add an outline entry to the outline tree, you need to use the
As we learned just a bit ago, there are four arguments that you can pass
pane. The key is a unique string within the document that you used
is the uppermost level. You may only go down one level at a time. If you
try to do go down more than one level, an error will be raised. Lastly,
the closed argument tells the navigation pane whether or not the
Page Transitions
ReportLab supports the concept of Page Transitions, which is kind of
PageTransitionEffects = {
'Blinds': [dimension_arg],
'Box': [motion_arg],
'Wipe' : [direction_arg],
'Dissolve' : [],
'Glitter':[direction_arg]
As you can see, you have six to choose from. You can apply a direction
listing from the ReportLab documentation that shows valid values for
these arguments:
direction_arg = [0,90,180,270]
If you would like to add a transition effect to your PDF, then you will
this:
canvas.setPageTransition(self, effectname=None, duration=1,
# transition_demo.py
def transition_demo():
my_canvas = canvas.Canvas('transition_demo.pdf')
my_canvas.setPageTransition(
effectname='Blinds', duration=1, direction=0,
dimension='H',
motion='I')
my_canvas.drawString(20, 750, 'This is a transitional page')
my_canvas.drawImage('snakehead.png', 30, 600,
width=100, height=100)
my_canvas.save()
if __name__ == '__main__':
transition_demo()
In the code above, we create a two page document. The first page just
has a single line of text painted onto it. The second page has a string
and an image in it. This page also has a transition set to use the
running this code, open up the PDF in Adobe Reader and make it full
screen. Then navigate to the second page to see the effect in action.
Now you can go back and edit the code to try out one of the other
effects!
Encryption
Adobe’s implementation of the PDF specification allows you to encrypt
your PDFs. You may do any of the following things when encrypting your
PDFs:
in the viewer
Note that an encrypted PDF does not require a password, but when you do
enters their password, the PDF will open the document, decrypt it and
then display it for the user to read. There are two different types of
passwords:
privileges
The “user” password - Allows you to open the document for viewing
If you have the “owner” password, you can print, copy/paste and change
the password itself. When you use the “user” password, your options may
If you set the user password to a blank string, then Adobe will not
prompt you for a password. If you only set the owner password, then the
PDF will open without a prompt. Finally, if you set the password for the
owner and the user to the same string, then the PDF will only open in
restricted mode with whatever privileges you have set. You will not be
able to modify any of those settings in this case. Always set the owner
When the PDF is encrypted, the encryption will apply to all the strings
and streams in the file. This will prevent nefarious people from trying
to find the password in a text or hex editor and changing it. According
cipher that uses the same mechanic for encrypting and decrypting your
Now we are ready to learn how to actually encrypt a PDF using ReportLab.
def __init__(self, userPassword,
ownerPassword=None,
canPrint=1,
canModify=1,
canCopy=1,
canAnnotate=1,
strength=40):
The ownerPassword is just set to None. The next four arguments set
encryption strength. Looking at the source, it looks like the only valid
values are 40 and 128 with the default being the weakest security.
# encryption_demo.py
if __name__ == '__main__':
encryption_demo(userPassword='bad_password',
ownerPassword='b3Tt3R_P@$$W0Rd')
When you run this code, you should end up with a PDF document that
Interactive Forms
ReportLab also allows you to create interactive fillable forms using
their toolkit. The PDF standard actually has a fairly rich set of
but it does cover most of them. In this section, we will look at the
following widgets:
checkbox
radio
choice
listbox
textfield
canvas.acroform property. Note that you can only have one form per
Checkbox
The checkbox widget is exactly what it sounds like. It’s a little
box that you can check. ReportLab supports several different “check”
depending on the style that you set. Here are all the parameters that
Now let’s write up a simple example that demonstrates how some of these
arguments behave:
# simple_checkboxes.py
def create_simple_checkboxes():
c = canvas.Canvas('simple_checkboxes.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Pets')
c.setFont("Courier", 14)
form = c.acroForm
textColor=blue, forceBorder=True)
c.save()
if __name__ == '__main__':
create_simple_checkboxes()
As you can see, we set up the name and set the tooltip to basically
match the name of the widget. Then we set its position and a few other
things. You can play around with the width of the checkbox’s border or
turn the border off. If you turn it off though, the checkbox may become
invisible, so you might want to set its background color via fillColor
Fig. 10-3: Checkboxes in an interactive form
If you open up the document and check all of the checkboxes, you will
Radio
Radio widgets are kind of like checkboxes except that usually you
put radio buttons into a group where only one radio button can be
per group. ReportLab does not appear to have a way to explicitly group a
Anyway, here is a listing of the parameters you can use when creating a
radio widget:
Now let’s take a moment and create a simple demo of the radio widgets:
# simple_radios.py
def create_simple_radios():
c = canvas.Canvas('simple_radios.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Radio demo')
c.setFont("Courier", 14)
form = c.acroForm
textColor=blue, forceBorder=True)
borderWidth=2,
textColor=pink, forceBorder=True)
form.radio(name='radio2', tooltip='Field radio2',
value='value2', selected=False,
x=110, y=595, buttonStyle='cross',
borderStyle='solid', shape='circle',
borderColor=green, fillColor=blue,
borderWidth=2,
textColor=pink, forceBorder=True)
borderWidth=2,
textColor=magenta, forceBorder=False)
form.radio(name='radio3', tooltip='Field radio3',
value='value2', selected=True,
x=110, y=545, buttonStyle='star',
borderStyle='bevelled', shape='circle',
borderColor=blue, fillColor=green,
borderWidth=2,
textColor=magenta, forceBorder=True)
c.save()
if __name__ == '__main__':
create_simple_radios()
When you run this code, you will notice that you only get 3 radios. This
is because you are required to create two instances of each radio button
with the same name, but with different values and sections. The
documentation does not state why, but I assume this is done to help
ReportLab track the “selected” state of the widget. This also allows you
Choice
The choice widget is basically a combobox that displays a dropdown
when the user clicks on it. This allows the user to pick one or more
have set. If you add edit to the fieldFlags, then the user can
Here is a listing of the various arguments you can use when creating
this widget:
document:
# simple_choices.py
def create_simple_choices():
c = canvas.Canvas('simple_choices.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Choices')
c.setFont("Courier", 14)
form = c.acroForm
c.save()
if __name__ == '__main__':
create_simple_choices()
you will get a rather bizarre error that doesn’t say anything about that
argument being missing. Anyway, when you run this code, you will find
Listbox
The listbox widget is kind of like the choice widget except that a
Here is a listing of all the arguments you can use when creating a
listbox:
Now let’s write up a quick demo so we can see how you might create one
of these widgets:
# simple_listboxes.py
def create_simple_listboxes():
c = canvas.Canvas('simple_listboxes.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Listboxes')
c.setFont("Courier", 14)
form = c.acroForm
c.save()
if __name__ == '__main__':
create_simple_listboxes()
created the choice widget. The main difference here is the look of
the listbox widget versus the choice widget. Otherwise, they are pretty
much the same. Following is a screenshot of what this code generates:
Textfield
The textfield is a text entry widget. You will see these in forms
for entering your name and address. Most of the arguments for the
textfield are the same as the ones you’ve seen in the previous widgets:
# simple_form.py
def create_simple_form():
c = canvas.Canvas('simple_form.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Employment Form')
c.setFont("Courier", 14)
form = c.acroForm
c.drawString(10, 650, 'First Name:')
form.textfield(name='fname', tooltip='First Name',
x=110, y=635, borderStyle='inset',
borderColor=magenta, fillColor=pink,
width=300,
textColor=blue, forceBorder=True)
width=300,
textColor=blue, forceBorder=True)
c.save()
if __name__ == '__main__':
create_simple_form()
them. As you can see, we changed the border and background color on
several of the fields. Then we just added some fairly standard ones. We
also used the width parameter to change how wide the textfield is.
Now let’s learn about the various styles and shapes that we can use with
Button Styles
There are actually several different button styles that ReportLab
supports. You can use these styles to indicate to ReportLab what style
or symbol should appear when the widget is selected. Here are the
current choices:
check
circle
cross
diamond
star
document renderer may render the symbols incorrectly for their intended
application. For example, Acrobat Reader will prefer to use its own
Widget Shapes
ReportLab supports two different widget shapes. These shapes apply to
the checkbox and radio widgets. You may use either of the following:
circle
square
Once again, the renderer may override your setting. For example,
Border Styles
The borderStyle argument is used by the widget to change its 3D
bevelled
dashed
inset
solid
underlined
fieldFlags
ReportLab supports a lot of different fieldFlags that you can use to
modify the behavior of your widgets. All the widgets that ReportLab
Let’s take a look at all the different kinds of fieldFlags that can
be applied:
There is another type of flags that we can use with our widgets. We will
annotationFlags
ReportLab’s widgets are technically known as annotations. So they can
Wrapping Up
In this chapter we learned how to add simple annotations to our PDF. We
also learned how to add bookmarks to our outline tree. This is quite
about adding basic page transitional effects, which are useful for
presentation purposes. We also learned how to encrypt our PDF and set
support for interactive forms. You should now spend some time working
through the examples and modifying them to learn how they work before
moving on.
Part II - Tutorials and How-Tos
Welcome to Part II. At this point in the book, you should be well versed in how
to use the ReportLab toolkit. This section of the book will go over a few use-
cases where you will create reports or partial reports using ReportLab. You will
also learn about other Python packages that you can use to work with PDFs.
By the time you finish this part of the book, you should be to generate PDFs
from various data sources, split PDFs up and merge them back together. You will
learn how to create custom headers and footers. You will discover how to fill out
PDF fillable forms using Python. You will also learn how to export data from a
PDF into another format.
common data file types such as XML, JSON, CSV, fixed width file formats, etc.
In this chapter, we will look at how to take an XML data file and turn it into a
report. I found a neat looking report that is actually kind of similar to other
reports that I have created in the past. It was an explanation of benefits (EOB)
from AETNA. I will focus on page two of the document as I think it will be
more instructive. Here’s a link so you can see what I am talking about:
https://ctmirror.org/2014/09/02/what-is-this-form-the-explanation-of-
benefits/
We will be trying to copy that document’s layout with ReportLab. By the end of
this chapter, you will be able to generate a PDF using ReportLab that looks
similar to this:
Fig. 11-1: An Example EOB from AETNA
The code you will create will support generating multi-page reports based on the
amount of data in the XML data file. Let’s find out how!
Getting Started
Usually a client will give you a data file and the output that they expect you to
generate from that file. Occasionally, you will get to
design your own specification that the client will have to use, but I have rarely
had that happen. The first thing to do when receiving a data file and a report is to
identify where the various pieces of data fit on the report. In this case, the pieces
will be in XML tags, so they will actually be labeled. That makes things a lot
easier, especially if the XML has good tag names. Fixed width data files are
more difficult
because they don’t have named fields that you can figure out at a
glance.
Once you have identified all the pieces on the pre-existing document, then you
are ready to start creating some code. Every developer is
different. You could start by writing a parser for the XML at this
point. Or you could try to layout the document using some hard-coded
values. It doesn’t really matter all that much which is done first as the parser and
layout code should be in separate modules.
# eob.py
class EOB:
"""
Explanation of Benefits PDF Class
"""
def create_payment_summary(self):
""""""
pass
def create_claims(self):
""""""
pass
def save(self):
""""""
self.canvas.save()
def main(pdf_file):
""""""
eob = EOB(pdf_file)
eob.save()
if __name__ == '__main__':
pdf_file = 'eob.pdf'
main(pdf_file)
You will note that for our first attempt at generating an Explanation of Benefits
(EOB), we use ReportLab’s canvas module. You could start with platypus too,
but for simplicity, we will start with the canvas and then update it to use platypus
when we need to. The __init__ method is where we create our canvas instance.
We also create some class properties that define the page’s width and height. You
will also note that we created a style sheet, which we will use for styling our
Paragraphs.
The next method that we create is called coord. We will use this method to help
us position elements on the page eventually.
Next we create three stubbed methods: create_header,
create_payment_summary and create_claims. These methods do not do
anything right now. The last method is our save method, which just saves the
PDF to disk.
Finally we have the main function. Here we create our EOB instance and tell it
to save. This code will generate an empty PDF,
which isn’t all that interesting. So let’s make our code do something useful by
updating the create_header method:
def create_header(self): """"""
'</b></font>'.format('01/01/2017') p = Paragraph(ptext,
self.styles["Normal"])
member_id='X123456',
group_num=789456-1235,
group_name='PYTHON CORP'
)
p = Paragraph(ptext, self.styles["Normal"])
Here we create some formatted text and add it to a Paragraph that we draw in a
fixed position on the canvas. Then we create some more text that we add to a
second Paragraph and draw it as well. Don’t forget to update the main() method
to call your create_header method:
def main(pdf_file): """"""
eob = EOB(pdf_file)
eob.create_header()
eob.save()
Save this code as eob2.py and then run the script. If everything works correctly,
you should end up with a PDF that looks like this:
Fig. 11-2: Adding a simple header
def create_payment_summary(self): """"""
p = Paragraph(ptext, self.styles["Normal"])
self.create_bold_text('Provider'),
self.create_bold_text('Amount'),
self.create_bold_text('Sent to'),
self.create_bold_text('Date'),
self.create_bold_text('Amount'),
]]
Here we create another Paragraph object with some text and we also create a
Table. The table will hold the summary of the payment. You will note that we
call another method here named create_bold_text.
def create_bold_text(self, text, size=8): """"""
This is just a helper method that will take a string and a font size as its
arguments. Then it will return a Paragraph object that bolds the
text. For this version of the code, let’s save it as eob3.py. Note that you will also
need to update the main() function so that it calls your newly defined method:
def main(pdf_file): """"""
eob = EOB(pdf_file)
eob.create_header()
eob.create_payment_summary()
eob.save()
Save this code as eob3.py. When you run this updated code, you will see that the
result now looks like the following:
def create_claims(self): """"""
fsize = 8
p = Paragraph(ptext, self.styles["Normal"])
self.styles["Normal"])
billed = Paragraph(
'<font size={}>Amount<br/>billed</font>'.format(fsize),
self.styles["Normal"])
member_rate = Paragraph(
pending = Paragraph(
.format(fsize),
self.styles["Normal"])
applied = Paragraph(
copay = Paragraph(
remaining = Paragraph(
'<font size={}>Amount<br/>remaining</font>'.format(fsize),
self.styles["Normal"])
plan_pays = Paragraph(
coins = Paragraph(
'<font size={}>Your<br/>coinsurance</font>'.format(fsize),
self.styles["Normal"])
owe = Paragraph(
'<font size={}>You owe<br/>C+D+E+H=I</font>'.format(fsize),
self.styles["Normal"])
colWidths = [110, 50, 50, 60, 50, 50, 50, 70, 60]
In this method, we create a series of Paragraphs and end this method with
another Table object. We use the Paragraphs to format the text the way we want.
The Table doesn’t really have any formatting applied to it at this time. As you’ve
probably gathered by now, you will want to
def main(pdf_file): """"""
eob = EOB(pdf_file)
eob.create_header()
eob.create_payment_summary()
eob.create_claims()
eob.save()
To continue to keep our iterations separate, you can edit the previous example
and then save it as eob4.py. When you run this example, the result should look
something like this:
It’s kind of annoying needing to update the main() function all the time. Now is
a good time to refactor the class and switch over to using Flowables rather than
continuing to use Canvas methods.
# eob_flow.py
import time
from reportlab.lib.enums import TA_JUSTIFY
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch, mm
from reportlab.pdfgen import canvas
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.platypus import Flowable, Indenter, Table
Now let’s create a custom Header Flowable to hold our page header.
This will allow us to easily update the header and we can also move the Header
class into its own module if we want to. Here’s the code:
class Header(Flowable):
def draw(self):
ptext = '<font size=10><b>Statement Date: {}' \
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, self.styles["Normal"])
p.wrapOn(self.canv, self.width, self.height)
p.drawOn(self.canv, *self.coord(110, 14, mm))
class EOB:
"""
Explanation of Benefits PDF Class
"""
def create_header(self):
""""""
header = Header()
self.elements.append(header)
self.elements.append(Spacer(1, 50))
def create_text(self, text, size=8, bold=False): """"""
if bold:
self.styles['Normal'])
self.styles['Normal'])
This method will only bold the text if the bold parameter is set.
Otherwise it just sets the font size and returns a Paragraph object. The next
update needs to happen to the create_payment_summary method.
p = Paragraph(ptext, self.styles["Normal"])
self.elements.append(p)
self.elements.append(Spacer(1, 20))
[self.create_text('Patient', bold=True),
self.create_text('Provider', bold=True),
self.create_text('Amount', bold=True),
self.create_text('Date', bold=True),
self.create_text('Amount', bold=True),
]]
self.elements.append(Indenter(left=60))
table = Table(data, colWidths=colWidths)
self.elements.append(table)
self.elements.append(Indenter(left=-60))
Here we create some Paragraphs and a Spacer. You will note that we have
updated the table’s data to call our create_text method. This allows us to format
the table’s content because we are now using
The next method we need to update is also the largest. It is the create_claims
method:
def create_claims(self): """"""
fsize = 8
p = Paragraph(ptext, self.styles["Normal"])
self.elements.append(p)
self.elements.append(Spacer(1, 20))
claim = Paragraph('''<font size={0}>
Claim ID {1}<br/>
Received on 12/12/16<br/></font>
'''.format(fsize, 'ER123456789'),
self.styles["Normal"])
billed = Paragraph(
'<font size={}>Amount<br/>billed</font>'.format(fsize),
self.styles["Normal"])
member_rate = Paragraph(
pending = Paragraph(
.format(fsize),
self.styles["Normal"])
applied = Paragraph(
copay = Paragraph(
remaining = Paragraph(
'<font size={}>Amount<br/>remaining</font>'.format(fsize),
self.styles["Normal"])
plan_pays = Paragraph(
coins = Paragraph(
'<font size={}>Your<br/>coinsurance</font>'.format(fsize),
self.styles["Normal"])
owe = Paragraph(
claim_one = [
self.create_text('12.50'),
self.create_text('12.50 (100%)'),
'', ''
data.append(claim_one)
colWidths = [110, 50, 50, 60, 50, 50, 50, 40, 60, 60]
self.elements.append(Indenter(left=60))
self.elements.append(table)
self.elements.append(Indenter(left=-60))
This one is pretty similar to the previous method updates that we have looked at.
We do use a bit more in-line XML formatting of the Paragraphs here though.
There is also a couple of examples of using the
Indenter flowable, which you can use to indent another Flowable. In this case,
we want to indent our Table 60 points from the left hand side of the page. Then
we add the table. You will note that we also need to dedent or unindent the
Flowable after adding the Table so that if we add anything after the Table, it
won’t get indented too.
I personally like to have a method that controls most of the rest of the script. We
were using a main function earlier and we could keep doing so if we wanted to.
But I decided to put all that logic into the class and added the following create
method instead:
def create(self):
""""""
self.create_header()
self.create_payment_summary()
self.create_claims()
self.save()
This method will create the various pieces of the document and then call the
save method. One good reason to encapsulate all this logic into a class’s method
is that now you can import the module and run it more easily rather than trying
to guess which of the functions you need to call to create a complete document.
It’s kind of like a demo, in other words.
Anyway, the save method also needs an update since we are now working with a
document template instead of the Canvas:
def save(self):
""""""
self.doc.build(self.elements)
This should look familiar. We have to call the build method and pass in the list
of flowables to have the PDF generated correctly.
Finally we can just remove the main function entirely and put all the logic into
the following:
if __name__ == '__main__': pdf_file = "eob_flow.pdf"
eob = EOB(pdf_file)
eob.create()
That was a lot of changes, but I think you will find that this code is much more
flexible now. Feel free to stop and download the full version of the code from
Github.
Now that you have the code, let’s try running it. When you do, you will find that
the result is pretty similar to the canvas code from the
previous section.
theoretically create a multipage document easily. To test that theory out, let’s
change the create_claims method so that it generates 50
rows of data. The following is the only snippet you need to change:
# Add these imports to the top of the file:
from reportlab.lib import colors
from reportlab.platypus import TableStyle
claim_one = [
self.create_text('FLU VIRUS VACC-SPLIT 3 YR & on 9/12/16'),
self.create_text('12.50'),
'', '', '', '',
self.create_text('12.50'),
self.create_text('12.50 (100%)'),
'', ''
]
self.elements.append(Indenter(left=40))
self.elements.append(table)
self.elements.append(Indenter(left=-40))
I left in the claim_one and the data variables for context. The real change starts
at the loop, where we append the claim we created at the beginning of this
snippet to our data list 50 times. Then we create a TableStyle instance to style
the rows in our table. Finally we add the data to the Table and apply the table
style. Then we get back to the
If you re-run the code with this update, you should now have a multipage
document!
hard-coded in the code that we have created up until this point. We need to
update our code so it can read an XML file and generate a report from it. Let’s
start by looking at a piece of example XML:
<?xml version="1.0" ?>
<patient>
<claim_id>ER123456789</claim_id>
<member_name>Mike D</member_name>
<member_id>X123456</member_id>
<group_num>788221</group_num>
<group_name>Python Corp</group_name>
<amount_owed>525.25</amount_owed>
<sent_to></sent_to>
<date>01/01/2018</date>
<amount_paid></amount_paid>
<received_date>12/20/2017</received_date>
<claims>
<claim>
<description>FLU VIRUS VACC-SPLIT 3 YR and on 9/12/16</description>
<amount_billed>12.50</amount_billed>
<member_rate></member_rate>
<pending></pending>
<deductible></deductible>
<copay></copay>
<amount_remaining>12.50</amount_remaining>
<plan_pays>12.50</plan_pays>
<coinsurance></coinsurance>
<total_owed>0.00</total_owed>
<received_date>12/20/2017</received_date>
</claim>
</claims>
</patient>
We should be able to take this XML and use it for the data that we feed to our
ReportLab module. This particular piece of XML will only create a table with
two rows though, so we won’t be able to generate a multipage document with it.
However, this is a good place to start with. Note that in real data, you would
have several claims per patient and multiple
patients per XML file. In fact, when I used to consume XML like this, it wasn’t
uncommon to have thousands patients with dozens (or occasionally, over a
hundred) claims per patient.
When you have files like that, you might not want to load the entire file into
memory. Instead, what I would do is write a pre-parser (or
pre-processor) script that would split out each patient into its own
self-contained XML file. Then each file would be fed to the report
generating module.
We don’t need to do the pre-processing part of it here. We just need to figure out
how to parse a self-contained XML file.
Python comes with several XML modules built into the standard library.
pip install lxml
Once you have that installed, we will need to update the code so it can consume
XML!
Consuming XML
I am going to create a copy of the last script we wrote and call it
eob_flow_consumer.py. Feel free to download the full code from Github so you
can follow along.
The first step to consume XML in our report module is to update the imports in
our code:
# eob_flow_consumer.py
import os
import time
Here we import objectify, which is a sub-module within the lxml package. The
objectify module allows us to turn our XML into a Python object.
The first thing we will need to do is modify the Header class so that it accepts
the XML object:
class Header(Flowable):
def draw(self):
ptext = '<font size=10><b>Statement Date: {}' \
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, self.styles["Normal"])
p.wrapOn(self.canv, self.width, self.height)
p.drawOn(self.canv, *self.coord(110, 14, mm))
Then we create an instance attribute with the same name. Finally we need to
update the Header’s draw method so that it uses the XML object that we passed
in to replace the Member and Group information in our
Now we are ready to move on and update the EOB class. However, I will only
focus on the methods that need updating. The first thing we need to do is modify
the __init__ method:
class EOB:
"""
Explanation of Benefits PDF Class
"""
self.xml = self.parse_xml(xml_file)
self.doc = SimpleDocTemplate(
pdf_file, pagesize=letter,
rightMargin=72, leftMargin=36,
topMargin=36, bottomMargin=18)
self.elements = []
self.styles = getSampleStyleSheet()
self.width, self.height = letter
Here we add a new parameter for accepting the XML file path and we also
default the name of the PDF. Next we verify that the XML file path that we
passed in exists. If not, we raise a RuntimeError. Finally, we create an instance
variable called self.xml by calling a function called parse_xml with the XML
file path as its argument. Let’s create that new method now:
def parse_xml(self, xml_file): with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
All this method does is read the XML file’s contents into memory. Then we pass
the file’s contents to lxml using the objectify.fromstring method. This will turn
our XML into a Python object that we can then use throughout the rest of our
EOB class.
Next up, we need to modify the create_header method slightly so it can pass the
XML object to the Header class:
def create_header(self): """"""
header = Header(self.xml)
self.elements.append(header)
self.elements.append(Spacer(1, 50))
There are a couple of methods in the EOB class that we don’t even need to
change. They are the following two methods:
create_text
create_payment_summary
These methods do not depend on the XML in any way, so we don’t need to
reproduce them again here. In fact, most of the changes will occur in the
create_claims method. Let’s break that method up a bit to make the changes
more obvious:
fsize = 7.5
p = Paragraph(ptext, self.styles["Normal"])
self.elements.append(p)
self.elements.append(Spacer(1, 20))
claim_id=self.xml.claim_id,
received_date=self.xml.received_date),
self.styles["Normal"])
Here we change the claim variable so that the Paragraph instance is created
using values from the XML rather than using the hard-coded one we had used
earlier. The next portion of the code is required, but no changes were done to it:
billed = Paragraph(
member_rate = Paragraph(
pending = Paragraph(
),
self.styles["Normal"])
applied = Paragraph(
copay = Paragraph(
al"])
plan_pays = Paragraph(
'<font size={}>Plan<br/>pays</font>'.format(fsize), self.styles["Normal"]) coins
= Paragraph(
al"])
owe = Paragraph(
mal"])
All this code does is set up the rest of the header columns for the Table. The
major change is this next bit:
for claim in self.xml.claims.getchildren(): data.append([
self.create_text(claim.description),
self.create_text(claim.amount_billed.text),
self.create_text(claim.member_rate),
self.create_text(claim.pending),
self.create_text(claim.deductible),
self.create_text(claim.copay),
self.create_text(claim.amount_remaining.text),
self.create_text(claim.plan_pays.text),
self.create_text(claim.coinsurance),
self.create_text(claim.total_owed.text)
])
Here we actually loop over the children of the claims XML tag. In this example,
the claims tag only has one child. But when we update the XML to have
additional claims, this code will happily loop over all of them. It then extracts
the values from the XML. One thing I want to point out is that when you use
lxml to extract floats from XML, it will truncate the second significant digit,
thus making 12.50 into 12.5.
report. This is why you will see that when I am dealing with floats, I access the
raw value by accessing the text attribute above. Another way to get around this
would be to just format the text using Python’s .2f string formatter syntax.
This code actually has an intentional bug in it in that I don’t access that text
attribute in all the places that I should. Try updating the XML for the deductible
or copay fields and you might end up seeing how the truncating looks.
table_style = TableStyle(
])
table.setStyle(table_style)
self.elements.append(Indenter(left=40))
self.elements.append(table)
self.elements.append(Indenter(left=-40))
You also do not need to update the create or the save methods since they do not
have anything to do with XML either. However, you do need to update the last
few lines of code to pass in the XML file
itself:
if __name__ == '__main__': eob = EOB(xml_file='eob.xml')
eob.create()
Now if you run this updated code, you will find that it generates the report using
real data instead of the hard-coded values. In the Github repository, I have added
an extra XML file with 50 claims in it so you can create a multipage document
using this code as well. Give that a try and see how it works.
Wrapping Up
This chapter is based on some of my personal experiences writing parsers for
XML and other data formats and then taking the parsed data and
turning it into PDFs. The nice thing about working with XML is that you already
have good libraries or packages like Python’s ElementTree or
lxml to turn the XML into objects. But if your data is JSON, CSV or some other
format, then you will need to find some other way to use that
data.
My favorite method is to create a class that parses out the data I need and
basically creates an object much like the one that is generated by lxml’s
objectify module. Then I can just pass an object to my report module that I can
just use attribute access on for filling out the
In the meantime, feel free to play around with the XML examples provided for
this chapter on Github. Try changing some of the values in the tags and re-
running the code a few times to see how thing change.
Chapter 12 - Custom Headers and Footers
There are many documents and reports that require you to have a header
on the first page or on every page. For example, if you are generating a
report for a lawyer or judge, then you will almost certainly need the
Most of the time, these headers will contain the following fields:
Date
Name or names of people (like all the doctors in the practice)
Address
Phone / Fax
Website
Logo
document number
page number
page number with total number of pages
ID numbers
The header can really have an infinite number of fields when you think
about it. I can think of many others. Just open up some mail from your
financial institutions or your insurance company and you will see other
fields that they put in their headers. This kind of thing can also apply
to footers. However footers usually don’t have as many fields. They tend
to have page numbers most of the time, although I have seen some that
A Naive Header
When you are writing code, you usually want to create your own reusable
this case, we will create a simple header class using the one from the
# header.py
class Header(Flowable):
def draw(self):
ptext = '<font size=10><b>Statement Date: {}' \
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, self.styles["Normal"])
p.wrapOn(self.canv, self.width, self.height)
p.drawOn(self.canv, *self.coord(110, 14, mm))
Now we just need to write some code that can use our new header module.
Create a new file called main.py and add the following code:
# main.py
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
elements = []
header = Header(xml)
elements.append(header)
doc.build(elements)
if __name__ == '__main__':
main(pdf_file='header.pdf', xml_file='eob.xml')
All this code does is import our Header class from our header module, create
an instance of it and add it to our list of flowables
before generating the report. The result of running this second example
This looks nice and all, but it’s not a true header. It is just a
flowable. When you create a header using this method, you will have to
manually add it each time you create a new page, which can be difficult
to calculate on the fly. So let’s learn how to make this work the right
way!
that you can use to control what happens on the first page as well as
what happens on all the subsequent pages. These arguments are called
craft custom headers and footers and then just adjust the document
template’s margins to account for them.
# header2.py
styles = getSampleStyleSheet()
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 800)
This code only needs two imports, where we grab the sample style sheet
that we can use with Paragraphs and the Paragraph class itself. Then we
document’s width and height and create a sample style sheet. The rest of
the function is just using absolute positioning to get the various bits
The next thing we need to do is create a new entry point for our code.
For this task, we will create a file called main2.py and save the
following to it:
# main2.py
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
elements = []
doc.xml = xml
if __name__ == '__main__':
main(pdf_file='main2.pdf', xml_file='eob.xml')
The only new import here is our new header function. We don’t need
doc object to hold our XML object. Then we call doc.build and
will be called on all pages after the first page but NOT on the first
page itself. This is useful for those times when you want a header on
the first page but not on any of the others. Instead you might add a
One other note that I’d like to bring up is that you will need to set
see, I ended up using 125 points for the top margin in this example. You
Now let’s move on to making our header look more professional by adding
a logo!
Adding a Logo
Most headers you see will include some kind of logo on the page. Logos
are important branding for most businesses, so being able to add a logo
Let’s update our header module so that we can add a logo. Make sure
# header3.py
styles = getSampleStyleSheet()
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 800)
Here we add an import for the Image flowable. Then we set the size
of the image to be 76.2 x 76.2 points and we draw the image in the upper
left side of the header.
already at the correct size as you do not want to hard code the
dimensions of the image if the logos you will be inserting might vary in
width. For example, I have written code where we just set the height of
the image but the width can be variable. As long the width of the logo
The final step is to update our main script, which we will rename
main3.py:
# main3.py
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
elements = []
doc.xml = xml
doc.logo_path = logo
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
main(pdf_file='main3.pdf', xml_file='eob.xml',
logo='snakehead.jpg')
Here we update the import to use the new header module we created. Then
a “path” to the logo that we want to add to our header. When you run
Fig. 12-2: Header using a canvas + flowables
That looks pretty good, but it’s not very flexible. Let’s learn some
Configurable Logos
There are a lot of different ways that we can tell our code that we want
In this section, we will look at how to pass the file path to our script
read the logo’s file path using a config file and Python’s
configparser library.
# main4.py
import sys
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
elements = []
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
logo_path = sys.argv[1]
main(pdf_file='main4.pdf', xml_file='eob.xml',
logo=logo_path)
Here we add an import of the sys library. The other change is in the
if statement at the bottom of the code. You can access all the
arguments that are passed to your script via sys.argv, which returns
a Python list. The first element in the list is always the path to your
your script on the command line in the order that they were passed.
To run this script on the command line, you will need to do the
following:
python3 main4.py /path/to/logo.jpg
When you run this example, you should see something similar to the
code though. If you don’t pass in the path to the logo, you will receive
an IndexError as the list that is returned from sys.argv contains only one
element and we are trying to access element two.
if __name__ == '__main__':
if len(sys.argv) > 1:
logo_path = sys.argv[1]
else:
explains the problem. You could go one better if you also add a check
that verifies that the logo path is valid, but I will leave that to the
reader.
arguments this way. It’s great for a prototype, but you will inevitably
want to add other arguments. When you do that, it can become difficult
to know what the order of the arguments should be. This is why I
recommend that you use Python’s argparse as it gives you a lot more
flexibility when it comes to argument parsing. There are also several
other 3rd party argument parsing libraries that are good too, such as
docopt or python-fire.
For this book, we will stick with argparse so you don’t need to
install anything else though. Feel free to check out some of those other
projects though.
Let’s take a few moments to update our code again so that it can utilize
# main5.py
import argparse
def get_args():
parser = argparse.ArgumentParser(
description="Custom PDF Header with Logos")
parser.add_argument('-l', '-logo', '--logo',
action='store',
required=True,
help="The logo's file path",
dest='logo')
arguments = parser.parse_args()
return arguments
Here we import the argparse library. You will also notice that we
argparse that we want to save off the value and that it is a required
parse the arguments that are passed in and return our object, which is
an instance of argparse.Namespace.
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
def main(pdf_file, xml_file):
arguments = get_args()
doc = SimpleDocTemplate(
pdf_file,
rightMargin=72, leftMargin=36,
topMargin=125, bottomMargin=18)
xml = parse_xml(xml_file)
elements = []
styles = getSampleStyleSheet()
elements.append(paragraph)
doc.xml = xml
doc.logo_path = arguments.logo
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
main(pdf_file='main5.pdf', xml_file='eob.xml')
our get_args* function right away and then the rest of the code is
pretty much the same. The only other difference is when we instantiate
logo argument.
python3 main5.py
usage: main5.py [-h] -l LOGO
main4.py: error: argument -l/-logo/--logo is required
That tells me that the “-l” argument is required. But it also mentions
that I can pass a “-h” argument, which is something that we didn’t even
python3 main5.py -h
usage: main5.py [-h] -l LOGO
optional arguments:
-h, --help show this help message and exit
-l LOGO, -logo LOGO, --logo LOGO
The logo's file path
creates a help option. Now we can call our script the right way:
python main5.py -l snakehead.jpg
Remember when I said that you will often want to add new command line
arguments? Well most of the time, your user will want to enter the PDF’s
name themselves and they should probably be able to pass in whatever XML
data file that they want to as well. Let’s make that possible in
main6.py:
# main6.py
import argparse
def get_args():
parser = argparse.ArgumentParser(
description="Custom PDF Header with Logos")
parser.add_argument('-l', '-logo', '--logo',
action='store',
required=True,
help="The logo's file path",
dest='logo')
parser.add_argument('-d', '--data_file',
action='store',
required=True,
help="The data file path",
dest='data_file')
parser.add_argument('-f', '-filepath', '--filepath',
action='store',
required=True,
help="The file path",
dest='path')
arguments = parser.parse_args()
return arguments
we add two new arguments to our argument parser: “-d” and “-f”. We make
both of these arguments required and we save off the values into our
argparse object.
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
def main():
arguments = get_args()
doc = SimpleDocTemplate(
arguments.path,
rightMargin=72, leftMargin=36,
topMargin=125, bottomMargin=18)
xml = parse_xml(arguments.data_file)
elements = []
styles = getSampleStyleSheet()
elements.append(paragraph)
doc.xml = xml
doc.logo_path = arguments.logo
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
main()
file path of the PDF. Then we update the call to the parse_xml function so it
uses our argparse object as well for its data file. We
also remove the arguments that were originally required for our main function to
run. Otherwise, the code remains the same.
When you go to run your code now, you will be required to pass in all
three arguments:
python main6.py -l snakehead.jpg -f main6.pdf -d eob.xml
The argparse library will allow you to pass these arguments in whatever
order you want to. It will also prevent you from passing in two few or
two many arguments. Go ahead and give the code a spin to see what
happens when you don’t pass in all the arguments. You can also try
popular method is to put the path into a user modifiable config file.
This gives the user the ability to change how a program works without
needing to access the code, much like using the previous method of
passing in paths on the command line. The nice thing about config files
over command line arguments is that you can just set them and forget
them.
Let’s create a new file called main_config.py and copy the last
# main_config.py
import configparser
import os
def get_logo_from_config():
cfg_path = 'config.ini'
if os.path.exists(cfg_path):
config = configparser.ConfigParser()
config.read(cfg_path)
logo_path = config.get('General', 'logo_path')
return logo_path
else:
return None
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
doc.xml = xml
doc.logo_path = get_logo_from_config()
elements = []
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
main(pdf_file='main_config.pdf', xml_file='eob.xml')
simpler. Feel free to take this example and the previous one and combine
that files exists, we read it and try to extract the logo’s file path
from a section of the config file that is called “General”. The other
modification to our main script is where we instantiate the Header class near the
end of the main function. Here we just call our
If you run this code without creating the config file, it should work
without any issues. You just won’t have a logo in your header. Let’s go
ahead and create the config.ini file and enter the following into
it:
[General]
logo_path = snakehead.jpg
Now re-run the code and you should now have a logo, assuming you entered
header4.py:
# header4.py
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import mm
styles = getSampleStyleSheet()
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 800)
You only need to update one line in our main module’s code, which
separate:
# page_number_main.py
The rest of the main module should be the same since the major code
what you use the canvasmaker argument for. But from my experience,
on each page.
# custom_canvas.py
class PageNumCanvas(canvas.Canvas):
def showPage(self):
"""
On a page break, add information to the list
"""
self.pages.append(dict(self.__dict__))
self._startPage()
def save(self):
"""
Add the page number to each page (page x of y)
"""
page_count = len(self.pages)
canvas.Canvas.save(self)
def draw_page_number(self, page_count):
"""
Add the page number
"""
page = "Page %s of %s" % (self._pageNumber, page_count)
self.setFont("Helvetica", 9)
self.drawRightString(200*mm, 20*mm, page)
method gets called whenever a new page is created. This allows us to add
each of the pages in the document and draw a page number along with the
which tells the canvas to close the current page and potentially start a
new one. The save method calls our own custom method,
argument. This is the method that actually draws our page number
# page_number_total_main.py
from custom_canvas import PageNumCanvas
from header3 import header
from lxml import objectify
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import mm
from reportlab.platypus import SimpleDocTemplate
from reportlab.platypus import Paragraph, Spacer
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
doc.xml = xml
doc.logo_path = logo
elements = []
doc.build(elements, onFirstPage=header,
onLaterPages=header,
canvasmaker=PageNumCanvas)
if __name__ == '__main__':
main(pdf_file='page_number_total_main.pdf',
xml_file='eob.xml',
logo='snakehead.jpg')
There are really only a couple of new things in this main module. We
import our custom canvas at the top of the code. Then we add a new
You will note that we can still call our header function too using the
Headers + Footers
This chapter wouldn’t be complete without a quick example of a document
that has a custom header and a custom footer. Let’s take the
still have a header function. Now add the following function to the
# header_and_footer.py
Create a footer
"""
styles = getSampleStyleSheet()
'</b></font>'
p = Paragraph(ptext, styles["Normal"])
All this function does is accept the same arguments as the header
function and then it draws a single line of text that identifies itself
as the footer.
The next step you can take is to take the code from
page_number_total_main.py and copy it and save it as
almost be done:
# header_and_footer_main.py
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
xml = parse_xml(xml_file)
doc.xml = xml
doc.logo_path = logo
elements = []
doc.build(elements, onFirstPage=header,
onLaterPages=footer)
if __name__ == '__main__':
main(pdf_file='header_and_footer_main.pdf',
xml_file='eob.xml',
logo='snakehead.jpg')
The final change to make this work is to update the onLaterPages argument to
call the footer we imported. This will cause the footer
to appear on every page except the first one. Of course, this also
prevents us from having the header on any page but the first one, which
might not be what you want. To fix that issue, let’s add one more simple
def header_and_footer(canvas, doc):
"""
"""
header(canvas, doc)
footer(canvas, doc)
All this new function does is call both the header and the footer with
the appropriate arguments. To use it, you will need to update our main
function in the other file to use it. To do that, you will just need to
doc.build(elements, onFirstPage=header_and_footer,
onLaterPages=header_and_footer)
Wrapping Up
At this point you should know how to add custom headers and footers that
can draw pretty much anything on demand. You have also learned how to
add page numbers to each page as well as how to add a page number with
the total number of pages. The main takeaway from this is that when you
are working with headers and footers, you will need to make sure you set
the margins correctly in your document template so that they take your
For this chapter, we will take what we have learned from the previous
two and expand on it a bit. The idea here is that we will create a
and easy to extend. Here is a list of what we will learn about in this
chapter:
The idea behind creating a module for creating PDFs is that it allows
example, you can define in your module if headers and footers are added
and how much space they take up. You can also define the fonts, margins,
page size, page numbering and many other variables that control how your
To keep things organized, we will be putting all the files into folders labeled
version_1, version_2, etc. I will be sure to mention when we switch to a new
version in the section that that occurs in.
this chapter, we will update our code so that we can accept data files
attempt to figure out the file’s type itself, you might be interested in
a 3rd party package called python-magic that you can find here:
https://github.com/ahupp/python-magic
Anyway, let’s update our code a bit so that it checks what the file
main6.py from chapter 12 and save it as main.py for this chapter. We will be
saving all our changes to a folder called
version_1, or you can just check that out of the book’s Github repository if you
don’t want to write code yourself.
# main.py
import argparse
import os
from header import header
from lxml import objectify
from parsers import parse_json
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import Paragraph
from reportlab.platypus import SimpleDocTemplate
def get_args():
parser = argparse.ArgumentParser(
description="Custom PDF Header with Logos")
parser.add_argument('-l', '-logo', '--logo',
action='store',
required=True,
help="The logo's file path",
dest='logo')
parser.add_argument('-d', '--data_file',
action='store',
required=True,
help="The data file path",
dest='data_file')
parser.add_argument('-f', '-filepath', '--filepath',
action='store',
required=True,
help="The file path",
dest='path')
arguments = parser.parse_args()
return arguments
def parse_xml(xml_file):
with open(xml_file) as f:
xml = f.read()
root = objectify.fromstring(xml)
return root
The first thing we need to do is import the os module. We will use it to help us
figure out the file extension shortly. We also import
something new from a parsers module called parse_json. We will use that to
parse JSON data files. Other than that, we really don’t need
to do any updates to the get_args or parse_xml functions at this time. So let’s
move on to our main function instead:
def main():
arguments = get_args()
_, ext = os.path.splitext(arguments.data_file)
msg = 'PDF Creator only accepts the following file types: ' \
doc = SimpleDocTemplate(
arguments.path,
rightMargin=72, leftMargin=36,
topMargin=125, bottomMargin=18)
if ext == '.xml':
data = parse_xml(arguments.data_file)
data = parse_json(arguments.data_file)
elements = []
styles = getSampleStyleSheet()
elements.append(paragraph)
doc.data = data
doc.logo_path = arguments.logo
doc.build(elements, onFirstPage=header)
if __name__ == '__main__':
main()
Here we create a list of support extension types, which in this case are JSON and
XML file types. Then we extract the file extension using
os.path.splitext, which will return a tuple of the file path and the extension
itself. Since we don’t care about the file path here, we just
RuntimeError with a custom error message to tell the user what went wrong.
The last change is to check what the extension is and call the
parse_xml and if we pass in a JSON file, then we want to call parse_json. You
will also note that we changed the doc object to use doc.data instead of doc.xml
like we did in the previous chapter.
The one thing we are missing here is the parse_json function that we imported
from parsers, so let’s write that next! Open up a new file and save it as
parsers.py. Then add the following code:
# parsers.py
import json
class JSON:
"""
A way to "objectify" a json object to match
the API we get from lxml.objectify
"""
def parse_json(json_file):
"""
Opens a JSON file and turns it into an object
"""
with open(json_file) as f:
data = json.load(f)
return JSON(data)
Here we import Python’s json library and then we create a class called JSON.
This reason we create a class is that we want to turn the JSON encoded data into
an object that has the same API as
to do the conversion itself. It relies on the parsers module to do that for it. In
fact, we could (and probably should) add the XML parsing
code to this parser module. Anyway, the last piece of code here is the
parse_json function, which just opens the JSON file and turns it into a JSON
object which it then returns.
The last piece of code that we need to update is our header.py script:
# header.py
styles = getSampleStyleSheet()
'</b></font>'.format('01/01/2017')
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 800)
If you compare this version to the one from the previous chapter, you
will note that we are now referring to doc.data instead of doc.xml. That is the
only change that we needed to make. The rest is the same.
python main.py -l snakehead.jpg -d eob.xml -f output.pdf
code so that it can read everything. We also need to add a table for the
claim data in the EOB data file. That way if we have a lot of claims,
big as there will be a lot of refactoring going on. So we will copy all
the files in version_1 and paste them in a new folder called version_2.
We are going to make the code much more modular in this step, so the
first thing we want to do is reduce all the things that main.py does. Let’s take a
look:
# main.py
import argparse
import os
import pdf_creator
from parsers import parse_json, parse_xml
def get_args():
parser = argparse.ArgumentParser(
description="Custom PDF Header with Logos")
parser.add_argument('-l', '-logo', '--logo',
action='store',
required=True,
help="The logo's file path",
dest='logo')
parser.add_argument('-d', '--data_file',
action='store',
required=True,
help="The data file path",
dest='data_file')
parser.add_argument('-f', '-filepath', '--filepath',
action='store',
required=True,
help="The output file path",
dest='path')
arguments = parser.parse_args()
return arguments
def main():
arguments = get_args()
supported_ext_types = ['.json', '.xml']
if ext == '.xml':
data = parse_xml(arguments.data_file)
elif ext == '.json':
data = parse_json(arguments.data_file)
eob = pdf_creator.EOB(data,
pdf_file=arguments.path,
logo=arguments.logo)
eob.save()
if __name__ == '__main__':
main()
As you can see, we moved the parse_xml function out and into the parsers
module. We also removed all the ReportLab related code from main and put it
into another module called pdf_creator. This reduces the number of imports
needed in main by quite a margin. Now all
that main does is call the parsing logic and then creates the PDF
document via the pdf_creator module. We could simplify this even more
by putting the calls to the parser into the pdf_creator module and by
moving the argument parsing into its own module. If you want to attempt
Let’s create the pdf_creator.py script next. We will be using the EOB class that
we created in chapter 11 and modifying it slightly for this example. Here is the
first part of the class:
# pdf_creator.py
self.doc = SimpleDocTemplate(
pdf_file, pagesize=letter,
rightMargin=72, leftMargin=36,
topMargin=125, bottomMargin=18)
self.doc.logo_path = logo
self.doc.data = self.data
self.elements = []
self.styles = getSampleStyleSheet()
self.width, self.height = letter
self.create()
Here we just do all the imports that we need to create our PDF. You will note
that we also import our header module here, so we do separate out a little of our
PDF creation into a separate module. Anyway, the
main takeaware here is that we are passing in the data object, the name of the
output PDF (pdf_file) and the logo path. This code is pretty self-explanatory, so
I won’t go into it in depth here.
The next two methods that we need to make are create and create_text:
def create(self):
"""
"""
self.create_claims()
"""
"""
if bold:
self.styles['Normal'])
self.styles['Normal'])
The create method is where you would put all the main calls of your
program. You could have it save the PDF too if you wanted. In this case,
we only want it to create the claims table. The create_text method is just a
helper method for creating styled ReportLab Paragraphs.
""""""
fsize = 7.5
p = Paragraph(ptext, self.styles["Heading1"])
self.elements.append(p)
self.elements.append(Spacer(1, 20))
claim_id=self.data.claim_id,
received_date=self.data.received_date),
self.styles["Normal"])
billed = Paragraph(
'<font size={}>Amount<br/>billed</font>'.format(fsize),
self.styles["Normal"])
member_rate = Paragraph(
'<font size={}>Member<br/>rate</font>'.format(fsize),
self.styles["Normal"])
pending = Paragraph(
.format(fsize),
self.styles["Normal"])
applied = Paragraph(
copay = Paragraph(
'<font size={}>Your<br/>copay</font>'.format(fsize),
self.styles["Normal"])
remaining = Paragraph(
'<font size={}>Amount<br/>remaining</font>'.format(fsize),
self.styles["Normal"])
plan_pays = Paragraph(
'<font size={}>Plan<br/>pays</font>'.format(fsize),
self.styles["Normal"])
coins = Paragraph(
'<font size={}>Your<br/>coinsurance</font>'.format(fsize),
self.styles["Normal"])
owe = Paragraph(
data.append([
self.create_text(claim.description),
self.create_text(claim.amount_billed.text),
self.create_text(claim.member_rate),
self.create_text(claim.pending),
self.create_text(claim.deductible),
self.create_text(claim.copay),
self.create_text(claim.amount_remaining.text),
self.create_text(claim.plan_pays.text),
self.create_text(claim.coinsurance),
self.create_text(claim.total_owed.text)
])
colWidths = [110, 50, 40, 60, 50, 40, 50, 40, 55, 60]
table_style = TableStyle(
])
table.setStyle(table_style)
self.elements.append(Indenter(left=40))
self.elements.append(table)
self.elements.append(Indenter(left=-40))
The last method we need to make to complete this class is our save method:
def save(self):
"""
"""
self.doc.build(self.elements, onFirstPage=header)
Note that this save method will build our document and apply our header to the
first page and the first page only! Now we are ready to update
it.
# parsers.py
import json
class String(str):
"""
Custom string sub-class to emulate the "strings"
returned from lxml
"""
@property
def text(self):
"""
Return the str
"""
return self
class Claim:
"""
Represents the data that makes up a claim
"""
The first change here is that we now import lxml because we moved the XML
“parser” into this function. But the next few changes are what’s
really interesting. The first class you will see is one named String that
subclasses Python’s built-in str type. We do that because in our pdf_creator
script you will see that we have some calls to a .text attribute and so we need to
emulate that for our JSON parser.
The next piece is that we need to create a Claim class. This class will represent
all the elements that make up a claim. Since
lxml.objectify turns the XML tree into a series of objects, we need to do the
same with our JSON parser. Now we need to create a Claims class:
class Claims:
"""
Represents a series of Claim objects
"""
def getchildren(self):
"""
Return all the children claim
"""
data_claims = self.data['patient']['claims']['claim']
claims = []
if isinstance(data_claims, dict):
# only a single claim exists
claim = Claim(data_claims)
claims.append(claim)
else:
# claims is a list
for claim in data_claims: claims.append(Claim(claim))
return claims
The Claims class is basically a container object for all our Claim
is, it will then create either a list that contains one claim or a list
class JSON:
"""
A way to "objectify" a json object to match
the API we get from lxml.objectify
"""
self.claims = Claims(data)
def parse_json(json_file):
"""
Opens a JSON file and turns it into an object
"""
with open(json_file) as f: data = json.load(f)
return JSON(data)
def parse_xml(xml_file):
"""
Opens an XML file and turns it into an lxml.objectify object
"""
with open(xml_file) as f: xml = f.read()
root = objectify.fromstring(xml)
return root
The only change here is that we added a couple more attributes,
The next item in our list of changes is the header.py script. All it needs is one
slight adjustment that is actually optional:
# header.py
styles = getSampleStyleSheet()
if doc.logo_path:
img = Image(doc.logo_path, width=76.2, height=76.2)
img.wrapOn(canvas, width, height)
img.drawOn(canvas, 100, 700)
ptext = '''
<b>Member:</b> {member}<br/>
<b>Member ID:</b> {member_id}<br/>
<b>Group #:</b> {group_num}<br/>
<b>Group name:</b> {group_name}<br/>
'''.format(member=doc.data.member_name,
member_id=doc.data.member_id,
group_num=doc.data.group_num,
group_name=doc.data.group_name
)
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 730)
All that was added here was to check if the logo_path wasn’t set to None, since
that is the default in the pdf_creator script. If it’s not None, then we attempt to
add the logo. We should add a check to
the main.py script that verifies the input logo and data files exist, but that can
wait for the next version.
At this point, you should be able to run the code the same way as
before:
python main.py -l snakehead.jpg -d eob.xml -f output.pdf
And if we want to test the json parser, so we run the command a second time:
python main.py -l snakehead.jpg -d eob.json -f output.pdf
I have also included a file called eob2.xml and a eob2.json file that you can use
to create a multipage document if you want to try that.
Custom Stylesheets
Another way to control the look and feel of your PDF is by creating your own
StyleSheet. The nice thing about using a template is that you can easily change
the entire look of all the text in your document by
Let’s take a few moments and write our own stylesheet. Create a new
directory called version_3 and copy in all the files from the version_2 folder we
created in the last section. Then open up your favorite Python editor and create a
new file named
custom_stylesheet.py and save it to the newly created version_3 folder. Now
enter the following code:
# custom_stylesheet.py
_baseFontNameB = tt2ps(_baseFontName,1,0)
_baseFontNameI = tt2ps(_baseFontName,0,1)
_baseFontNameBI = tt2ps(_baseFontName,1,1)
def get_custom_stylesheet_1():
"""
Create and return a custom stylesheet
"""
stylesheet = StyleSheet1()
stylesheet.add(ParagraphStyle(name='Normal',
fontName=_baseFontName,
fontSize=12,
leading=14,
textColor=blue)
)
stylesheet.add(ParagraphStyle(name='BodyText',
parent=stylesheet['Normal'],
spaceBefore=6)
)
stylesheet.add(ParagraphStyle(name='Italic',
parent=stylesheet['BodyText'],
fontName = _baseFontNameI)
)
stylesheet.add(ParagraphStyle(name='Bold',
parent=stylesheet['BodyText'],
fontName = _baseFontNameB)
)
stylesheet.add(ParagraphStyle(name='Heading1',
parent=stylesheet['Normal'],
fontName = _baseFontNameB,
fontSize=20,
leading=24,
spaceAfter=6),
alias='h1')
return stylesheet
Here we import a bunch of required bits and pieces from ReportLab. This file is
actually based on ReportLab’s own styles.py file that you can find in its lib
folder. Anyway, we create five styles. The Normal style is used as the parent for
the BodyText and Heading styles whereas BodyText is the parent for the Italic
and Bold styles. We append each of these ParagraphStyles to our StyleSheet1
instance.
# pdf_creator.py
class EOB:
"""
Explanation of Benefits PDF Class
"""
def __init__(self, data, pdf_file='eob.pdf', logo=None):
""""""
self.data = data
self.doc = SimpleDocTemplate(
pdf_file, pagesize=letter,
rightMargin=72, leftMargin=36,
topMargin=125, bottomMargin=18)
self.doc.logo_path = logo
self.doc.data = self.data
self.elements = []
self.styles = get_custom_stylesheet_1()
self.width, self.height = letter
self.create()
# header.py
styles = get_custom_stylesheet_1()
if doc.logo_path:
img = Image(doc.logo_path, width=76.2, height=76.2)
img.wrapOn(canvas, width, height)
img.drawOn(canvas, 100, 700)
ptext = 'Statement Date: {}'.format('01/01/2017')
p = Paragraph(ptext, styles["Bold"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 700)
ptext = '''
<b>Member:</b> {member}<br/>
<b>Member ID:</b> {member_id}<br/>
<b>Group #:</b> {group_num}<br/>
<b>Group name:</b> {group_name}<br/>
'''.format(member=doc.data.member_name,
member_id=doc.data.member_id,
group_num=doc.data.group_num,
group_name=doc.data.group_name
)
p = Paragraph(ptext, styles["Normal"])
p.wrapOn(canvas, width, height)
p.drawOn(canvas, 400, 730)
changed the font color for our new stylesheet to blue. So when you run
the main script, the main difference you will see is that the text color will be
blue instead of black. However the font size of some of
points, so if you look at the header of the table or the block of text
in the upper right hand corner, you might notice that the font size is
You can use the custom stylesheet to apply various settings to various parts of
your document or to the entire document. You can set the font
size, family, and color here as well as many other stylistic settings.
Configuring Your Application
There are several popular file formats used for configuring
applications. One of the most popular that we’ve mentioned before is the
INI file. Another popular configuration file type is YAML. You can use
whatever format you are most comfortable using. For this chapter, we
will focus on using an INI type file. This will allow us to continue to
use Python’s configparser library. You can configure anything about your
application that you want to in the config file. For this example,
we will just create a small sample of the settings you could configure.
Let’s create a version_4 folder and copy the contents of the version_3 folder
into it. Then create a file named config.ini in your new folder with the following
contents:
[General]
logo_path = snakehead.jpg
left_margin = 36
right_margin = 72
top_margin = 125
bottom_margin = 18
style = customer_1_stylesheet
output_dir = output
This config file allows us to configure the logo’s file path so we won’t need to
pass it in any more on the command line. It also allows us to
set the margins of the page. Since we just set up a stylesheet in the
output directory which is the folder our PDF will get saved to.
Let’s start by renaming custom_stylesheet.py to customer_1_stylesheet.py.
Now create a folder inside your version_4 folder called stylesheets and move
your stylesheet Python file in there.
Now let’s create a generic default_stylsheet.py file in that same folder. Once
that is created, add the following code:
# default_stylesheet.py
_baseFontNameB = tt2ps(_baseFontName,1,0)
_baseFontNameI = tt2ps(_baseFontName,0,1)
_baseFontNameBI = tt2ps(_baseFontName,1,1)
def get_stylesheet():
"""
Create and return a custom default stylesheet
"""
stylesheet = getSampleStyleSheet()
stylesheet.add(ParagraphStyle(name='Bold',
parent=stylesheet['BodyText'],
fontName = _baseFontNameB)
)
return stylesheet
able to fall back on something if someone mistypes the style name in the
config file. Since our custom style has a Bold sub-style, we also need the default
stylesheet to have one too. Otherwise it just inherits
everything that it does from ReportLab’s default stylesheet.
the stylesheets folder. Inside of your brand new __init__.py file, enter the
following code:
from . import default_stylesheet
Let’s call it util.py and put it in the version_4 folder. We will add a class in this
module for reading the config file:
# util.py
import configparser
import importlib
import os
class ConfigObj:
"""
Create a configuration object
"""
def __init__(self):
cfg_path = 'config.ini'
# Set some defaults
self.logo_path = None
self.right_margin = 36
self.left_margin = 36
self.top_margin = 18
self.bottom_margin = 18
self.style = None
self.output_dir = None
if os.path.exists(cfg_path):
config = configparser.ConfigParser()
config.read(cfg_path)
self.logo_path = config.get('General', 'logo_path')
self.right_margin = int(config.get('General', 'right_margin'))
self.left_margin = int(config.get('General', 'left_margin'))
self.top_margin = int(config.get('General', 'top_margin'))
self.bottom_margin = int(config.get('General', 'bottom_margin'))
self.style = config.get('General', 'style')
self.output_dir = config.get('General', 'output_dir')
This code attempts to read the config file and basically turn it into an object. If
the config file doesn’t exist, then we also have some
defaults set so that our PDF library should still work, although it
won’t look right without the correct margins applied. We could actually
library, configobj as it already turns config files into objects. It hasn’t been
updated in a while though, but it is well worth a look if
The next piece that we want to add to our util.py module is some code to load up
our custom stylesheet. Add the following function to
your module:
def get_stylesheet():
"""
config = ConfigObj()
try:
path = os.path.abspath(os.path.join('stylesheets',
'{}.py'.format(config.style)))
stylesheet = importlib.util.module_from_spec(spec)
spec.loader.exec_module(stylesheet)
getSampleStyleSheet = stylesheet.get_stylesheet
getSampleStyleSheet = default_stylesheet.get_stylesheet
return getSampleStyleSheet()
attempting to load the style, then we fall back to using our default
stylesheet. You could replace the code in the except block with the following if
you prefer to continue using importlib:
path = os.path.abspath(os.path.join('stylesheets', 'default_stylesheet.py'))
stylesheet = importlib.util.module_from_spec(spec)
spec.loader.exec_module(stylesheet)
getSampleStyleSheet = stylesheet.get_stylesheet
Now we need to update the pdf_creator.py file to use our utility module. I will
only show the portion of the code that we need to change
to make this work in the pdf_creator.py file rather than showing you the entire
module again:
# pdf_creator.py
import os
class EOB:
"""
Explanation of Benefits PDF Class
"""
self.doc = SimpleDocTemplate(
pdf_file, pagesize=letter,
rightMargin=config.right_margin,
leftMargin=config.left_margin,
topMargin=config.top_margin,
bottomMargin=config.bottom_margin)
self.doc.logo_path = config.logo_path
self.doc.data = self.data
self.elements = []
self.styles = get_stylesheet()
self.width, self.height = letter
self.create()
The new code here is where we import ConfigObj and get_stylesheet from our
util module. Then we create an instance of our ConfigObj() class and use that
object to set out margins and the
We also need to update header.py to use our utility module too as it also relies
on our new custom stylesheet infrastructure. Here are the
# header.py
import os
styles = get_stylesheet()
You will notice that we import our get_stylesheet function here and then we use
it in the header function itself. Other than that, the rest of the module stays the
same.
buttons and other widgets to interact with their software over using a
would use them. Python has several choices for creating graphical user
interfaces. Probably the most popular are tkinter, wxPython and PyQt or
one of its variants. I will be using wxPython for this example. The wxPython
toolkit has been around for many years and works on Linux, Mac
and Windows. You can use pip to install wxPython 4 like this:
python -m pip install wxPython
Note that you may need to have a compiler installed on Linux or Mac for this to
succeed successfully. Linux may also require that you install
webkit and one or two other dependencies. Most of the time, the errors
that you will see will be quite obvious and you can just install the
dependency and then retry installing wxPython. I haven’t had any issues
Once you have it installed, we can get started. The first thing we will want to
create is the main user interface. We want to be able to do the
following:
You might find it helpful to sketch out what the user interface should look like.
In my mind, this sounds like we need to have something that
Fig. 13-1: Mock-up of our PDF Creator
Let’s try to create a user interface that looks something like our
mock-up! For the first iteration, we will need to create a new file.
Since we already have a main.py for our command line interface, let’s create a
file called gui.py for our graphical user interface.
We will be going over this piece by piece because the code is fairly
complex. Feel free to just download the code if you think that would
# gui.py
import os
import pdf_creator
import wx
from parsers import parse_json, parse_xml
"JSON (*.json)|*.json|"
"All files (*.*)|*.*")
class PDFPanel(wx.Panel):
"""
Panel container for holding the PDF generation
widgets
"""
This code imports the pieces that we need for the GUI to create the PDF.
You will note that we have imported our custom pdf_creator module and our
two parsers. We also set up a wildcard variable that we will be using when we
go to choose our data file later on in the code.
It can be given a different background color and other panels can also
def __init__(self, parent):
wx.Panel.__init__(self, parent)
self.main_sizer = wx.BoxSizer(wx.VERTICAL)
self.std_paths = wx.StandardPaths.Get()
data_file_btn.Bind(wx.EVT_BUTTON, self.get_data_file)
self.layout(self.data_file, data_file_btn)
output_btn.Bind(wx.EVT_BUTTON, self.get_output_path)
self.layout(self.output_path, output_btn)
generate_btn.Bind(wx.EVT_BUTTON, self.generate_pdf)
self.main_sizer.Add(generate_btn, 0, wx.ALL|wx.CENTER, 5)
self.SetSizer(self.main_sizer)
Panels almost always have a parent widget, so we pass one in here. Then we
create a sizer, which allows you to layout your widgets without
setting an absolute position. This gives the user the ability to resize
the window and have the widgets resize accordingly, depending on how you
the files. We bind the buttons to wx.EVT_BUTTON, which tells wxPython that
when the button is clicked, it should call the associated
function that it is bound to. The other item I want to point out here is
the 3rd line in the __init__, which basically creates a special paths object from
which you can get various standard paths in a
cross-platform way. This allows me to query wxPython for the user’s App
As you may have noticed, in the code above we are calling a method
def layout(self, txt, btn):
"""
"""
hsizer = wx.BoxSizer(wx.HORIZONTAL)
hsizer.Add(txt, 1, wx.ALL, 5)
hsizer.Add(btn, 0, wx.ALL, 5)
self.main_sizer.Add(hsizer, 0, wx.EXPAND)
This is what I call a “helper” method. You pass in a couple of widgets and it adds
them to a sizer for you. The reason I am doing it this way
is to prevent writing the same sizer code repeatedly. There are several
other types of sizers available in wxPython that we could use here, but
def get_data_file(self, event):
"""
"""
dlg = wx.FileDialog(
defaultDir=self.std_paths.GetDocumentsDir(),
defaultFile="",
wildcard=wildcard,
if dlg.ShowModal() == wx.ID_OK:
path = dlg.GetPath()
self.data_file.SetValue(path)
dlg.Destroy()
The get_data_file method is called when the user presses the “Get Data” button.
It will open a file dialog and allow the user to choose a
JSON or XML file or any file using that wildcard we set earlier. If the user
presses the OK button, then we grab the path to the file that
they chose and put that path in the text control. Then we destroy the
def get_output_path(self, event):
"""
"""
dlg = wx.FileDialog(
defaultDir=self.std_paths.GetDocumentsDir(),
defaultFile="",
wildcard="PDF (*.pdf)|*.pdf",
style=wx.FD_SAVE
if dlg.ShowModal() == wx.ID_OK:
path = dlg.GetPath()
if not ext:
path += '.pdf'
self.output_path.SetValue(path)
dlg.Destroy()
The get_output_path method is called when the user presses the “Output
Folder” button. Instead of opening an Open File Dialog, it
instead opens a Save File dialog. They are slightly different in that
the Open File dialog allows you to select a file for opening while the
Save File dialog allows you to choose a location to save the file and
In this example, we only allow the user to save files in the PDF format.
If the user doesn’t set an extension, then we add the “.pdf” extension
to it. Note that the user could enter the wrong extension and this code
will not detect that and will save the PDF with the wrong extension.
Now it’s time to learn what happens when the user presses the “Generate PDF”
button:
def generate_pdf(self, event):
"""
"""
data_file = self.data_file.GetValue()
output_path = self.output_path.GetValue()
_, ext = os.path.splitext(data_file)
if not data_file:
return
msg = 'PDF Creator only accepts the following file types: {}'
self.show_error_msg(msg.format(supported_ext_types))
return
if not output_path:
return
if ext == '.xml':
data = parse_xml(data_file)
eob = pdf_creator.EOB(data,
pdf_file=output_path)
eob.save()
dlg = wx.MessageDialog(
parent=None,
caption='PDF Saved',
style=wx.OK|wx.ICON_INFORMATION
dlg.ShowModal()
dlg.Destroy()
Here we set up some code that should look kind of familiar. We create a list of
supported extension types and then we grab the data file path
and the save location. Next we grab the extension of the data file. The
next three if statements check if we actually have paths to use and that the
extension type is correct. If any of these return False, then
If the user has filled out our GUI correctly, then we call our
pdf_creator module and have it generate the PDF. If that is successful, then we
show a dialog that says as much and also tells the
def show_error_msg(self, msg):
"""
"""
dlg = wx.MessageDialog(parent=None,
message=msg,
caption='Error',
style=wx.OK|wx.ICON_ERROR)
dlg.ShowModal()
dlg.Destroy()
This code is very similar to the code we used to tell the user that the PDF saved
successfully. All we are doing is creating a MessageDialog and displaying it to
the user.
The last bit of code that we need to look at is for creating the top
level widget, which in this case is a wx.Frame class. This is the parent widget
that we pass to our wx.Panel. Here’s the code:
class PDFFrame(wx.Frame):
"""
The top level container for the panel and menu
"""
def __init__(self):
wx.Frame.__init__(self, None, title='PDF Creator')
panel = PDFPanel(self)
self.Show()
if __name__ == '__main__':
app = wx.App(False)
frame = PDFFrame()
app.MainLoop()
All this code does right now is create the frame and the panel inside
the frame. You will note that the last line of code starts the
loop, which is basically an infinite loop that waits for the user to
button, moves the mouse over a widget, or otherwise interacts with your
application.
When you run this code, you should get a user interface that looks
That looks kind of like our mock-up, but we don’t have the menu along
the top of the frame yet. We will need that to make editing the config
possible. Of course, if you want, you could add another button at the
version_4 folder to a new folder called version_5 so we can keep our new
changes separate. Now open up your new copy of gui.py and update the
PDFFrame class:
class PDFFrame(wx.Frame):
"""
The top level container for the panel and menu
"""
def __init__(self):
wx.Frame.__init__(self, None, title='PDF Creator')
panel = PDFPanel(self)
self.create_menu()
self.Show()
Here we add a new call to a method named create_menu. That was a simple
change. Let’s look at this new method though:
def create_menu(self):
"""
"""
menubar = wx.MenuBar()
file_menu = wx.Menu()
exit_menu_item = file_menu.Append(
menubar.Append(file_menu, '&File')
self.Bind(wx.EVT_MENU, self.on_exit, exit_menu_item)
edit_menu = wx.Menu()
config_menu_item = edit_menu.Append(
menubar.Append(edit_menu, '&Edit')
self.SetMenuBar(menubar)
Here we create a wx.MenuBar instance. This is a class that we use for creating
the menu bar itself. Then we need to create wx.Menu objects to add then to our
MenuBar. After creating a wx.Menu object, you can append menu items to them
via the Menu’s Append method.
These require a unique identifier (wx.NewId), a label, and some help text that
would appear in the Status Bar at the bottom of the frame, if
we had one. Finally we bind the menu items to events using self.Bind and
binding to wx.EVT_MENU, which is fired when someone clicks on a menu
item in the menu.
Now let’s write the last two methods that the menu items are bound to:
def on_edit_config(self, event):
"""
"""
dlg = config_editor.ConfigDialog()
dlg.ShowModal()
dlg.Destroy()
"""
"""
self.Close()
The on_edit_config event handler method will utilize a new module called
config_editor. From there we call ConfigDialog to display a dialog that allows
us to edit the configuration file. We also
Make sure that you import the config_editor module at the top of the gui.py
script.
Here is what the GUI should look like now:
Now let’s create a file called config_editor.py and save it to our version_5
folder. Now we can start writing our module:
import configparser
import wx
class ConfigDialog(wx.Dialog):
def __init__(self):
wx.Dialog.__init__(self, None,
title='Edit Configuration',
size=(400, 400))
self.main_sizer = wx.BoxSizer(wx.VERTICAL)
config = configparser.ConfigParser()
config.read('config.ini')
config_dict = dict(config.items('General'))
self.logo = wx.TextCtrl(
self, value=config_dict['logo_path'])
self.layout('Logo Path:', self.logo)
self.left_margin = wx.TextCtrl(
self, value=config_dict['left_margin'])
self.layout('Left Margin:', self.left_margin)
self.right_margin = wx.TextCtrl(
self, value=config_dict['right_margin'])
self.layout('Right Margin:', self.right_margin)
self.top_margin = wx.TextCtrl(
self, value=config_dict['top_margin'])
self.layout('Top Margin:', self.top_margin)
self.bottom_margin = wx.TextCtrl(
self, value=config_dict['bottom_margin'])
self.layout('Bottom Margin:', self.bottom_margin)
self.style = wx.TextCtrl(
self, value=config_dict['style'])
self.layout('Style:', self.style)
btn_sizer = wx.BoxSizer()
save_btn = wx.Button(self, label='Save')
save_btn.Bind(wx.EVT_BUTTON, self.save)
btn_sizer.Add(save_btn, 0, wx.ALL|wx.CENTER, 5)
self.main_sizer.Add(btn_sizer, 0, wx.ALL|wx.CENTER, 5)
self.SetSizer(self.main_sizer)
Here we subclass from wx.Dialog. The nice thing about dialogs in Python is that
they basically have a wx.Panel built into them.
However they do not allow the user to resize them, so that is kind of
annoying. If you want a resizable dialog, then you would want to use a
wx.Frame instead. Anyway, all this code is used to create the form that matches
with the config file we created earlier in this chapter. We
use a layout helper method here as well and we bind the Cancel and Save
buttons to separate event handlers. Let’s take a look at some of
def cancel(self, event):
"""
"""
self.Close()
"""
"""
hsizer = wx.BoxSizer()
hsizer.Add(wx.StaticText(self, label=lbl, size=size),
0, wx.ALL, 5)
hsizer.Add(txt, 1, wx.ALL|wx.EXPAND, 5)
self.main_sizer.Add(hsizer, 0, wx.EXPAND)
The cancel event handler method is just used to close our dialog.
Note that we need to actually call a dialog’s Destroy method to make it actually
go away. Otherwise it will just kind of float around in
memory. This is why we call the Destroy method on the dialog object in
gui.py. Here we just want to close the dialog. This tells wxPython that it’s no
longer needed and will allow Python to continue to destroy
it.
Next up is the layout method, which is just used for laying out the label and text
controls on the dialog. The last method we will look at
def save(self, event):
"""
"""
config = configparser.ConfigParser()
config.add_section('General')
config.set('General', 'logo_path',
self.logo.GetValue())
config.set('General', 'left_margin',
self.left_margin.GetValue())
config.set('General', 'right_margin',
self.right_margin.GetValue())
config.set('General', 'top_margin',
self.top_margin.GetValue())
config.set('General', 'bottom_margin',
self.bottom_margin.GetValue())
config.set('General', 'style',
self.style.GetValue())
config.write(config_file)
dlg = wx.MessageDialog(
parent=None,
message='Config saved',
caption='Config Saved',
style=wx.OK|wx.ICON_INFORMATION
dlg.ShowModal()
dlg.Destroy()
self.Close()
Then we recreate the section and key / value pairs that we need using
the text control’s values. Finally we write the config back to disk by
controls. Finally we pop up a dialog to tell the user that config was
saved successfully. When the user closes this notification, we close the
When the user goes to launch the configuration edit dialog, it should
look something like this:
Try editing some of the values using your new user interface and then
generate the PDF to see how the changes effect the result.
Wrapping Up
You should now have a decent idea of how to create some reusable modules of
your own for generating PDFs. In this chapter we learned how to apply
addition to using the configuration file for controlling the look of the
output. Finally we learned how to add a graphical user interface via the
wxPython toolkit. Try playing around with the examples and changing them
up a bit. You can also enhance the UI by adding a preview menu item or a
PyPDF2 is actually a fork of the original pyPdf which was written by Mathiew
Fenniak and released in 2005. However, the original pyPdf’s last release was in
2014. A company called Phaseit, Inc spoke with Mathieu and ended up
sponsoring PyPDF2 as a fork of pyPdf
At the time of writing this book, the PyPDF2 package hasn’t had a release since
2016. However it is still a solid and useful package that is worth your time to
learn.
Extracting metadata
Splitting documents
Merging 2 PDF files into 1
Rotating pages
Overlaying / Watermarking Pages
Encrypting / decrypting
Installation
PyPDF2 is a pure Python package, so you can install it using pip (assuming pip
is in your system’s path):
python -m pip install pypdf2
As usual, you should install 3rd party Python packages to a Python virtual
environment to make sure that it works the way you want it to.
def get_info(path):
with open(path, 'rb') as f:
pdf = PdfFileReader(f)
info = pdf.getDocumentInfo()
number_of_pages = pdf.getNumPages()
print(info)
author = info.author
creator = info.creator
producer = info.producer
subject = info.subject
title = info.title
if __name__ == '__main__':
path = 'reportlab-sample.pdf'
get_info(path)
Here we import the PdfFileReader class from PyPDF2. This class gives us the
ability to read a PDF and extract data from it using various accessor methods.
The first thing we do is create our own get_info function that accepts a PDF file
path as its only argument. Then we open the file in read-only binary mode. Next
we pass that file handler into PdfFileReader and create an instance of it.
Now we can extract some information from the PDF by using the
getDocumentInfo method. This will return an instance of
PyPDF2.pdf.DocumentInformation, which has the following useful attributes,
among others:
author
creator
producer
subject
title
If you print out the DocumentInformation object, this is what you will see:
{'/Author': 'Michael Driscoll',
'/CreationDate': "D:20180331023901-00'00'",
'/Creator': 'LaTeX with hyperref package',
'/Producer': 'XeTeX 0.99998',
'/Title': 'ReportLab - PDF Processing with Python'}
We can also get the number of pages in the PDF by calling the getNumPages
method.
Let’s try to extract the text from the first page of the PDF that we downloaded in
the previous section:
# extracting_text.py
def text_extractor(path):
with open(path, 'rb') as f:
pdf = PdfFileReader(f)
text = page.extractText()
print(text)
if __name__ == '__main__':
path = 'reportlab-sample.pdf'
text_extractor(path)
You will note that this code starts out in much the same way as our previous
example. We still need to create an instance of PdfFileReader. But this time, we
grab a page using the getPage method. PyPDF2 is zero-based, much like most
things in Python, so when you pass it a one, it actually grabs the second page.
The first page in this case is just an image, so it wouldn’t have any text.
Interestingly, if you run this example you will find that it doesn’t return any text.
Instead all I got was a series of line break characters. Unfortunately, PyPDF2 has
pretty limited support for extracting text. Even if it is able to extract text, it may
not be in the order you expect and the spacing may be different as well.
To get this example code to work, you will need to try running it against a
different PDF. I found one on the United States Internal Revenue Service
website here: https://www.irs.gov/pub/irs-pdf/fw9.pdf
Splitting PDFs
The PyPDF2 package gives you the ability to split up a single PDF into multiple
ones. You just need to tell it how many pages you want. For this example, we
will open up the W9 PDF from the previous example and loop over all six of its
pages. We will split off each page and turn it into its own standalone PDF.
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
def pdf_splitter(path):
fname = os.path.splitext(os.path.basename(path))[0]
pdf = PdfFileReader(path)
for page in range(pdf.getNumPages()):
pdf_writer = PdfFileWriter()
pdf_writer.addPage(pdf.getPage(page))
output_filename = '{}_page_{}.pdf'.format(
fname, page+1)
with open(output_filename, 'wb') as out:
pdf_writer.write(out)
print('Created: {}'.format(output_filename))
if __name__ == '__main__':
path = 'w9.pdf'
pdf_splitter(path)
For this example, we need to import both the PdfFileReader and the
PdfFileWriter. Then we create a fun little function called pdf_splitter. It
accepts the path of the input PDF. The first line of this function will grab the
name of the input file, minus the extension. Next we open the PDF up and create
a reader object. Then we loop over all the pages using the reader object’s
getNumPages method.
Finally we open the new file name in write-binary mode and use the PDF writer
object’s write method to write the object’s contents to disk.
When the original PyPdf came out, the only way to get it to merge multiple
PDFs together was like this:
# pdf_merger.py
import glob
from PyPDF2 import PdfFileWriter, PdfFileReader
def merger(output_path, input_paths):
pdf_writer = PdfFileWriter()
if __name__ == '__main__':
paths = glob.glob('w9_*.pdf')
paths.sort()
merger('pdf_merger.pdf', paths)
import glob
from PyPDF2 import PdfFileMerger
if __name__ == '__main__':
paths = glob.glob('fw9_*.pdf')
paths.sort()
merger('pdf_merger2.pdf', paths)
Here we just need to create the PdfFileMerger object and then loop through the
PDF paths, appending them to our merging object. PyPDF2 will automatically
append the entire document so you don’t need to loop through all the pages of
each document yourself. Then we just write it out to disk.
The PdfFileMerger class also has a merge method that you can use. Its code
definition looks like this:
def merge(self, position, fileobj, bookmark=None, pages=None,
import_bookmarks=True):
"""
Merges the pages from the given file into the output file at the
specified page number.
:param int position: The *page number* to insert this file. File will
be inserted after the given number.
Basically the merge method allows you to tell PyPDF where to merge a page by
page number. So if you have created a merging object with 3 pages in it, you can
tell the merging object to merge the next document in at a specific position. This
allows the developer to do some pretty complex merging operations. Give it a try
and see what you can do!
Rotating Pages
PyPDF2 gives you the ability to rotate pages. However you must rotate in 90
degrees increments. You can rotate the PDF pages either clockwise or counter
clockwise. Here’s a simple example:
# pdf_rotator.py
def rotator(path):
pdf_writer = PdfFileWriter()
pdf_reader = PdfFileReader(path)
page1 = pdf_reader.getPage(0).rotateClockwise(90)
pdf_writer.addPage(page1)
page2 = pdf_reader.getPage(1).rotateCounterClockwise(90)
pdf_writer.addPage(page2)
pdf_writer.addPage(pdf_reader.getPage(2))
if __name__ == '__main__':
rotator('reportlab-sample.pdf')
Here we create our PDF reader and writer objects as before. Then we get the first
and second pages of the PDF that we passed in. We then rotate the first page 90
degrees clockwise or to the right. Then we rotate the second page 90 degrees
counter-clockwise. Finally we add the third page in its normal orientation to the
writer object and write out our new 3-page PDF file.
If you open the PDF, you will find that the first two pages are now rotated in
opposite directions of each other with the third page in its normal orientation.
For this example we will take one of the logos I use for my blog, “The Mouse
vs. the Python”, and overlay it on top of the W9 form from earlier:
# watermarker.py
pdf = PdfFileReader(input_pdf)
pdf_writer = PdfFileWriter()
if __name__ == '__main__':
watermark(input_pdf='w9.pdf',
output_pdf='watermarked_w9.pdf',
watermark_pdf='watermark.pdf')
The first thing we do here is extract the watermark page from the PDF. Then we
open the PDF that we want to apply the watermark to. We use a for loop to
iterate over each of its pages and call the page object’s mergePage method to
apply the watermark. Next we add that watermarked page to our PDF writer
object. Once the loop finishes, we write our new watermarked version out to
disk.
PDF Encryption
The PyPDF2 package also supports adding a password and encryption to your
existing PDFs. As you may recall from Chapter 10, PDFs support a user
password and an owner password. The user password only allows the user to
open and read a PDF, but may have some restrictions applied to the PDF that
could prevent the user from printing, for example. As far as I can tell, you can’t
actually apply any restrictions using PyPDF2 or it’s just not documented well.
# pdf_encryption.py
from PyPDF2 import PdfFileWriter, PdfFileReader
if __name__ == ‘__main__’:
encrypt(input_pdf=’reportlab-sample.pdf’,
output_pdf=’encrypted.pdf’, password=’blowfish’)
All we did here was create a set of PDF reader and write objects and read
all the pages with the reader. Then we added those pages out to the
specified writer object and added the specified password. If you only set the
user password, then the owner password is set to the user password
automatically. Whenever you add a password, 128-bit encryption is applied
by default. If you set that argument to False, then the PDF will be encrypted
at 40-bit encryption instead.
Wrapping Up
We covered a lot of useful information in this chapter. You learned how to
extract metadata and text from your PDFs. We found out how to split and merge
PDFs. You also learned how to rotate pages in a PDF and apply watermarks.
Finally we discovered that PyPDF2 can add encryption and passwords to our
PDFs.
Chapter 15 - The pdfrw Package
Patrick Maupin created a package he called pdfrw and released it back in 2012.
The pdfrw package is a pure-Python library that you can
use to read and write PDF files. At the time of writing, pdfrw was at
rotating and modifying data in PDFs. The pdfrw package has been used by
the rst2pdf package (see chapter 18) since 2010 because pdfrw can
PyPDF2 does. You may recall that PyPDF2 let’s you extract an document
information object that you can use to pull out information like author,
title, etc. While pdfrw does let you get the Info object, it displays it
# reader.py
def get_pdf_info(path):
pdf = PdfReader(path)
print(pdf.keys())
print(pdf.Info)
print(pdf.Root.keys())
print('PDF has {} pages'.format(len(pdf.pages)))
if __name__ == '__main__':
get_pdf_info('w9.pdf')
Here we import pdfrw’s PdfReader class and instantiate it by passing in the path
to the PDF file that we want to read. Then we extract the
PDF object’s keys, the information object and the Root. We also grab how
many pages are in the document. The result of running this code is
below:
['/ID', '/Root', '/Info', '/Size']
{'/Author': '(SE:W:CAR:MP)',
'/CreationDate': "(D:20171109144422-05'00')",
'/Creator': '(Adobe LiveCycle Designer ES 9.0)',
'/Keywords': '(Fillable)',
'/ModDate': "(D:20171109144521-05'00')",
'/Producer': '(Adobe LiveCycle Designer ES 9.0)',
'/SPDF': '(1112)',
'/Subject': '(Request for Taxpayer Identification Number and Certification)',
'/Title': '(Form W-9 \\(Rev. November 2017\\))'}
['/Pages', '/Perms', '/MarkInfo', '/Extensions', '/AcroForm', '/Metadata', '/Typ\
e', '/Names', '/StructTreeRoot']
PDF has 6 pages
If you run this against the reportlaf-sample.pdf file that I also included in the
source code for this book, you will find that the author
‘<feff004d00690063006800610065006c002000440072006900730063006f006c006c>’
is, but I am assuming that PyPDF2 does some extra data massaging on the
Splitting
You can also use pdfrw to split a PDF up. For example, maybe you want to take
the cover off of a book for some reason or you just want to extract
the chapters of a book into multiple PDFs instead of storing them in one
# splitter.py
writer.write(output)
if __name__ == '__main__':
split('reportlab-sample.pdf', 10, 'subset.pdf')
Here we create a function called split that takes an input PDF file path, the
number of pages that you want to extract and the output path.
Then we open up the file using pdfrw’s PdfReader class and grab the total
number of pages from the input PDF. Then we create a PdfWriter object and
loop over the range of pages that we passed in. In each
iteration, we attempt to extract a page from the input PDF and add that
page to our writer object. Finally we write the extracted pages to disk.
Merging / Concatenating
The pdfrw package makes merging multiple PDFs together very easy. Let’s write
up a simple example that demonstrates how to do it:
# concatenator.py
writer.trailer.Info = IndirectPdfDict(
Title='Combined PDF Title',
Author='Michael Driscoll',
Subject='PDF Combinations',
Creator='The Concatenator'
)
writer.write(output)
if __name__ == '__main__':
paths = ['reportlab-sample.pdf', 'w9.pdf']
concatenate(paths, 'concatenate.pdf')
the output path. Then iterate over those paths, open the file and add
all the pages to the writer object via the writer’s addpages method.
Just for fun, we also import IndirectPdfDict, which allows us to add some
trailer information to our PDF. In this case, we add the title,
author, subject and creator script information to the PDF. Then we write
Rotating
The pdfrw package also supports rotating the pages of a PDF. So if you happen
to have a PDF that was saved in a weird way or an intern that
scanned in some documents upside down, then you can use pdfrw (or
PyPDF2) to fix the PDFs. Note that in pdfrw you must rotate clockwise in
increments that are divisible by 90 degrees.
For this example, I created a function that will extract all the odd
# rotator.py
writer.write(output)
if __name__ == '__main__':
rotate_odd('reportlab-sample.pdf', 'rotate_odd.pdf')
Here we just open up the target PDF and create a writer object. Then we grab all
the pages and iterate over them. If the page is an odd numbered
page, we rotate it and then add that page to our writer object. This
code ran pretty fast on my machine and the output is what you would
expect.
address or with your logo. You can also use the overlay one PDF on top
of another PDF. We will actually use the overlay technique for filling
Let’s create a simple watermarker script to demonstrate how you might use
pdfrw to overlay one PDF on top of another.
# watermarker.py
writer = PdfWriter()
writer.write(output, base_pdf)
if __name__ == '__main__':
watermarker('reportlab-sample.pdf',
'watermark.pdf',
'watermarked-test.pdf')
path, the PDF that contains the watermark and the output path of the end
result. Then we open up the base PDF and the watermark PDF. We extract
the watermark page and then iterate over the pages in the base PDF. In
each iteration, we create a PageMerge object using the current base PDF page
that we are on. Then we overlay the watermark on top of that
page and render it. After the loop finished, we create a PdfWriter
Scaling
The pdfrw package can also manipulate PDFs in memory. In fact, it will allow
you to create Form XObjects. These objects can represent any page or rectangle
in a PDF. What this means is that you once you have
one of these objects created, you can then scale, rotate and position
called 4up.py that takes pages from a PDF and scales them down to a quarter of
their size and positions four pages to a single page.
I decided to ask the creator of pdfrw, Patrick Maupin, if I could usea modified
version of that example in this book to show some of the power
# scaler.py
def get4(srcpages):
scale = 0.5
srcpages = PageMerge() + srcpages
x_increment, y_increment = (scale * i for i in srcpages.xobj_box[2:])
for i, page in enumerate(srcpages):
page.scale(scale)
page.x = x_increment if i & 1 else 0
page.y = 0 if i & 2 else y_increment
return srcpages.render()
writer.write()
if __name__ == '__main__':
scale_pdf('reportlab-sample.pdf', 'four-page.pdf')
The get4 function comes from the 4up.py script. This function takes a series of
pages and uses pdfrw’s PageMerge class to merge those pages together. We
basically loop over the passed in pages and
scale them down a bit, then we position them on the page and render the
The next function is scale_pdf, which takes the input PDF and the path for the
output. Then we extract the pages from the input file and
create a writer object. Next we loop over the pages of the input
document 4 at a time and pass them to the get4 function. Then we take the result
of that function and add it to our writer object.
that show different ways to use the two packages together. The creator
For example, you can use pdfrw to read in pages from a pre-existing PDF
and turn them into objects that you can write out in ReportLab. Let’s
write a script that will create a subset of a PDF using pdfrw and
ReportLab. The following example is based on one from the pdfrw project:
# split_with_rl.py
my_canvas = Canvas(output)
if __name__ == '__main__':
split('reportlab-sample.pdf', 10, 'subset-rl.pdf')
Here we import some new functionality. First we import the pagexobj which
will create a Form XObject from the view that you give it. The view defaults to
an entire page, but you could tell pdfrw to just
extract a portion of the page. Next we import the makerl function which will
take a ReportLab canvas object and a pdfrw Form XObject and
turn it into a form that ReportLab can add to its canvas object.
So let’s examine this code a bit and see how it works. Here we create a reader
object and a canvas object. Then we create a list of Form XForm
objects starting with the first page to the last page that we specified.
Note that we do not check if we asked for too many pages though, so that
likely to fail.
Next we iterate over the pages that we just created and add them to our
ReportLab canvas. You will note that we set the page size using the
width and height that we extract using pdfrw’s BBox attributes. Then we add the
Form XObjects to the canvas. The call to showPage tells ReportLab that you
finished creating a page and to start a new one.
There are some other examples on pdfrw’s site that you should review.
For example, there is a neat piece of code that shows how you could take
a page from a pre-existing PDF and use it as the background for a new
scaling example where you can use pdfrw and ReportLab to scale pages
down in much the same way that we did with pdfrw all by itself.
Wrapping Up
The pdfrw package is actually pretty powerful and has features that
PyPDF2 does not. Its ability to integrate with ReportLab is one feature
robust in generating viable PDFs than PyPDF2 but I have not done
aren’t a lot of Python packages that do the extraction part very well.
you can use to extract text. We will also learn how to extract some
images from PDFs. While there is no complete solution for these tasks in
Python, you should be able to use the information herein to get you
started. Once we have extracted the data we want, we will also look at
is to extract text from a PDF. In fact, PDFMiner can tell you the exact
fonts. For Python 2.4 - 2.7, you can refer to the following websites for
Github - https://github.com/euske/pdfminer
PyPI - https://pypi.python.org/pypi/pdfminer/
Webpage - https://euske.github.io/pdfminer/
The directions for installing PDFMiner are out-dated at best. You can actually
use pip to install it:
python -m pip install pdfminer
If you want to install PDFMiner for Python 3 (which is what you should
probably be doing), then you have to do the install like this:
python -m pip install pdfminer.six
likely need to use Google and StackOverflow to figure out how to use
Note: Even when using the pdfminer.six package, I have had reports that it
doesn’t fully work in Python 3. So if you want to use any of
PDFMiner’s advanced features, you may need to use Python 2.7 until they
get that package completely converted over to Python 3.
will look at some of the programmatic methods first. Let’s try reading
all the text out of an Internal Revenue Service W9 form. You can get a
Once you had the PDF properly saved off, we can look at the code:
# miner_text_extraction.py
import io
def extract_text_from_pdf(pdf_path):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)
caching=True,
check_extractable=True):
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
# close open handles
converter.close()
fake_file_handle.close()
if text:
return text
if __name__ == '__main__':
print(extract_text_from_pdf('w9.pdf'))
The PDFMiner package tends to be a bit verbose when you use it directly.
Here we import various bits and pieces from various parts of PDFMiner.
dig into the source code yourself if you’re really curious. However, I
create a file-like object via Python’s io module. If you are using Python 2, then
you will want to use the StringIO module. Our next step is to create a converter.
In this case, we choose the
The last step is to open the PDF and loop through each page. At the end, we grab
all the text, close the various handlers and print out the text
to stdout.
Extracting text by page
Frankly grabbing all the text from a multi-page document isn’t all that useful.
Usually you will want to do work on smaller subsets of the
time:
# miner_text_generator.py
import io
def extract_text_by_page(pdf_path):
with open(pdf_path, 'rb') as fh:
for page in PDFPage.get_pages(fh,
caching=True,
check_extractable=True):
resource_manager = PDFResourceManager()
fake_file_handle = io.StringIO()
converter = TextConverter(resource_manager, fake_file_handle)
page_interpreter = PDFPageInterpreter(resource_manager, converter)
page_interpreter.process_page(page)
text = fake_file_handle.getvalue()
yield text
def extract_text(pdf_path):
for page in extract_text_by_page(pdf_path):
print(page)
print()
if __name__ == '__main__':
print(extract_text('w9.pdf'))
In this example, we create a generator function that yields the text for each page.
The extract_text function prints out the text of each page. This is where we
could add some parsing logic to parse out
what we want. Or we could just save the text (or HTML or XML) off as
You will note that the text may not be in the order you expect. So you will
definitely need to figure out the best way to parse out the text
The nice thing about PDFMiner is that you can already “export” the PDF
You can also use PDFMiner’s command line tools, pdf2txt.py and dumppdf.py
to do the exporting for you if you don’t want to try to figure out PDFMiner
yourself. According to the source code of
pdf2txt.py, it can be used to export a PDF as plain text, html, xml or “tags”.
will not recognize text that is images as PDFMiner does not support
using it which is just passing it the path to a PDF file. We will use
the w9.pdf. Open up a terminal and navigate to the location that you have saved
that PDF or modify the command below to point to that file:
pdf2txt.py w9.pdf
If you run this, it will print out all the text to stdout. You can also make
pdf2txt.py write the text to file as text, HTML, XML or “tagged
PDF”. The XML format will give to the most information about the PDF as
tends to be ugly. Here’s how you can get different formats output:
pdf2txt.py -o w9.html w9.pdf
The first command will create an HTML document while the second will
Fig. 16-1: HTML conversion of a PDF
As you can see, the end result looks a bit off, but it’s not too bad.
like:
<pages>
<page id="1" bbox="0.000,0.000,611.976,791.968" rotate="0">
<textbox id="0" bbox="36.000,732.312,100.106,761.160">
<textline bbox="36.000,732.312,100.106,761.160">
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="36.000,736.33\
4,40.018,744.496" size="8.162">F</text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="40.018,736.33\
4,44.036,744.496" size="8.162">o</text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="44.036,736.33\
4,46.367,744.496" size="8.162">r</text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="46.367,736.33\
4,52.338,744.496" size="8.162">m</text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="52.338,736.33\
4,54.284,744.496" size="8.162"> </text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="54.284,736.33\
4,56.230,744.496" size="8.162"> </text>
<text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="56.230,736.33\
4,58.176,744.496" size="8.162"> </text
><text font="JYMPLA+HelveticaNeueLTStd-Roman" bbox="58.176,736.3\
34,60.122,744.496" size="8.162"> </text>
<text font="ZWOHBU+HelveticaNeueLTStd-BlkCn" bbox="60.122,732.31\
2,78.794,761.160" size="28.848">W</text>
<text font="ZWOHBU+HelveticaNeueLTStd-BlkCn" bbox="78.794,732.31\
2,87.626,761.160" size="28.848">-</text>
<text font="ZWOHBU+HelveticaNeueLTStd-BlkCn" bbox="87.626,732.31\
2,100.106,761.160" size="28.848">9</text>
<text></text>
</textline>
Python 3 compatible. If you want to give it a try, you may need to have
easy_install distribute
I wasn’t able to get pip to install that package correctly. Once it’s installed
though, you will be able to use pip to install slate:
python -m pip install slate
Note that the latest version is 0.5.2 and pip may or may not grab that version. If
it does not, then you can install slate directly from
Github:
python -m pip install git+https://github.com/timClicks/slate
Now we’re ready to write some code to extract the text from a PDF:
# slate_text_extraction.py
import slate
def extract_text_from_pdf(pdf_path):
with open(pdf_path) as fh:
document = slate.PDF(fh, password='', just_text=1)
if __name__ == '__main__':
extract_text_from_pdf('w9.pdf')
As you can see, to make slate parse a PDF, you just need to import slate and then
create an instance of its PDF class. The PDF class is actually a subclass of
Python’s list builtin, so it just returns a list / iterable of pages of text. You will
also note that we can pass in
a password argument if the PDF has a password set. Anyway, once the
looking through the source code, it appears that all this package
ways:
XML
JSON
CSV
Exporting to XML
The eXtensible Markup Language (XML) format is one of the most well
known output and input formats. It is used widely on the internet for
Let’s create our own XML creation tool though. Here’s a simple example:
# xml_exporter.py
import os
import xml.etree.ElementTree as xml
counter = 1
for page in extract_text_by_page(pdf_path):
text = xml.SubElement(pages, 'Page_{}'.format(counter))
text.text = page[0:100]
counter += 1
tree = xml.ElementTree(root)
xml_string = xml.tostring(root, 'utf-8')
parsed_string = minidom.parseString(xml_string)
pretty_string = parsed_string.toprettyxml(indent=' ')
if __name__ == '__main__':
pdf_path = 'w9.pdf'
xml_path = 'w9.xml'
export_as_xml(pdf_path, xml_path)
This script will use Python’s built-in XML libraries, minidom and
ElementTree. We also import our PDFMiner generator script that we use to grab
a page of text at a time. In this example, we create our top
level element which is the file name of the PDF. Then we add a Pages element
underneath it. The next step is our for loop where we extract each page from the
PDF and save off the information we want.
Here is where you could add a special parser where you might split up
the page into sentences or words and parse out more interesting
For this example, we just extract the first 100 characters from each
page and save them off into an XML SubElement. Technically the next bit of
code could be simplified to just write out the XML. However,
kind of ends up looking like minified javascript in that its just one
we use minidom to “prettify” the XML with whitespace before writing it out.
The result ends up looking like this:
<?xml version="1.0" ?>
<w9>
<Pages>
<Page_1>Form W-9(Rev. November 2017)Department of the Treasury Internal \
That’s pretty clean XML and it’s also easy to read. For bonus points, you could
take what you learned in the PyPDF2 chapter and use it to
extract the metadata from the PDF and add it to your XML as well.
Exporting to JSON
JavaScript Object Notation or JSON is a lightweight data-interchange
format that is easy to read and write. Python includes a json module in its
standard library that allows you to read and write JSON
and use that to create an exporter script that outputs JSON instead of
XML:
# json_exporter.py
import json
import os
def export_as_json(pdf_path, json_path):
filename = os.path.splitext(os.path.basename(pdf_path))[0]
data = {'Filename': filename}
data['Pages'] = []
counter = 1
for page in extract_text_by_page(pdf_path):
text = page[0:100]
page = {'Page_{}'.format(counter): text}
data['Pages'].append(page)
counter += 1
if __name__ == '__main__':
pdf_path = 'w9.pdf'
json_path = 'w9.json'
export_as_json(pdf_path, json_path)
Here we import the various libraries that we need including our PDFMiner
module. Then we create a function that accepts the PDF input path and
The Pages key maps to an empty list. Next we loop over each page of the PDF
and extract the first 100 characters of each page. Then we
create a dictionary with the page number as the key and the 100
Finally we write the file using the json module’s dump command.
{'Filename': 'w9',
'Pages': [{'Page_1': 'Form W-9(Rev. November 2017)Department of the Treasury\
{'Page_6': 'Form W-9 (Rev. 11-2017)Page 6 The IRS does not initiate c\
Once again, we have some nice output that is easy to read. You could
enhance this example with the PDF’s metadata as well, if you would like
to. Note that the output will change depending on what you want to parse
Exporting to CSV
CSV stands for comma separated values. It is a pretty standard format that has
been around a very long time. The nice thing about CSV
Python has a built-in csv module that you can use to read and write CSV files.
We will use it here to create a CSV from the text that we
# csv_exporter.py
import csv
import os
counter = 1
with open(csv_path, 'w') as csv_file:
writer = csv.writer(csv_file)
for page in extract_text_by_page(pdf_path):
text = page[0:100]
words = text.split()
writer.writerow(words)
if __name__ == '__main__':
pdf_path = 'w9.pdf'
csv_path = 'w9.csv'
export_as_csv(pdf_path, csv_path)
For this example, we import Python’s csv library. Otherwise the imports are the
same as the previous example. In our function, we create
a CSV file handler using the CSV file path. Then we initialize a CSV
writer object with that file handler as its sole argument. Next we loop
over the pages of the PDF as before. The only difference here is that we
split the first 100 characters into individual words. This allows us to
have some actual data to add to the CSV. If we did not do this, then
each row would only have one element in it, which isn’t really a CSV
file at that point. Finally we write out our list of words to the CSV
file.
Form,W-9(Rev.,November,2017)Department,of,the,Treasury,Internal,Revenue,Service,\
Request,for,Taxp
Form,W-9,(Rev.,11-2017)Page,2,By,signing,the,filled-out,"form,",you:,1.,Certify,\
that,the,TIN,you,are,g
Form,W-9,(Rev.,11-2017)Page,3,Criminal,penalty,for,falsifying,information.,Willf\
ully,falsifying,cert
Form,W-9,(Rev.,11-2017)Page,4,The,following,chart,shows,types,of,payments,that,m\
ay,be,exempt,from,ba
Form,W-9,(Rev.,11-2017)Page,5,1.,"Interest,","dividend,",and,barter,exchange,acc\
ounts,opened,before,1984
Form,W-9,(Rev.,11-2017)Page,6,The,IRS,does,not,initiate,contacts,with,taxpayers,\
via,emails.,"Also,",th
I think this one is a bit harder to read than the JSON or XML examples, but it’s
not too bad. Now let’s move on and look at how we might extract
Extracting Images
Unfortunately, there are no Python packages that actually do image
extraction from PDFs. The closest thing I found was a project called
minecart that claims to be able to do it, but only works on Python 2.7. I was not
able to get it to work with the sample PDFs I had. There
(https://nedbatchelder.com/blog/200712/extracting_jpgs_from_pdfs.html) that
talks a bit about how he was able to extract JPGs from PDFs. His
code is as follows:
# Extract jpg's from pdf's. Quick and dirty.
import sys
startmark = "\xff\xd8"
startfix = 0
endmark = "\xff\xd9"
endfix = 2
i = 0
njpg = 0
while True:
istream = pdf.find("stream", i)
if istream < 0:
break
istart = pdf.find(startmark, istream, istream+20)
if istart < 0:
i = istream+20
continue
iend = pdf.find("endstream", istart)
if iend < 0:
raise Exception("Didn't find end of stream!")
iend = pdf.find(endmark, iend-20)
if iend < 0:
raise Exception("Didn't find end of JPG!")
istart += startfix
iend += endfix
print("JPG %d from %d to %d" % (njpg, istart, iend))
jpg = pdf[istart:iend]
jpgfile = file("jpg%d.jpg" % njpg, "wb")
jpgfile.write(jpg)
jpgfile.close()
njpg += 1
i = iend
This also did not work for the PDFs I was using. There are some people in the
comments that do claim it works for some of their PDFs and there
are some examples of updated code in the comments too. StackOverflow has
variations of this code on it, some of which use PyPDF2 in some way or
pdfimages -all reportlab-sample.pdf images/prefix-jpg
Make sure that the images folder (or whatever output folder you want to create)
is already created as pdfimages doesn’t create it for you.
Let’s write up a Python script that also executes this command and will make
sure the output folder exists for you too:
# image_exporter.py
import os
import subprocess
'{}/prefix'.format(output_dir)]
subprocess.call(cmd)
print('Images extracted:')
print(os.listdir(output_dir))
if __name__ == '__main__':
pdf_path = 'reportlab-sample.pdf'
image_exporter(pdf_path, output_dir='images')
In thie example, we import the subprocess and os modules. If the output
directory does not exist, we attempt to create it. Them we use
subprocess’s call method to execute pdfimages. We use call because it will wait
for pdfimages to finish running. You could use
Popen instead, but that will basically run the process in the background. Finaly
we print out a listing of the output directory to
There are some other articles on the internet that reference a library called Wand
that you might also want to try. It is an ImageMagick wrapper. Also of note is
that there is a Python binding to Poppler
called pypoppler, although I wasn’t able to find any examples of that package
that did image extraction.
Wrapping Up
We covered a lot of different information in this chapter. You learned about
several different packages that we can use to extract text from
built-in libraries to export the text to XML, JSON and CSV. Finally we
Python does not currently have any good libraries for this task, you can
utility.
Chapter 17 - Filling in PDF Forms
Fillable forms have been a part of Adobe’s PDF format for years. One of the
most famous examples of fillable forms in the United States are documents from
the Internal Revenue Service. There are lots of government forms that use
fillable forms. There are many different approaches for filling in these forms
programmatically. The most time consuming method I have heard about is to just
recreate the form in ReportLab by hand and then fill it in. Frankly I think this is
probably the worst idea, except when your company is in charge of creating the
PDFs itself. Then that might be a viable option because you then have complete
control over the PDF creation and the inputs that need to go into it.
def create_simple_form():
c = canvas.Canvas('simple_form.pdf')
c.setFont("Courier", 20)
c.drawCentredString(300, 700, 'Employment Form')
c.setFont("Courier", 14)
form = c.acroForm
c.save()
if __name__ == '__main__':
create_simple_form()
When you run this example, the interactive PDF form looks like this:
Now we are ready to learn one of the ways that we can fill in this form!
Merging Overlays
Jan Chęć wrote an article on Medium
(https://medium.com/@zwinny/filling-pdf-forms-in-python-the-right-way-
eb9592e03dba) that contained several different approaches to this problem of
filling in forms in PDFs. The first solution proposed was to take an unfilled form
in a PDF and create a separate PDF using ReportLab that has the data we want to
us to “fill” this form. The author then used pdfrw to merge the two PDFs
together. You could theoretically use PyPDF2 for the merging process too. Let’s
go ahead and take a look at how this approach might work using the pdfrw
package.
If you haven’t already installed pdfrw when you read chapter 15, then you will
want to get started by installing it now:
python -m pip install pdfrw
Now that we have pdfrw installed, let’s create a file called fill_by_overlay.py.
We will add two functions to this file. The first function will create our overlay.
Let’s check that out:
# fill_by_overlay.py
import pdfrw
from reportlab.pdfgen import canvas
def create_overlay():
"""
Create the data that will be overlayed on top
of the form that we want to fill
"""
c = canvas.Canvas('simple_form_overlay.pdf')
c.save()
Here we import the pdfrw package and we also import the canvas sub-module
from ReportLab. Then we create a function called create_overlay that creates a
simple PDF using ReportLab’s Canvas class. We just use the drawString
canvas method. This will take some trial-and-error. Fortunately on Linux and
Mac, there are decent PDF Previewer applications that you can use to just keep
the PDF open and they will automatically refresh with each change. This is very
helpful in figuring out the exact coordinates you need to draw your strings to.
Since we created the original form, figuring out the offset for the overlay is
actually pretty easy. We already knew where on the page the form elements
were, so we can make a good educated guess of where to draw the strings to.
The next piece of the puzzle is actually merging the overlay we created above
with the form we created in the previous section. Let’s write that function next:
def merge_pdfs(form_pdf, overlay_pdf, output):
"""
Merge the specified fillable form PDF with the
overlay PDF and save the output
"""
form = pdfrw.PdfReader(form_pdf)
olay = pdfrw.PdfReader(overlay_pdf)
writer = pdfrw.PdfWriter()
writer.write(output, form)
if __name__ == '__main__':
create_overlay()
merge_pdfs('simple_form.pdf',
'simple_form_overlay.pdf',
'merged_form.pdf')
Here we open up both the form and the overlay PDFs using pdfrw’s PdfReader
classes. Then we loop over the pages of both PDFs and merge them together
using PageMerge. At the end of the code, we create an instance of PdfWriter
that we use to write the newly merged PDF out. The end result should look like
this:
Note: When I ran this code, I did receive some errors on stdout. Here’s an
example:
[ERROR] tokens.py:226 stream /Length attribute (171) appears to be too small (si\
ze 470) -- adjusting (line=192, col=1)
As I mentioned, this doesn’t actually prevent the merged PDF from being
created. But you might want to keep an eye on these as they might hint at a
problem should you have any issues.
A better method would be to open a PDF in a PDF editor where you can add
invisible read-only fields. You can label the fields with unique names and then
access them via the PDF’s metadata. Loop over the metadata and use
ReportLab’s canvas methods to create an overlay again and then merge it in
much the same way as before.
I have also seen a lot of people talking about using Forms Data Format or FDF.
This is the format that PDFs are supposed to use to hold that data that is to be
filled in a PDF. You can use PyPDFtk and PdfJinja to do the form filling.
Interestingly, PyPDFtk doesn’t work with image fields, such as where you
might want to paste a signature image. You can use PdfJinja for this purpose.
However PdfJinja seems to have some limitations when working with
checkboxes and radioboxes.
You can read more about these topics at the following links:
https://yoongkang.com/blog/pdf-forms-with-python/
https://medium.com/@zwinny/filling-pdf-forms-in-python-the-right-
way-eb9592e03dba
To use pdfforms, you must first have it inspect the PDF that contains a form so it
knows how to fill it out. You can do the inspection like this:
pdfforms inspect simple_form.pdf
If pdfforms works correctly, it will create a “filled” PDF in its “test” sub-folder.
This sub-folder appears next to where pdfforms itself is, not where you run it
from. It will fill the form with numbers in a sequential order. These are the field
numbers.
The next thing you do is create a CSV file where the first column and row
contains the name of the PDF. The other rows in the first column correspond to
the field numbers. You enter the numbers of the fields that you want to fill here.
Then you enter the data you want to fill use in the form in the third column of
your CSV file. The second column is ignored, so you can put a description here.
All columns after the third column are also ignored, so these can be used for
whatever you want.
For this example, your CSV file might look something like this:
simple_form.pdf,,,
1,first name,Mike
2,last name,Driscoll
Once you have the CSV filled out, you can run the following command to
actually fill your form out with your custom data:
pdfforms fill data.csv
Now on to the bad news. I wasn’t able to get this to work correctly on Windows
or Mac. I got the inspect step to work on Windows, but on Mac it just hangs. On
Windows, when I run the fill command it just fails with an error about not
finding the PDF to fill.
I think when this package becomes less error-prone, it will be really amazing.
The only major downside other than it having issues running is that you need to
install a 3rd party tool that isn’t written in Python at all.
Wrapping Up
After looking at the many different options available to the Python developer for
filling PDF forms, I think the most straight-forward method is creating the
overlay and then merging it to the fillable form PDF using a tool like pdfrw.
While this feels a bit like a hack, the other methods that I have seen seem just as
hacky and just as time consuming. Once you have the position of one of the cells
in the form, you can reasonably calculate the majority of the others on the page.
Chapter 18 - Converting Markup to PDF
While ReportLab is great for laying out your PDFs programmatically in Python,
sometimes you might want to write your documents using a type of markup. For
example, you might have non-developers that need to be able to get their work
turned into PDFs too. Now they could just use a print-to-PDF option. That
actually works great in most applications. But what I am referring to are the
people who know markup, like HTML, Markdown or restructuredText. There
are a lot of tools out there to help you write well using a text editor and still get a
nice PDF output.
We will look at just a handful of the ones that are currently available in this
chapter.
rst2pdf
I personally really like restructuredText. This markup format used to be the
format of choice for writing documentation in Python. I believe the current
Python documentation is still written in restructuredText. This is also the format
that the Sphinx documentation framework was originally based on. Sphinx is the
documentation tool used for creating Python’s documentation and the
documentation used on the ReadTheDocs website. Of course, today Sphinx also
supports Markdown and several other formats.
When I was first learning about how to write a book, I recalled that I had heard
of a project called rst2pdf that a fellow named Roberto Alsina had created. It
basically takes restructuredText as its input and then outputs a PDF using
ReportLab underneath. It is an alternative to using LaTeX, which I won’t be
covering in detail in this chapter. You can read all about rst2pdf on its website
here: http://rst2pdf.ralsina.me/stories/index.html or download the manual here:
http://ralsina.me/static/manual.pdf
Subtitle
--------
Let’s save the markup in the example above to a file called test.rst. Now let’s try
converting this restructuredText file into a PDF. Open up a terminal and navigate
to the location that you saved the file to. Then run the following command:
rst2pdf test.rst test.pdf
The first argument to rst2pdf is the path to the input file and the second argument
is the path to the output file. The command should just return without any errors.
Now if you open your test.pdf file, it should look something like this:
restructuredText supports pretty much any type of markup you can imagine. You
can add a code block like this:
.. code-block:: python
def doubler(x):
return x * 2
This is an example of a directive. There are many other directives and you can
even create custom ones. There are directives for inserting images, figures,
tables, footnotes, sidebars and much much more! There are already several
documents detailing what you can do with restructuredText and that is really
outside the scope of this book. However, I highly recommend reading the
documentation for restructuredText and then trying it out with rst2pdf.
If you look in the book’s source code on Github, you will find that I have
included a chapter (intro.rst) of this book in restructuredText format. You can
use rst2pdf on this file too to see what kind of output it can generate with a
more full-fledged example.
One final note on this topic is that when you installed rst2pdf it installed docutils
as a dependency. Docutils includes tools for converting restructuredText to
HTML, XML, LaTeX and more. So if you are looking for a versatile markup
language, you might want to check restructuredText out.
WeasyPrint
WeasyPrint is a specialized beast. It is a “visual rendering engine for HTML and
CSS that can export to PDF”. It does not allow you to specify absolute
positioning of elements on the page, but it does make pagination easy. You can
read more about WeasyPrint on the ReadTheDocs website here:
http://weasyprint.readthedocs.io
This does a pretty good job of downloading the web page and turning it into a
PDF. Here is a screenshot of how part of the first page turned out:
Fig. 18-2: WeasyPrint PDF Example
In this case, it looks like we ended up with a PDF that has a bunch of the
website’s contents truncated off the bottom. Somehow WeasyPrint managed to
create bookmarks for parts of the web page that it didn’t actually render in the
PDF. Regardless, it still made the pages look nice, even if it wasn’t able to get all
of the information.
Now let’s take a moment and find out how you might use WeasyPrint in Python
itself:
from weasyprint import HTML
html = HTML('https://www.blog.pythonlibrary.org/')
html.write_pdf('/tmp/test.pdf')
This code is the equivalent of calling the weasyprint executable we were using in
the previous examples. Here we just use the weasyprint module’s HTML class
to basically turn a web page into an object that can then be turned into a PDF or
PNG. In this case, we call the write_pdf method to create a PDF. One thing I
noticed while using WeasyPrint is that it doesn’t work very well with web pages
that have a lot of dynamically loaded content.
Anyway, one cool extra that WeasyPrint provides is the ability to use custom
CSS to style the HTML your download or create before turning it into a PDF.
You just have to import the CSS class from the weasyprint module. You can
give this class an inline stylesheet by passing it a string or you can pass it a file
path. WeasyPrint also supports font-face rules in your CSS, although you will
have to use its FontConfiguration class to make that work.
I personally think this is a pretty slick package for web developers or people who
are just familiar with HTML / CSS.
Pandoc
Pandoc is well known as a “universal document converter”. You can use it to
convert documents created in many types of markup to HTML, Microsoft Word,
epub, TeX formats, PDF and much more. I actually use Pandoc when I am
writing my books to convert from restructuredText to Markdown. I have also
used Pandoc for converting restructuredText to PDF and other formats. It is free
software that is licensed under GPL. You can read all about it on Pandoc’s
website here: https://pandoc.org/
If you want to convert straight to PDF, then you need to tell Pandoc to use its
LaTeX converter:
pandoc test.rst -f rst -t latex -o test-pandoc.pdf
When you run that, you should get a nicely formatted PDF. You will note that
LaTeX converted the restructuredText a bit differently than rst2pdf did.
LaTeX
LaTeX (usually pronounced la-tec) is a typesetting system. A lot of scientific and
technical writers use LaTeX to produce documentation. It is used especially by
scientists for scientific documents. It is also free under the LPPL license. I
personally know some authors who use LaTeX for writing their books. Some use
it for every part of their book and others use it just for specific parts of their
books. LaTeX has it’s own special kind of markup that you will have to learn. Its
markup is nothing like restructuredText, HTML or Markdown, so knowing those
will not be of much help. I personally have never used LaTeX and do not plan to.
I only mention LaTeX because you will see it mentioned in a lot of other
programs as an intermediary program. For example, in the previous section,
Pandoc uses LaTeX to generate PDFs. You will find that a lot of conversion
utility use LaTeX or a variant of it under the hood to do certain types of
conversions.
Learning how to use LaTeX is way outside the scope of this book. There is a lot
of documentation for this program on their website and a lot of good books on
the subject that you can get too.
Wrapping Up
While this chapter was shorter than some of the previous ones, its intent was to
help you learn about the many different tools you can use to convert markup to
PDFs and other formats. My favorite is probably rst2pdf because I really enjoy
using restructuredText. There are lots of other products for converting other
types of markup to PDFs too. We looked at WeasyPrint for converting HTML /
CSS documentation into PDFs. You could also use docutils html2pdf utility for
the same purpose. If you like the Markdown format, then you can use Pandoc
and many other utilities to convert that format to PDF. Finally we mentioned
LaTeX in passing. This is a really powerful program for typesetting that you
hopefully will never need to learn. But if you do, you can rest assured that it can
make beautiful looking documentation and PDFs should you ever need to use it.
Chapter 19 - The PyFPDF Package
ReportLab is the primary toolkit that you can use for generating PDFs from
scratch. However I have found that there is another one called
PyFPDF or FPDF for Python. The PyFPDF package is actually a port of the
“Free”-PDF package that was written in PHP. There hasn’t
been a release of this project in a few years, but there have been
commits to its Github repository so there is still some work being done
on the project. The PyFPDF package supports Python 2.7 and Python 3.4+.
package. However it will cover more than enough for you to get started
“Python does PDF: pyFPDF” by Edwood Ocasio on Leanpub if you would like
to learn more about the library than what is covered in this chapter or
Installation
Installing PyFPDF is easy since it was designed to work with pip. You can install
this package just like you have all the others in this book:
python -m pip install fpdf
At the time of writing, this command installed version 1.7.2 on Python 3.6 with
no problems whatsoever. You will notice when you are installing
Basic Usage
Now that you have PyFPDF installed, let’s try using it to create a simple PDF.
Open up your Python editor and create a new file called
# simple_demo.py
pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
pdf.cell(200, 10, txt="Welcome to Python!", ln=1, align="C")
pdf.output("simple_demo.pdf")
The first item that we need to talk about is the import. Here we import the FPDF
class from the fpdf package. The defaults for this class are to create the PDF in
Portrait mode, use millimeters for its
pdf = FPDF(orientation='P', unit='mm', format='A4')
I am not a fan of using the letter ‘P’ to tell the class what its orientation is. You
may also use ‘L’ if you prefer landscape over
portrait.
The PyFPDF package supports ‘pt’, ‘cm’ and ‘in’ as alternative measurement
units.
If you go diving into the source, you will find that the PyFPDF package only
supports the following page sizes:
A3
A4
A5
letter
legal
This is a bit limiting compared to ReportLab where you have several additional
sizes supported out of the box and you can set the page size
Anyway, the next step is to create a page using the add_page method. Then we
set the page’s font via the set_font method. You will note that we pass in the
font’s family name and the size that we
want. You can also set the font’s style with the style argument. If you want to do
this, note that it takes a string such as ‘B’ for bold or
millimeters high. A cell is basically a flowable that holds text and can
break is enabled and the cell goes beyond the page’s size limit. The
txt parameter is the text that you want to print in the PDF. The ln parameter tells
PyFPDF to add a line break if set to one, which is what we do here. Finally we
can set the alignment of the text to
here.
Finally we save the document to disk by calling the output method with the path
to the file that we want to save.
When I ran this code, I ended up with a PDF that looked like this:
Now let’s learn a little bit about how PyFPDF works with fonts.
self.core_fonts={'courier': 'Courier', 'courierB': 'Courier-Bold',
'courierBI': 'Courier-BoldOblique',
'courierI': 'Courier-Oblique',
'helvetica': 'Helvetica',
'helveticaB': 'Helvetica-Bold',
'helveticaBI': 'Helvetica-BoldOblique',
'helveticaI': 'Helvetica-Oblique',
'symbol': 'Symbol',
'times': 'Times-Roman',
'timesB': 'Times-Bold',
'timesBI': 'Times-BoldItalic',
'timesI': 'Times-Italic',
'zapfdingbats': 'ZapfDingbats'}
You will note that Arial is not listed here even though we used it in the previous
example. Arial is getting remapped to Helvetica in the
actual source code, so you are not really using Arial at all. Anyway,
# change_fonts.py
pdf.output("change_fonts.pdf")
if __name__ == '__main__':
change_fonts()
and then loop over the core fonts. When I tried that, I discovered that
PyFPDF doesn’t consider the variant names of its core fonts as valid
create a list comprehension and check for any capital characters in the
font’s name. If there is one, we skip that font. Otherwise we set the
font and the font size and write it out. We also increase the font size
by two points each time through the loop. If you want to change the
font’s color, then you can call set_text_color and pass in the RGB
I like how easy it is to change fonts in PyFPDF. However the number of core
fonts is pretty small. You can add TrueType, OpenType or Type1
fonts using PyFPDF though via the add_font method. This method takes the
following arguments:
pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
You would call add_font before attempting to use it via the set_font method. I
tried this on Windows and got an error as Windows couldn’t find this font, which
is what I expected. This is a
really simply way to add fonts though and will probably work. Note that
FPDF_FONTPATH
SYSTEM_TTFONTS
These appear to be constants that are defined either in your environment or in the
PyFPDF package itself. The documentation does not explain how
these are set or modified however, if you look closely at the API and
the source code, it would appear that you would have to do the following
import fpdf
fpdf.SYSTEM_TTFONTS = '/path/to/system/fonts'
Drawing
The PyFPDF package has limited drawing support. You can draw lines, ellipses
and rectangles. Let’s take a look at how to draw lines first:
# draw_lines.py
def draw_lines():
pdf = FPDF()
pdf.add_page()
pdf.line(10, 10, 10, 100)
pdf.set_line_width(1)
pdf.set_draw_color(255, 0, 0)
pdf.line(20, 20, 100, 20)
pdf.output('draw_lines.pdf')
if __name__ == '__main__':
draw_lines()
Here we call the line method and pass it two pairs of x/y coordinates. The line
width defaults to 0.2 mm so we increase it to 1 mm
for the second line by calling the set_line_width method. We also set the color
of the second line by calling set_draw_color to an RGB value equivalent to red.
The output looks like this:
# draw_shapes.py
from fpdf import FPDF
def draw_shapes():
pdf = FPDF()
pdf.add_page()
pdf.set_fill_color(255, 0, 0)
pdf.ellipse(10, 10, 10, 100, 'F')
pdf.set_line_width(1)
pdf.set_fill_color(0, 255, 0)
pdf.rect(20, 20, 100, 50)
pdf.output('draw_shapes.pdf')
if __name__ == '__main__':
draw_shapes()
When you draw a shape like an ellipse or a rect, you will need to pass in the x
and y coordinates that represent the upper left corner
of the drawing. Then you will want to pass in the width and height of
the shape. The last argument you can pass in is for style which can be “D” or an
empty string (default), “F” for fill or “DF” for draw and
fill. In this example, we fill the ellipse and use the default for the
Fig. 19-4: PyFPDF - Drawing Shapes
Adding Images
The PyFPDF package supports adding JPEG, PNG and GIF formats to your
PDF. If you happen to try to use an animated GIF, only the first frame
is used. Also of note is that if you add the same image multiple times
to the document, PyFPDF is smart enough to only embed one actual copy of
using PyFPDF:
# add_image.py
if __name__ == '__main__':
add_image('snakehead.jpg')
The new piece of code here is the call to the image method. Its signature looks
like the this:
image(name, x = None, y = None, w = 0, h = 0, type = '', link = '')
You specify the image file path, the x and y coordinate and the width and height.
If you only specify the width or the height, the other is
the image. You can also specify the file type explicitly, otherwise it
is guessed from the file name. Finally you can add a link / URL when
When you run this code, you should see something like the following:
Fig. 19-5: PyFPDF - Adding an Image
Multipage Documents
PyFPDF had multipage support enabled by default. If you add enough cells to a
page, it will automatically create a new page and continue to add
# multipage_simple.py
if __name__ == '__main__':
multipage_simple()
All this does is create 100 lines of text. When I ran this code, I ended up with a
PDF that contained 4 pages of text.
# header_footer.py
class CustomPDF(FPDF):
def header(self):
# Set up a logo
self.image('snakehead.jpg', 10, 8, 33)
self.set_font('Arial', 'B', 15)
# Add an address
self.cell(100)
self.cell(0, 5, 'Mike Driscoll', ln=1)
self.cell(100)
self.cell(0, 5, '123 American Way', ln=1)
self.cell(100)
self.cell(0, 5, 'Any Town, USA', ln=1)
# Line break
self.ln(20)
def footer(self):
self.set_y(-10)
self.set_font('Arial', 'I', 8)
def create_pdf(pdf_path):
pdf = CustomPDF()
# Create the special value {nb}
pdf.alias_nb_pages()
pdf.add_page()
pdf.set_font('Times', '', 12)
line_no = 1
for i in range(50):
pdf.cell(0, 10, txt="Line #{}".format(line_no), ln=1)
line_no += 1
pdf.output(pdf_path)
if __name__ == '__main__':
create_pdf('header_footer.pdf')
Since this is a fairly long piece of code, let’s go over this piece-by-piece. The
first section that we want to look at is the
header method:
def header(self):
# Set up a logo
# Add an address
self.cell(100)
self.cell(100)
self.cell(100)
# Line break
self.ln(20)
Here we just hard-code in the logo image that we want to use and then we set the
font that we will be using in our header. Next we add an address
and we position that address to the right of the image. You will notice
that when you are using PyFPDF, the origin is the top left of the page.
So if we want to move our text over to the right, then we need to create
a cell with a number of units of measurement. In this case, we move the
next three lines over to the right by adding a cell of 100 mm. Then we
add a line break at the end, which should add 20 mm of vertical space.
def footer(self):
self.set_y(-10)
self.set_font('Arial', 'I', 8)
The first thing we do here is set the y-position of the origin on the page to -10
mm or -1 cm. This puts the footer’s origin right above the
bottom of the page. Then we set our font for the footer. Finally we
create the page number text. You will note the reference to {nb}.
def create_pdf(pdf_path):
pdf = CustomPDF()
pdf.alias_nb_pages()
pdf.add_page()
line_no = 1
for i in range(50):
line_no += 1
pdf.output(pdf_path)
This is where we call the somewhat magical alias_nb_pages method that will
help us get the total number of pages. We also set the font
for the portion of the page that is not taken up by the header or
footer. Then we write 50 lines of text to the document to make it create
a multipage PDF.
When you run this code you should see a page that looks something like this:
Now let’s find out how you can create tables with PyFPDF.
Tables
The PyFPDF does not have a table control. Instead you have to build your tables
using cells or HTML. Let’s take a look at how you might create a
# simple_table.py
pdf = FPDF()
pdf.set_font("Arial", size=12)
pdf.add_page()
pdf.output('simple_table.pdf')
if __name__ == '__main__':
simple_table()
Here we just create a simple list of lists and then loop over it. For each row in
the list and each element in the nested row, we add a cell
to our PDF object. Note that we turn the border on for these cells. When
cells to have more space in the cells, then you can pass in a spacing
value. When I ran this script, I ended up with a table that looked like
this:
Fig. 19-7: PyFPDF - Adding a Simple Table
# simple_table_html.py
def simple_table_html():
pdf = HTML2PDF()
pdf.add_page()
pdf.write_html(table)
pdf.output('simple_table_html.pdf')
if __name__ == '__main__':
simple_table_html()
Here we use PyFPDF’s HTMLMixin class to allow it to accept HTML as an
input and transform that into a PDF. When you run this example, you
There are some examples on the website that use the Web2Py framework in
conjunction with PyFPDF to create better looking tables, but the code
also add hyperlinks, images, lists and tables. Check the documentation
for the full list of tags and attributes that are supported. You can
then take basic HTML and turn it into a PDF using the HTMLMixin that we
saw in the previous section when we created our table.
# html2fpdf.py
def html2pdf():
html = '''<h1 align="center">PyFPDF HTML Demo</h1>
<p>This is regular text</p>
<p>You can also <b>bold</b>, <i>italicize</i> or <u>underline</u>
'''
pdf = HTML2PDF()
pdf.add_page()
pdf.write_html(html)
pdf.output('html2pdf.pdf')
if __name__ == '__main__':
html2pdf()
Here we just use pretty standard HTML markup to design the PDF. It actually
ends up looking pretty good when you run this code:
Web2Py
The Web2Py framework includes PyFPDF package to make creating reports in
the framework easier. This allows you to create PDF templates in Web2Py.
The documentation is a bit scarce on this subject, so I won’t be
covering this subject in this book. However it does appear that you can
Templates
You can also create templates using PyFPDF. The package even includes a
designer script that uses wxPython for its user interface. The templates
that you can create would be where you want to specify where each
element appears on the page, its style (font, size, etc) and the default
Wrapping Up
The PyFPDF package is a fairly nice project that let’s you do basic PDF
generation. They do point out in the FAQ that they do not support charts
PyPDF2. However if all you need are the bare bone basics to generate a
PDF, then this library might work for you. I think its learning curve is
simpler than ReportLab’s is. However PyFPDF is nowhere near as
feature-rich as ReportLab and I didn’t feel like you had quite the same
The svglib package will work on Linux, Mac OS and Windows. The website
states that it works with Python 2.7 - 3.5, but it should work in newer versions of
Python as well.
You can use svglib to read your existing SVG giles and convert them into
ReportLab Drawing objects. The svglib package also has a command-line tool,
svg2pdf, that can convert SVG files to PDFs.
Dependencies
The svglib package depends on ReportLab and lxml. You can install both of
these packages using pip:
python -m pip install reportlab lxml
Installation
The svglib package can be installed using one of three methods.
Manual installation
Most of the time, using pip is the way to go. But you can also download the
tarball from the Python Packaging Index and do all the steps that pip does for
you automatically if you want to. Just run the following three commands in your
terminal in order:
tar xfz svglib-0.8.1.tar.gz
cd svglib-0.8.1
python setup.py install
Now that we have svglib installed, let’s learn how to use it!
Usage
Using svglib with ReportLab is actually quite easy. All you need to do is import
svg2rlg from svglib.svglib and give it the path to your SVG file. Let’s take a
look:
# svg_demo.py
if __name__ == '__main__':
svg_demo('snakehead.svg', 'svg_demo.pdf')
After giving svg2rlg your path to the SVG file, it will return a drawing object.
Then you can use this object to write it out as a PDF or a PNG. You could go on
to use this script to create your own personal SVG to PNG converting utility!
def add_image(image_path):
my_canvas = canvas.Canvas('svg_on_canvas.pdf')
drawing = svg2rlg(image_path)
renderPDF.draw(drawing, my_canvas, 0, 40)
my_canvas.drawString(50, 30, 'My SVG Image')
my_canvas.save()
if __name__ == '__main__':
image_path = 'snakehead.svg'
add_image(image_path)
Here we create a canvas.Canvas object and then create our SVG drawing
object. Now you can use renderPDF.draw to draw your drawing on your canvas
at a specific x/y coordinate. We go ahead and draw out some small text
underneath our image and then save it off. The result should look something like
this:
https://upload.wikimedia.org/wikipedia/commons/b/bd/Flag_of_Cuba.svg
I have also copied the image into the book’s Github repository as well.
Once you have the image saved off, we can take a look at the code:
# svg_demo2.py
doc = SimpleDocTemplate(output_path)
story = []
story.append(drawing)
doc.build(story)
if __name__ == '__main__':
svg_demo('Flag_of_Cuba.svg', 'svg_demo2.pdf')
This worked pretty well, although the flag is cut off on the right side. Here’s the
output:
doc = SimpleDocTemplate(output_path,
rightMargin=0,
leftMargin=0)
story = []
story.append(drawing)
doc.build(story)
if __name__ == '__main__':
svg_demo('snakehead.svg', 'svg_demo3.pdf')
if __name__ == '__main__':
image_path = 'snakehead.svg'
add_image(image_path, scaling_factor=0.5)
Here we have two functions. The first function will scale our image using a
scaling factor. In this case, we use 0.5 as our scaling factor. Then we do some
math against our drawing object and tell it to scale itself. Finally we draw it back
out in much the same way as we did in the previous example.
Now that we have matplotlib installed, we can create a simple plot and export it
as SVG. Let’s see how this works:
import matplotlib.pyplot as pyplot
def create_matplotlib_svg(plot_path):
pyplot.plot(list(range(5)))
pyplot.title = 'matplotlib SVG + ReportLab'
pyplot.ylabel = 'Increasing numbers'
pyplot.savefig(plot_path, format='svg')
if __name__ == '__main__':
from svg_demo import svg_demo
svg_path = 'matplot.svg'
create_matplotlib_svg(svg_path)
svg_demo(svg_path, 'matplot.pdf')
In this code, we import the pyplot sub-library from matplotlib. Next we create a
simple function that takes the path to where we want to save our plot. For this
simple plot, we create a simple range of five numbers for one of the axes. hen we
add a title and a y-label. Finally we save the plot to disk as an SVG.
The last step is in the if statement at the bottom of the code. Here we import our
svg_demo code from earlier in this appendix. We create oru SVG image and
then we run it through our demo code to turn it into a PDF.
Using svg2pdf
When you install svglib, you also get a command-line tool called svg2pdf. As
the name implies, you can use this tool to convert SVG files to PDF files. Let’s
look at a couple of examples:
svg2pdf /path/to/plot.svg
This command just takes the path to the SVG file that you want to turn into a
PDF. It will automatically rename the output to the same name as the input file,
but with the PDF extension. You can specify the output name though:
svg2pdf -o /path/to/output.pdf /path/to/plot.svg
The -o flag tells svg2pdf requires that you pass in the output PDF path followed
by the input SVG path.
The documentation also mentions that you can convert all the SVG files to PDFs
using a command like the following:
svg2pdf -o "%(base)s.pdf" path/to/file*.svg
This will rename the output PDF to the same name as the input SVG file for
each SVG file in the specified folder.
Wrapping Up
The svglib is the primary method to add SVGs to ReportLab at the time of
writing this book. While it isn’t full featured, it works pretty well and the API is
quite nice. We also learned how to insert a plot SVG created via the popular
matplotlib package. Finally we looked at how to turn SVGs to PDFs using the
svg2pdf command line tool.
Appendix B - Getting System Fonts
ReportLab does not have a way to get all the fonts that are installed on a system.
In fact, Python itself has no good way to grab the fonts either. But I thought it
would be fun to write a script that would grab all the TrueType fonts on your
system and try to add them to your PDF.
After much research, I thought I had found a handy answer with Python’s built-
in GUI toolkit, Tkinter:
>>> from tkinter import Tk, font
>>> root = Tk()
>>> font.families()
This code basically creates a Tk object which then allows us to query the
operating system for installed font families and names. Alas, it does not provide
us with the file paths to those families as far as I could find.
python -m pip install fonttools
Once you have that installed, we can write some code to get the information that
ReportLab requires!
Getting the System’s Fonts
The first step is to figure out a way to find all the TTF files on your system. To
do that, we can use ReportLab’s rl_settings, which has some search paths built
into it. Let’s start by creating a file named find_installed_fonts.py. Here’s how
the code should start:
# find_installed_fonts.py
import os
def find_installed_ttf_fonts():
installed_fonts = []
for folder in rl_settings.TTFSearchPath:
if os.path.exists(folder):
for entry in os.scandir(folder):
if entry.name.endswith('.ttf'):
installed_fonts.append(entry.path)
return installed_fonts
Now let’s add another function to our script. This function will extract the font
name from the TTF file using the fontTools package.
def get_font_names(font_paths): fonts = {}
for font in font_paths:
tt = ttLib.TTFont(font)
name = ''
family = ''
if '\000' in str(record.string):
name = record.string
if '\000' in str(record.string):
family = record.string
break
fonts[name] = font
return fonts
if __name__ == '__main__':
paths = find_installed_ttf_fonts()
get_font_names(paths)
Here we create an empty dictionary that we will use to map our font names to
their respective font files. This code is based on some used by the TTFQuery
package (https://pypi.org/project/TTFQuery/), which is a useful wrapper around
fontTools. Anyway, I was able to use this to extract font names pretty
successfully.
Now we just need to write some ReportLab code to embed our newly found
fonts into a PDF.
# system_font_demo.py
import find_installed_fonts
my_canvas.setFont(font, 12)
my_canvas.drawString(30, pos_y, font)
pos_y -= 10
if pos_y < 40:
my_canvas.showPage()
pos_y = 750
if __name__ == '__main__':
my_canvas = canvas.Canvas("system_font_demo.pdf")
font_paths = find_installed_fonts.find_installed_ttf_fonts()
fonts = find_installed_fonts.get_font_names(font_paths)
system_font_demo(my_canvas, fonts)
my_canvas.save()
For this example, we import our font finder script as well as a few items from
ReportLab. For this example, we use ReportLab’s canvas.Canvas class. For
each font that we found on the system, we attempt to load it using ReportLab’s
TTFont class. I had some fonts that wouldn’t load, so I wrapped this step in a
try/except. Then we write out the font name and adjust the y coordinate so that
each font is on a different line. When I ran this code, I ended up with several
pages worth of fonts. Note that some fonts are rendered as boxes. I am guessing
these fonts could not be embedded as-is so ReportLab did the best that it could.
Fig. Appendix B-1: System Fonts
Wrapping Up
Now you have enough code that you will be able to access the pre-installed fonts
on your system. You can updated this code as you see fit to add new font search
paths or to look for other types of fonts besides TrueType.
Appendix C - Creating a Color Demo
ReportLab has many built-in colors included in reportlab.lib.colors.
colors actually look like. So in this appendix, we will write some code
that can generate a PDF that shows all the colors in the
of variables that represent the ReportLab color names. They are set
setting the color. The HexColor function returns a ReportLab Color object. So
what we need to do is figure out how to access those color
variables.
Let’s take a look at some code that shows one way to do what we want:
from reportlab.lib import colors
from reportlab.lib.pagesizes import letter
from reportlab.graphics.shapes import String, Rect, Drawing
from reportlab.platypus import Paragraph, SimpleDocTemplate
def color_demo():
doc = SimpleDocTemplate("colors.pdf",
pagesize=letter)
flowables = []
for color in dir(colors):
attr = getattr(colors, color)
if isinstance(attr, colors.Color):
drawing = Drawing(width=100, height=50)
string = String(10, 10, color)
rectangle = Rect(125, 10, 100, 35)
rectangle.fillColor = attr
drawing.add(string)
drawing.add(rectangle)
flowables.append(drawing)
doc.build(flowables)
if __name__ == '__main__':
color_demo()
the attributes and methods from the colors module. This will return
a LOT of cruft that we just don’t care about. So when we iterate over
Color type. If it is, then we create a drawing and add a string with
the color’s name. We also add a rectangle that is filled with that
color.
When you run this code you should get a multipage document that look
Wrapping Up
Being able to see what the color looks like before we attempt to use it
is quite nice. You might want to use the code in this appendix to
generate a listing of the colors and keep that file around to refer to
or print it out and put them on the wall near your PC.
Afterword and Thanks
This book was a lot of fun to put together. I enjoyed learning how to do new
things with ReportLab and Python. There are lots of really interesting Python
packages that you can use to work with PDFs. In this book, we looked at the
major ones:
ReportLab
PyPDF2
pdfrw
PyFPDF
pdfminer
rst2pdf
WeasyPrint
I really hope you enjoyed learning how to create, edit and remix PDFs using
Python. There are so many applications you can create using the information that
you learned in this book. I hope you will let me know about some of the fun
applications that you create.
I want to thank Axel Kielhorn, Keven Webb and Steve Barnes for all their
suggestions and bug reports. I also want to thank the ReportLab developers for
creating such a useful package and supporting it for so many years. Finally I
want to thank the many other people who have written neat packages for
manipulating PDFs with Python and a special thanks goes out to my readers for
supporting me and this book!
Thanks, Mike