0% found this document useful (0 votes)
22 views

Use Python To Fill PDF Files! - AKDux

The document discusses various methods for automating the filling of PDF forms using Python, VBA, and libraries like pdfrw and PyPDF2. It details the challenges faced with each approach and ultimately presents a working solution using Python that allows for easy population of PDF forms from a data dictionary. The author provides code examples and tips for ensuring that filled fields appear correctly in Adobe Acrobat.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Use Python To Fill PDF Files! - AKDux

The document discusses various methods for automating the filling of PDF forms using Python, VBA, and libraries like pdfrw and PyPDF2. It details the challenges faced with each approach and ultimately presents a working solution using Python that allows for easy population of PDF forms from a data dictionary. The author provides code examples and tips for ensuring that filled fields appear correctly in Adobe Acrobat.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

29/3/24, 0:16 Use Python To Fill PDF Files!

- AKDux

Search Here...

OCTOBER 31

Use Python To Fill PDF Files!


PYTHON 1 COMMENTS

PDFs are hard to work with. Over the years I've tried several approaches to filling them out in an
automated way. It's amazing my job has so many manual tasks that require filling out PDFs. It's
fairly routine for me to be manually filling out PDF files to process transactions. Needless to say I've

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 1/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

either created or borrowed several solutions. First let me say I'm no VBA expert but I have
experimented with solutions here as well.

I've wrote a VBA script to fill out a PDF using


"send keys". Think "how could I do everything by
TA B L E O F C O N T E N TS
use of just the keyboard shortcuts?" Once you
know how to open a PDF with shortcuts, tab
through the form fields and use shortcuts to 1 PDF Setup
save, you can automate this in VBA. The
2 pdfrw Setup
problem here is that it's painstaking to set up.
Plus if the form changes or you want to add a 3 Accessing our PDF
new form it's basically like starting from scratch.
4 Filling a PDF

Next I used VBA and the Acrobat reference to


5 Bringing it all together
access and manipulate PDFs. This works much
better because you can access the PDF form 6 Bonus material: how to fill in multiple fields
with same name
fields using VBA and Java Script. I would highly
recommend this route if you're going to use VBA.
I still felt as if every PDF template had to be
setup completely separate. Some of this is likely
due to my experience level with VBA. Either way
there was a lot of copying a pasting code.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 2/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Then came my experience with Python, PyPDF2 and reportlab. I won't go into too much detail about
exactly how I did this. In short you create your PDF template, create blank PDF with just your data
fields, and paste the new PDF as a watermark on top of your PDF template. Again, this is
painstaking because you're using grid coordinates to position where text should be placed on the
page. This worked, it was fast, but it wasn't great if the PDF template changed or if you wanted to
manipulate the PDF file afterward.

It was great when I found you could fill PDF form fields with python using PyPDF2 and pdfrw. Both
of these libraries look to be able to do similar tasks but I chose pdfrw because it appears to be
maintained better. PyPDF2 actually is no longer maintained. There is a PyPDF3 and PyPDF4;
however, I already settled on pdfrw. The only issue I ran into is that you could fill in the fields but
those values wouldn't show until you refreshed the field in Acrobat. I found two ways around this;
one was to click into every field and hit Enter. This option isn't doable if you have several PDFs. The
next was to open the PDFs in a web browser which causes a refresh of the fields.

Because of these challenges I gave up for a while... However, while digging into Python and PDFs
again I found the solution that refreshes the fields!

So now I have a working solution I can pass around the office easily. A basic macro reference a
Python exe file located on a shared network drive. Meaning there is no python install! And we can
populate PDF forms with a simple excel macro while still getting all the flexibility and functionality
of Python. The rest of this post will be going through an example of how to fill out a PDF using
python.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 3/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

PDF Setup
I’m using Adobe Acrobat DC. I’m going to create a sample PDF file for this example. If you have an
existing PDF you want to use just open, click on Tools > Prepare Form. This action will create a
fillable PDF form.

Now let’s create a simple PDF for this example. We have the following fields.

name

phone

date

account_number

cb_1 (check box “Yes”)

cb_2 (check box “No”)

Now that we have a sample PDF we will get started with a little Python.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 4/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Example of the form I’m using

pdfrw Setup
First thing to do is install pdfrw using !pip install pdfrw

Python

1 !pip3 install pdfrw

Python

1 import pdfrw
2 pdfrw.__version__
3
https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 5/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

'0.4'

Accessing our PDF

Python
1 # Let's first set some variable to reference our PDF template and output.pdf
2 pdf_template = "template.pdf"
3 pdf_output = "output.pdf"

Python
1 template_pdf = pdfrw.PdfReader(pdf_template) # create a pdfrw object from our template.pdf
2 # template_pdf # uncomment to see all the data captured from this PDF.

You should print out template_pdf to see everything available in the PDF. There is a lot so for ease of
reading I’ll comment out.

For now let’s just try to get the form fields of the PDF we created. To do this we will set some of the
variable we find important. I grabbed this code from a random snippet online but you can find
several similar setups on stack overflow.

Python

1 ANNOT_KEY = '/Annots'
2 ANNOT_FIELD_KEY = '/T'
3 ANNOT_VAL_KEY = '/V'
https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 6/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux
4 ANNOT_RECT_KEY = '/Rect'
5 SUBTYPE_KEY = '/Subtype'
6 WIDGET_SUBTYPE_KEY = '/Widget'

Next, we can loop through the page(s). Here we only have one but you it’s a good idea to prepare
for future functionality. We grab all the annotations to grab just the form field keys.

Python
1 for page in template_pdf.pages:
2 annotations = page[ANNOT_KEY]
3 for annotation in annotations:
4 if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
5 if annotation[ANNOT_FIELD_KEY]:
6 key = annotation[ANNOT_FIELD_KEY][1:-1]
7 print(key)

name
phone
date
account_number
cb_1
cb_2

There you can see we were able to grab our form field names!

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 7/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Filling a PDF
To fill a PDF we can create a dictionary of what we want to populate the PDF. The
dictionary keys will be the form field names and the values will be what we want to fill into the PDF.

Python

1 from datetime import date


2 data_dict = {
3 'name': 'Andrew Krcatovich',
4 'phone': '(123) 123-1234',
5 'date': date.today(),
6 'account_number': '123123123',
7 'cb_1': True,
8 'cb_2': False,
9 }

Let’s setup a function to handle grabbing the keys, populating the values, and saving out
the output.pdf file

Python

1
2 def fill_pdf(input_pdf_path, output_pdf_path, data_dict):
3 template_pdf = pdfrw.PdfReader(input_pdf_path)
4 for page in template_pdf.pages:
5 annotations = page[ANNOT_KEY]
6 for annotation in annotations:
7 if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
8
https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 8/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

9 if annotation[ANNOT_FIELD_KEY]:
10 key = annotation[ANNOT_FIELD_KEY][1:-1]
11 if key in data_dict.keys():
12 if type(data_dict[key]) == bool:
13 if data_dict[key] == True:
14 annotation.update(pdfrw.PdfDict(
15 AS=pdfrw.PdfName('Yes')))
16 else:
17 annotation.update(
18 pdfrw.PdfDict(V='{}'.format(data_dict[key]))
19 )
20 annotation.update(pdfrw.PdfDict(AP=''))
pdfrw.PdfWriter().write(output_pdf_path, template_pdf)

Python
1 fill_pdf(pdf_template, pdf_output, data_dict)

Okay! That just filled out a PDF. Opening in preview on my Mac shows.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 9/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

However, opening the very same PDF in Acrobat doesn’t show the values of the form fields. If you
click into the field you can see it did fill but for some reason the field isn’t refreshed to show the
value. Printing the PDF here won’t help either as it will print blank. After a long while searching for
an answer I found the following solution. Worked like a charm and the form fields are now showing
in Acrobat as well.

Tip: add Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject(“true”)))

Honestly, I don’t know why this isn’t the default setting. It seems like everyone online runs into the
same issue and this solution seems hidden away to where there are several hard work-arounds

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 10/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

that are being used. Either way just add the above reference line to the fill_pdf function like so.

Python
1 def fill_pdf(input_pdf_path, output_pdf_path, data_dict):
2 template_pdf = pdfrw.PdfReader(input_pdf_path)
3 for page in template_pdf.pages:
4 annotations = page[ANNOT_KEY]
5 for annotation in annotations:
6 if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
7 if annotation[ANNOT_FIELD_KEY]:
8 key = annotation[ANNOT_FIELD_KEY][1:-1]
9 if key in data_dict.keys():
10 if type(data_dict[key]) == bool:
11 if data_dict[key] == True:
12 annotation.update(pdfrw.PdfDict(
13 AS=pdfrw.PdfName('Yes')))
14 else:
15 annotation.update(
16 pdfrw.PdfDict(V='{}'.format(data_dict[key]))
17 )
18 annotation.update(pdfrw.PdfDict(AP=''))
19 template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true'))) # NEW
20 pdfrw.PdfWriter().write(output_pdf_path, template_pdf)

I added one additional function fill_simple_pdf_file as I found it very useful to manipulate a data
dictionary, especially if it came from an excel file, first before populating the data. This way you can
create many fillable forms from the same data source, do formating on the fields and set default
values if nothing was supplied.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 11/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Bringing it all together

Python

1 import pdfrw
2 from datetime import date
3
4 ANNOT_KEY = '/Annots'
5 ANNOT_FIELD_KEY = '/T'
6 ANNOT_VAL_KEY = '/V'
7 ANNOT_RECT_KEY = '/Rect'
8 SUBTYPE_KEY = '/Subtype'
9 WIDGET_SUBTYPE_KEY = '/Widget'
10
11 def fill_pdf(input_pdf_path, output_pdf_path, data_dict):
12 template_pdf = pdfrw.PdfReader(input_pdf_path)
13 for page in template_pdf.pages:
14 annotations = page[ANNOT_KEY]
15 for annotation in annotations:
16 if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
17 if annotation[ANNOT_FIELD_KEY]:
18 key = annotation[ANNOT_FIELD_KEY][1:-1]
19 if key in data_dict.keys():
20 if type(data_dict[key]) == bool:
21 if data_dict[key] == True:
22 annotation.update(pdfrw.PdfDict(
23 AS=pdfrw.PdfName('Yes')))
24 else:
25 annotation.update(
26 pdfrw.PdfDict(V='{}'.format(data_dict[key]))
27 )
28 annotation.update(pdfrw.PdfDict(AP=''))
29 template_pdf.Root.AcroForm.update(pdfrw.PdfDict(NeedAppearances=pdfrw.PdfObject('true')))
30
https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 12/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

31 pdfrw.PdfWriter().write(output_pdf_path, template_pdf)
32
33 # NEW
34 def fill_simple_pdf_file(data, template_input, template_output):
35 some_date = date.today()
36 data_dict = {
37 'name': data.get('name', ''),
38 'phone': data.get('phone', ''),
39 'date': some_date,
40 'account_number': data.get('account_number', ''),
41 'cb_1': data.get('cb_1', False),
42 'cb_2': data.get('cb_2', False),
43 }
44 return fill_pdf(template_input, template_output, data_dict)
45
46 if __name__ == '__main__':
47 pdf_template = "template.pdf"
48 pdf_output = "output.pdf"
49
50 sample_data_dict = {
51 'name': 'Andrew Krcatovich',
52 'phone': '(123) 123-1234',
53 # 'date': date.today(), # Removed date so we can dynamically set in python.
54 'account_number': '123123123',
55 'cb_1': True,
56 'cb_2': False,
57 }
fill_simple_pdf_file(sample_data_dict, pdf_template, pdf_output)

Thanks for reading! Hope this can help someone else!

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 13/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Bonus material: how to fill in multiple fields with same


name
There are really two ways to get around this issue. One way is to rename the fields to different
names. e.g. name__1 and name__2. If you have a lot of duplicate fields or need to fill this form out
manually, a better options would be to experiment with the the widget annotations. Typically, I find
that duplicates create a ‘/Parent’ annotation before the ‘/T’ annotation.

You could do something like:

Python
from datetime import date
from pdfrw import PdfReader, PdfDict, PdfName, PdfObject, PdfWriter

ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'

data_dict = {
'account_number': '12312312',
'trade_date': date.today(),
}

template_pdf = PdfReader("test.pdf")
for page in template_pdf.pages:
annotations = page[ANNOT_KEY]

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 14/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux
19
for annotation in annotations:
20
if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
21
# CHANGED: for example purposes
22
if not annotation[ANNOT_FIELD_KEY]:
23
if annotation['/Parent']: # note the '/Parent' widget
24
key = annotation['/Parent'][ANNOT_FIELD_KEY][1:-1] # so '/T' is inside the '/Pare
25
if key in data_dict.keys():
26
annotation['/Parent'].update(
27
PdfDict(V='{}'.format(data_dict[key]))
28
)
29
annotation['/Parent'].update(PdfDict(AP=''))
30
template_pdf.Root.AcroForm.update(PdfDict(NeedAppearances=PdfObject('true')))
31
PdfWriter().write("output.pdf", template_pdf)

Previous

TAGS

Get in touch
You may also like

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 15/16
29/3/24, 0:16 Use Python To Fill PDF Files! - AKDux

Name* No Business 100 Message IRONMAN 2020 AWA!

Email*

USEFUL LINKS C AT E G O R I E S C O N TA C T

Home Python S e nUSA


, Chicago, IL, 60640, d Message

Blog Running
Contact Uncategorized
Privacy policy (269) 355-0845

[email protected]

SOCIAL

Copyright 2024 Andrew Krcatovich, all rights reserved.

https://akdux.com/python/2020/10/31/python-fill-pdf-files/ 16/16

You might also like