How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
[This article was !rst published on python – Better Data Science, and kindly contributed to python-
Follow @pythonbloggers 197 followers
bloggers]. (You can report issue about the content on this page here)
Python-bloggers
Want to share your content on python-bloggers? click here. Like Page 69 likes
Reports are everywhere, so any tech professional must know how to create them. It’s a
tedious and time-consuming task, which makes it a perfect candidate for automation
with Python.
Recent Posts
You can bene!t from an automated report generation whether you’re a data scientist or
a software developer. For example, data scientists might use reports to show Why data analysts should learn to code
performance or explanations of machine learning models. The learning theories behind Advancing into
Analytics
This article will teach you how to make data-visualization-based reports and save them Master Machine Learning: Decision Trees
as PDFs. To be more precise, you’ll learn how to combine multiple data visualizations From Scratch With Python
(dummy sales data) into a single PDF !le. How to Resample Data by Group In Pandas
How to Predict the Position of Runners in a
And the best thing is – it’s easier than you think!
Race
You can download the Notebook with the source code here.
Data generation
You can’t have reports without data. That’s why you’ll have to generate some !rst—more
on that in a bit.
Let’s start with the imports. You’ll need a bunch of things – but the FPDF library is likely
the only unknown. Put simply, it’s used to create PDFs, and you’ll work with it a bit later.
Refer to the following snippet for the imports:
1 import os
2 import shutil
Popular Posts
Let’s generate some fake data next. The idea is to declare a function that returns a data
frame of dummy sales data for a given month. It does that by constructing a date range
for the entire month and then assigning the sales amount as a random integer within a
given range. Archives
You can use the calendar library to get the last day for any year/month combination. Select Month
4 dates = pd.date_range(
7 )
8
9 # Sales numbers as a random integer between 1000 and 2000
11
12 # Combine into a single dataframe
13 return pd.DataFrame({
14 'Date': dates,
15 'ItemsSold': sales
16 })
17
18 # Test
19 generate_sales_data(month=3)
And that’s it – you now have a function that generates dummy sales data. Let’s see how
to visualize it next.
Data visualization
Your next task is to create a function that visualizes the earlier created dataset as a line
plot. It’s the most appropriate visualization type, as you’re dealing with time series data.
6 plt.xlabel('Period', fontsize=13)
7 plt.xticks(fontsize=9)
9 plt.yticks(fontsize=9)
11 plt.close()
12 return
13
14 # Test
15 december = generate_sales_data(month=12)
16 plot(data=december, filename='december.png')
In a nutshell – you’re creating data visualization, setting the title, playing around with
fonts – nothing special. The visualization isn’t shown to the user but is instead saved to
the machine. You’ll see later how powerful this can be.
An example call will save a data visualization for December of 2020. Here’s how it looks
like:
Image 2 – Sales for December/2020 plot (image by author)
And that’s your visualization function. There’s only one step remaining before you can
create PDF documents, and that is to save all the visualization and de!ne the report
page structure.
1 PLOT_DIR = 'plots'
2
3 def construct():
5 try:
6 shutil.rmtree(PLOT_DIR)
7 os.mkdir(PLOT_DIR)
8 except FileNotFoundError:
9 os.mkdir(PLOT_DIR)
10
11 # Iterate over all months in 2020 except January
13 # Save visualization
14 plot(data=generate_sales_data(month=i), filename=f'{PLOT_DIR}/{i}.png'
15
16 # Construct data shown in document
17 counter = 0
18 pages_data = []
19 temp = []
21 files = os.listdir(PLOT_DIR)
22 # Sort them by month - a bit tricky because the file names are strings
29 temp = []
30 counter = 0
31
32 temp.append(f'{PLOT_DIR}/{fname}')
33 counter += 1
34
35 return [*pages_data, temp]
It’s possibly a lot to digest, so go over it line by line. The comments should help. The idea
behind sorting is to obtain the month integer representation from the string – e.g., 3
from “3.png” and use this value to sort the charts. Delete this line if the order doesn’t
matter, but that’s not the case with months.
1 plots_per_page = construct()
2 plots_per_page
You should see the following in your Notebook after running the above snippet:
In case you’re wondering – here’s how the plots/ folder looks on my machine (after
calling the construct() function):
And that’s all you need to construct PDF reports – you’ll learn how to do that next.
header() – used to de!ne the document header. A custom logo is placed on the
left (make sure to have one or delete this code line), and a hardcoded text is
placed on the right
footer() – used to de!ne the document footer. It will simply show the page
number
page_body() – used to de!ne how the page looks like. This will depend on the
number of visualizations shown per page, so positions are margins are set
accordingly (feel free to play around with the values)
print_page() – used to add a blank page and !ll it with content
1 class PDF(FPDF):
2 def __init__(self):
3 super().__init__()
4 self.WIDTH = 210
5 self.HEIGHT = 297
6
7 def header(self):
9 # Create an `assets` folder and put any wide and short image inside
13 self.cell(self.WIDTH - 80)
15 self.ln(20)
16
17 def footer(self):
23
24 def page_body(self, images):
25 # Determine how many plots there are per page and set positions
26 # and margins accordingly
27 if len(images) == 3:
28 self.image(images[0], 15, 25, self.WIDTH - 30)
31 elif len(images) == 2:
32 self.image(images[0], 15, 25, self.WIDTH - 30)
33 self.image(images[1], 15, self.WIDTH / 2 + 5, self.WIDTH - 30)
34 else:
35 self.image(images[0], 15, 25, self.WIDTH - 30)
36
37 def print_page(self, images):
40 self.page_body(images)
Now it’s time to instantiate it and to append pages from the 2-dimensional content
matrix:
1 pdf = PDF()
2
3 for elem in plots_per_page:
4 pdf.print_page(elem)
5
6 pdf.output('SalesRepot.pdf', 'F')
The above cell will take some time to execute, and will return an empty string when
done. That’s expected, as your report is saved to the folder where the Notebook is
stored.
Of course, yours will look di"erent due to the di"erent logo and due to sales data being
completely random.
And that’s how you create data-visualization-powered PDF reports with Python. Let’s
wrap things up next.
Conclusion
You’ve learned many things today – how to create dummy data for any occasion, how to
visualize it, and how to embed visualizations into a single PDF report. Embedding your
visualizations will require minimal code changes – mostly for positioning and margins.
Let me know if you’d like to see a guide for automated report creation based on
machine learning model interpretations (SHAP or LIME) or something else related to
data science.
Connect on LinkedIn.
Learn more
Top 5 Books to Learn Data Science in 2021
SHAP: How to Interpret Machine Learning Models With Python
Top 3 Classi!cation Machine Learning Metrics – Ditch Accuracy Once and For All
ROC and AUC – How to Evaluate Machine Learning Models
Precision-Recall Curves: How to Easily Evaluate Machine Learning Models
The post How to Create PDF Reports with Python – The Essential Guide appeared !rst on
Better Data Science.
Related
To leave a comment for the author, please follow the link and comment on their blog:
python – Better Data Science.