PYTHON-BLOGGERS
Data science news and tutorials - contributed by Python bloggers
HOME ABOUT RSS ADD YOUR BLOG CONTACT US
To search, type and hit enter
How to Create PDF Reports with
Python – The Essential Guide
Daily news and tutorials about data-
science with Python, contributed by
bloggers. Stay updated:
Posted on January 18, 2021 by Dario Radečić in Data science | 0 Comments
Your e-mail here Subscribe
[This article was !rst published on python – Better Data Science, and kindly contributed to python-
Follow @pythonbloggers 197 followers
bloggers]. (You can report issue about the content on this page here)
Python-bloggers
Want to share your content on python-bloggers? click here. Like Page 69 likes
! Share " Tweet
Reports are everywhere, so any tech professional must know how to create them. It’s a
tedious and time-consuming task, which makes it a perfect candidate for automation
with Python.
Recent Posts
You can bene!t from an automated report generation whether you’re a data scientist or
a software developer. For example, data scientists might use reports to show Why data analysts should learn to code
performance or explanations of machine learning models. The learning theories behind Advancing into
Analytics
This article will teach you how to make data-visualization-based reports and save them Master Machine Learning: Decision Trees
as PDFs. To be more precise, you’ll learn how to combine multiple data visualizations From Scratch With Python
(dummy sales data) into a single PDF !le. How to Resample Data by Group In Pandas
How to Predict the Position of Runners in a
And the best thing is – it’s easier than you think!
Race
The article is structured as follows:
Sponsors
Data generation
Data visualization
Create a PDF page structure
Create PDF reports
Conclusion
You can download the Notebook with the source code here.
Data generation
You can’t have reports without data. That’s why you’ll have to generate some !rst—more
on that in a bit.
Let’s start with the imports. You’ll need a bunch of things – but the FPDF library is likely
the only unknown. Put simply, it’s used to create PDFs, and you’ll work with it a bit later.
Refer to the following snippet for the imports:
1 import os
2 import shutil
3 import numpy as np MySQL Online Course
4 import pandas as pd Sta! Learning Today
5 import calendar
Learn MySQL Online At Your Own Pace.
6 from datetime import datetime Sta! Today and Become an Expe! in
7 from fpdf import FPDF Days
8 udemy.com
9 import matplotlib.pyplot as plt
10 from matplotlib import rcParams
11 rcParams['axes.spines.top'] = False OPEN
12 rcParams['axes.spines.right'] = False
pdf_reports.py hosted with ❤ by GitHub view raw
Popular Posts
Let’s generate some fake data next. The idea is to declare a function that returns a data
frame of dummy sales data for a given month. It does that by constructing a date range
for the entire month and then assigning the sales amount as a random integer within a
given range. Archives
You can use the calendar library to get the last day for any year/month combination. Select Month
Here’s the entire code snippet:
Full list of contributing python-bloggers
1 def generate_sales_data(month: int) -> pd.DataFrame:
2 # Date range from first day of month until last
3 # Use ```calendar.monthrange(year, month)``` to get the last date
4 dates = pd.date_range(
5 start=datetime(year=2020, month=month, day=1),
6 end=datetime(year=2020, month=month, day=calendar.monthrange(2020,
7 )
8
9 # Sales numbers as a random integer between 1000 and 2000
10 sales = np.random.randint(low=1000, high=2000, size=len(dates))
11
12 # Combine into a single dataframe
13 return pd.DataFrame({
14 'Date': dates,
15 'ItemsSold': sales
16 })
17
18 # Test
19 generate_sales_data(month=3)
pdf_reports.py hosted with ❤ by GitHub view raw
A call to generate_sales_data(month=3) generated 31 data points for March of 2020.
Here’s how the !rst couple of rows look like:
Image 1 – Sample of generated data (image by author)
And that’s it – you now have a function that generates dummy sales data. Let’s see how
to visualize it next.
Data visualization
Your next task is to create a function that visualizes the earlier created dataset as a line
plot. It’s the most appropriate visualization type, as you’re dealing with time series data.
Here’s the function for data visualization and an example call:
1 def plot(data: pd.DataFrame, filename: str) -> None:
2 plt.figure(figsize=(12, 4))
3 plt.grid(color='#F2F2F2', alpha=1, zorder=0)
4 plt.plot(data['Date'], data['ItemsSold'], color='#087E8B', lw=3, zorder
5 plt.title(f'Sales 2020/{data["Date"].dt.month[0]}', fontsize=17)
6 plt.xlabel('Period', fontsize=13)
7 plt.xticks(fontsize=9)
8 plt.ylabel('Number of items sold', fontsize=13)
9 plt.yticks(fontsize=9)
10 plt.savefig(filename, dpi=300, bbox_inches='tight', pad_inches=0)
11 plt.close()
12 return
13
14 # Test
15 december = generate_sales_data(month=12)
16 plot(data=december, filename='december.png')
pdf_reports.py hosted with ❤ by GitHub view raw
In a nutshell – you’re creating data visualization, setting the title, playing around with
fonts – nothing special. The visualization isn’t shown to the user but is instead saved to
the machine. You’ll see later how powerful this can be.
An example call will save a data visualization for December of 2020. Here’s how it looks
like:
Image 2 – Sales for December/2020 plot (image by author)
And that’s your visualization function. There’s only one step remaining before you can
create PDF documents, and that is to save all the visualization and de!ne the report
page structure.
Create a PDF page structure
The task now is to create a function that does the following:
Creates a folder for charts – deletes if it exists and re-creates it
Saves a data visualization for every month in 2020 except for January – so you can
see how to work with di"erent number of elements per page (feel free to include
January too)
Creates a PDF matrix from the visualizations – a 2-dimensional matrix where a row
represents a single page in the PDF report
Here’s the code snippet for the function:
1 PLOT_DIR = 'plots'
2
3 def construct():
4 # Delete folder if exists and create it again
5 try:
6 shutil.rmtree(PLOT_DIR)
7 os.mkdir(PLOT_DIR)
8 except FileNotFoundError:
9 os.mkdir(PLOT_DIR)
10
11 # Iterate over all months in 2020 except January
12 for i in range(2, 13):
13 # Save visualization
14 plot(data=generate_sales_data(month=i), filename=f'{PLOT_DIR}/{i}.png'
15
16 # Construct data shown in document
17 counter = 0
18 pages_data = []
19 temp = []
20 # Get all plots
21 files = os.listdir(PLOT_DIR)
22 # Sort them by month - a bit tricky because the file names are strings
23 files = sorted(os.listdir(PLOT_DIR), key=lambda x: int(x.split('.')[0]))
24 # Iterate over all created visualization
25 for fname in files:
26 # We want 3 per page
27 if counter == 3:
28 pages_data.append(temp)
29 temp = []
30 counter = 0
31
32 temp.append(f'{PLOT_DIR}/{fname}')
33 counter += 1
34
35 return [*pages_data, temp]
pdf_reports.py hosted with ❤ by GitHub view raw
It’s possibly a lot to digest, so go over it line by line. The comments should help. The idea
behind sorting is to obtain the month integer representation from the string – e.g., 3
from “3.png” and use this value to sort the charts. Delete this line if the order doesn’t
matter, but that’s not the case with months.
Here’s an example call of the construct() function:
1 plots_per_page = construct()
2 plots_per_page
pdf_reports.py hosted with ❤ by GitHub view raw
You should see the following in your Notebook after running the above snippet:
Image 3 – Generated visualizations (image by author)
In case you’re wondering – here’s how the plots/ folder looks on my machine (after
calling the construct() function):
Image 4 – PDF report content matrix (image by author)
And that’s all you need to construct PDF reports – you’ll learn how to do that next.
Create PDF reports
This is where everything comes together. You’ll now create a custom PDF class that
inherits from the FPDF . This way, all properties and methods are available in our class,
if you don’t forget to call super().__init__() in the constructor. The constructor will
also hold values for page width and height (A4 paper).
Your PDF class will have a couple of methods:
header() – used to de!ne the document header. A custom logo is placed on the
left (make sure to have one or delete this code line), and a hardcoded text is
placed on the right
footer() – used to de!ne the document footer. It will simply show the page
number
page_body() – used to de!ne how the page looks like. This will depend on the
number of visualizations shown per page, so positions are margins are set
accordingly (feel free to play around with the values)
print_page() – used to add a blank page and !ll it with content
Here’s the entire code snippet for the class:
1 class PDF(FPDF):
2 def __init__(self):
3 super().__init__()
4 self.WIDTH = 210
5 self.HEIGHT = 297
6
7 def header(self):
8 # Custom logo and positioning
9 # Create an `assets` folder and put any wide and short image inside
10 # Name the image `logo.png`
11 self.image('assets/logo.png', 10, 8, 33)
12 self.set_font('Arial', 'B', 11)
13 self.cell(self.WIDTH - 80)
14 self.cell(60, 1, 'Sales report', 0, 0, 'R')
15 self.ln(20)
16
17 def footer(self):
18 # Page numbers in the footer
19 self.set_y(-15)
20 self.set_font('Arial', 'I', 8)
21 self.set_text_color(128)
22 self.cell(0, 10, 'Page ' + str(self.page_no()), 0, 0, 'C')
23
24 def page_body(self, images):
25 # Determine how many plots there are per page and set positions
26 # and margins accordingly
27 if len(images) == 3:
28 self.image(images[0], 15, 25, self.WIDTH - 30)
29 self.image(images[1], 15, self.WIDTH / 2 + 5, self.WIDTH - 30)
30 self.image(images[2], 15, self.WIDTH / 2 + 90, self.WIDTH - 30
31 elif len(images) == 2:
32 self.image(images[0], 15, 25, self.WIDTH - 30)
33 self.image(images[1], 15, self.WIDTH / 2 + 5, self.WIDTH - 30)
34 else:
35 self.image(images[0], 15, 25, self.WIDTH - 30)
36
37 def print_page(self, images):
38 # Generates the report
39 self.add_page()
40 self.page_body(images)
pdf_reports.py hosted with ❤ by GitHub view raw
Now it’s time to instantiate it and to append pages from the 2-dimensional content
matrix:
1 pdf = PDF()
2
3 for elem in plots_per_page:
4 pdf.print_page(elem)
5
6 pdf.output('SalesRepot.pdf', 'F')
pdf_reports.py hosted with ❤ by GitHub view raw
The above cell will take some time to execute, and will return an empty string when
done. That’s expected, as your report is saved to the folder where the Notebook is
stored.
Here’s how to !rst page of the report should look like:
Image 5 – First page of the PDF report (image by author)
Of course, yours will look di"erent due to the di"erent logo and due to sales data being
completely random.
And that’s how you create data-visualization-powered PDF reports with Python. Let’s
wrap things up next.
Conclusion
You’ve learned many things today – how to create dummy data for any occasion, how to
visualize it, and how to embed visualizations into a single PDF report. Embedding your
visualizations will require minimal code changes – mostly for positioning and margins.
Let me know if you’d like to see a guide for automated report creation based on
machine learning model interpretations (SHAP or LIME) or something else related to
data science.
Thanks for reading.
Connect on LinkedIn.
Join my private email list for more helpful insights.
Learn more
Top 5 Books to Learn Data Science in 2021
SHAP: How to Interpret Machine Learning Models With Python
Top 3 Classi!cation Machine Learning Metrics – Ditch Accuracy Once and For All
ROC and AUC – How to Evaluate Machine Learning Models
Precision-Recall Curves: How to Easily Evaluate Machine Learning Models
The post How to Create PDF Reports with Python – The Essential Guide appeared !rst on
Better Data Science.
Related
R vs Python: Which is Matplotlib Cheat Sheet:
better for Data Science? Plotting in Python
Since, R and Python Data visualization and
remain the most popular storytelling with your data
languages, it seems are essential skills that
reasonable to debate every data scientist needs 3 Essential Ways to
which one is better. We'll to communicate insights Calculate Feature
Importance in Python
! Share " Tweet
To leave a comment for the author, please follow the link and comment on their blog:
python – Better Data Science.
Want to share your content on python-bloggers? click here.
← Previous post Next post →
Copyright © 2021 | MH Corporate basic by MH Themes