0% found this document useful (0 votes)
88 views

How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers

This document summarizes how to create PDF reports with Python. It discusses generating sample sales data, visualizing the data with line plots, and saving the plots. It then describes creating a folder to store the plots and constructing a 2D matrix structure to organize the plots into pages for the PDF report. Functions are provided to generate data, create visualizations, save the plots, and define the page structure to combine the plots into a multi-page PDF document.

Uploaded by

Mario Colosso V.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers

This document summarizes how to create PDF reports with Python. It discusses generating sample sales data, visualizing the data with line plots, and saving the plots. It then describes creating a folder to store the plots and constructing a 2D matrix structure to organize the plots into pages for the PDF report. Functions are provided to generate data, create visualizations, save the plots, and define the page structure to combine the plots into a multi-page PDF document.

Uploaded by

Mario Colosso V.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PYTHON-BLOGGERS

Data science news and tutorials - contributed by Python bloggers

HOME ABOUT RSS ADD YOUR BLOG CONTACT US

To search, type and hit enter

How to Create PDF Reports with


Python – The Essential Guide
Daily news and tutorials about data-
science with Python, contributed by
bloggers. Stay updated:
Posted on January 18, 2021 by Dario Radečić in Data science | 0 Comments
Your e-mail here Subscribe

[This article was !rst published on python – Better Data Science, and kindly contributed to python-
Follow @pythonbloggers 197 followers
bloggers]. (You can report issue about the content on this page here)

Python-bloggers
Want to share your content on python-bloggers? click here. Like Page 69 likes

! Share " Tweet

Reports are everywhere, so any tech professional must know how to create them. It’s a
tedious and time-consuming task, which makes it a perfect candidate for automation
with Python.
Recent Posts
You can bene!t from an automated report generation whether you’re a data scientist or
a software developer. For example, data scientists might use reports to show Why data analysts should learn to code
performance or explanations of machine learning models. The learning theories behind Advancing into
Analytics
This article will teach you how to make data-visualization-based reports and save them Master Machine Learning: Decision Trees
as PDFs. To be more precise, you’ll learn how to combine multiple data visualizations From Scratch With Python
(dummy sales data) into a single PDF !le. How to Resample Data by Group In Pandas
How to Predict the Position of Runners in a
And the best thing is – it’s easier than you think!
Race

The article is structured as follows:


Sponsors
Data generation
Data visualization
Create a PDF page structure
Create PDF reports
Conclusion

You can download the Notebook with the source code here.

Data generation
You can’t have reports without data. That’s why you’ll have to generate some !rst—more
on that in a bit.

Let’s start with the imports. You’ll need a bunch of things – but the FPDF library is likely
the only unknown. Put simply, it’s used to create PDFs, and you’ll work with it a bit later.
Refer to the following snippet for the imports:
1 import os
2 import shutil

3 import numpy as np MySQL Online Course


4 import pandas as pd Sta! Learning Today
5 import calendar
Learn MySQL Online At Your Own Pace.
6 from datetime import datetime Sta! Today and Become an Expe! in
7 from fpdf import FPDF Days
8 udemy.com
9 import matplotlib.pyplot as plt

10 from matplotlib import rcParams

11 rcParams['axes.spines.top'] = False OPEN


12 rcParams['axes.spines.right'] = False

pdf_reports.py hosted with ❤ by GitHub view raw

Popular Posts
Let’s generate some fake data next. The idea is to declare a function that returns a data
frame of dummy sales data for a given month. It does that by constructing a date range
for the entire month and then assigning the sales amount as a random integer within a
given range. Archives

You can use the calendar library to get the last day for any year/month combination. Select Month

Here’s the entire code snippet:


Full list of contributing python-bloggers
1 def generate_sales_data(month: int) -> pd.DataFrame:
2 # Date range from first day of month until last

3 # Use ```calendar.monthrange(year, month)``` to get the last date

4 dates = pd.date_range(

5 start=datetime(year=2020, month=month, day=1),

6 end=datetime(year=2020, month=month, day=calendar.monthrange(2020,

7 )

8
9 # Sales numbers as a random integer between 1000 and 2000

10 sales = np.random.randint(low=1000, high=2000, size=len(dates))

11
12 # Combine into a single dataframe

13 return pd.DataFrame({

14 'Date': dates,

15 'ItemsSold': sales

16 })

17
18 # Test

19 generate_sales_data(month=3)

pdf_reports.py hosted with ❤ by GitHub view raw

A call to generate_sales_data(month=3) generated 31 data points for March of 2020.


Here’s how the !rst couple of rows look like:
Image 1 – Sample of generated data (image by author)

And that’s it – you now have a function that generates dummy sales data. Let’s see how
to visualize it next.

Data visualization
Your next task is to create a function that visualizes the earlier created dataset as a line
plot. It’s the most appropriate visualization type, as you’re dealing with time series data.

Here’s the function for data visualization and an example call:

1 def plot(data: pd.DataFrame, filename: str) -> None:


2 plt.figure(figsize=(12, 4))

3 plt.grid(color='#F2F2F2', alpha=1, zorder=0)

4 plt.plot(data['Date'], data['ItemsSold'], color='#087E8B', lw=3, zorder

5 plt.title(f'Sales 2020/{data["Date"].dt.month[0]}', fontsize=17)

6 plt.xlabel('Period', fontsize=13)

7 plt.xticks(fontsize=9)

8 plt.ylabel('Number of items sold', fontsize=13)

9 plt.yticks(fontsize=9)

10 plt.savefig(filename, dpi=300, bbox_inches='tight', pad_inches=0)

11 plt.close()

12 return

13
14 # Test

15 december = generate_sales_data(month=12)
16 plot(data=december, filename='december.png')

pdf_reports.py hosted with ❤ by GitHub view raw

In a nutshell – you’re creating data visualization, setting the title, playing around with
fonts – nothing special. The visualization isn’t shown to the user but is instead saved to
the machine. You’ll see later how powerful this can be.

An example call will save a data visualization for December of 2020. Here’s how it looks
like:
Image 2 – Sales for December/2020 plot (image by author)

And that’s your visualization function. There’s only one step remaining before you can
create PDF documents, and that is to save all the visualization and de!ne the report
page structure.

Create a PDF page structure


The task now is to create a function that does the following:

Creates a folder for charts – deletes if it exists and re-creates it


Saves a data visualization for every month in 2020 except for January – so you can
see how to work with di"erent number of elements per page (feel free to include
January too)
Creates a PDF matrix from the visualizations – a 2-dimensional matrix where a row
represents a single page in the PDF report

Here’s the code snippet for the function:

1 PLOT_DIR = 'plots'
2
3 def construct():

4 # Delete folder if exists and create it again

5 try:

6 shutil.rmtree(PLOT_DIR)

7 os.mkdir(PLOT_DIR)

8 except FileNotFoundError:

9 os.mkdir(PLOT_DIR)

10
11 # Iterate over all months in 2020 except January

12 for i in range(2, 13):

13 # Save visualization

14 plot(data=generate_sales_data(month=i), filename=f'{PLOT_DIR}/{i}.png'

15
16 # Construct data shown in document

17 counter = 0

18 pages_data = []

19 temp = []

20 # Get all plots

21 files = os.listdir(PLOT_DIR)

22 # Sort them by month - a bit tricky because the file names are strings

23 files = sorted(os.listdir(PLOT_DIR), key=lambda x: int(x.split('.')[0]))

24 # Iterate over all created visualization

25 for fname in files:

26 # We want 3 per page


27 if counter == 3:
28 pages_data.append(temp)

29 temp = []

30 counter = 0

31
32 temp.append(f'{PLOT_DIR}/{fname}')

33 counter += 1

34
35 return [*pages_data, temp]

pdf_reports.py hosted with ❤ by GitHub view raw

It’s possibly a lot to digest, so go over it line by line. The comments should help. The idea
behind sorting is to obtain the month integer representation from the string – e.g., 3
from “3.png” and use this value to sort the charts. Delete this line if the order doesn’t
matter, but that’s not the case with months.

Here’s an example call of the construct() function:

1 plots_per_page = construct()
2 plots_per_page

pdf_reports.py hosted with ❤ by GitHub view raw

You should see the following in your Notebook after running the above snippet:

Image 3 – Generated visualizations (image by author)

In case you’re wondering – here’s how the plots/ folder looks on my machine (after
calling the construct() function):

Image 4 – PDF report content matrix (image by author)

And that’s all you need to construct PDF reports – you’ll learn how to do that next.

Create PDF reports


This is where everything comes together. You’ll now create a custom PDF class that
inherits from the FPDF . This way, all properties and methods are available in our class,
if you don’t forget to call super().__init__() in the constructor. The constructor will
also hold values for page width and height (A4 paper).

Your PDF class will have a couple of methods:

header() – used to de!ne the document header. A custom logo is placed on the
left (make sure to have one or delete this code line), and a hardcoded text is
placed on the right
footer() – used to de!ne the document footer. It will simply show the page
number
page_body() – used to de!ne how the page looks like. This will depend on the
number of visualizations shown per page, so positions are margins are set
accordingly (feel free to play around with the values)
print_page() – used to add a blank page and !ll it with content

Here’s the entire code snippet for the class:

1 class PDF(FPDF):
2 def __init__(self):
3 super().__init__()

4 self.WIDTH = 210

5 self.HEIGHT = 297

6
7 def header(self):

8 # Custom logo and positioning

9 # Create an `assets` folder and put any wide and short image inside

10 # Name the image `logo.png`

11 self.image('assets/logo.png', 10, 8, 33)

12 self.set_font('Arial', 'B', 11)

13 self.cell(self.WIDTH - 80)

14 self.cell(60, 1, 'Sales report', 0, 0, 'R')

15 self.ln(20)

16
17 def footer(self):

18 # Page numbers in the footer


19 self.set_y(-15)
20 self.set_font('Arial', 'I', 8)
21 self.set_text_color(128)

22 self.cell(0, 10, 'Page ' + str(self.page_no()), 0, 0, 'C')

23
24 def page_body(self, images):
25 # Determine how many plots there are per page and set positions
26 # and margins accordingly
27 if len(images) == 3:
28 self.image(images[0], 15, 25, self.WIDTH - 30)

29 self.image(images[1], 15, self.WIDTH / 2 + 5, self.WIDTH - 30)

30 self.image(images[2], 15, self.WIDTH / 2 + 90, self.WIDTH - 30

31 elif len(images) == 2:
32 self.image(images[0], 15, 25, self.WIDTH - 30)
33 self.image(images[1], 15, self.WIDTH / 2 + 5, self.WIDTH - 30)
34 else:
35 self.image(images[0], 15, 25, self.WIDTH - 30)

36
37 def print_page(self, images):

38 # Generates the report


39 self.add_page()

40 self.page_body(images)

pdf_reports.py hosted with ❤ by GitHub view raw

Now it’s time to instantiate it and to append pages from the 2-dimensional content
matrix:

1 pdf = PDF()
2
3 for elem in plots_per_page:

4 pdf.print_page(elem)

5
6 pdf.output('SalesRepot.pdf', 'F')

pdf_reports.py hosted with ❤ by GitHub view raw

The above cell will take some time to execute, and will return an empty string when
done. That’s expected, as your report is saved to the folder where the Notebook is
stored.

Here’s how to !rst page of the report should look like:

Image 5 – First page of the PDF report (image by author)

Of course, yours will look di"erent due to the di"erent logo and due to sales data being
completely random.
And that’s how you create data-visualization-powered PDF reports with Python. Let’s
wrap things up next.

Conclusion
You’ve learned many things today – how to create dummy data for any occasion, how to
visualize it, and how to embed visualizations into a single PDF report. Embedding your
visualizations will require minimal code changes – mostly for positioning and margins.

Let me know if you’d like to see a guide for automated report creation based on
machine learning model interpretations (SHAP or LIME) or something else related to
data science.

Thanks for reading.

Connect on LinkedIn.

Join my private email list for more helpful insights.

Learn more
Top 5 Books to Learn Data Science in 2021
SHAP: How to Interpret Machine Learning Models With Python
Top 3 Classi!cation Machine Learning Metrics – Ditch Accuracy Once and For All
ROC and AUC – How to Evaluate Machine Learning Models
Precision-Recall Curves: How to Easily Evaluate Machine Learning Models

The post How to Create PDF Reports with Python – The Essential Guide appeared !rst on
Better Data Science.

Related

R vs Python: Which is Matplotlib Cheat Sheet:


better for Data Science? Plotting in Python
Since, R and Python Data visualization and
remain the most popular storytelling with your data
languages, it seems are essential skills that
reasonable to debate every data scientist needs 3 Essential Ways to
which one is better. We'll to communicate insights Calculate Feature
Importance in Python

! Share " Tweet

To leave a comment for the author, please follow the link and comment on their blog:
python – Better Data Science.

Want to share your content on python-bloggers? click here.

← Previous post Next post →

Copyright © 2021 | MH Corporate basic by MH Themes

You might also like