0% found this document useful (0 votes)
32 views11 pages

PDF Generator Ne3t.aspx

THe detailed instructions

Uploaded by

tpcmv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views11 pages

PDF Generator Ne3t.aspx

THe detailed instructions

Uploaded by

tpcmv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Main Website

Scraping

Web Scraping
Node.js
Puppeteer
Updated on
March 25, 2024
Convert HTML to PDF Using Puppeteer qwäpekg

Converting web pages into PDFs, especially for things like invoices, detailed reports and tables, is super important.
It’s all about making sure the stuff you see in a web page looks just right when turned into a PDF.
Puppeteer is like a magic tool for making this happen. It works with Node.js and empowers developers to
effortlessly navigate web pages and convert HTML into PDFs. In this article, our focus will be on Puppeteer and
Node.js, showcasing how this powerful combination simplifies the process for invoices, reports and tables.
Jump straight to code examples:
Method 1: Generating PDF from a web page URL.
Method 2: Generating PDF from an HTML file.

Choosing Puppeteer as PDF generating package


When it comes to converting HTML to PDF using Node.js, Puppeteer emerges as an okay choice. While there are
better tools out there like the HTML to PDF plugin for Node.js, you can also use Puppeteer and limit the amount of
dependencies, incorporating PDF generation directly using Puppeteer.
Main use case of Puppeteer for PDF generation is taking advantage of its headless browser automation. Puppeteer
is built on top of the Chrome browser's DevTools Protocol, enabling headless browser automation. This allows
Puppeteer to render and interact with web pages, making it well-suited for tasks like HTML to PDF conversion
directly from the target URL. However, it is limited in its native support for HTML and PDF manipulation.

HTML to PDF conversion methods


In this section, we’ll explore methods of converting HTML to PDF using Puppeteer and Node.js. Each method offers
unique features and advantages, allowing you to choose the most suitable method based on your specific
requirements.
Method 1: Generating PDF from a web page using URL
One practical use case of Puppeteer is generating a PDF directly from a web page using its URL. This method is
particularly useful when you want to capture the content of a webpage and save it as a PDF without having to
render the page in a headless browser. This method advantages are:
Direct Extraction: Allows direct extraction of the PDF from the specified URL without rendering the page in a
browser.
Time Efficiency: Can be more time-efficient compared to loading the page in a headless browser, especially for
scenarios where rendering is not necessary.
Ideal for Static Pages: Suitable for static web pages or when dynamic content rendering is not a requirement.
Here’s a simple code example:

const puppeteer = require("puppeteer");

async function downloadPdfFromUrl(url, outputPath) {


const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();

// Navigate to the specified URL


await page.goto(url, { waitUntil: "networkidle0" });

// Generate PDF from the page content


await page.pdf({ path: outputPath, format: "A4" });

// Close the browser


await browser.close();
}

const targetUrl = "https://www.webshare.io/blog/what-are-datacenter-proxies";


const outputFile = "downloaded_page.pdf";

downloadPdfFromUrl(targetUrl, outputFile)
.then(() => console.log(`PDF downloaded successfully at: ${outputFile}`))
.catch((error) => console.error("Error:", error));

Here's what the code does:


Navigate to URL: The page.goto() function is used to navigate to the specified URL. The { waitUntil:
'networkidle0' } option ensures the page is considered loaded when there are no network connections for at
least 500 milliseconds.
Generate PDF: The page.pdf() function is used to generate a PDF from the page content. The PDF is saved at
the specified output path outputPath in A4 format.
Close the Browser: The browser.close() function is called to close the Puppeteer browser instance.
You can run the code and see the result as shown below:

Here's the generated PDF file:

Common use case of this method is invoice URL generation to PDF. Invoices are crucial documents in business
transactions, and converting them to PDFs is a common requirement for archival, sharing, or printing purposes.
The PDF format ensures that the invoice maintains a consistent appearance across different devices and
platforms.
Suppose you have an invoice hosted on a website like Stripe, and you want to generate a PDF from its URL using
Puppeteer. Below is a code example demonstrating this scenario:
const puppeteer = require("puppeteer");

async function generateInvoicePdfFromUrl(invoiceUrl, outputPath) {


const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();

// Navigate to the specified invoice URL


await page.goto(invoiceUrl, { waitUntil: "networkidle0" });

// Generate PDF from the invoice page content


await page.pdf({ path: outputPath, format: "A4" });

// Close the browser


await browser.close();
}

const invoiceUrl = "https://b.stripecdn.com/docs-statics-srv/assets/hosted-invoice-page.46a27a6f0e9fee330cde9bdb884dce68.png";


const outputFile = "invoice.pdf";

generateInvoicePdfFromUrl(invoiceUrl, outputFile)
.then(() =>
console.log(`Invoice PDF generated successfully at: ${outputFile}`)
)
.catch((error) => console.error("Error:", error));

Here’s the generated PDF:

Method 2: Generating PDF from an HTML file


In this method, Puppeteer is employed to generate a PDF directly from an HTML file. This is beneficial when you
have a pre-existing HTML file that you want to convert into a PDF without navigating to a live web page. Puppeteer
can seamlessly render the HTML file and generate a PDF based on its content. This method advantages are:
Offline Processing: Ideal for scenarios where the HTML content is available locally, eliminating the need to
fetch content from a live URL.
Batch Processing: Suitable for batch processing multiple HTML files to generate corresponding PDFs.
Custom Styling: Allows customization of the PDF output based on the styling and structure of the provided
HTML file.
Suppose you have an HTML file named template.html with the below provided content.

<!DOCTYPE html>
<html>
<head>
<title>HTML content</title>
</head>
<body>
<h1>Sample</h1>
<div>
<p>
</p><ul>
<li>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Integer interdum felis nec orci mattis, ac dignissim mauris commodo.</li>
</ul>
<p></p>
<p>
</p><ul>
<li>In et augue non turpis faucibus tincidunt a et lectus.</li>
<li>Nulla congue nisi vel diam hendrerit, at pulvinar massa aliquam.</li>
</ul>
<p></p>
</div>

<h1>Ipsum Paragraphs</h1>
<div>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas sit amet magna turpis. Donec a tellus in mi pharetra volutpat a
</p>
</div>
</body>

</html>

Here’s a code example that uses Puppeteer to convert this HTML file into a PDF:
const puppeteer = require("puppeteer");
const path = require("path");

async function downloadPdfFromHtmlFile(htmlFilePath, outputPath) {


const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();

// Load HTML content from the file


const absolutePath = path.resolve(htmlFilePath);
await page.goto(`file://${absolutePath}`, { waitUntil: "networkidle0" });

// Generate PDF from the page content


await page.pdf({ path: outputPath, format: "A4" });

// Close the browser


await browser.close();
}
const inputHtmlFile = "template.html";
const outputFile = "downloaded_from_html.pdf";

downloadPdfFromHtmlFile(inputHtmlFile, outputFile)
.then(() => console.log(`PDF downloaded successfully at: ${outputFile}`))
.catch((error) => console.error("Error:", error));

Here's what the code does:


​L oad HTML from File: The page.goto() function is used to load HTML content from the specified local file.
​G enerate PDF: The page.pdf() function is used to generate a PDF from the loaded HTML content. The PDF is
saved at the specified output path in A4 format.
​Close the Browser: The browser.close() function is called to close the Puppeteer browser instance.
Output
You can run the code and see the output:

Here's the sample PDF output from our HTML:


Common use case of this method is report generation from HTML to PDF. In case you do not have an HTML file
ready, Excel table format is often used for reporting. You can use a tool like Table Convert to generate an HTML
from your Excel table. When you have it ready, just replace the <html> content with your generated table HTML
code. Alternatively, instead of pasting HTML directly in the code, you can setup a placeholder HTML file and
reference to it, using previous code examples.

const puppeteer = require("puppeteer");

async function generateReportPdf() {


const browser = await puppeteer.launch({ headless: "new" });
const page = await browser.newPage();

// Load HTML content for the report


const reportHtml = `

<html>
<head>
<title>Sample Report</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
}
h1 {
color: #333;
}
p{
color: #555;
margin-bottom: 10px;
}
table {
width: 100%;
border-collapse: collapse;
margin-top: 20px;
}
margin-top: 20px;
}
th, td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
th {
background-color: #f2f2f2;
}
</style>
</head>
<body>

<h1>Monthly Sales Report</h1>

<p>Date: January 10, 2024</p>

<table>
<thead>
<tr>
<th>Product</th>
<th>Units Sold</th>
<th>Revenue</th>
</tr>
</thead>
<tbody>
<tr>
<td>Product A</td>
<td>150</td>
<td>$5,000</td>
</tr>
<tr>
<td>Product B</td>
<td>120</td>
<td>$4,000</td>
</tr>
<tr>
<td>Product C</td>
<td>200</td>
<td>$6,500</td>
</tr>
</tbody>
</table>

<p>Total Revenue: $15,500</p>

</body>
</html>

`;
await page.setContent(reportHtml);

// Generate PDF for the report


await page.pdf({ path: "report.pdf", format: "A4" });

await browser.close();
await page.pdf({ path: "report.pdf", format: "A4" });

await browser.close();
}

// Call the function to generate a report PDF


generateReportPdf();

Here’s the generated report:

Styling tips
Let's explore styling tips and enhancements for HTML to PDF conversion using Puppeteer.
CSS styling considerations
Styling plays a crucial role in ensuring that the PDF output looks polished and meets specific design requirements.
Here are some considerations:
While Puppeteer supports both inline styles and external stylesheets, inline styles are often more straightforward
for PDF generation.
Consider embedding styles directly within the HTML using the <style> tag for simplicity.

<style>
body {
font-family: 'Arial', sans-serif;
}
h1 {
color: #333;
}
/* ... (additional styles) ... */
</style>

For more advanced styling techniques, consider exploring the following resources:
CSS Tricks: CSS Tricks is a comprehensive resource with articles and guides on various CSS techniques. You
can explore their tips on responsive design, flexbox and grid layout.
MDN Web Docs - CSS: The MDN Web Docs is an excellent reference for CSS properties and values. Dive into their
CSS documentation for in-depth information.
Google Fonts: Google Fonts offers a wide selection of free and open-source fonts. Choose fonts that are not
only pleasing but also supported in PDF rendering.

Conclusion
In this article, we covered two handy methods for turning HTML into PDFs with Puppeteer and Node.js. These
methods are crucial for creating polished documents like invoices, reports and tables. Further, we delved into
styling tips that empower developers to enhance the visual appeal of their PDFs, ensuring a seamless blend of
precision and elegance in their document generation process.
Related Articles
How to Get HTML in Puppeteer?

How to Take Screenshot in Puppeteer: Complete Guide

Downloading Images in Puppeteer: 6 Methods Explained

Anber Arif
TECHNICAL WRITER
Anber Arif, a Software Engineer, leverages over three years of expertise in crafting clear and impactful technical content.

SHARE

Get started today,


no credit card required.
Create Account
Buy fast & affordable proxy servers. Get 10 proxies today for free.

PR O D UC T S

F E AT UR E S

USE C A SE S

R E SO UR C E S

C O MPA NY

You might also like