0% found this document useful (0 votes)

43 views16 pages

PDFHTML Whitepaper FINAL

Uploaded by

Broli Tiako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views16 pages

PDFHTML Whitepaper FINAL

Uploaded by

Broli Tiako

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

pdfHTML

Convert HTML/CSS to PDF

www.itextpdf.com
pdfHTML
1
Due to the popularity of the World Wide Web, its underlying technologies have become commonplace and ubiquitous.
HTML (and CSS) in particular are widely used to present text and information in a visually pleasing manner. At the same
time, many companies rely on the PDF format for document workflows, internal documentation and archiving. The main
draws of PDF are the reliability and portability of the format, and the commitment to present the same appearance
across different devices, which is an explicit mission statement of the PDF specification.

PDF however, is not an easily editable format, and is often converted from other formats as a final step in document
creation. One such format can be HTML, pdfHTML leverages the widespread knowledge of the format and existing skills
of development resources in converting HTML to PDF. pdfHTML provides the engine that converts HTML/CSS content to
a well-formatted, well-structured PDF document, without the need to know the technical details of the PDF format.

HTML/CSS to PDF transformation

In HTML, content is wrapped in HTML tags. Each tag corresponds to a conceptual element (e.g. paragraph, table), and
many tags can be nested. These tags can be organized into a hierarchical model, such as the DOM (Document Object
Model) of the HTML document, detailing the structure and semantics of the content.

Styling and visual representation for HTML content is provided by the use of Cascading Style Sheets (CSS). CSS
declarations define styling and layout information (e.g. font, font-size, margins, borders, color, alignment, etc.) for the
various HTML tags and their content. These can be found in the HTML file itself, or as a separate style-sheet file.

A web browser parses and interprets the HTML file and accompanying CSS to create a visual representation of the
content, calculating the rendering and layout on the fly and visualizing the various contents according to their CSS
declarations and the renderers own settings.

In contrast, at its core, a PDF document is not inherently structured and semantic. Its content consists of a set of
instructions that result in painting at absolute positions on a large canvas. The concept of a line of text, for example,
does not exist at this basic level. We only infer that visually because characters appear next to each other at the same
vertical position.

PDF does offer an additional layer of functionality to store semantic and structural information, using similar concepts
as HTML: tagging pieces of content according to their roles in the document and constructing a hierarchical tree. To
support features like proper content extraction, repurposing of content, search indexing and accessibility, it’s crucial to
augment the visual-only representation of PDF documents with this additional information.

Since HTML documents inherently contain semantic and structural information, they are an excellent source to convert
to rich, smart PDF documents. This is where pdfHTML comes in.

pdfHTML
2
What is pdfHTML?

pdfHTML is an add-on module for the iText7 platform that transforms HTML and accompanying CSS into a PDF
document. Built for the creation of PDF files from HTML templates, pdfHTML allows you to automate PDF generation for
documents like internal reports, tickets, invoices, and more.

pdfHTML provides you with the means to create beautiful and functional PDFs without forcing designers to learn the
complex PDF syntax or the intricacies of the iText 7 platform. They simply need to utilize their common, well-honed
HTML and CSS skills to create the template, and pdfHTML takes care of the rest, converting to a visually equivalent PDF
document, with the possibility to preserve semantic, structural and accessibility information.

How does it work?

A technical in-depth look

On a conceptual level, pdfHTML works by mapping HTML tags to iText 7 layout objects, and CSS property declarations to
iText layout properties. On a practical level, this process happens through the use of TagWorkers and CssAppliers.

What’s a TagWorker?

A TagWorker is a class responsible for processing an HTML tag. This processing includes resolving any resources
required by the tag and its content; translating the tags’ content into an iText layout element; resolving any non-
style attribute and resolving any style attributes through a CssApplier.

What’s a CssApplier?

A CssApplier is a class responsible for the processing of CSS styles and any declarations for a HTML tag. The
implementation of this CssApplier contains all the necessary logic to resolve and apply the style declarations to the
iText layout element that is associated with the CssApplier.

pdfHTML
3
Each HTML tag is mapped to a TagWorker and CssApplier, and those classes contain the necessary logic to process
the tag, selecting the iText layout object it corresponds to and applying any necessary CSS. When processing the HTML
DOM, pdfHTML walks through the tree in a depth-first manner, starting the translation on a tag, and then recursively
processing all its children, ending the processing when all its children have been processed. The visualization of this
process can be seen in figures 1 and 2:

.html file
.html file .css file.css file Tree Node
Tree Node

Create Create
Layout Layout
Object Object

pdfHTML
pdfHTML

Preprocessing
CSS style
CSS style ProcessProcess
Preprocessing sheet parser ProcessProcess
Tag Tag Children
Children
sheet parser

hierarchical
hierarchical tree tree Css
Css style style sheet
sheet Apply CSS
Apply CSS

iText iText
layout object
layout object

hand over
hand over
TagWorkers
TagWorkers CssAppliers
CssAppliers Layout Layout
Object Object
to parent
to parent

iText iText
PdfDocument
PdfDocument
Figure 2: pdfHTML tag processing

PdfWriter
PdfWriter

.pdf file.pdf file

Figure 1: pdfHTML high level flow

pdfHTML
4
In practice

A basic example will show the use of pdfHTML. For this, we will use the following HTML and CSS.

<html>
<head>
<link rel="stylesheet" type="text/css" href="simple.css"/>
</head>
<body>
<p>iText pdfHTML</p>
<div>Converting HTML to PDF with (Cascading) Style (Sheets)!</div>
</body>
</html>

HTML code

p{
font-style:italic;
}
div{
color:red;
border-style:solid;
border-width:2pt;
border-color:blue;
}

simple.css

The output will be directly written to a PDF file, using the following code:

ConverterProperties converterProperties = new ConverterProperties();

HtmlConverter.convertToPdf(new FileInputStream(htmlSource), new FileOutputStream(pdfDest),
converterProperties);

The resulting PDF will look as follows:

Figure 3: pdfHTML output

pdfHTML
5
Let us explain what happened, in order:

1. The HTML file is transformed into a DOM representation of the content.

2. All CSS declarations are extracted from both the HTML as well as the style sheet, and are resolved.
3. The html tag is processed by the HTMLTagWorker, which creates a layout document object.
4. The head and body tags are special cases and do not have a corresponding TagWorker. They're skipped in the
walkthrough.
5. The p tag is encountered and a PTagWorker creates a Paragraph object. The text iText pdfHTML is encountered next,
and added to the paragraph created in its parent. A BlockCssApplier is created for the p tag , and applies the italic
to the text. The PTagWorker finished its work and the paragraph is added to the document.
6. The div tag is encountered and a DivTagWorker creates a Div object. The text Converting HTML to PDF
with (Cascading) Style (Sheets)! is encountered next, and added to the div created by the DivTagWorker. A
BlockCssApplier is created for the div tag, and it applies the red colour to the text, and the blue border surrounding
it to the Div layout object.
7. The DivTagWorker finishes its work and the Div is added to the document.

The HTMLTagworker finishes its work and passes the document back to the HTMLConverter, which closes it, writing the
contents to the output stream.

Some more in-depth information

Out of the box, pdfHTML offers a range of additional configuration options. No extra knowledge of PDF is required.

Configuration settings can be enabled with the optional last parameter ConverterProperties of all HTMLConverter
public methods. This parameter contains the basic configuration options that allow users to customize handling of the
input data in various ways. The following options allow the configuration of your pdfHTML code for optimal results.

baseUri
If the HTML file requires any external resources, such as a standalone CSS file or image files, pdfHTML will need to know
where these elements are located. That location may either be a URI on the local file system or online.

pdfHTML will try to use a reasonable default value in case you don't specify a baseUri. If you use a String parameter
to pass your HTML, then the default value will be the folder where the code is executed. If you use the overload of
convertToPdf with a File parameter, it will use the same location as the input file.

If neither of these defaults is applicable, you will need to define the default resource location root. Any references in the
HTML file will start from that path in order to find the resource. It is possible to use the ../ syntax to go up in the directory
tree of your file system, but for most purposes, the specified URI acts as the root folder for determining paths. Thus e.g.
<img src="static/img/logo.png"/> and <img src="/static/img/logo.png"/> will both refer to the same file,
relative to the baseUri folder you specified or to the default value.

pdfHTML
6
fontProvider

To customize the fonts for use by pdfHTML, you can define a FontProvider object that will act as a repository for those
fonts.

There is a standard implementation available in the pdfHTML project which is called DefaultFontProvider. You can
specifically determine which fonts should be supplied to the PDF rendering logic. On top of the basic behavior that you
can define in the constructor, you can also add a font that you specify as a location on the system, a byte[], or an iText
FontProgram object.

DefaultFontProvider has a constructor with three boolean arguments, which allow you to specify (in order):

• Whether or not to select the 14 standard PDF fonts

• Whether or not to select a number of Free fonts included with pdfHTML
• Whether or not to try and select all fonts installed in the system's default font folder(s)

There is also a default no-argument constructor for DefaultFontProvider. This constructor will select the 14 standards
fonts as well as the fonts shipped with iText and pdfHTML but will not register the system fonts by default.

In addition to this, fonts can be added by either supplying the path to a font file or by telling the FontProvider to
register all fonts contained in a specified folder.

ConverterProperties props = new ConverterProperties();

FontProvider dfp = new DefaultFontProvider(true, false, false);
dfp.addFont("/path/to/MyFont.ttf");
props.setFontProvider(dfp);

pdfHTML
7
mediaDeviceDescription

If the input file uses media queries, you can simply tell pdfHTML to interpret the relevant set of rules:

ConverterProperties props = new ConverterProperties();

props.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));

The registered MediaType constants are:

• MediaType.ALL
• MediaType.AURAL
• MediaType.BRAILLE
• MediaType.EMBOSSED
• MediaType.HANDHELD
• MediaType.PRINT
• MediaType.PROJECTION
• MediaType.SCREEN
• MediaType.SPEECH
• MediaType.TTY
• MediaType.TV

A use case: Accessible PDF creation

In a company, policies can impose strict guidelines concerning aspects of documents, as e.g. the accessibility of
documents.

With minimal effort, pdfHTML can be configured to generate PDFs that comply with the accessibility standards as
defined in PDF/A and PDF/UA, by leveraging the functionality provided by the iText 7 core engine regarding the setting of
both the tagging as well as adding the additional accessibility information.

public void createPdf(String src, String dest, String resources) throws IOException {
try {
FileOutputStream outputStream = new FileOutputStream(dest);
writerProperties writerProperties = new WriterProperties();
writerProperties.addXmpMetadata();

PdfWriter pdfWriter = new PdfWriter(outputStream, writerProperties);

PdfDocument pdfDoc = new PdfDocument(pdfWriter);
pdfDoc.getCatalog().setLang(new PdfString("en-US"));

//Set the pdf to be tagged

pdfDoc.setTagged();

pdfHTML
8
pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().
setDisplayDocTitle(true));

//Set meta tags

pdfDocumentInfo pdfMetaData = pdfDoc.getDocumentInfo();

pdfMetaData.setAuthor("Samuel Huylebroeck");
pdfMetaData.addCreationDate();
pdfMetaData.getProducer();
pdfMetaData.setCreator("iText Software");
pdfMetaData.setKeywords("HTML, PDF");
pdfMetaData.setSubject("PDF accessibility");
//Title is derived from html

//Setup custom header role, mapping it to heading

pdfDoc.getStructTreeRoot().getRoleMap().put(new PdfName("Heading"),PdfName.H);

// pdf conversion

ConverterProperties props = new ConverterProperties();

FontProvider fp = new FontProvider();
fp.addSystemFonts();
props.setFontProvider(fp);
props.setBaseUri(resources);

//Setup custom tagworker factory for better tagging of headers

DefaultTagWorkerFactory tagWorkerFactory = new AccessibilityTagWorkerFactory();

props.setTagWorkerFactory(tagWorkerFactory);
HtmlConverter.convertToPdf(new FileInputStream(src), pdfDoc, props);
pdfDoc.close();

} catch (Exception e) {
e.printStackTrace();
}

pdfHTML
9
How to customize pdfHTML

In addition to the out-of-the-box functionality of pdfHTML, based on the capabilities of the iText7 platform, and its
configuration options, pdfHTML also enables additional customization opportunities, as discussed in this section.

There are two plugin mechanisms in pdfHTML that allow you to execute custom behavior in your HTML/CSS conversion
process. Both have a very similar working.

Much like the configuration options as specified in the associated pdfHTML white paper, registering your plugin code
with pdfHTML happens through the ConverterProperties:

ConverterProperties props = new ConverterProperties();

props.setTagWorkerFactory(new MyTagWorkerFactory()); // Custom HTML parsing
props.setCssApplierFactory(new MyCssApplierFactory()); // Custom CSS parsing

TagWorkerFactory

To define custom rules for existing HTML tags, you can create a ITagWorker implementation that will execute logic
defined by you. The most common use cases are handling a tag in a nonstandard way or as a no-op, but you can
also implement a custom tag for your specific purposes. After implementing this interface or extending an existing
implementation, you still need to register it with pdfHTML so that it knows what to call.

This can be achieved by extending DefaultTagWorkerFactory and overriding the following method:

public class MyTagWorkerFactory extends DefaultTagWorkerFactory {

public ITagWorker getCustomTagWorker(IElementNode tag, ProcessorContext context) {
if (tag.name().equalsIgnoreCase("custom")) {
return new CustomTagWorker(); // implements ITagWorker
}
if (tag.name().equalsIgnoreCase("p")) {
return new CustomParagraphTagWorker(); // extends ParagraphTagWorker
}
// default return value should be either null
// so the default implementations can be called ...
return null;
// ... or you can directly call the superclass method
// for the exact same effect
return super.getTagWorker(tag, context);
}
}

pdfHTML
10
One particular use case might be to add dynamic content to your PDF, such as barcodes. In that case, you can define
<qr>http://www.example.com</qr> in the source HTML, rather than having to generate an image separately. Your
custom TagWorker then leverages the iText APIs to create the QR code, and adds it to the document.

The input:

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="UTF-8">
<title>QRCode Example</title>
<link rel="stylesheet" type="text/css" href="qrcode.css"/>
</head>
<body>
<p>QR Code below,Q </p>
<qr charset="Cp437" errorcorrection="Q">
With great power comes great current squared times resistance
</qr>
<p>QR Code below, L</p>
<qr charset="Cp437" errorcorrection="L">
With great power comes great current squared times resistance
</qr>
</body>
</html>

qr{
border:solid 1px red;
height:200px;
width:200px;

pdfHTML
11
The output without a custom TagWorker:

Figure 4: Output with no custom TagWorker

The output with a custom TagWorker:

Figure 5: Output with Custom Tagworker

import com.itextpdf.barcodes.BarcodeQRCode;
import com.itextpdf.barcodes.qrcode.EncodeHintType;
import com.itextpdf.barcodes.qrcode.ErrorCorrectionLevel;
import com.itextpdf.html2pdf.attach.ITagWorker;
import com.itextpdf.html2pdf.attach.ProcessorContext;
import com.itextpdf.html2pdf.html.node.IElementNode;
import com.itextpdf.layout.IPropertyContainer;
import com.itextpdf.layout.element.Image;

import java.util.HashMap;
import java.util.Map;

public class QRCodeTagWorker implements ITagWorker {

private static String[] allowedErrorCorrection = {"L","M","Q","H"};
private static String[] allowedCharset = {"Cp437","Shift_JIS","ISO-8859-
1","ISO-8859-16"};
private BarcodeQRCode qrCode;
private Image qrCodeAsImage;

public QRCodeTagWorker(IElementNode element, ProcessorContext context){

//Retrieve all necessary properties to create the barcode

pdfHTML
12
Map<EncodeHintType, Object> hints = new HashMap<>();
//Character set
String charset = element.getAttribute("charset");
if(checkCharacterSet(charset )){
hints.put(EncodeHintType.CHARACTER_SET, charset);
}
//Error-correction level
String errorCorrection = element.getAttribute("errorcorrection");
if(checkErrorCorrectionAllowed(errorCorrection)){
ErrorCorrectionLevel errorCorrectionLevel =
getErrorCorrectionLevel(errorCorrection);
hints.put(EncodeHintType.ERROR_CORRECTION, errorCorrectionLevel);
}
//Create the QR-code
qrCode = new BarcodeQRCode("placeholder",hints);

@Override
public void processEnd(IElementNode element, ProcessorContext context) {
//Transform barcode into image
qrCodeAsImage = new Image(qrCode.createFormXObject(context.getPdfDocument()));
}
@Override
public boolean processContent(String content, ProcessorContext context) {
//Add content to the barcode
qrCode.setCode(content);
return true;
}
@Override
public IPropertyContainer getElementResult() {
return qrCodeAsImage;
}
@Override
public boolean processTagChild(ITagWorker childTagWorker, ProcessorContext context) {
return false;
}

private static boolean checkErrorCorrectionAllowed(String toCheck){

for(int i = 0; i<allowedErrorCorrection.length;i++){
if(toCheck.toUpperCase().equals(allowedErrorCorrection[i])){
return true;
}
}
return false;

pdfHTML
13
}
private static boolean checkCharacterSet(String toCheck){
for(int i = 0; i<allowedCharset.length;i++){
if(toCheck.equals(allowedCharset[i])){
return true;
}
}
return false;
}
private static ErrorCorrectionLevel getErrorCorrectionLevel(String level){
switch(level) {
case "L":
return ErrorCorrectionLevel.L;
case "M":
return ErrorCorrectionLevel.M;
case "Q":
return ErrorCorrectionLevel.Q;
case "H":
return ErrorCorrectionLevel.H;
}
return null;
}
}

The custom tagWorkerFactory

import com.itextpdf.html2pdf.attach.ITagWorker;
import com.itextpdf.html2pdf.attach.ProcessorContext;
import com.itextpdf.html2pdf.attach.impl.DefaultTagWorkerFactory;
import com.itextpdf.html2pdf.html.node.IElementNode;

public class QRCodeTagWorkerFactory extends DefaultTagWorkerFactory {

@Override
public ITagWorker getCustomTagWorker(IElementNode tag, ProcessorContext context) {
if(tag.name().equals("qr")){
return new QRCodeTagWorker(tag, context);
}
return super.getTagWorker(tag, context);
}
}

pdfHTML
14
cssApplierFactory

By default, pdfHTML will only execute CSS logic on standard HTML tags. If you have defined a custom HTML tag
and you want to apply CSS to it, then you will also have to write an ICssApplier and register it by extending
DefaultTagWorkerFactory.

Similarly, you may want to change the way a standard HTML tag reacts to a CSS property. In that case, you can extend
the ICssApplier for that tag and write custom logic.

Integration with existing iText add-ons

pdfHTML can integrate with existing iText add-ons. In particular, it integrates seamlessly with pdfCalligraph - iText's
advanced typography module. This add-on enables the correct representation of text in complex alphabets, such
as Arabic and Hindi. Leveraging this module in an application requires absolutely no extra code: if the pdfCalligraph
module is available on the classpath when the PDF instructions are written, its code will be used to create correctly
expressed text in the relevant writing system.

This also holds true for pdfHTML: all you need is the JAR and a license file for pdfCalligraph, and your HTML parsing
code will work out of the box.

Comparison to XMLWorker

iText’s previous HTML/CSS conversion tool is XML Worker in combination with iText 5. It has been replaced by pdfHTML
in combination with iText 7. A quick comparison.

pdfHTML
Specifically developed for use in combination with the re-architectured and re-written iText 7 PDF platform, the pdfHTML
add-on makes full use of the enhanced and new capabilities of iText 7. pdfHTML supports more HTML tags and CSS
features such as floating and fixed positioning, and @media rules and queries. It’s also easier to extend for custom tags.
It integrates seamlessly with other iText functionalities such as barcodes, PDF/A and PDF/UA output, and advanced
typography features of pdfCalligraph. pdfHTML has more robust handling of imperfect or invalid HTML input.

XML Worker
XML Worker was designed and developed based on the previous version of iText – iText 5. As such, it is limited by the
constraints of the iText 5 PDF engine – challenged by the growing needs of the users. XML Worker was fundamentally
designed to be top-to-bottom, and text line based, which has its limitations when dealing with the block level concepts
and the flexible layout options of HTML. It also lacks generic handling of CSS styles, support for media queries (e.g.
screen vs print), and a convenient extension mechanism (custom HTML elements, custom CSS handling)

pdfHTML
15
Integration in today’s document workflow

Today, it is in the interest of HTML/CSS experts to be able to inject their documents and files in the wider corporate
document workflow, which is often PDF based. It should be quite clear this is possible without the HTML/CSS experts
acquiring an in-depth or advanced knowledge of the ins and outs of PDF syntax and quirks. With the iText 7 PDF
platform, assisted by the pdfHTML add-on, this insertion of HTML/CSS documents into the corporate document workflow
can be a ‘fire and forget’ automated process. It allows for HTML/CSS integration in the world of applications that are
document and workflow oriented, supporting mission critical company applications.

About iText

iText is the world’s foremost platform to create PDF files and integrate them in corporate applications. Originally
developed by Bruno Lowagie, iText is available as an open source product with community support, as well as a
commercial product, worldwide supported by the iText Group NV (with offices in Europe, North America and Asia).
In 2016, the popular iText 5 version was succeeded by the fully re-written and re-architectured iText 7 platform, with
advanced add-on options. iText represents a unique capability of facilitating the use of dynamic and complex PDF
documents in corporate document workflows, throughout the whole lifecycle of the documents.

pdfHTML
16

2015 - AutoCAD Tutorial Architecture Imperial Version
67% (6)
2015 - AutoCAD Tutorial Architecture Imperial Version
44 pages
Activity Sheets in Econ 101
100% (4)
Activity Sheets in Econ 101
14 pages
Webtopdf Com HTML To PDF
No ratings yet
Webtopdf Com HTML To PDF
11 pages
The Odin Project PDF - Javascript Path
No ratings yet
The Odin Project PDF - Javascript Path
507 pages
Tecnicas de Estudio
No ratings yet
Tecnicas de Estudio
367 pages
Block Printing
100% (1)
Block Printing
30 pages
James Learning Javascript Programming
From Everand
James Learning Javascript Programming
James Lombard
No ratings yet
WWW Webtoons Com en Action Omniscient Reader Episode 119 Vie
No ratings yet
WWW Webtoons Com en Action Omniscient Reader Episode 119 Vie
94 pages
Character When Relevant
No ratings yet
Character When Relevant
4 pages
Deriving HTML From PDF
No ratings yet
Deriving HTML From PDF
52 pages
HTML Cheat Sheet of 2019 - With All New HTML5 Tags
No ratings yet
HTML Cheat Sheet of 2019 - With All New HTML5 Tags
26 pages
PDF Converter Free Complete, ILovepdf3
No ratings yet
PDF Converter Free Complete, ILovepdf3
27 pages
Cause Effect Eng.5lpppp
100% (2)
Cause Effect Eng.5lpppp
3 pages
Prince - Sample Documents
No ratings yet
Prince - Sample Documents
9 pages
Webtopdf-Com
No ratings yet
Webtopdf-Com
11 pages
Webtopdf-Com
No ratings yet
Webtopdf-Com
10 pages
WWW Sejda
No ratings yet
WWW Sejda
2 pages
SVC200203 - ST Vincent's Clinic A4 Booklet Web Final
No ratings yet
SVC200203 - ST Vincent's Clinic A4 Booklet Web Final
84 pages
5+ Ways To Save A Webpage As A PDF (Iphone, Android, Chrome, and More) - Inkit
No ratings yet
5+ Ways To Save A Webpage As A PDF (Iphone, Android, Chrome, and More) - Inkit
10 pages
PDF Generator Ne3t.aspx
No ratings yet
PDF Generator Ne3t.aspx
11 pages
639321
No ratings yet
639321
16 pages
Create PDF in Your Applications With The Pdfcrowd
No ratings yet
Create PDF in Your Applications With The Pdfcrowd
47 pages
How To Swim
No ratings yet
How To Swim
7 pages
Care Management of Small Ruminant
No ratings yet
Care Management of Small Ruminant
29 pages
Obasi
No ratings yet
Obasi
8 pages
Convert From HTML To PDF Online: Options
No ratings yet
Convert From HTML To PDF Online: Options
2 pages
How To Save A Website As A PDF - IONOS
No ratings yet
How To Save A Website As A PDF - IONOS
8 pages
Lesson2 HTM
No ratings yet
Lesson2 HTM
6 pages
HTML To PDF API - PDFCrowd
No ratings yet
HTML To PDF API - PDFCrowd
4 pages
CS Xii PB1 QP 2024-25 (Set-1)
No ratings yet
CS Xii PB1 QP 2024-25 (Set-1)
10 pages
Adiciones Al Libro de Las Putas de Juan Cú. Xilote Negro Editor
No ratings yet
Adiciones Al Libro de Las Putas de Juan Cú. Xilote Negro Editor
14 pages
Lectura de Auras 1
No ratings yet
Lectura de Auras 1
35 pages
WWW Ilovepdf
No ratings yet
WWW Ilovepdf
3 pages
My File Name
No ratings yet
My File Name
9 pages
Webtopdf-Com
No ratings yet
Webtopdf-Com
7 pages
Lectura de Auras 2
No ratings yet
Lectura de Auras 2
24 pages
MPDF
No ratings yet
MPDF
12 pages
Pdfaid Com en Document Create
No ratings yet
Pdfaid Com en Document Create
5 pages
VW 1320 - Ps A4 - Po Portrait - SP - F PDF
No ratings yet
VW 1320 - Ps A4 - Po Portrait - SP - F PDF
3 pages
Getting Started
No ratings yet
Getting Started
2 pages
HTML To PDF - CloudConvert
No ratings yet
HTML To PDF - CloudConvert
2 pages
Convert Webpages To PDF - Quick, Online, Free - PDF24
No ratings yet
Convert Webpages To PDF - Quick, Online, Free - PDF24
5 pages
Convert From HTML To PDF Online: Options
No ratings yet
Convert From HTML To PDF Online: Options
2 pages
Test 1
No ratings yet
Test 1
2 pages
PDF 24
No ratings yet
PDF 24
5 pages
Convert From HTML To PDF Online: Options
No ratings yet
Convert From HTML To PDF Online: Options
2 pages
HTML Code
No ratings yet
HTML Code
1 page
How To Convert A HTML File To PDF (With Colors) - Ask Ubuntu
No ratings yet
How To Convert A HTML File To PDF (With Colors) - Ask Ubuntu
5 pages
What Is The Best Way To Generate A PDF File From HTML and CSS - Quora
No ratings yet
What Is The Best Way To Generate A PDF File From HTML and CSS - Quora
5 pages
Convertir Pages Web Ou Fichiers HTML Vers Fichiers PDF
No ratings yet
Convertir Pages Web Ou Fichiers HTML Vers Fichiers PDF
5 pages
PDF 24
No ratings yet
PDF 24
5 pages
PDF xhtml2pdf
No ratings yet
PDF xhtml2pdf
4 pages
Accounting Treatment of Spare Parts
No ratings yet
Accounting Treatment of Spare Parts
4 pages
Crystal Academy: Ecosystem
No ratings yet
Crystal Academy: Ecosystem
3 pages
Convert Web Page To PDF For Free Online - #1 Web To PDF ... : About 5,050,000,000 Results (0.51 Seconds)
No ratings yet
Convert Web Page To PDF For Free Online - #1 Web To PDF ... : About 5,050,000,000 Results (0.51 Seconds)
3 pages
Convert HTML To PDF in PHP
No ratings yet
Convert HTML To PDF in PHP
4 pages
HTML To PDF - Brave Search (7 - 31 - 2024 2 - 37 - 39 PM)
No ratings yet
HTML To PDF - Brave Search (7 - 31 - 2024 2 - 37 - 39 PM)
2 pages
Sample Documents: Dictionary
No ratings yet
Sample Documents: Dictionary
1 page
Dompdf Github Guia php5
No ratings yet
Dompdf Github Guia php5
4 pages
Green Economy Presentation
No ratings yet
Green Economy Presentation
17 pages
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
No ratings yet
Republic of The Philippines City of Taguig Taguig City University Gen. Santos Avenue, Central Bicutan, Taguig City
7 pages
Notes and Questions: Aqa Gcse
No ratings yet
Notes and Questions: Aqa Gcse
18 pages
Document
No ratings yet
Document
2 pages
Add HTML To PDF Elements To PDF PDF
No ratings yet
Add HTML To PDF Elements To PDF PDF
2 pages
EVO PDF Tools Demo
No ratings yet
EVO PDF Tools Demo
2 pages
EVO PDF Tools Demo
No ratings yet
EVO PDF Tools Demo
2 pages
EvoHtmlToPdf PDF
No ratings yet
EvoHtmlToPdf PDF
2 pages
Evo HTML To PDF
No ratings yet
Evo HTML To PDF
2 pages
Convert From HTML To PDF Online: Options
No ratings yet
Convert From HTML To PDF Online: Options
2 pages
Bookshop Business Quick Guide: by Crack A Business Kenya
No ratings yet
Bookshop Business Quick Guide: by Crack A Business Kenya
19 pages
Css2pdf@cloudformatter Why Did We Develop This?
No ratings yet
Css2pdf@cloudformatter Why Did We Develop This?
2 pages
Hiqpdf Software: HTML To PDF Converter
No ratings yet
Hiqpdf Software: HTML To PDF Converter
1 page
Guitar Rig 4 Getting Started English
No ratings yet
Guitar Rig 4 Getting Started English
29 pages
Expertpdf Htmltopdf Converter v5 3
No ratings yet
Expertpdf Htmltopdf Converter v5 3
1 page
BoQ For Electrical, Voice Data
No ratings yet
BoQ For Electrical, Voice Data
3 pages
Master Cascading Style Sheets (CSS) Quickly: CSS Properties, CSS Property Definitions, Inline CSS, Internal CSS, External CSS and Sample Codes
From Everand
Master Cascading Style Sheets (CSS) Quickly: CSS Properties, CSS Property Definitions, Inline CSS, Internal CSS, External CSS and Sample Codes
Steven Bright
No ratings yet
Functions of Housekeeping Department and Inter Relationship With Other Departments
No ratings yet
Functions of Housekeeping Department and Inter Relationship With Other Departments
12 pages
GOD's AWESOME ANIMALS: MY BLOG
No ratings yet
GOD's AWESOME ANIMALS: MY BLOG
472 pages
Problems CHAPTER 17
100% (2)
Problems CHAPTER 17
4 pages
17 Wemew
100% (1)
17 Wemew
31 pages
Wika 0900766b813ecd99
No ratings yet
Wika 0900766b813ecd99
2 pages
Pathfit 103 Learning Materials
No ratings yet
Pathfit 103 Learning Materials
47 pages
De-140613-160123-Fortnight Attendence and Mid Marks Uploading Instructions To Principals
No ratings yet
De-140613-160123-Fortnight Attendence and Mid Marks Uploading Instructions To Principals
2 pages
Hemanta Biswal
No ratings yet
Hemanta Biswal
3 pages
Overcoming Obstacles
No ratings yet
Overcoming Obstacles
1 page
Ethical Issues Powerpoint
No ratings yet
Ethical Issues Powerpoint
12 pages
Hazard Identification and Risk Assessment Form
No ratings yet
Hazard Identification and Risk Assessment Form
4 pages
Application For Death Certificate: Department of Home Affairs
No ratings yet
Application For Death Certificate: Department of Home Affairs
1 page
Hoffmann, Goethe, and Miyazaki's Spirited Away
No ratings yet
Hoffmann, Goethe, and Miyazaki's Spirited Away
4 pages
Autumn and Summer in A Frozen Fire
No ratings yet
Autumn and Summer in A Frozen Fire
2 pages

PDFHTML Whitepaper FINAL

Uploaded by

PDFHTML Whitepaper FINAL

Uploaded by

pdfHTML

Convert HTML/CSS to PDF

HTML/CSS to PDF transformation

How does it work?

A technical in-depth look

.pdf file.pdf file

Figure 1: pdfHTML high level flow

ConverterProperties converterProperties = new ConverterProperties();

The resulting PDF will look as follows:

Figure 3: pdfHTML output

1. The HTML file is transformed into a DOM representation of the content.

Some more in-depth information

• Whether or not to select the 14 standard PDF fonts

ConverterProperties props = new ConverterProperties();

ConverterProperties props = new ConverterProperties();

The registered MediaType constants are:

A use case: Accessible PDF creation

PdfWriter pdfWriter = new PdfWriter(outputStream, writerProperties);

//Set the pdf to be tagged

//Set meta tags

pdfDocumentInfo pdfMetaData = pdfDoc.getDocumentInfo();

//Setup custom header role, mapping it to heading

ConverterProperties props = new ConverterProperties();

//Setup custom tagworker factory for better tagging of headers

DefaultTagWorkerFactory tagWorkerFactory = new AccessibilityTagWorkerFactory();

ConverterProperties props = new ConverterProperties();

public class MyTagWorkerFactory extends DefaultTagWorkerFactory {

Figure 4: Output with no custom TagWorker

The output with a custom TagWorker:

Figure 5: Output with Custom Tagworker

public class QRCodeTagWorker implements ITagWorker {

public QRCodeTagWorker(IElementNode element, ProcessorContext context){

private static boolean checkErrorCorrectionAllowed(String toCheck){

The custom tagWorkerFactory

public class QRCodeTagWorkerFactory extends DefaultTagWorkerFactory {

Integration with existing iText add-ons

You might also like