0% found this document useful (0 votes)
90 views5 pages

Portable Document Format: History and Standardization Technical Foundations Technical Overview

The PDF file format was developed by Adobe in the 1990s to present documents independently of application software, hardware, and operating systems. PDF files can contain text, images, and other content like interactive forms and multimedia. The format is based on PostScript but is simplified for display purposes. PDF was standardized by ISO in 2008 and made an open standard, though it still includes some proprietary Adobe technologies. It allows embedding of fonts and color management features for consistent color output.

Uploaded by

Roger Sepulveda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views5 pages

Portable Document Format: History and Standardization Technical Foundations Technical Overview

The PDF file format was developed by Adobe in the 1990s to present documents independently of application software, hardware, and operating systems. PDF files can contain text, images, and other content like interactive forms and multimedia. The format is based on PostScript but is simplified for display purposes. PDF was standardized by ISO in 2008 and made an open standard, though it still includes some proprietary Adobe technologies. It allows embedding of fonts and color management features for consistent color output.

Uploaded by

Roger Sepulveda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

19/8/2020 PDF - Wikipedia

PDF
The Portable Document Format (PDF) is a file format developed by Adobe in the 1990s to present
documents, including text formatting and images, in a manner independent of application software, Portable Document
hardware, and operating systems.[2][3] Based on the PostScript language, each PDF file encapsulates a Format
complete description of a fixed-layout flat document, including the text, fonts, vector graphics, raster
images and other information needed to display it. PDF was standardized as ISO 32000 in 2008, and
no longer requires any royalties for its implementation.[4]

PDF files may contain a variety of content besides flat text and graphics including logical structuring
elements, interactive elements such as annotations and form-fields, layers, rich media (including video
content) and three dimensional objects using U3D or PRC, and various other data formats. The PDF
Adobe PDF icon
specification also provides for encryption and digital signatures, file attachments and metadata to
enable workflows requiring these features.

Contents
History and standardization Filename .pdf[note 1]
extension
Technical foundations
Internet application/pdf,[1]
PostScript media type
application/x-pdf
Technical overview
application/x-
File structure
bzpdf
Imaging model
application/x-
Vector graphics
gzpdf
Raster images
Text Type code 'PDF '[1] (including
Fonts a single space)
Standard Type 1 Fonts (Standard 14 Fonts) Uniform Type com.adobe.pdf
Identifier (UTI)
Encodings
Magic number %PDF
Transparency
Developed by Adobe Inc. (1993–
Interactive elements
2008)
AcroForms
https://en.wikipedia.org/wiki/PDF 1/23
19/8/2020 PDF - Wikipedia

Forms Data Format (FDF) ISO (2008–)


XML Forms Data Format (XFDF) Initial release 15 June 1993
Adobe XML Forms Architecture (XFA) Latest release 2.0
Logical structure and accessibility Extended to PDF/A, PDF/E,
Optional Content Groups (layers) PDF/UA, PDF/VT,
Security and signatures PDF/X
Usage rights Standard ISO 32000-2
Vulnerabilities Open format? Yes
File attachments Website www.iso.org
Metadata /standard/63534
Usage restrictions and monitoring .html (https://www.
iso.org/standard/6
Default display settings
3534.html)
Intellectual property
Technical issues
Accessibility
Viruses and exploits
Content
Software
Conversions
Annotation
Other
See also
Notes
References
Further reading
External links

History and standardization

https://en.wikipedia.org/wiki/PDF 2/23
19/8/2020 PDF - Wikipedia

Adobe Systems made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in desktop
publishing workflows, and competed with a variety of formats such as DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and
even Adobe's own PostScript format.

PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the
International Organization for Standardization as ISO 32000-1:2008,[5][6] at which time control of the specification passed to an ISO
Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for
all patents owned by Adobe that are necessary to make, use, sell, and distribute PDF-compliant implementations.[7]

PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by
Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as
normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not
standardized and their specification is published only on Adobe's website.[8][9][10][11][12] Many of them are also not supported by popular
third-party implementations of PDF.

On July 28, 2017, ISO 32000-2:2017 (PDF 2.0) was published.[13] ISO 32000-2 does not include any proprietary technologies as
normative references.[14]

Technical foundations
The PDF combines three technologies:

A subset of the PostScript page description programming language, for generating the layout and graphics.
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single file, with data compression where
appropriate.

PostScript

PostScript is a page description language run in an interpreter to generate an image, a process requiring many resources. It can handle
graphics and standard features of programming languages such as if and loop commands. PDF is largely based on PostScript but
simplified to remove flow control features like these, while graphics commands such as lineto remain.

Often, the PostScript-like PDF code is generated from a source PostScript file. The graphics commands that are output by the PostScript
code are collected and tokenized. Any files, graphics, or fonts to which the document refers also are collected. Then, everything is
compressed to a single file. Therefore, the entire PostScript world (fonts, layout, measurements) remains intact.

https://en.wikipedia.org/wiki/PDF 3/23
19/8/2020 PDF - Wikipedia

As a document format, PDF has several advantages over PostScript:

PDF contains tokenized and interpreted results of the PostScript source code, for direct correspondence between changes to items in
the PDF page description and changes to the resulting page appearance.
PDF (from version 1.4) supports transparent graphics; PostScript does not.
PostScript is an interpreted programming language with an implicit global state, so instructions accompanying the description of one
page can affect the appearance of any following page. Therefore, all preceding pages in a PostScript document must be processed to
determine the correct appearance of a given page, whereas each page in a PDF document is unaffected by the others. As a result,
PDF viewers allow the user to quickly jump to the final pages of a long document, whereas a PostScript viewer needs to process all
pages sequentially before being able to display the destination page (unless the optional PostScript Document Structuring Conventions
have been carefully complied and included).

Technical overview

File structure

A PDF file is a 7-bit ASCII file, except for certain elements that may have binary content. A PDF file starts with a header containing the
magic number and the version of the format such as %PDF-1.7. The format is a subset of a COS ("Carousel" Object Structure) format.[15] A
COS tree file consists primarily of objects, of which there are eight types:[16]

Boolean values, representing true or false


Numbers
Strings, enclosed within parentheses ((...)), may contain 8-bit characters.
Names, starting with a forward slash (/)
Arrays, ordered collections of objects enclosed within square brackets ([...])
Dictionaries, collections of objects indexed by Names enclosed within double pointy brackets (<<...>>)
Streams, usually containing large amounts of data, which can be compressed and binary, between the stream and endstream
keywords, preceded by a dictionary
The null object

Furthermore, there may be comments, introduced with the percent sign (%). Comments may contain 8-bit characters.

Objects may be either direct (embedded in another object) or indirect. Indirect objects are numbered with an object number and a
generation number and defined between the obj and endobj keywords if residing in the document root. Beginning with PDF version 1.5,
indirect objects (except other streams) may also be located in special streams known as object streams (marked /Type /ObjStm). This

https://en.wikipedia.org/wiki/PDF 4/23
19/8/2020 PDF - Wikipedia

technique enables non-stream objects to have standard stream filters applied to them, reduces the size of files that have large numbers of
small indirect objects and is especially useful for Tagged PDF. Object streams do not support specifying an object's generation number
(other than 0).

An index table, also called the cross-reference table, is typically located near the end of the file and gives the byte offset of each indirect
object from the start of the file.[17] This design allows for efficient random access to the objects in the file, and also allows for small changes
to be made without rewriting the entire file (incremental update). Before PDF version 1.5, the table would always be in a special ASCII
format, be marked with the xref keyword, and follow the main body composed of indirect objects. Version 1.5 introduced optional cross-
reference streams, which have the form of a standard stream object, possibly with filters applied. Such a stream may be used instead of the
ASCII cross-reference table and contains the offsets and other information in binary format. The format is flexible in that it allows for
integer width specification (using the /W array), so that for example a document not exceeding 64 KiB in size may dedicate only 2 bytes for
object offsets.

At the end of a PDF file is a footer containing:

The startxref keyword followed by an offset to the start of the cross-reference table (starting with the xref keyword) or the cross-
reference stream object
And the %%EOF end-of-file marker.

If a cross-reference stream is not being used, the footer is preceded by the trailer keyword followed by a dictionary containing
information that would otherwise be contained in the cross-reference stream object's dictionary:

A reference to the root object of the tree structure, also known as the catalog (/Root)
The count of indirect objects in the cross-reference table (/Size)
And other optional information.

There are two layouts to the PDF files: non-linear (not "optimized") and linear ("optimized"). Non-linear PDF files consume less disk space
than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document
are scattered throughout the PDF file. Linear PDF files (also called "optimized" or "web optimized" PDF files) are constructed in a manner
that enables them to be read in a Web browser plugin without waiting for the entire file to download, since they are written to disk in a
linear (as in page order) fashion.[18] PDF files may be optimized using Adobe Acrobat software or QPDF.

Imaging model

The basic design of how graphics are represented in PDF is very similar to that of PostScript, except for the use of transparency, which was
added in PDF 1.4.

https://en.wikipedia.org/wiki/PDF 5/23

You might also like