0% found this document useful (0 votes)
89 views23 pages

Document 3

Uploaded by

amidattitilope5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views23 pages

Document 3

Uploaded by

amidattitilope5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

• The update cross-reference table need not contain any entries.

A conforming
writer that uses the hybrid-reference format creates the main cross-reference table,
the update cross-reference table, and the cross-reference stream at the same time.
Objects 12 and 13, for example, are not compressed. They might have entries in
the update table. Since objects 2 and 11, the object stream and the cross-reference
stream, are not compressed, they might also be defined in the update table. Since
they are part of the hidden section, however, it makes sense to define them in the
cross-reference stream.
• The update cross-reference section shall appear at the end of the file, but
otherwise, there are no ordering restrictions on any of the objects or on the main
cross-reference section. However, a file that uses both the hybrid-reference format
and the linearized format has ordering requirements (see Annex F).
7.6 Encryption
7.6.1 General
A PDF document can be encrypted (PDF 1.1) to protect its contents from
unauthorized access. Encryption applies to all strings and streams in the
document's PDF file, with the following exceptions:
• The values for the ID entry in the trailer
• Any strings in an Encrypt dictionary
• Any strings that are inside streams such as content streams and compressed
object streams, which themselves are encrypted
Encryption is not applied to other object types such as integers and boolean values,
which are used primarily to convey information about the document's structure
rather than its contents. Leaving these values unencrypted allows random access to
the objects within a document, whereas encrypting the strings and streams protects
the document's contents.
When a PDF stream object (see 7.3.8, "Stream Objects") refers to an external file,
the stream's contents shall not be encrypted, since they are not part of the PDF file
itself. However, if the contents of the stream are embedded within the PDF file (see
7.11.4, "Embedded File Streams"), they shall be encrypted like any other stream in
the file. Beginning with PDF 1.5, embedded files can be encrypted in an otherwise
unencrypted document (see 7.6.5, "Crypt Filters").
Encryption-related information shall be stored in a document's encryption
dictionary, which shall be the value of the Encrypt entry in the document's trailer
dictionary (see Table 15). The absence of this entry from the trailer dictionary
means that a conforming reader shall consider the document to be not encrypted.
The entries shown in Table 20 are common to all encryption dictionaries.
The encryption dictionary's Filter entry identifies the file's security handler, a
software module that implements various aspects of the encryption process and
controls access to the contents of the encrypted document. PDF specifies a
standard password-based security handler that all conforming readers shall support,
but conforming readers can optionally provide additional security handlers of their
own.
The SubFilter entry specifies the syntax of the encryption dictionary contents. It
allows interoperability between handlers; that is, a document can be decrypted by a
handler other than the preferred one (the Filter entry) if they both support the
format specified by SubFilter.
The V entry, in specifying which algorithm to use, determines the length of the
encryption key, on which the encryption (and decryption) of data in a PDF file
shall be based. For V values 2 and 3, the Length entry specifies the exact length of
the encryption key. In PDF 1.5, a value of 4 for V permits the security handler to
use its own encryption and decryption algorithms and to specify crypt filters to use
on specific streams (see 7.6.5, "Crypt Filters*).
The remaining contents of the encryption dictionary shall be determined by the
security handler and may vary from one handler to another. Entries for the standard
security handler are described in 7.6.3, "Standard
Security Handler." Entries for public-key security handlers are described in 7.6.4,
"Public-Key Security Handlers."
Unlike strings within the body of the document, those in the encryption dictionary
shall be direct objects. The contents of the encryption dictionary shall not be
encrypted (the algorithm specified by the V entry). Security handlers shall be
responsible for encrypting any data in the encryption dictionary that they need to
protect.
NOTE Conforming writers have two choices if the encryption methods
and syntax provided by PDF are not sufficient for their needs:
they can provide an alternate security handler or they can
encrypt whole PDF documents themselves, not making use of
PDF security.
7.6.2 General Encryption Algorithm
One of the following algorithms shall be used when encrypting data in a PDF file:
• A proprietary encryption algorithm known as RC4. RC4 is a symmetric
stream cipher: the same algorithm shall be used for both encryption and
decryption, and the algorithm does not change the length of the data.C4 is a
copyrighted, proprietary algorithm of RSA Security, Inc. Independent software
vendors may be required to license RC4 to develop software that encrypts or
decrypts PDF documents. For further information, visit the RSA Web site
http://www.rsasecurity.com> or send e-mail to <[email protected]>.
• The AES (Advanced Encryption Standard) algorithm (beginning with PDF
1.6). AES is a symmetric block cipher: the same algorithm shall be used for both
encryption and decryption, and the length of the data when encrypted is rounded
up to a multiple of the block size, which is fixed to always be 16 bytes, as specified
in FIPS 197, Advanced Encryption Standard (AES); see the Bibliography).
Strings and streams encrypted with AES shall use a padding scheme that is
described in Internet FC 2898, PKCS #5: Password-Based Cryptography
Specification Version 2.0; see the Bibliography. For an original message length of
M, the pad shall consist of 16 - (M mod 16) bytes whose value shall also be 16 -
(M mod 16).
EXAMPLE A 9-byte message has a pad of 7 bytes, each with the value
0x07. The pad can be unambiguously removed to determine the
original message length when decrypting. Note that the pad is
present when M is evenly divisible by 16; it contains 16 bytes of
0x10.
PDF's standard encryption methods also make use of the MD5 message-digest
algorithm for key generation purposes (described in Internet FC 1321, The MD5
Message-Digest Algorithm; see the Bibliography).
The encryption of data in a PDF file shall be based on the use of an encryption key
computed by the security handler. Different security handlers compute the
encryption key using their own mechanisms. Regardless of how the key is
computed, its use in the encryption of data shall always be the same (see
"Algorithm 1:
Encryption of data using the RC4 or AES algorithms"). Because the RC4 algorithm
and AES algorithms are symmetric, this same sequence of steps shall be used both
to encrypt and to decrypt data.
Algorithms in 7.6, "Encryption" are uniquely numbered within that clause in a
manner that maintains compatibility with previous documentation.
Algorithm 1: Encryption of data using the RC4 or AES algorithms
a) Obtain the object number and generation number from the object identifier
of the string or stream to be encrypted (see 7.3.10, "Indirect Objects"). If the string
is a direct object, use the identifier of the indirect object containing it.
b) For all strings and streams without crypt filter specifier; treating the object
number and generation number as binary integers, extend the original n-byte
encryption key to n + 5 bytes by appending the low-order 3 bytes of the object
number and the low-order 2 bytes of the generation number in that order, low-order
byte first. (n is 5 unless the value of V in the encryption dictionary is greater than
1, in which case n is the value of Length divided by 8.)
If using the AES algorithm, extend the encryption key an additional 4 bytes by
adding the value "SAIT", which corresponds to the hexadecimal values 0x73.
0x41. 0x6C. 0×54. (This addition is done for backward compatibility and is not
intended to provide additional security.)
c) Initialize the MD5 hash function and pass the result of step (b) as input to
this function.
d) Use the first (n + 5) bytes, up to a maximum of 16, of the output from the
MD5 hash as the key for the RC4 or AES symmetric key algorithms, along with
the string or stream data to be encrypted.
If using the AES algorithm, the Cipher Block Chaining (CBC) mode,
which requires an initialization vector, is used. The block size parameter is
set to 16 bytes, and the initialization vector is a 16-byte random number that is
stored as the first 16 bytes of the encrypted stream or string.
The output is the encrypted data to be stored in the PDF file.
Stream data shall be encrypted after applying all stream encoding filters and shall
be decrypted before applying any stream decoding filters. The number of bytes to
be encrypted or decrypted shall be given by the Length entry in the stream
dictionary. Decryption of strings (other than those in the encryption dictionary)
shall be done after escape-sequence processing and hexadecimal decoding as
appropriate to the string representation described in 7.3.4, "String Objects."
7.6.3 Standard Security Handler
7.6.3.1 General
PDF's standard security handler shall allow access permissions and up to two
passwords to be specified for a document: an owner password and a user password.
An application's decision to encrypt a document shall be based on whether the user
creating the document specifies any passwords or access restrictions.
EXAMPLE A conforming writer may have a security settings dialog box that
the user can invoke before saving the PDF file.
If passwords or access restrictions are specified, the document shall be encrypted,
and the permissions and information required to validate the passwords shall be
stored in the encryption dictionary. Documents in which only file attachments are
encrypted shall use the same password as the user and owner password.
NOTE 1 A conformino writer mav also create an encrypted document
without an user interaction it it has some oiner source of
information about what passwords and permissions to use.
If a user attempts to open an encrypted document that has a user password, the
conforming reader shall first try to authenticate the encrypted document using the
padding string defined in 7.6.3.3, "Encryption Key Algorithm" (default user
password):
• If this authentication attempt is successful, the conforming reader may open,
decrypt and display the document on the screen.
• If this authentication attempt fails, the application should prompt for a
password. Correctly supplying either password (owner or user password) should
enable the user to open the document, decrypt it, and display it on the screen.
Whether additional operations shall be allowed on a decrypted document depends
on which password (if any) was supplied when the document was opened and on
any access restrictions that were specified when the document was created:
• Opening the document with the correct owner password should allow full
(owner) access to the document.This unlimited access includes the ability to
change the document's passwords and access permissions.
• Opening the document with the correct user password (or opening a
document with the default password) should allow additional operations to be
performed according to the user access permissions specified in the document's
encryption dictionary.
Access permissions shall be specified in the form of flags corresponding to the
various operations, and the set of operations to which they correspond shall depend
on the security handler's revision number (also stored in the encryption dictionary).
If the security handler's revision number is 2 or greater, the operations to which
user access can be controlled shall be as follows:
• Modifying the document's contents
• Copying or otherwise extracting text and graphics from the document,
including extraction for accessibility purposes (that is, to make the contents of the
document accessible through assistive technologies such as screen readers or
Braille output devices; see 14.9, "Accessibility Support".
• Adding or modifying text annotations (see 12.5.6.4, "Text Annotations") and
interactive form fields (see 12.7, "Interactive Forms")
• Printing the document
If the security handler's revision number is 3 or greater, user access to the
following operations shall be controlled more selectively:
• Filling in forms (that is, filling in existing interactive form fields) and
signing the document (which amounts to filling in existing signature fields, a type
of interactive form field).
• Assembling the document: inserting, rotating, or deleting pages and creating
navigation elements such as bookmarks or thumbnail images (see 12.3,
"Document-Level Navigation").
• Printing to a representation from which a faithful digital copy of the PDF
content could be generated.Disallowing such printing may result in degradation of
output quality.
• In addition, security handlers of revisions 3 and greater shall enable the
extraction of text and graphics (in support of accessibility to users with disabilities
or for other purposes) to be controlled separately.
If a security handler of revision 4 is specified, the standard security handler shall
support crypt filters (see 7.6.5,"Crypt Filters"). The support shall be limited to the
Identity crypt filter (see Table 26) and crypt filters named StdCF whose
dictionaries contain a CM value of V2 or AESV2 and an AuthEvent value of
DocOpen. Public-Key security handlers in this case shall use crypt filters named
DefaultCryptFilter when all document content is encrypted, and shall use crypt
filters named DefEmbeddedFile when file attachments only are encrypted in place
of StdCF name. This nomenclature shall not be used as indicator of the type of the
security handler or encryption.
Once the document has been opened and decrypted successfully, a conforming
reader technically has access to the entire contents of the document. There is
nothing inherent in PDF encryption that enforces the document permissions
specified in the encryption dictionary. Conforming readers shall respect the intent
of the document creator by restricting user access to an encrypted PDF file
according to the permissions contained in the file.
NOTE 2 PDF 1.5 introduces a set of access permissions that do not
require the document to be encrypted (see
12.8.4,"Permissions*). This enables limited access to a
document when a user is not be able to respond to a prompt for a
password. For example, there may be conforming readers that do not have a
person running them such as printing off-line or on a server.
7.6.3.2 Standard Encryption Dictionary
The values of the O and U entries in this dictionary shall be used to determine
whether a password entered when the document is opened is the correct owner
password, user password, or neither.
The value of the P entry shall be interpreted as an unsigned 32-bit quantity
containing a set of flags specifying which access permissions shall be granted
when the document is opened with user access. Table 22 shows the meanings of
these flags. Bit positions within the flag word shall be numbered from 1 (low-
order) to 32 (high-order). A 1 bit in any position shall enable the corresponding
access permission. Which bits shall be meaningful, and in some cases how they
shall be interpreted, shall depend on the security handler's revision number
(specified in the encryption dictionary's R entry). Conforming readers shall ignore
all flags other than those at bit positions 3, 4, 5, 6, 9, 10, 11, and 12.
NOTE PDF integer objects can be interpreted as binary values in a
signed twos-complement form. Since all the reserved high-order
flag bits in the encryption dictionary's p value are required to be
1, the integer value P shall be specified as a negative integer. For
example, assuming revision 2 of the security handler, the value -
44 permits printing and copying but disallows modifying the
contents and annotations.
7.6.3.3 Encryption Key Algorithm
As noted earlier, one function of a security handler is to generate an encryption key
for use in encrypting and decrypting the contents of a document. Given a password
string, the standard security handler computes an encryption key as shown in
"Algorithm 2: Computing an encryption key".
Algorithm 2: Computing an encryption key
a) Pad or truncate the password string to exactly 32 bytes. If the password string is
more than 32 bytes long, use only its first 32 bytes; if it is less than 32 bytes long,
pad it by appending the required number of additional bytes from the beginning of
the following padding string:
< 28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08
2E 2E 00 B6 DO 68 3E 80 2F OC A9 FE 64 53 69 7A >
That is, if the password string is n bytes long, append the first 32 - n bytes of the
padding string to the end of the password string. If the password string is empty
(zero-length), meaning there is no user password, substitute the entire padding
string in its place.
b) Initialize the MD5 hash function and pass the result of step (a) as input to
this function.
c) Pass the value of the encryption dictionary's O entry to the MD5 hash
function. ("Algorithm 3: Computing the encryption dictionary's O (owner
password) value" shows how the O value is computed.)
d) Convert the integer value of the P entry to a 32-bit unsigned binary number
and pass these bytes to the MD5 hash function, low-order byte first.
e) Pass the first element of the file's file identifier array (the value of the ID
entry in the document's trailer dictionary; see Table 15) to the MD5 hash function.
NOTE The first element of the ID array generally remains the same for a
given document. However, in some situations, conforming
writers may regenerate the ID array it a new generation of a
document is created.Security handlers are encouraged not to
rely on the ID in the encryption key computation.
f) (Security handlers of revision 4 or greater) If document metadata is not
being encrypted, pass 4 bytes with the value OxFFFFFFFF to the MD5 hash
function.
g) Finish the hash.
h) (Security handlers of revision 3 or greater) Do the following 50 times: Take
the output from the previous MD5 hash and pass the first n bytes of the output as
input into a new MD5 hash, where n is the number of bytes of the encryption key
as defined by the value of the encryption dictionary's Length entry.
i) Set the encryption key to the first n bytes of the output from the final MD5
hash, where n shall always be 5 for security handlers of revision 2 but, for security
handlers of revision 3 or greater, shall depend on the value of the encryption
dictionary's Length entry.
This algorithm, when applied to the user password string, produces the encryption
key used to encrypt or decrypt string and stream data according to "Algorithm 1:
Encryption of data using the RC4 or AES algorithms" in 7.6.2, "General
Encryption Algorithm." Parts of this algorithm are also used in the algorithms
described below
7.6.3.4 Password Algorithms
In addition to the encryption key, the standard security handler shall provide the
contents of the encryption dictionary (Table 20 and Table 21). The values of the
Filter, V, Length, R, and P entries are straightforward, but the computation of the O
(owner password) and U (user password) entries requires further explanation. The
algorithms 3 through 7 that follow show how the values of the owner password and
user password entries shall be computed (with separate versions of the latter
depending on the revision of the security handler).
Algorithm 3: Computing the encryption dictionary's O (owner password) value
a) Pad or truncate the owner password string as described in step (a) of
"Algorithm 2: Computing an encryption key". If there is no owner password, use
the user password instead.
b) Initialize the MD5 hash function and pass the result of step (a) as input to
this function.
c) (Security handlers of revision 3 or greater) Do the following 50 times: Take
the output from the previous MD5 hash and pass it as input into a new MD5 hash.
d) Create an RC4 encryption key using the first n bytes of the output from the
final MD5 hash, where n shall always be 5 for security handlers of revision 2 but,
for security handlers of revision 3 or greater, shall depend on the value of the
encryption dictionary's Length entry.
e) Pad or truncate the user password string as described in step (a) of
"Algorithm 2: Computing an encryption key".
f) Encrypt the result of step (e), using an RC4 encryption function with the
encryption key obtained in step (d).
g) (Security handlers of revision 3 or greater) Do the following 19 times: Take
the output from the previous invocation of the RC4 function and pass it as input to
a new invocation of the function; use an encryption key generated by taking each
byte of the encryption key obtained in step (d) and performing an XOR (exclusive
or) operation between that byte and the single-byte value of the iteration counter
(from 1 to 19).
h) Store the output from the final invocation of the RC4 function as the value of
the O entry in the encryption dictionary.
Algorithm 4: Computing the encryption dictionary's U (user password) value
(Security handlers of revision 2)
a) Create an encryption key based on the user password string, as described in
"Algorithm 2: Computing an encryption key".
b) Encrypt the 32-byte padding string shown in step (a) of "Algorithm 2:
Computing an encryption key", using an RC4 encryption function with the
encryption key from the preceding step.
c) Store the result of step (b) as the value of the U entry in the encryption
dictionary.
Algorithm 5: Computing the encryption dictionary's U (user password) value
(Security handlers of revision 3 or greater)
a) Create an encryption key based on the user password string, as described in
"Algorithm 2: Computing an encryption key".
b) Initialize the MD5 hash function and pass the 32-byte padding string shown
in step (a) of "Algorithm 2:Computing an encryption key" as input to this function.
c) Pass the first element of the file's file identifier array (the value of the ID
entry in the document's trailer dictionary; see Table 15) to the hash function and
finish the hash.
d) Encrypt the 16-byte result of the hash, using an RC4 encryption function
with the encryption key from step (a).
e) Do the following 19 times: Take the output from the previous invocation of
the RC4 function and pass it as input to a new invocation of the function; use an
encryption key generated by taking each byte of the original encryption key
obtained in step (a) and performing an XOR (exclusive or) operation between that
byte and the single-byte value of the iteration counter (from 1 to 19).
f) Append 16 bytes of arbitrary padding to the output from the final invocation
of the RC4 function and store the 32-byte result as the value of the U entry in the
encryption dictionary.
NOTE The standard security handler uses the algorithms 6 and 7 that
follow, to determine whether a supplied password string is the
correct user or owner password. Note too that algorithm 6 can
be used to determine whether a document's user password is the
empty string, and therefore whether to suppress prompting for a
password when the document is opened
Algorithm 6: Authenticating the user password
a) Perform all but the last step of "Algorithm 4: Computing the encryption
dictionary's U (user password) value (Security handlers of revision 2)" or
"Algorithm 5: Computing the encryption dictionary's U (user password) value
(Security handlers of revision 3 or greater)" using the supplied password string.
b) If the result of step (a) is equal to the value of the encryption dictionary's U
entry (comparing on the first 16 bytes in the case of security handlers of revision 3
or greater), the password supplied is the correct user password. The key obtained in
step (a) (that is, in the first step of "Algorithm 4: Computing the encryption
dictionary's U (user password) value (Security handlers of revision 2)" or
"Algorithm 5: Computing the encryption dictionary's U (user password) value
(Security handlers of revision 3 or greater)") shall be used to decrypt the document.
Algorithm 7: Authenticating the owner password
a) Compute an encryption key from the supplied password string, as described
in steps (a) to (d) of
"Algorithm 3: Computing the encryption dictionary's O (owner password) value".
b) (Security handlers of revision 2 only) Decrypt the value of the encryption
dictionary's O entry, using an RC4 encryption function with the encryption key
computed in step (a).
(Security handlers of revision 3 or greater) Do the following 20 times: Decrypt the
value of the encryption dictionary's O entry (first iteration) or the output from the
previous iteration (all subsequent iterations), using an RC4 encryption function
with a different encryption key at each iteration. The key shall be generated by
taking the original key (obtained in step (a)) and performing an XOR (exclusive or)
operation between each byte of the key and the single-byte value of the iteration
counter (from 19 to 0).T
c) The result of step (b) purports to be the user password. Authenticate this user
password using "Algorithm
6: Authenticating the user password". If it is correct, the password supplied is the
correct owner password.
7.6.4 Public-Key Security Handlers
7.6.4.1 General
Security handlers may use public-key encryption technology to encrypt a
document (or strings and streams within a document). When doing so, specifying
one or more lists of recipients, where each list has its own unique access
permissions may be done. Only specified recipients shall open the encrypted
document or content, unlike the standard security handler, where a password
determines access. The permissions defined for public-key security handlers are
shown in Table 24 in 7.6.4.2, "Public-Key Encryption Dictionary".
Public-key security handlers use the industry standard Public Key Cryptographic
Standard Number 7 (PKCS#7) binary encoding syntax to encode recipient list,
decryption key, and access permission information.The PCS#7 specification is in
Internet FC 2315, PKCS #7: Cryptographic Message Syntax, Version 1.5 (see the
Bibliography).
When encrypting the data, each recipient's X.509 public key certificate (as
described in ITU-T Recommendation X.509; see the Bibliography) shall be
available. When decrypting the data, the conforming reader shall scan the recipient
list for which the content is encrypted and shall attempt to find a match with a
certificate that belongs to the user. If a match is found, the user requires access to
the corresponding private key, which may require authentication, possibly using a
password. Once access is obtained, the private key shall be used to decrypt the
encrypted data.
7.6.4.2 Public-Key Encryption Dictionary
Encryption dictionaries for public-key security handlers contain the common
entries shown in Table 20, whose values are described above. In addition, they may
contain the entry shown in Table 23 as described below.
The Filter entry shall be the name of a public-key security handler.
NOTE Examples of existing security handlers that support public-key
encryption are Entrust.PPKEF,Adobe.PerLite, and
Adobe.Pubsec. This handler will be the preferred handler when
encrypting the document.
Permitted values of the SubFilter entry for use with conforming public-key security
handlers are adbe.pkcs7.s3, adbe.pkcs7.s4, which shall be used when not using
crypt filters (see 7.6.5, "Crypt Filters") and adbe.pkcs7.s5, which shall be used
when using crypt filters.The CF, StmF, and StrF entries may be present when
SubFilter is adbe.pkcs7.s5.
The value of the P entry shall be interpreted as an unsigned 32-bit quantity
containing a set of flags specifying which access permissions shall be granted
when the document is opened with user access. Table 24 shows the meanings of
these flags. Bit positions within the flag word shall be numbered from 1 (low-
order) to 32 (high-order). A 1 bit in any position shall enable the corresponding
access permission.
Conforming readers shall ignore all flags other than those at bit positions 2, 3, 4, 5,
6, 9, 10, 11, and 12.
7.6.4.3 Public-Key Encryption Algorithms
The enveloped data in the PCS#7 object contains keying material that shall be used
to decrypt the document (or individual strings or streams in the document, when
crypt filters are used; see 7.6.5, "Crypt Filters"). A key shall be used to encrypt
(and decrypt) the enveloped data. This key (the plaintext key in Figure 4) shall be
encrypted for each recipient, using that recipient's public key, and shall be stored in
the PCS#7 object (as the encrypted key for each recipient). To decrypt the
document, that key shall be decrypted using the recipient's private key, which
yields a decrypted (plaintext) key. That key, in turn, shall be used to decrypt the
enveloped data in the PCS#7 object, resulting in a byte array that includes the
following information:
• A 20-byte seed that shall be used to create the encryption key that is used by
"Algorithm 1: Encryption of data using the RC4 or AES algorithms". The seed
shall be a unique random number generated by the security handler that encrypted
the document.
• A 4-byte value defining the permissions, least significant byte first. See
Table 24 for the possible Dermnission values.
• When SubFilter is adbe.pkcs7.3, the relevant permissions shall be only those
specified for revision 2 of the standard security handler.
• For adbe.pkcs7.s4, security handlers of revision 3 permissions shall apply.
• For adbe.pkcs7.s5, which supports the use of crypt filters, the permissions
shall be the same as adbe.pkcs7.s4 when the crypt filter is referenced from the
StmF or StrF entries of the encryption dictionary. When referenced from the Crypt
filter decode parameter dictionary of a stream object (see Table 14), the 4 bytes of
permissions shall be absent from the enveloped data.
The algorithms that shall be used to encrypt the enveloped data in the PCS#7
object are: RC4 with key lengths up to 256-bits, DES, Triple DES, RC2 with key
lengths up to 128 bits, 128-bit AES in Cipher Block Chaining (CBC) mode, 192-bit
AES in CBC mode, 256-bit AES in CBC mode. The PCS#7 specification is in
Internet FC 2315, PKCS #7: Cryptographic Message Syntax, Version 1.5 (see the
Bibliography).
The encryption key used by "Algorithm 1: Encryption of data using the RC4 or
AES algorithms" shall be calculated by means of an SHA-1 message digest
operation that digests the following data, in order:
a) The 20 bytes of seed
b) The bytes of each item in the Recipients array of PCS#7 objects in the order
in which they appear in the array
c) 4 bytes with the value OFF if the key being generated is intended for use in
document-level encryption and the document metadata is being left as plaintext
The first n/8 bytes of the resulting digest shall be used as the encryption key, where
n is the bit length of the encryption key.
7.6.5 Crypt Filters
PDF 1.5 introduces crypt filters, which provide finer granularity control of
encryption within a PDF file. The use of crypt filters involves the following
structures:
• The encryption dictionary (see Table 20) contains entries that enumerate the
crypt filters in the document (CF) and specify which ones are used by default to
decrypt all the streams (StmF) and strings (StrF) in the document. In addition, the
value of the V entry shall be 4 to use crypt filters.
• Each crypt filter specified in the CF entry of the encryption dictionary shall
be represented by a crypt filter dictionary, whose entries are shown in Table 25.
• A stream filter type, the Crypt filter (see 7.4.10, "Crypt Filter") can be
specified for any stream in the document to override the default filter for streams.
A conforming reader shall provide a standard Identity filter which shall pass the
data unchanged (see Table 26) to allow specific streams, such as document
metadata, to be unencrypted in an otherwise encrypted document. The stream's
DecodeParms entry shall contain a Crypt filter decode parameters dictionary (see
Table 14) whose Name entry specifies the particular crypt filter to be used (if
missing, Identity is used). Different streams may specify different crypt filters.
Authorization to decrypt a stream shall always be obtained before the stream can
be accessed. This typically occurs when the document is opened, as specified by a
value of DocOpen for the AuthEvent entry in the crypt filter dictionary.
Conforming readers and security handlers shall treat any attempt to access a stream
for which authorization has failed as an error. AuthEvent can also be FOpen, which
indicates the presence of an embedded file that is encrypted with a crypt filter that
may be different from the crypt filters used by default to encrypt strings and
streams in the document.
In the file specification dictionary (see 7.11.3, "File Specification Dictionaries"),
related files (RF) shall use the same crypt filter as the embedded file (EF).
A value of None for the CM entry in the crypt filter dictionary allows the security
handler to do its own decryption. This allows the handler to tightly control key
management and use any preferred symmetric-key cryptographic algorithm.
Security handlers may add their own private data to crypt filter dictionaries. Names
for private data entries shall conform to the PDF name registry (see Annex E).
EXAMPLE The following shows the use of crypt filters in an encrypted
document containing a plaintext document-level metadata
stream. The metadata stream is left as is by applying the Identity
crypt filter. The remaining streams and strings are decrypted
using the default filters.
%PDF-1.5
10 obj
< Type /Catalog
/Pages 2 0 R
/Metadata 6 0 R
>>
endobi
20 obj
/Type /Pages
/Kids 3 0 RJ
/Count 1
endobj
3 0 obj
< /Type /Page
% Document catalog
% Page tree
% 1s t page
/Parent 20 R
/MediaBox [0 0 612 792]

/Contents 4 0 R

う>

endobj

4 0 obj
% Page contents

<< /Length 35 >>

stream

*** Encrypted Page-marking operators *** endstream

endobi

5 0 obj

<< /Title (S#*#%*$#^&##) >>

% Info dictionary: encrypted text string

endobi

6 0 obj

< /Type /Metadata

/Subtype /XML

/Length 15

/Filter [/Crypt]

% Uses a crypt filter

/Decode arms

% with these parameters

< /Type /CryptFilterDecode Parms

/Name /Identity

% Indicates no encryption

stream

XML metadata

end stream endobj

8 0 obj

<< /Filter /MySecurityHandlerName

N4

ICF
< /MyFilterO

<< Type /CryptFilter

/CFM V2

% Unencrypted metadata

% Encryption dictionary

% Version 4: allow crypt filters

% List of crypt filters

% Uses the standard algorithm

/StrF /MyFilter0

/StmF /MyFilterO

% Strings are decrypted using /MyFilterO

% Streams are decrypted using /MyFilter0

% Private data for /My SecurityHandlerName

/MyUnsecureKey (12345678)

/EncryptMetadata false

endobi

trailer

< /Size 8

/Root 1 0 R

/Info 5 0 R

/Encrypt 8 0 R

495

%%EOF

7.7 Document Structure

7.7.1 General

A PDF document can be regarded as a hierarchy of objects contained in the body section of a PDF file. At
the root of the hierarchy is the document's catalog dictionary (see 7.7.2, "Document Catalog").

NOTE Most of the objects in the hierarchy are dictionaries. Figure 5 illustrates the structure of
the object hierarchy.

EXAMPLE Each page of the document is represented by a page object- a dictionary that includes
references to the page's contents and other attributes, such as its thumbnail image (12.3.4, "Thumbnail
Images") and any annotations (12.5, "Annotations") associated with it. The individual page objects are
tied together in a structure called the page tree (described in 7.7.3, "Page Tree"), which in turn is specified
by an indirect reference in the document catalog. Parent, child, and sibling relationships within the
hierarchy are defined by dictionary entries whose values are indirect references to other dictionaries.

The data structures described in this sub-clause, particularly the Catalog and Page dictionaries, combine
entries describing document structure with ones dealing with the detailed semantics of documents and
pages.

All entries are listed here, but many of their descriptions are deferred to subsequent sub-clauses.

7.7.2 Document Catalog

The root of a document's object hierarchy is the catalog dictionary, located by means of the Root entry in
the trailer of the PDF file (see 7.5.5, "File Trailer"). The catalog contains references to other objects
defining the document's contents, outline, article threads, named destinations, and other attributes. In
addition, it contains information about how the document shall be displayed on the screen, such as
whether its outline and thumbnail page images shall be displayed automatically and whether some
location other than the first page shall be shown when the document is opened. Table 28 shows the entries
in the catalog dictionary.

EXAMPLE The following shows a sample catalog object.

10 obj

< /Type /Catalog

/Pages 20 R

/PageMode /UseOutlines

/Outlines 30 R

endobj

7.7.3 Page Tree

7.7.3.1 General

The pages of a document are accessed through a structure known as the page tree, which defines the orde
of pages in the document. Using the tree structure, conforming readers using only limited memory, can
quit open a document containing thousands of pages. The tree contains nodes of two types--intermediate
noc called page tree nodes, and leaf nodes, called page objects whose form is described in the subsequent
s clauses. Conforming products shall be prepared to handle any form of tree structure built of such nodes.

NOTE The simplest structure can consist of a single page tree node that references all of the
document's p objects directly. However, to optimize application performance, a conforming writer can
construct trees particular form, known as balanced trees. Further information on this form of tree can be
found in Structures and Algorithms, by Aho, Hopcroft, and Ullman (see the Bibliography).

NOTE The structure of the page tree is not necessarily related to the logical structure of the document;
that is, page tree nodes do not represent chapters, sections, and so forth. Other data structures are defined
for this purpose; see 14.7, "Logical Structure".

Conforming products shall not be required to preserve the existing structure of the page tree.

EXAMPLE The following illustrates the page tree for a document with three pages. See 7.7.3.3, "Page
Objects," for the contents of the individual page objects, and H.5, "Page Tree Example", for a more
extended example showing the page tree for a longer document.

2 0 obj

< Type /Pages

/Kids [40 R

10 0 R

24 0 R

/Count 3

endobi

4 0 obj

<< /Туре /Page

..Additional entries describing the attributes of this page...

endobi

10 0 obj

< Type /Page

Additional entries describing the attributes of this page...

endobj

24 0 obi

<<

/Type /Page

... Additional entries describing the attributes of this page..


endobi

In addition to the entries shown in Table 29, a page tree node may contain further entries defining
inherited attributes for the page objects that are its descendants (see 7.7.3.4, "Inheritance of Page
Attributes").

7.7.3.3 Page Objects

The leaves of the page tree are page objects, each of which is a dictionary specifying the attributes of a
single page of the document. Table 30 shows the contents of this dictionary. The table also identifies
which attributes a page may inherit from its ancestor nodes in the page tree, as described under 7.7.3.4,
"Inheritance of Page Attributes." Attributes that are not explicitly identified in the table as inheritable shall
not be inherited.

EXAMPLE The following shows the definition of a page object with a thumbnail image and two
annotations. The media box specifies that the page is to be printed on letter-size paper. In addition, the
resource dictionary is specified as a direct object and shows that the page makes use of three fonts named
F3, F5, and F7.

3 0 obj

<< Type /Page

/Parent 40 R

/MediaBox [0 0 612 792]

/Resources < /Font < /F3 70 R

/F5 90 R

/F7 11 0 R

/ProcSet (/PDF]

>>

/Contents 120 R

/Thumb 14 0 R

/Annots [ 23 0 R

24 0 R

endobi

7.7.3.4 Inheritance of Page Attributes

Some of the page attributes shown in Table 30 are designated as inheritable. If such an attribute is omitted
from a page object, its value shall be inherited from an ancestor node in the page tree. If the attribute is a
required one, a value shall be supplied in an ancestor node. If the attribute is optional and no inherited
value is specified, the default value shall be used.

An attribute can thus be defined once for a whole set of pages by specifying it in an intermediate page
tree node and arranging the pages that share the attribute as descendants of that node.

EXAMPLE A document may specify the same media box for all of its pages by including a MediaBox
entry in the root node of the page tree. If necessary, an individual page object may override this inherited
value with a MediaBox entry of its own.

In a document conforming to the Linearized PDF organization (see Annex F), all page attributes shall be
specified explicitly as entries in the page dictionaries to which they apply; they shall not be inherited from
an ancestor node.

7.7.4 Name Dictionary

Some categories of objects in a PDF file can be referred to by name rather than by object reference. The
correspondence between names and objects is established by the document's name dictionary (PDF 1.2),
located by means of the Names entry in the document's catalog (see 7.7.2, "Document Catalog"). Each
entry in this dictionary designates the root of a name tree (see 7.9.6, "Name Trees") defining names for a
particular category of objects. Table 31 shows the contents of the name dictionary.

7.8 Content Streams and Resources

7.8.1 General

Content streams are the primary means for describing the appearance of pages and other graphical
elements. A content stream depends on information contained in an associated resource dictionary; in
combination, these two objects form a self-contained entity. This sub-clause describes these objects.

7.8.2 Content Streams

A content stream is a PDF stream object whose data consists of a sequence of instructions describing the
graphical elements to be painted on a page. The instructions shall be represented in the form of PDF
objects, using the same object syntax as in the rest of the PDF document. However, whereas the document
as a whole is a static, random-access data structure, the objects in the content stream shall be interpreted
and acted upon sequentially.

Each page of a document shall be represented by one or more content streams. Content streams shall also
be used to package sequences of instructions as self-contained graphical elements, such as forms (see
8.10,"Form XObjects*), patterns (8.7, "Patterns"), certain fonts (9.6.5, "Type 3 Fonts"), and annotation
appearances (12.5.5, "Appearance Streams").

A content stream, after decoding with any specified filters, shall be interpreted according to the PDF
syntax rules described in 7.2, "Lexical Conventions." It consists of PDF objects denoting operands and
operators. The operands needed by an operator shall precede it in the stream. See EXAMPLE 4 in 7.4,
"Filters," for an example of a content stream.

An operand is a direct object belonging to any of the basic PDF data types except a stream. Dictionaries
shall be permitted as operands only by certain specific operators. Indirect objects and object references
shall not be permitted at all.

An operator is a PDF keyword specifying some action that shall be performed, such as painting a
graphical shape on the page. An operator keyword shall be distinguished from a name object by the
absence of an initial SOLIDUS character (2Fh) (/). Operators shall be meaningful only inside a content
stream.

NOTE 1 This postfix notation, in which an operator is preceded by its operands, is superficially the
same as in the PostScript language. However, PDF has no concept of an operand stack as PostScript has.

In PDF, all of the operands needed by an operator shall immediately precede that operator. Operators do
not return results, and operands shall not be left over when an operator finishes execution.

NOTE 2 Most operators have to do with painting graphical elements on the page or with specifying
parameters that affect subsequent painting operations. The individual operators are described in the
clauses devoted to their functions:

Clause 8, "Graphics" describes operators that paint general graphics, such as filled areas, strokes, and
sampled images, and that specify device-independent graphical parameters, such as colour.

Clause 9, "Text" describes operators that paint text using character glyphs defined in fonts.

Clause 10, "Rendering" describes operators that specify device-dependent rendering parameters.

Clause 14, "Document Interchange describes the marked-content operators that associate higher-level
logical information with objects in the content stream. These operators do not affect the rendered
appearance of the content; they specify information useful to applications that use PDF for document
interchange.

Ordinarily, when a conforming reader encounters an operator in a content stream that it does not
recognize, an error shall occur. A pair of compatibility operators, BX and EX (PDF 1.1), shall modify this
behaviour (see Table 32). These operators shall occur in pairs and may be nested. They bracket a
compatibility section, a portion of a content stream within which unrecognized operators shall be ignored
without error. This mechanism enables a conforming writer to use operators defined in later versions of
PDF without sacrificing compatibility with older applications. It should be used only in cases where
ignoring such newer operators is the appropriate thing to do. The BX and EX operators are not themselves
part of any graphics object (see 8.2,*Graphics Objects") or of the graphics state (8.4, "Graphics State").

7.8.3 Resource Dictionaries

As stated above, the operands supplied to operators in a content stream shall only be direct objects;
indirect objects and object references shall not be permitted. In some cases, an operator shall refer to a
PDF object that is defined outside the content stream, such as a font dictionary or a stream containing
image data. This shall be accomplished by defining such objects as named resources and referring to them
by name from within the content stream.

Named resources shall be meaningful only in the context of a content stream. The scope of a resource
name shall be local to a particular content stream and shall be unrelated to externally known identifiers
for objects such as fonts. References from one object outside of content streams to another outside of
content streams shall be made by means of indirect object references rather than named resources.

A content stream's named resources shall be defined by a resource dictionary, which shall enumerate the
named resources needed by the operators in the content stream and the names by which they can be
referred to.

EXAMPLE 1 If a text operator appearing within the content stream needs a certain font, the content
stream's resource dictionary can associate the name F42 with the corresponding font dictionary. The text
operator can use this name to refer to the font

A resource dictionary shall be associated with a content stream in one of the following ways:

• For a content stream that is the value of a page's Contents entry (or is an element of an array that is
the value of that entry), the resource dictionary shall be designated by the page dictionary's Resources or
is inherited, as described under 7.7.3.4, "Inheritance of Page Attributes," from some ancestor node of the
page object.

• For other content streams, a conforming writer shall include a Resources entry in the stream's
dictionary specifying the resource dictionary which contains all the resources used by that content stream.
This shall apply to content streams that define form Objects, patterns, Type 3 fonts, and annotation.

• PDF files written obeying earlier versions of PDF may have omitted the Resources entry in all form
XObjects and Type 3 fonts used on a page. All resources that are referenced from those forms and fonts
shall be inherited from the resource dictionary of the page on which they are used. This construct is
obsolete and should not be used by conforming writers.

In the context of a given content stream, the term current resource dictionary refers to the resource
dictionary associated with the stream in one of the ways described above.

Each key in a resource dictionary shall be the name of a resource type, as shown in Table 33. The
corresponding values shall be as follows:

• For resource type ProcSet, the value shall be an array of procedure set names

• For all other resource types, the value shall be a subdictionary. Each key in the subdictionary shall be
the name of a specific resource, and the corresponding value shall be a PDF object associated with the
name.

EXAMPLE2 The following shows a resource dictionary containing procedure sets, fonts, and external
objects. The procedure sets are specified by an array, as described in 14.2, "Procedure Sets*. The fonts are
specified with a subdictionary associating the names F5, F6, F7, and F8 with objects 6, 8, 10, and 12,
respectively.

Likewise, the XObject subdictionary associates the names Im1 and Im2 with objects 13 and 15
respectively.

<</ProcSet (/PDF /ImageB]

/Font << /F5 6 0 R

/F6 8 0 R
/F7 10 0 R /F8 12 0 R

/Object << /m1 13 0 R

/m2 15 0 R

7.9 Common Data Structures

7.9.1 General

As mentioned at the beginning of this clause, there are some general-purpose data structures that are built
from the basic object types described in 7.3, "Objects," and are used in many places throughout PDF. This
sub. clause describes data structures for text strings, dates, rectangles, name trees, and number trees. More
complex data structures are described in 7.10, "Functions," and 7.11, "File Specifications."

All of these data structures are meaningful only as part of the document hierarchy; they may not appear
within content streams. In particular, the special conventions for interpreting the values of string objects
apply only to strings outside content streams. An entirely different convention is used within content
streams for using strings to select sequences of glyphs to be painted on the page (see clause 9, "Text).
Table 34 summarizes the basic and higher-level data types that are used throughout this standard to
describe the values of dictionary entries and other PDF data values.

You might also like