<?xml version='1.0' encoding='utf-8'?>
<html lang="en">
<h2>Manifest specification</h2>
<h3>Contents</h3>
<toc level='h2'/>
<h2>Introduction</h2>
<p>
Zero Install <i>implementations</i> are directory trees identified by an
algorithm name (e.g., "sha256"), and digest of their contents calculated using
that algorithm. Adding, deleting, renaming or modifying any file in a tree will
change its digest. It should be infeasibly difficult to generate a new tree
with the same digest as a given tree. Thus, if you know the digest of the
implementation you want, and someone gives you a tree with that digest, you can
trust that it is the implementation you want.
</p>
<p>
This document describes how a digest is calculated from a directory tree.
</p>
<p>
Changes to this document (and to the rest of the web-site) are <a href='https://sourceforge.net/p/zero-install/code/commit_browser'>tracked using Git</a>.
</p>
<h3>Algorithms</h3>
<p>
There are several different algorithms that can be used to generate a digest from a directory
tree, so an implementation's identifier includes the algorithm.
This allows new algorithms to be added easily if weaknesses are discovered in older ones.
The currently supported algorithms are:
</p>
<dl>
<dt>sha1=XXX</dt><dd>This is supported by all versions of 0install. It
is less secure than the format used with the other algorithms.</dd>
<dt>sha1new=XXX</dt><dd>Supported from version 0.20. This is the same hash as <b>sha1</b> but with
the new manifest format.</dd>
<dt>sha256=XXX</dt><dd>Supported from version 0.20 and also requires the <b>hashlib</b> Python module
to be installed.</dd>
<dt>sha256new_XXX</dt><dd>Supported from version 1.10. This is the same as <b>sha256</b>, except that the final digest is <a href='http://en.wikipedia.org/wiki/Base32'>base32-encoded</a> (without padding) to make it a bit shorter, and the separator character is "_" rather than "=", as pathnames containing "=" cause problems for some programs.</dd>
</dl>
<p>
When checking a new tree (e.g., that has just been downloaded from the net and
unpacked), 0install generates a 'manifest' file. The manifest lists every
file, directory and symlink in the tree, and gives the digest of each file's
content. Here is a sample manifest file for a tree containing two files
(<b>README</b> and <b>src/main.c</b>) and using the <b>sha1</b> algorithm (you can use
<b>0store manifest</b> to generate this):
</p>
<pre>
F 0a4d55a8d778e5022fab701977c5d840bbc486d0 1132502750 11 README
D 1132502769 /src
F 83832457b29a423c8e6daf05c6dbcba17d0514dd 1132502769 17 main.c
</pre>
<p>
If you generate a manifest file for a directory tree and find that it is identical
to the manifest file you want, then you can feel confident that you have the tree you
want. This is convenient, because the manifest file is much smaller than the packaged tree.
</p>
<p>
After checking, the generated manifest file is stored in a file called
<b>.manifest</b> in the top-level of the tree.
</p>
<p>
To save even more space, we can simply compare the <i>digests</i>
(rather than the <i>contents</i>) of these manifest files. For example, the digest of the three-line
manifest file above is given by:
</p>
<pre>
$ <b>sha1sum .manifest</b>
b848561cd89be1b806ee00008a503c63eb4ad56e
</pre>
<p>
When 0install adds a new archive to the cache, the top-level
directory is renamed to this final digest (so the <b>main.c</b> above would be
stored as <b>.../sha1=b848561cd89be1b806ee00008a503c63eb4ad56e/src/main.c</b>,
for example).
</p>
<p>
Because only the digest of the manifest is needed, it is not strictly necessary
to store the <b>.manifest</b> file at all. However, if the tree is modified later
somehow it can show you exactly which files were changed (rather than just
letting you know that the tree has changed in some unknown way).
</p>
<h2>Manifest file format</h2>
<p>
This description of the manifest file is based on Joachim's 12 Oct 2005 post to
the zero-install-devel list:
</p>
<p>
The manifest file lists, line by line, all nodes in a directory
identified as "/", without "/" itself. All relevant numbers are coded
as decimal numbers without leading zeros (unless 0 is to be coded, which
is coded as "0"). Times are represented as the number of seconds since
the epoch. Nodes are of one of their possible types: 'D', 'F', 'X', and
'S'. Names must not contain newline characters (the tree will be rejected if they
do).
</p>
<p>
The file itself is encoded as UTF-8, with Unix line-endings ("\n") and no
<a href='http://en.wikipedia.org/wiki/Byte-order_mark'>BOM</a>. Note that
some operating systems treat filenames as sequences of bytes (rather than as
sequences of characters), and thus may be able to handle filenames which cannot
be represented as strings. A Zero Install implementation cannot contain such
filenames.
</p>
<h3>Directories</h3>
<p>
'D' nodes correspond to directories, and their line format is:
</p>
<pre>"D", space, [mtime, space,] full path name, newline</pre>
<p>
So, top level directories, for example, would have a "full path name"
that matches the regular expression <b>^/[^/\n]+$</b>.
</p>
<p>
The modification time is only included when using the original <b>sha1</b>
algorithm, as it was not found to be useful and caused problems with many
archives.
</p>
<h3>Files</h3>
<p>
'F' and 'X' nodes correspond to files and executable files, respectively, and
their line formats are:
</p>
<pre>"F", space, hash, space, mtime, space, size, space, file name, newline
"X", space, hash, space, mtime, space, size, space, file name, newline</pre>
<p>
As opposed to directories, no full path names are given. Hence, file
names match <b>^[^/\n]+$</b> . The hash is the hexadecimal representation of
the digest of the contents of the respective file, using the same digest algorithm as
the manifest file itself. Hexadecimal digits a through f are used (rather than A
through F).
</p>
<h3>Symlinks</h3>
<p>
'S' nodes correspond to symbolic links, and their line format is:
</p>
<pre>"S", space, hash, space, size, space, symlink name, newline</pre>
<p>
The symlink name is given analogously to file names in 'F' and 'X'
nodes. The size of a symlink is the number of bytes in its target
(name). The hash sum, similarly to that of files, is the digest of
the target (name) of the respective symlink.
</p>
<h3>Other files types</h3>
<p>
It is an error for a tree to contain other types of object (such as device files).
Such trees are rejected.
</p>
<h3>Ordering</h3>
<p>
These lines appear in the order of a depth-first search. Within a directory,
regular files and symlinks come first (ordered lexicographically by their name, i.e.,
they appear in the order that "LC_ALL=C sort" would produce), and then each subdirectory
(again sorted in the same way).
</p>
<p>
For the original <b>sha1</b> algorithm the sort order is slightly different:
sub-directories, regular files and symlinks are all sorted together rather than
the subdirectories always coming after other types.
</p>
<p>
Implementations have to abide by these rules to the letter because such
a file is to be generated automatically and this process absolutely must
generate the same file that the directory tree packager had computed.
</p>
</html>