<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!--
* CodeSnip File Format Documentation: Main Database Update Data Stream
*
* $Rev$
* $Date$
-->
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>
CodeSnip File Format Documentation - Main Database Update Data Stream
</title>
<link
rel="stylesheet"
type="text/css"
media="screen"
href="main.css"
/>
<style type="text/css">
div.flowchart {
width: 26em;
background-color: #eee;
text-align: center;
padding: 1em;
margin: 1em auto;
border: 1px silver solid;
}
div.flowchart .box {
width: 25em;
pading: 0.5em;
background-color: white;
border: 1px silver solid;
}
div.flowchart .label {
}
</style>
</head>
<body>
<div class="title">
<div>
DelphiDabbler CodeSnip
</div>
<div class="subtitle">
File Format Documentation
</div>
</div>
<h1>
Main Database Update Data Stream
</h1>
<p class="todo">
<strong>TODO:</strong>
Add list of links to sections of this document and to any other related
documents.
</p>
<h2>
Introduction
</h2>
<p>
The Database Update Data Stream is a stream of data received from the CodeSnip
Database Update web service that is used to update the local copy of the main
database.
</p>
<p>
The stream is plain text and consists of a concatenation of text files from
the online database along with some housekeeping information. The text files
are recreated in the main database directory.
</p>
<h2>
Encoding
</h2>
<p>
The data stream is received from the web server in a single syte or multi byte
ANSI encoding. The encoding must be such that characters from the ASCII
character set occupy one byte each. Therefore encodings that use two bytes for
such characters, such as UTF-16, cannot be used.
</p>
<p>
The actual encoding used is determined by the web server should be specified
in HTTP header. If the HTTP headers do not specify the encoding then
ISO-8859-1 is assumed.
</p>
<p>
The encoding used for the files recreated in the main database directory is
UTF-8 with byte order mark. <span class="todo">TODO: Reference main database
document here.</span>
</p>
<p>
Data is converted between several formats on its journey from the web server
to the final database file. See the <a href="#appendix">appendix</a> for
details.
</p>
<h2>
Stream Format
</h2>
<p>
The stream contains structured plain text comprising both numeric and string
information. Variable length strings are preceded by numeric values that
indicate the length of the following string. Numeric values are encoded as hex
characters. The format is as follows:
</p>
<dl>
<dt>
<code>FileCount</code>
</dt>
<dd>
Number of files encoded in the data stream. SmallInt encoded as four hex
digits. Maximum number of files is 32,767.
</dd>
</dl>
<p>
Followed by <code>FileCount</code> file records of:
</p>
<dl>
<dt>
<code>Name</code>
</dt>
<dd>
Name of file without path information. AnsiString preceded by its size in
bytes as a SmallInt encoded as four hex digits.
</dd>
<dt>
<code>UnixDate</code>
</dt>
<dd>
File's modification date (GMT) in Unix format. Int64 encoded as 16 hex
digits.
</dd>
<dt>
<code>Content</code>
</dt>
<dd>
File contents. AnsiString preceded by its size in bytes as a SmallInt
encoded as four hex digits. File size is limited to 32kB.
</dd>
</dl>
<h2 id="appendix">
Appendix: Description of Data Encoding Conversions
</h2>
<p>
The following flowchart show the various encodings used for downloaded data on
its journey from web server to main database file.
</p>
<div class="flowchart">
<div class="box">
Text sent from web server using a single or multi-byte ANSI encoding.<br />
Encoding used sent in HTTP header.
</div>
<div class="label">
↓
</div>
<div class="label">
ANSI text stream
</div>
<div class="label">
↓
</div>
<div class="box">
CodeSnip's HTTP handling code automatically converts ANSI text stream into
Unicode string using encoding specified in HTTP header.
</div>
<div class="label">
↓
</div>
<div class="label">
Unicode string
</div>
<div class="label">
↓
</div>
<div class="box">
Database download manager code converts Unicode string back into ANSI text
stream with same encoding in which it was sent from web server.
</div>
<div class="label">
↓
</div>
<div class="label">
ANSI text stream
</div>
<div class="label">
↓
</div>
<div class="box">
File updater interprets information stored in formatted ANSI text stream and
get contents of each file, converting them to Unicode.
</div>
<div class="label">
↓
</div>
<div class="label">
Unicode string
</div>
<div class="label">
↓
</div>
<div class="box">
File writer finally writes each file as UTF-8 with a BOM.
</div>
<div class="label">
↓
</div>
<div class="label">
UTF-8 stream
</div>
<div class="label">
↓
</div>
<div class="box">
UTF-8 text files.
</div>
</div>
</body>
</html>