The ADABAS Buffer Pool Manager: Address Converter
The ADABAS Buffer Pool Manager: Address Converter
Harald Schijning
Software AG, Uhlandstr. 12,64297 Darmstadt
hsgasoftware-ag.de
675
index, and for the address converter, can be chosen and the page must not be removed from the buffer pool
independently. This flexibility in page size can also be while a command is still working on it. To guarantee this,
used for an adaptation of the database to changing storage database management systems usually FIX and UNFIX
medium characteristics, as pointed out in [GG97]. the pages in a buffer pool explicitly [EH84]. In the
As a consequence, the ADABAS buffer pool manager ADABAS buffer pool manager, this functionality is
must cope with pages of various sizes from all three combined with the synchronization of page accesses
container types. In particular, the varying page sizes affect by the ADABAS commands. For this purpose, each
the buffer replacement algorithms. header contains a readers/writer lock.
Other commercial database systems do not have this The headers are linked in physical sequence (so-called
freedom in configuration, and, as a consequence, do not physical chain) and in LRU sequence (LRU chain).
need to handle such complex replacement problems. The Furthermore, to enable an efficient search for a specific
research database system PRIMA [HMMS87] developed database page, a hash structure is allocated. Each hash
at the University of Kaiserslautern, also supports multiple bucket contains the pointers to the corresponding headers
page sizes. The replacement algorithm implemented there and is protected by its own latch. Hence, lock conflicts on
[Si88], however, differs from the one used in ADABAS. the hash structure are rare. The overflow of a hash bucket
There, a free list is kept separate from the LRU chain. is organized as AVL tree. Thus, even in the case where a
This list is then copied and a the buffers contained in the bucket has a large amount of overflow information the
LRU chain are consecutively marked as replaceable. access remains fast - another pre-requisite for low lock
When an area has been found which is large enough for contention on the hash structure. The buffer pool
the new buffer, the process stops. This algorithm, architecture is depicted in Figure 1 (LRU chain not
however, lacks the flexibility of the ADABAS algorithm shown).
described below.
3 Page Access Synchronization
1.3 Parallel access
The access to a database page works as follows: The page
The buffer pool manager has to care for a proper is searched in the hash structure. The hash bucket is
synchronization of the accesses to the pages in the buffer protected by a latch (a short time mutual exclusion lock).
pool. Several threads which execute in parallel on If it is not found, a header for the database page is
multiple CPUs may want to access the same database allocated, exclusively locked, and entered into the hash
page, and hence the same buffer. Of course, the structure such that other tasks have a reference to it. Then
synchronization has to be very efficient, not only avoiding the physical I/O is started. When it is finished, the
deadlocks, but also keeping waiting times as short as exclusive lock can be downgraded to a shared lock if the
possible. database pages was needed for reading only.
The following sections discuss the architecture and the If the page had been found in the hash structure, the
algorithms chosen for the new ADABAS buffer pool buffer pool tries to acquire a lock of the requested quality
manager. (shared or exclusive) and, if successful, returns a pointer
to the corresponding location in the buffer pool.
2 Architecture of the buffer pool manager Locks on database pages in the database are held until
The buffer pool is allocated as a contiguous piece of the command has performed its changes to the page, i.e.
memory. In order to avoid double page faults [EH84], i.e. for a very short time only.
page faults in the operating system’s virtual memory, the When database pages are logically linked (e.g. nodes in
whole buffer pool can be pinned in the physical memory. an index tree), and updates affecting this link have to be
When a block from disk is read into the buffer pool, a performed, more than one page has to be exclusively
header structure is assigned to it, which stores all locked at a time. While deadlocks in such situations often
information needed for the management of the block, can be prevented by enforcing a certain sequence of
including, of course, the identification of the database locking, this is not possible in all cases. For example,
pages this block corresponds to. These headers positioning in an index is done from root to leave, while
themselves are allocated in the buffer pool in contiguous index updates occur from the leave to the root. Hence,
areas. Note that the variable page size in ADABAS makes positioning and updating simultaneously could lead to a
it impossible to predict the number of headers needed (of deadlock. Locking the whole index would lead to
course, the least possible page size determines an upper unacceptable waiting situations. To cope with this
limit to the number of headers, but allocating that much situation, the ADABAS buffer pool manager uses a multi-
headers in advance could waste a lot of space). version locking scheme: when an exclusive lock is not
ADABAS directly references the pages in the buffer granted, the buffer pool tries to acquire a shared lock. If
pool. Therefore, the address of a page must not change this is granted, a copy of the database page is generated in
676
the buffer pool, and the pointer in the hash structure is set l A page which has been chosen for replacement need
to this copy. Hence, all threads that subsequently search not be written to disk before it is replaced, allowing a
for this page will find the copy. The block containing the fast replacement.
original page remains in the buffer pool, but is placed at l Pages cannot be replaced if they had been changed
the end of the LRU chain, thus being prime candidate for after the last buffer flush. Therefore, a buffer flush
replacement once all (shared) locks on it are released. Its must occur before too many pages are “dirty”. On the
other hand, flushing too early destroys the caching
Hash structure effect for updates because pages are written to disk
after fewer updates per page. Obviously, finding a
reasonable percentage of dirty pages for the start of a
buffer flush is not trivial. In order to relieve the DBA
Buffer headers from this task, ADABAS can choose a useful
percentage and internally adapt it to the current
situation.
l The crash recovery algorithms are tightly coupled to
the asynchronous writing. The start and the end of the
buffer flush are logged. From this logging information
Contiguous buffer area containing database pages the crash recovery algorithm can infer which database
of varying sizes changes are already reflected on disk and which are
(possibly) not. Therefore it is essential that all changed
Figure 1: The architecture of the ADABAS buffer pool pages are covered by the buffer flush. On the other
hand, pages which are very frequently updated could
access time stamp is set to zero.
defer the whole buffer flush considerably. To cope
A consequence of this locking scheme is that shared
with such situations, the buffer flush can be split into a
locks do not protect a logical link between pages against
first part which contains all pages which could be
changes. Therefore, pages found by following a link
locked without blocking on the lock, and a second part
always have to be re-evaluated before they can be used.
which flushed all the other pages (and usually handles
Example 1 illustrates this effect.
very few pages).
Note that in high-load situations, more than one thread
could copy the same database page. Only one of these two
5 Buffer Replacement Handling
copies must survive. For this purpose, the exchange in the
hash structure for search must be atomic. It fails if the As pointed out earlier, buffer replacement in ADABAS is
address to be replaced is not the expected one. quite sophisticated. If a page of a certain page size is to be
read into the buffer pool, and the buffer pool is filled up
4 Saving Changes to Disk (which is the normal case after an initial filling phase), the
necessary space must be provided by selecting another
The ADABAS buffer pool managers saves changed pages
buffer which can be overwritten. However, caused by the
to disk in an asynchronous manner. When a certain
varying page sizes in an ADABAS database, it might be
(configurable) percentage of all pages has been modified,
necessary to overwrite several other pages of smaller page
an asynchronous thread (called the buffer flush thread)
size. In databases with one fixed page size, the first
starts to write all changed pages to the disk. Of course, the
available page when searching from the end of the LRU
pages must not be modified while they are written to disk.
chain can be chosen for replacement. This is not true for
On the other hand, it is not acceptable to defer changes to
ADABAS.
those pages until the writing has been done. If a page
Consider the following case: A page with size 4 KB
which is locked by the buffer flush thread is to be changed
has to be read into the buffer pool. At the end of the LRU
by another thread, the same multi-version locking as
chain, only 2 KB pages can be found. The next 4 KB page
described above is applied. The pages involved in the
is quite at the begin of the LRU chain, i.e., it is a quite
buffer flush are locked by the buffer flush thread using a new page. In this case, one of the 2KB pages and its
privileged read lock. If a page is currently write locked, it
physical neighbor should be replaced. However, the
is entered into a refused-lock list and skipped. After all physical neighbor might also be a very new page. To find
other pages are locked, the buffer flush thread blocks on
good replacement candidates, the following handling is
the pages in the refused-lock list if necessary. Typically, applied. The LRU chain is searched from its end. When a
the updating command that had held a lock on those pages page is found which is available for replacement (i.e.
has meanwhile released the lock contains no unwritten changes and is not locked),
The asynchronous writing of changed pages has some ADABAS searches for the necessary space starting from
consequences: this page. The left neighbors are considered, as long as
677
they can be replaced and the necessary space is not yet anyway. Note, that in contrast to the procedure used in
gathered. Then, the right neighbors are checked. The ORACLE [Br97], the ADABAS algorithms reflects the
space between the left-most neighbor found and the right- correct sequence of accessesin the LRU chain.
most one usually leaves several choices for so-called
overlay sets, i.e. sets of buffers which could be replaced 6 Prefetching
to gain the needed space (cf. Figure 2). The overlay set
Sequential operations which scan the data of multiple
with the lowest costs is stored.
adjacent database pages are quite common. In ADABAS,
I--+ LRU chain the DBA can optimize for such operations, e.g. by
choosing large page sizes. However, such optimizations
Buffer headers are complex and might decrease the performance of
commands with different access patterns. Therefore,
ADABAS recognizes sequential access, and can read
several pages in one IO into a contiguous buffer pool area.
The replacement algorithm described above obviously
covers this case without adaptation. Although read with
one I/O, all pages have their own header and are managed
by the buffer pool manager as if they has been read
L-Y-
separately. The number of pages to be read in one IO is
dynamically determined according to the following
Figure 2: Two overlay sets based on page 1567 criteria:
l Maximum number of pages needed by the current
Then the LRU search is continued, until an upper limit
command
of found pages is reached, or until a single page is found
which is large to render the necessary space. This page is l Number of pages which fit into a single physical IO.
a singleton overlay set. The cheapest overlay set is chosen This is platform dependent, but it also depends on the
for replacement. distribution of the container files over disks.
The cost of an overlay set is determined by applying a l Next page which is already in the buffer pool. In order
function to the access time stamps of the pages in the set. to avoid inconsistencies with updates on this page, the
The choice of this function heavily influences the page must not be re-read into the buffer pool.
replacement algorithm. If the function is MIN, for l Available space in the buffer pool
example, the first overlay set found would be selected. If
the function is SUM, the likelihood of a multi-buffer 7 Summary
replacement decreases rapidly with the number of pages. The ADABAS buffer pool manager has completely been
In the case of MAX, an overlay set is chosen only if all its redesigned for the latest parallel version of ADABAS. In
pages are older than the oldest replaceable single page of particular, the locking of the LRU chain had been a
sufficient size. bottleneck, in particular because the replacement
Obviously, search for replacement candidates is an algorithm is very complex due to the different page sizes
operation which takes quite long due to the need to cope used in a database. To prevent lock contention on the
with different page sizes. Unfortunately, the LRU chain LRU chain , the chain has been split into several areas.
cannot be changed while such a search is performed. Furthermore, the update of the LRU chain is done lazily.
Since every access to a database page should update the In order to increase parallelism and avoid deadlocks in
LRU chain (placing the accessed page in front of the LRU particular in index operations, dedicated multi-version
chain), there is a considerable bottle neck. To avoid lock locking protocols have been introduced. Various dynamic
contention on the LRU chain, ADABAS splits the buffer optimizations relieve the DBA from too sophisticated
pool into several physical regions, where each region has tuning. The use of further self-tuning algorithms such as
its own LRU chain. The number of regions depends on the LRU-2 [OOW93] is constantly investigated.
size of the buffer pool, the maximum parallelism allowed
by the DBA, and other criteria.
These physical regions are chosen for replacement in a
round-robin manner. Only the affected LRU chain is
locked. Furthermore, the updates to the LRU chain are
deferred. Obviously, the updates need not be done before
a replacement search is performed. Hence, the access to
pages is memorized, but it is reflected in the LRU chain
only when the lock for replacement search is required
678
a) Thread 1 wants to insert value
20 into the index. It locks leaf 3
exclusively.
X(thr. 1) /\
679