Inside MaxDB
Inside MaxDB
Contents:
◼ Deeper insight into the technical background of MaxDB
© SAP 2008
© SAP 2008
© SAP 2008
© SAP 2008
Inside MaxDB
Topic 1: Processes
Topic 2: Locks
Topic 3: Memory Areas
Topic 4: Disk Areas
Topic 5: Logging
© SAP 2008
Trace Writer
Log Writer
Requester
Garbage
Server
Pager
User
Server
Utility
Pager
Timer
User
IO Threads
Catalog Cache
SharedSQL
/sapdb -
DATA LOG programs
LOG
etc.
© SAP 2008
Coordinator
User Kernel Thread (UKT)
Requester Tasks
Console User
User Timer Trace Writer
(Clock)
Server
Server Garbage Log Writer
(IO Threads<i>)
Pager
(IO Worker<i>) Pager Utility
© SAP 2008
◼ The MaxDB database kernel runs as a process that is divided into threads. Threads can be active in
parallel on multiple processors within the operating system.
◼ A distinction is made between user kernel threads (UKTs) and special threads.
◼ User kernel threads (UKTs) each consist of multiple tasks (internal tasking) that fulfill different
requirements. Tasks can be coordinated more effectively than in operating system tasking since
individual threads are used.
◼ The runtime environment (RTE) defines the structure of the process and user kernel threads.
Coordinator Studio
Client Restart
Requester
Connect
Console knlMSg
© SAP 2008
◼ When the runtime environment is started (that is, when the database instance is started in ADMIN
state), the coordinator thread is generated first. This is especially important.
• When it starts, the coordinator thread uses database parameters to determine the memory and
process configuration of the database instance.
• The coordinator thread also coordinates the start processes of the other threads and monitors
these while the database instance is running.
• If operating errors occur, the coordinator thread can stop other database threads.
◼ The requester thread receives user process logons to the database and assigns them to a task within a
user kernel thread.
◼ The console thread collects important messages from the other threads for the administrator and
places them in the knlMsg file, which is stored in the run directory of the database instance.
◼ The clock or timer thread is used to calculate times internally, for example, to determine how long it
took to execute an SQL statement.
Clock / Timer
SELECT * FROM tab
WHERE col1 = 5
IO Threads<i>
Parameter: MaxUserTasks
(in this example =6)
Parameter: MaxUserTaskStackSize
(usually =1024)
© SAP 2008
◼ Exactly one user task is assigned permanently to every user or application process at logon.
The maximum number of available user tasks depends on the MaxUserTasks database parameter.
This parameter therefore also limits the number of user sessions that can be active simultaneously in
the database system.
◼ The MaxCPUs database parameter specifies the number of user kernel threads across which the user
tasks are distributed. The user tasks generate the main workload. Other tasks and the global threads
use very little CPU time. You can therefore limit the number of processors used in parallel by the
database instance using the MaxCPUs parameter.
◼ However, the CPU load is usually about 10-30% of the overall load.
◼ With the MaxTaskStackSize database parameter (default: 1024 KB), you can estimate the memory
consumption based on the number of user tasks. If the address space or available memory is
restricted, the number of user tasks should therefore not exceed the recommendations of double the
number of work processes in the connected application instances.
IO Thread
I/O Query 2 2
UKT 5
IO Thread
IO Threads<i> I/O Answer 1 3
Floating
FloatingService
(IO Worker<i>) FloatingService
Service I/O Query 3 4
I/O Answer 2 5
IO Thread
I/O Query 1 6
IO Thread
Task running
Task runable
Task waiting t
© SAP 2008
◼ Cooperative multitasking is carried out within the user kernel thread, while the operating system
executes preemptive multitasking between the threads.
◼ This means that only one process can ever be active in the threads and so one or more user tasks
within the same UKT have to wait until a process has ended before the task can take control.
◼ With MaxDB 7.5.0, a new parameter has been introduced that can change this behavior and also
switch to preemptive multitasking within the UKTs. However, this parameter
(UseCoroutineForUserTask) should only be set to NO on liveCache instances or databases that also
perform liveCache functions (see also SAP Note 1038709 - FAQ: SAP MaxDB/liveCache OneDB).
• Advantage: Allows forced interruption of long-running user tasks with liveCache application
functions (LCApps).
• Disadvantage: Stripes (regions) in the database can experience higher collision rates when user
tasks are accessed (other databases also refer to these regions as latches).
© SAP 2008
◼ In the liveCache environment, this function is often used to move user tasks between UKTs. The
liveCache application routines (LCApps) can run for hours and process a lot of data from the cache.
It is worth thinking here about moving to UKTs with no load and thus CPUs.
◼ After the switch process, the entire context (memory and so on) of the user task belongs to the
environment of the new user kernel thread.
◼ Database parameters that are used here:
• LoadBalancingCheckLevel – Load balancing is activated with values from 4 to 600 seconds. The
value is the wait time between measurement points.
• LoadBalancingWorkloadDeviation – Describes the required difference (percentage) in the
internal runtime of UKTs, below which the UKTs are considered equal (default 5%).
• LoadBalancingWorkloadThreshold – Load difference (percentage) between UKTs before load
balancing is even considered (default 10%).
◼ More information about this topic is covered in TEWA60 (liveCache Monitoring and Performance
Optimization).
Console
Clock / Timer
Drop
Table/Index
IO Threads<i>
© SAP 2008
◼ Server tasks are used to carry out administrative tasks in the MaxDB instance. Some server tasks
control how data is read from the data volumes, while others control how data is written to the
backup medium.
◼ When the CREATE INDEX statement and asynchronous DROP INDEX or DROP TABLE
statements are executed, the server tasks receive a request to read the table data in parallel from the
data volumes or to delete it asynchronously The server task also handles the statements executed to
check data (Verify).
◼ As of MaxDB Version 7.6.05, the server tasks were also responsible for PreFetching. This function
will also soon be made available in MaxDB 7.7 (MaxDB 7.7.06).
◼ The system automatically calculates the number of server tasks when the database instance is
configured based on the number of data volumes and anticipated number of backup devices.
You can set a higher number using the MaxServerTasks parameter.
◼ As usual, the read and write actions are not completed by the server task itself. Instead, they are
transferred to the IO threads.
◼ With MaxDB 7.5, the server tasks can also be distributed to more UKTs (that is, more CPUs) in
special situations where the server tasks only block each other. The MaxDB parameter used here is
EnableMultipleServerTaskUKT. This option is mainly used for migrations.
"AutoLgW" "Autosave Log Writer" Writes Log pages to the backup medium
"AutoLgW" "Autosave Log Reader" Reads Log pages from the log volumes
"BUPmed" "Backup / Restore Medium Task" Reads/Writes from/to a data backup medium
"BUPvol" "Backup / Restore Volume Task" Reads/Writes from/to data volumes for backup
© SAP 2008
◼ List of possible server task statuses in the task list of MaxDB task manager.
◼ You can see the numerous prefetch task types for preparing diverse processes within the database.
"RedoRea" "RedoLog Reader" Reads from log volume or log file during restore
"StdbySy" "Standby Synchronize Server" Waits for sync calls from a Hot Standby master
© SAP 2008
◼ Continuation of list.
Console
UKT 4
Session
Clock / Timer Timer
Timeout
Pager
Pager
Pager
IO Threads<i>
(IO Worker<i>)
DATA
© SAP 2008
◼ Pager tasks are responsible for writing data from the IO buffer cache to the data volumes. They
become active when a savepoint is declared.
The number of pagers is calculated by the system. The number depends primarily on the size of the
IO buffer cache and the number of data volumes.
◼ The timer task is used to manage all kinds of timeout situations (such as a session timeout for
connections, request timeout for locks, and so on).
Console
UKT 4 UKT 5 UKT 6
Clock / Timer Timer Floating Service
Floating
Floating Service
Service Log Writer
Pager
Pager
Pager
IO Threads<i>
(IO Worker<i>)
LOG‘
© SAP 2008
◼ Floating services can assume many different functions in the database. The two most important
functions are:
• DBAnalyzer – This enables the database to be monitored extensively and the results to be logged.
• Event management – Enables management functions for operating the database to be triggered
automatically and autonomously, for example, database extension functions and update table
statistics can be executed automatically. However, these functions have not been used to date in
the SAP environment.
◼ The log writer is responsible for writing the after images to the log volumes. It also uses the IO
threads to execute the actual write operations.
Console
UKT 4 UKT 5 UKT 6
Clock / Timer Timer Floating
FloatingService Log Writer
FloatingService
Service
Pager
Pager
Pager
UKT 7
IO Threads<i>
Garbage Collector
(IO Worker<i>)
DATA
© SAP 2008
◼ The familiar garbage collector from liveCache has now been introduced as of MaxDB 7.6 and is
responsible for cleanup actions within the database. It removes superfluous data (history, that is,
before images of transaction logic) from the database. In earlier versions, the user task was
responsible for this, but it can now be executed asynchronously.
◼ See also SAP Note 1247666 – (FAQ: MaxDB/liveCache history pages)
Console
UKT 4 UKT 5 UKT 6
Clock / Timer Timer Floating
FloatingService Log Writer
FloatingService
Service
Pager
Pager
Pager
© SAP 2008
◼ MaxDB provides the option of writing a special log (called the kernel trace) for diagnosis purposes.
If you activate this function, the trace writer task becomes active.
The log data is written to a buffer by the active tasks. The trace writer task writes the data from this
buffer to the knltrace file.
◼ The utility task is reserved exclusively for managing the database instance.
Since only one utility task exists for each database instance, administrative tasks cannot be carried
out in parallel. This prevents conflicts from occurring.
This does not include automatic log backups, which can be performed in parallel to other
administrative tasks. The utility task is used only to activate and deactivate the function for backing
up logs automatically.
◼ In MaxDB 7.5.0 and later, the utility task becomes less and less important and is provided only for
backwards compatibility purposes. The actions previously performed with the utility task can also
be coordinated directly via the DBMServer process by means of db_execute commands. This
process has an internal table that defines which action can be run in parallel with which other
activity.
Here is an example of
how tasks are arranged for
a small database instance
in Linux:
© SAP 2008
◼ Here, you see a minimum configuration on Linux, as created without any intervention from the
administrator.
◼ The output can also be determined with the following console command:
x_cons <SID> show rte
or dbmcli –d <SID> -u <controluser, password> show rte
© SAP 2008
UNIX
◼ vserver (watchdog)
xserver
xserver ◼ vserver
X Server
◼ vserver
IO Threads<i>
◼ …
◼ vserver (n+1)
(IO Worker<i>)
x_show
◼ Shows the status of X
Server at the end
© SAP 2008
◼ Access to the database via the network is handled by the vserver (UNIX) or serv.exe (Microsoft
Windows) processes. You can start these server processes manually using the x_server command.
However, they are normally started automatically when the system is started. X server runs as a
service on Windows.
◼ A new X server process is created for every user process that logs on to the database remotely. The
generating process serves the user; the new process waits for the next user logon. These processes
are identified based on the implementation (UNIX), or threading is used and the processes are
implemented as a thread within the process (Linux, Microsoft Windows).
◼ By default, remote accesses are allowed as soon as X server is started. With the xtcpupd program,
however, you can activate or deactivate "remote SQL" access on Windows, that is, allow or deny
access to the database instances via a network.
◼ You can determine the status of X server using the x_show tool. The status information is displayed
at the end of the output.
◼ With the –r and –v options of x_show, you can obtain more information about X server on
Windows.
◼ For analysis purposes, you can also activate a trace level for X server while it is running: x_server –
N <Tracelevel 0…9>
In this way, for example, you can check which accesses are made to the database instance. This
information is logged in the file /sapdb/data/wrk/xserver_<hostname>.prt.
© SAP 2008
◼ To use the SSL tunnel, additional encryption libraries from SAP are required. These are delivered
on special media. The usual export restrictions with regard to “strong encryption” apply. The
libraries can be found on SAP Service Marketplace at http://service.sap.com/swdc → Downloads →
SAP Cryptographic Software (SAP Note 455033).
◼ If the libraries are not found, the X server still works but a warning message is displayed (WNG
12453 NISSLSRV NISSL Init: SSL: Could not locate licence file). For more information, see SAP
Note 1032643.
◼ Activating encryption affects performance (about 20% more workload).
Inside MaxDB
Topic 1: Processes
Topic 2: Locks
Topic 3: Memory Areas
Topic 4: Disk Areas
Topic 5: Logging
© SAP 2008
Lock list
◼ Row locks (row_exclusive)
◼ Table locks (tab_exclusive) Processes
◼ Catalog locks (sys_exclusive)
© SAP 2008
◼ With the SQL command select * from lockliststatistics, you can determine more information from lock
management:
DESCRIPTION VALUE
MAX LOCKS 300000
TRANS LIST REGIONS 8
TABLE LIST REGIONS 8
ROW LIST REGIONS 8
ENTRIES 902400
USED ENTRIES 0
USED ENTRIES (%) 0
AVG USED ENTRIES 15
AVG USED ENTRIES (%) 0
MAX USED ENTRIES 25500
MAX USED ENTRIES (%) 3
OMS SHARE LOCK CONTROL ENTRIES 0
OMS SHARE LOCK CONTROL ENTRIES USED 0
OMS SHARE LOCK ENTRIES 0
OMS SHARE LOCK ENTRIES USED 0
OMS SHARE LOCK COLLISION ENTRIES USED 0
CONSIST VIEW ENTRIES 0
OPEN TRANS ENTRIES 0
LOCK ESCALATION VALUE 60000
LOCK ESCALATIONS 0
SQL LOCK COLLISIONS 63
OMS LOCK COLLISIONS 0
DEADLOCKS 0
SQL REQUEST TIMEOUTS 0
OMS REQUEST TIMEOUTS 0
TRANSACTIONS HOLDING LOCKS 0
TRANSACTIONS REQUESTING LOCKS 0
TRANSACTIONS HOLDING OMS LOCKS 0
CHECKPOINT WANTED FALSE
SHUTDOWN WANTED FALSE
SHARED lock
◼ Other transactions can read objects with a SHARED lock but cannot change them.
EXCLUSIVE lock
◼ Other transactions can read the data records but there is a risk that the data records could be changed.
… set a SHARED lock for any row in this table? Yes Yes Yes Yes
… read the table definition from the catalog? Yes Yes Yes Yes Yes Yes
© SAP 2008
Lock list
◼ The table is then locked completely and
exclusively for the process. Processes
© SAP 2008
◼ The LOCK ESCALATION VALUE in the LOCKLISTESCALATION system table defines the
point at which a lock is escalated. Here, this value corresponds to 60,000 (20% of the MAX
LOCKS, which is 300,000).
Lock list
◼ Increase the size of the lock list; up to 5
million entries are common. Processes
◼ Integrate COMMITs into the batch input or
report
20%
Table
10%
0
Process
© SAP 2008
◼ Very long lock lists slow down processes involving data records since these lists have to be checked
every time a data record is accessed to determine whether a record is locked. However, if the lock
list does not contain any locks, it is not a problem.
© SAP 2008
◼ As of MaxDB 7.7.04, table indexes can be created without the previous lock situations on the base
table.
◼ This procedure applies to larger tables only. In the case of smaller base tables, only a short time is
usually needed to create the indexes and so the temporary catalog lock does not affect them.
◼ In the case of larger tables, the change requests for the base tables are written to temporary
structures while the indexes are created and then finally processed when the indexes have been
created.
◼ In the knlMsg diagnosis file, this procedure can be followed as shown above.
Inside MaxDB
Topic 1: Processes
Topic 2: Locks
Topic 3: Memory Areas
Topic 4: Disk Areas
Topic 5: Logging
© SAP 2008
Trace Writer
Log Writer
Requester
Garbage
Server
Pager
User
Server
Utility
Pager
Timer
User
IO Threads
Catalog Cache
SharedSQL
/sapdb -
DATA LOG programs
LOG
etc.
© SAP 2008
◼ The database has buffers to keep the number of read and write operations on the disks as low as
possible.
• The IO buffer cache contains the data most recently requested from the data volumes (LRU
mechanism). This data contains:
- Data pages from the tables
- Converter information with logical positions of data pages in the data volumes
- Before images (history data) for the transactions that are currently running
• Log entries are written to the log volumes via the log queue.
• The catalog cache contains information about the structure of tables. The catalog cache is
assigned dynamically to every session by UKTs and is a process-local memory.
◼ Cache sizes can set according to user requirements.
◼ The memory requirement of a MaxDB instance can be determined from the
ALLOCATORSTATISTIC table. This table stores information about different memory structures of
MaxDB. The overall size including the IO buffer cache, other caches, task stacks, and so on (but not
executables) can be found in the SystemHeap row and the USED_BYTES column. The
corresponding SQL statement is (ensure that 'SystemHeap' is written correctly):
select USED_BYTES from ALLOCATORSTATISTIC where ALLOCATOR = 'SystemHeap'
Catalog SharedSQL
IO Buffer Cache Cache Cache Log Queue
Converter pages
Data pages
Undo pages
Free pages
LOG
Undo Undo Undo
The IO buffer cache is sychronized with the data volumes by means of SAVEPOINTs.
In contrast, the log is written continuously and even synchronously in the case of a COMMIT.
The catalog cache and shared SQL cache are process buffers and so do not have any
corresponding disk areas.
© SAP 2008
© SAP 2008
◼ Experience has shown that a database can function well with these memory provisions for MaxDB.
This does not apply to applications with functions that read a lot of data since the data has to be
evaluated differently in this case.
Pages in the
I/O buffer Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7
cache
© SAP 2008
◼ In the data volume (that is, in the converter), logical page numbers are mapped to physical page
addresses. The I/O buffer cache contains the last read or write-accessed pages of the converter. It
is used by all concurrently active users.
◼ The size of the converter is 1/1861 of the database size, since one converter page can contain the
administrative data (references) for 1861 data pages. A database with 500 GB of data requires a
converter of roughly 278 MB.
Restart
© SAP 2008
◼ As of Version 7.4, the converter can grow and shrink dynamically. The converter pages are
distributed across all data volumes. Upon restart, the converter pages are read via a tree structure.
◼ The tree has three levels: a root level, an index level, and a leaf level. When restarted, the database
finds the root page of the converter via the restart page at the beginning of the first data volume. It
contains the positions of the index pages. In turn, the index pages contain positions of the leaf pages.
The leaf pages are not necessarily sorted. When started, the database reads the converter in parallel.
Page addressing:
◼ MaxDB numbers the data pages and assigns a physical page address to them in the data volume.
This assignment is managed in the converter. The converter is part of the I/O buffer cache.
◼ Through the use of shadow memory technology, a current version of each data page can exist as
well as an old version that is required for a possible restart. This means that two converter entries
can also exist for each data page: the address of the old and new version. There is a 35-bit converter
entry with the following information for each of the versions of a logical page:
• Device number (8 bit): Data volume number
• Device offset (24 bit):Position in the relevant data volume
• Page type (1 bit): Permanently (p) or temporarily (t) required
• Save pages (1 bit): This indicator is set if the data page is to be backed up during an
incremental data backup (SAVEPAGES).
• Saved (1 bit): This indicator is set during the backup.
◼ At a savepoint, the converter "notes" the address of the current page version; the address of the old
version is released again for overwriting after the savepoint.
◼ 1861 pages of the data volume can be managed with a converter page of 8 KB. The size of the
converter is therefore approximately 1/1861 of the database size.
Memory
◼ Shifting of synchronous operations to Change to page 4711
asynchronous execution; user tasks do not
have to wait for the end of I/O operations
◼ Increased use of cache technologies
(Converter) No change to page 4711
◼ Reduction of write load; changed data Two versions
of one page
pages are written from the cache to the
Disk
hard disks only at savepoints Instead, one new page 4711’
◼ The database status is always structurally
consistent (restartable) on the hard disks; if Page DATA‘
errors occur, you can return to this status 4711 DATA Page
4711’
Write load
Savepoint
© SAP 2008
◼ I/O database concept functions according to the principle of shadow memory management. The key
points are: optimized support of symmetrical multiprocessor systems, shifting as much I/O to
asynchronous execution, and optimizing data backup performance to a level that can handle today's
database sizes.
◼ A user task should not have to wait until I/O operations have ended. All change operations are
carried out in the main memory. The I/O subsystem must ensure that a full restart is possible at any
time.
◼ Shadow memory management distinguishes between original and copied data pages. When the
system is restarted, the respective statuses of the data pages are identified. The concept is based on
savepoint cycles that are completed by means of savepoints. A completed cycle is specified by the
version number of its savepoint. This number is referred to as the "savepoint version" or "converter
version".
◼ The different versions of the data pages created by these savepoint cycles are managed in the
"converter". The converter assigns physical blocks to the original and copied logical data pages. The
location in which a logical data page is saved can therefore change from one savepoint cycle to the
next.
◼ The data cache has been optimized to support symmetrical multiprocessor architectures (SMP)
through the use of pager tasks and server tasks working in parallel.
Transaction 1
Savepoint
Transaction 2
Time-controlled
Transaction 3
Startup / Shutdown
Backups 0 Time
CREATE INDEX
IO Buffer Cache
LOG FULL
DB FULL
Restart
DATA
Undo
© SAP 2008
X X X X X
25 25 25 26 26
X
I/O
I/O Buffer
Buffer
X X
Cache
Cache
X X
I/O
I/O
X
Buffer
Buffer
X
Cache
Cache
X X
25 25 25 25 25 26 26 26 26
X X X
25 25 25
X
I/O Buffer
X X
Cache
X X
25 25 25 25 25
Restart Restart
Undo Undo
Savepoint 25 Savepoint 26
10 minutes or longer t
© SAP 2008
Converter Cache
© SAP 2008
◼ The savepoint can be considered a core function of the I/O concept. The illustration shows what
happens during a savepoint.
◼ The savepoint has to flush the data cache and converter cache to the corresponding data volumes.
Due to the size of the two caches, this cannot be carried out as a synchronous action since the
system would be blocked for too long. However, there must be a phase (as short as possible) during
which the caches can be securely synchronized.
◼ Savepoints are time based and occur every 10 minutes by default. To minimize the amount of data
to be flushed in the protected section (phase 2), the savepoint begins by flushing the data cache in
parallel to ongoing operations. In this process, the data cache is processed by multiple pager tasks
simultaneously. Most of the pages are flushed in this phase.
◼ In the second phase, an indicator is set that prevents clearing operations on B* trees. It also prevents
new transactions from being opened during this phase. All pages that were changed during the first
phase are marked as savepoint relevant. An open trans file is created for the open transactions.
◼ In the last phase, all pages that were marked during the second phase are flushed. The markings are
reset. Finally, all changed converter pages are written in parallel to the data volumes. The savepoint
itself is complete when the restart page is written. Afterwards, the savepoint version (number) is
updated.
◼ The protected phase of the savepoint is generally quite short and goes unnoticed by the end user.
For performance reasons, it is important to have a large ratio of written pages to the
number of IOs executed for these pages in the first phase of the savepoint.
© SAP 2008
© SAP 2008
Reserved
© SAP 2008
◼ Stripes (regions) can be found in all areas where resources are used in parallel in the database.
◼ For example, the data cache is split into multiple segments in the main memory.
◼ Definable allocation of stripes in the data cache (8 – 1024, default setting depends on the size of the
data cache)
◼ MaxDB parameters: DataCacheStripes, ConverterStripes, and so on.
◼ Multiplying the stripes results in a higher parallelism (MaxDB-internal synchronization
mechanism). However, this higher number of stripes also means more administration. Both aspects
must be weighed up against each other.
◼ In a multiprocessor system (MaxCPUs parameter > 1), collisions between processes of different
UKTs are unavoidable. The deciding factor is how fast the switch takes place on the stripes, that is,
how long the colliding task has to wait to reach the stripe (region) and start working. This means
that the Waits percentage is very important since it indicates how often a task had to wait for a stripe
on average.
◼ The DBA Cockpit and DBAnalyzer offer a lot of information here in various forms (split according
to time or summarized).
◼ There are many different types of stripes that are related to the various divided functions within the
database. The UMEW60 training course (MaxDB Performance Optimization) lists these according
to version.
You can initially activate this function by converting the table with
◼ ALTER TABLE <table name> CLUSTER
© SAP 2008
© SAP 2008
Activation
◼ ALTER TABLE <table name> CACHE
◼ Tables to be kept in the cache
◼ ALTER TABLE <table name> NOCACHE
◼ Tables to be removed from the cache
◼ These additions are also available for the CREATE TABLE statement.
◼ You can reset the default behavior with
◼ NOT CACHE
◼ NOT NOCACHE
You can determine the current setting for every table by checking the FILES system
table.
© SAP 2008
◼ This function is offered for the first time with MaxDB 7.7.06.
Inside MaxDB
Topic 1: Processes
Topic 2: Locks
Topic 3: Memory Areas
Topic 4: Disk Areas
Topic 5: Logging
© SAP 2008
Trace Writer
Log Writer
Requester
Garbage
Server
Pager
User
Server
Utility
Pager
Timer
User
IO Threads
Catalog Cache
SharedSQL
Restart
/sapdb -
LOG programs
DATA LOG
etc.
Undo
© SAP 2008
◼ The term "volume" (also referred to as "devspace" in versions before 7.4) denotes a physical disk or
part of a physical disk.
◼ There are two types of volumes:
• Data volumes mainly contain the user data (tables, indexes, and so on) and the database catalog,
but also store the converter as well as history pages with before images of the current
transactions. Through internal database striping, the data for a table is equally distributed across
all data volumes.
• In the log volumes, all changes to database content are recorded as after images to allow changes
that are not contained in the last complete data backup to be reloaded when a database instance is
restored.
To ensure that a database in a small system without RAID1 drives can run smoothly, you can
mirror the log volumes. In unmirrored log mode, the disk must be mirrored either physically or
by the operating system for at least production systems.
sapdata
saplog
◼ As of Version 7.2.04, the directory for the database software is registered based on the database
instance. In the Database Manager CLI, you can use the command db_enum to call up information
about which database instances are installed on the server and the directory in which the associated
software is stored. The information fast and slow describes the database kernels that are active. The
slow kernel is a special diagnosis and trace kernel.
Example: dbmcli –n <server name> db_enum
• P01 /sapdb/P01/db 7.7.04.29 fast running
• P01 /sapdb/P01/db 7.7.04.29 slow offline
◼ Backwards compatible database programs are stored in the directory /sapdb/programs (dbmcli
dbm_getpath IndepProgPath). Configuration data and diagnosis files are stored in the directory
/sapdb/data (dbmcli dbm_getpath IndepDataPath).
◼ The following directories should be accessible in the environment (SAP Note 327578):
• PATH variable in the SYSTEM environment:
Drive:\sapdb\programs\bin
Drive:\sapdb\programs\pgm
• PATH variable in the USER environment:
Drive:\sapdb\<SID>\db\bin
Drive:\sapdb\<SID>\db\pgm
Drive:\sapdb\<SID>\db\sap
◼ On UNIX, the path information should be adjusted and entered in dbenv_<server name>.<shell> of
the database administrator <sid>adm.
◼ As of MaxDB 7.8, an ”isolated installation“ will also be possible.
sapdb
SID SID
db
sapdata saplog
bin pgm env etc lib incl misc symbols sap
M_DISKL001 … M_DISKL0xx
© SAP 2008
◼ When a MaxDB database is installed for SAP applications, certain naming conventions apply to the
required directories.
◼ The sapdata directory includes all the data volumes. You can set up a maximum of 256 data
volumes.
◼ The saplog directory includes the log volumes. If you are using the mirrored log mode, the log
volumes are mirrored by the database. You can set up any number of log volumes.
◼ Usually, only one sapdata and one saplog directory are created. However, more can be created.
◼ In these directories, you can also insert links to file names in file systems where there is capacity to
create volumes.
◼ You can also create mount points for different sapdata or saplog directories to make space for the
volumes. For a large number of data volumes, we recommend that you create between five and ten
data volumes for each file system. Reason: There is no way to inform the operating system of how
many parallel write/read devices are behind one mount point.
◼ The previous DBROOT directory is now comparable to INSTROOT and is available under
/sapdb/<SID>/db.
◼ If your system offers hardware-based mirroring (RAID1 systems), we recommend that you use this
to mirror the log volumes.
Restart Undo
Undo
DISKD0003
DISKD0002
Undo
DISKD0001
© SAP 2008
◼ The database system distributes the data pages across the available data volumes to ensure a fill
level and I/O load that are as even as possible.
Here, the data volumes are filled based on the absolute fill level, and not relatively.
◼ If you are using a RAID controller with local disks, we recommend that you configure the same
number of data volumes as available disk drives. Data volumes are normally stored on RAID5 disk
systems due to the lower security overhead (parity information).
◼ Database-independent files such as the OS swap or page areas and SAP system files should be
stored on separate disks.
◼ Examples based on the square root formula:
• Total size of database Number of data volumes to be created
10 GB 4 volumes
50 GB 8 volumes
100 GB 10 volumes
200 GB 15 volumes
500 GB 23 volumes
1 TB 32 volumes
• More recent installation routines of SAPinst already use this formula.
T C
IO Thread
IRE
◼ Uses the O_DIRECT operating system function to write DB
_D
data to volumes and backups in the file system directly
OS
EN
without a file system cache
OP
File System
◼ Since internal file system caches are used when RAW
E_
devices are implemented on Linux, you should also set
Cache
US
the parameter here.
◼ When mounting file systems from third-party
manufacturers on UNIX, use O_DIRECT or similar
Restart
options.
DATA
Undo
© SAP 2008
◼ In UNIX systems, we recommend that you configure log and data volumes as RAW devices. This
avoids an additional administration level (in this case, particularly the records kept in modern file
systems).
◼ If a file system is actually faster than a RAW device, internal caches are used in file management,
which can jeopardize the operation of the database in the event of abrupt system failures (power loss
and so on). This means that administrators cannot assume that data regarded as written was in fact
written immediately. File system management usually confirms receipt of pages immediately and
may not execute the write action until some time later, after several write commands have
accumulated.
◼ During subsequent database operation or at the latest when the database is next backed up or
checked, page errors occur when you try to access pages that were not previously saved. In this
situation, a database recovery is then usually required.
◼ When using file systems with MaxDB, you should set the parameters
UseFilesystemCacheForVolume (USE_OPEN_DIRECT) and UseFilesystemCacheForBackup
(USE_OPEN_DIRECT_FOR_BACKUP) for security reasons to make absolutely sure that all pages
were written prior to reconfirmation. The database kernel then opens the volumes with the
O_DIRECT operating system option.
◼ Since operating system caches are also used to store data temporarily when RAW devices are
implemented on older Linux releases, we also recommend that you use this parameter for security
reasons.
◼ In other versions of UNIX and Windows systems, this parameter is not used even if it can be set. In
Windows, files are always accessed with corresponding options for writing directly to the disks,
whereas UNIX usually offers the O_DIRECT mount option.
DATA DATA
◼ Data volume filled
© SAP 2008
◼ This topic is also addressed in conjunction with the automation of page clustering:
◼ Data is equally distributed only when the workload is low.
◼ However, it can still take a while for the database to interrupt the distribution process if the
workload increases again. Equal distribution is then continued later.
◼ This function is implemented as of MaxDB 7.7.06.
◼ Until now, this was carried out via the data cache and caused long-term cache displacement, thus
affecting day-to-day business significantly.
◼ In the final implementation, memory structures outside of the data cache are used, which still have
to be provided.
Database
DatabaseKernel
Kernel
db_activate
db_activate
DATA
DATA
DATA
© SAP 2008
◼ Since the sequential formatting process required very few system resources in the past, this
installation step could not easily be identified. For this reason, users often canceled the installation
process because there was no activity on the server and the installation process involved long wait
periods. Through parallel formatting, this process should be much faster.
Inside MaxDB
Topic 1: Processes
Topic 2: Locks
Topic 3: Memory Areas
Topic 4: Disk Areas
Topic 5: Logging
© SAP 2008
Mirrored
Log
Mirrored
M_DISKL001 Log
M_DISKL002
Log
DISKL001
Log
DISKL002
© SAP 2008
◼ The log area can consist of any number of log volumes and mirrored log volumes. MaxDB handles
these as one continuous area and writes to it cyclically. For this reason, the administration is only
made more complicated through several log volumes and results in no additional advantages. It is
better to keep the number of log volumes low. One or two log volumes are recommended. A typical
volume is 2 to 8 GB, but can also be up to 64 GB in some cases.
◼ All log volumes should be on separate disks, physically separated from the data areas. This is
necessary for data security, availability, and performance reasons.
◼ You can set up software mirroring of the log volumes through the database kernel, but we
recommend using hardware mirroring (RAID1) to achieve this.
◼ Log information is released for rewriting only once the log has been backed up successfully.
◼ For performance reasons, we recommend that you do not store log volumes on RAID5 disk drives.
• Log information is usually written asynchronously in small portions (between one and two 8 KB
pages for each action). Only when a transaction is completed (COMMIT) does the process that
triggered the COMMIT have to wait for the log data to be written successfully for this transaction
in a synchronous action.
For this reason, the disks on which the log data is stored should be as fast as possible so that wait
times can be avoided when the COMMIT is executed. Ideally, log write times should be under
one millisecond. Exceptions here are distributed and mirrored drives in different data centers.
Due to signal runtimes between the drives, it can take 2-3 ms in this case.
• Since the read/write effort for small write operations of this kind in simple RAID5-based disk
systems is considerably higher than for RAID1 disks, the latter should be used. In large disk
storage systems with advanced cache technology, the difference in technology between RAID1
and RAID5 does not affect performance significantly, so RAID5 could also be used for the log
area in this case. Definitive conclusions can be made only after performance has been measured.
Restart Pager
Pager History Log Log Writer
DATA Pager
pages queue Log
Undo
© SAP 2008
Data Cache
Table
History
s (1)
ss (15)
(3)
© SAP 2008
◼ The process of separating short-term log data (before images) and log data to be backed up
permanently is illustrated again here in a detailed example.
ss (15)
(3)
© SAP 2008
◼ Experience has shown that the history pages are almost never archived from the data cache to the
data volumes. In the liveCache environment, this is already the case if many consistent views are
used.
Log
© SAP 2008
◼ Each log queue (for each UKT) can be extended to include the LogQueueSize parameter.
from the last SAVEPOINT, which may be days old in some log writer
cases.
All transaction data is lost.
STOP
Log
© SAP 2008
◼ In the future, this option will not be as easy to use on the graphical front ends.
Mirrored Mirrored
Log log Log log
© SAP 2008
© SAP 2008
Unit: 3
Topic: Inside MaxDB
D.
Unit: 3
Topic: Inside MaxDB