0% found this document useful (0 votes)
129 views

Sybase Data Storage & Fragmentation

Uploaded by

Alberto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views

Sybase Data Storage & Fragmentation

Uploaded by

Alberto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Sybase Data Storage

& Fragmentation
Software Gems Pty Ltd
Derek Asirvadem
V2.5
04 Sep 12
Sybase Data Storage & Fragmentation
Introduction
Purpose
1 This Software Gems document defines the physical elements of a Sybase ASE database; assists in the understanding the terminology in the manuals,
and the operation of ASE. Indeed, it overcomes the problem of abysmal manuals in that subject matter.
2 There is an awful lot of shallow, inaccurate, misleading and false information on the Internet. Unfortunately some of that false or misleading
information is published by Sybase, both in the manuals, and on the web. This document is therefore rendered to provide full and complete
information (albeit very condensed), such that the reader is no longer vulnerable to false or confusing information on the subject.
Structure
This document combines three closely related HTML documents into a single PDF, and resolves the links. It remains in three Parts, with a single
numbering scheme (19 chapters) throughout (Levels are numbered in Roman numerals). When it is relevant, the section presents APL vs DPL/DRL
LockSchemes separately. The definitions are Normalised, and cross-referenced. Virtually all objects can be selected, to open further detail.
Sybase Data Storage
The elements of data storage units, their relations, and their types. This is a pre-requisite to the second part.
1 Unit
Units of data storage, their relations, the hierarchy
2 DataStructure
The five possible DataStructures that constitute a table, four of which are fully illustrated and examined
3.1 Heap
3.2 Clustered Index Education
3.3 Nonclustered Index • This document is actually a consolidated version of
3.4 Placement Index a selection of the Memory Tag pages from our
4 Data Model/Catalogue courses.
Explains the entities in the Sybase ASE catalogue that pertain to Data Storage • We do not provide ordinary SQL and Sybase
5 Data Model/DataStruct courses, there are many providers.
Presents all the elements relevant to Data Storage in the form of a Relational Data Model • However, as true performance experts, we provide
Sybase Fragmentation specialist Sybase Quality & Performance courses
Definition & identification of the three distinct levels of fragmentation & the types within them; at both the DBA and Developer level, which
determination of each level/type; followed by chapters for each level/type allow you to take full advantage of your software
6 Definition investment.
Defines Fragmentation, Levels, terminology and differentiates the types • We also provide high performance, standard-
7 Determination Level I Level II Level III Partition compliant Relational Database Design and
Guidance on the accurate determination of each Level/Type of Fragmentation education.
8 I Allocation Unit • There is no substitute for formal, qualified
Identifies Fragmentation in AllocationUnits & Extents within AllocationUnits education. Please inquire if you need further
9 I Drop-Create detail, or you have an interest in improving your
Why Drop-Create Clustered Index does not return Asynch Pre-Fetch & Large I/O Sybase performance or SQL coding.
10 I Segment • As such, they are detailed, very condensed and
The value of Segments complete, but of course, the scope is limited.
12 II Page Chain
Identifies and discusses Fragmentation in the Page Chain Manual
13 II Overflow Page These documents are provided to complement the
Identifies and discusses Fragmentation in Overflow Pages Sybase manuals, and to correct them, as follows:
14 II Unused Space/Extent • they contain information that is not in the manuals
Identifies Fragmentation in Unused Space in Extents (ie. they overcome the lack of information)
15 II Unused Space/Page • where the manuals contain contradictory
Identifies Fragmentation in Unused Space in Pages information, the correct version only, is provided,
17 III Page the goal is to eliminate confusion and half-truths !
Identifies Level III Fragmentation (DOL only): Rows within Pages, displaced rows • where misleading or false technical terms are used,
19 Index Type correct technical terms are used instead
Compares APL vs DOL from an Index Type perspective. • they bring all the relevant information about a
subject together, in one place

Document Status
What was once a few single pages made available on the web, due to interaction with the Sybase community, has been consolidated into a single
document, and expanded. It remains a collection of diagrams from our course documents, a terse, condensed, diagrammatic style; rather than one of our
usual polished final documents, that some of your have come to expect. Progress (adding diagrams and explanatory test) is made between assignments,
based on questions and feedback received.
Version
V2.0 12 Sep 11 Consolidation of three previous docs; full exposition to 14 pages; first open publication; enabled HTML Image Map
V2.5 28 Mar 12 Data Storage (now 9p); Definition & Determination added (8p); Fragmentation (now 12p); PDF version (now 31p).
It is valid for Sybase ASE versions 12.5.4.x and 15.x. Yes.
Copyright
The entire document is the property of, and copyright, Software Gems Pty Ltd. It is provided free of charge to assist the Sybase community in server
and database administration, where no fee is charged. Permission is granted to copy or distribute this document, as long as it remains unaltered; with
the copyright notice intact; due credit is given to the author; and the distribution and ensuing consultation remains free. Contact us re commercial use.
Moral Right & Contact
The author is Derek Asirvadem, Information Architect and Sybase performance specialist, he is solely responsible for the content. He welcomes
constructive commentary and answers questions for professionals (click the link at the bottom of the page).
2 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
1 Data Storage Unit
First we need to understand the different Data Storage Elements, what they contain, how they relate to each other, and their Units of
Measure. This is presented in its natural hierarchy, from top to bottom, largest to smallest, and identifies the Pages used to control space management.

v Devices
Database specified in MB
s Segments
d DataStructures

Db Allocation specified in MB 1 Device


[DevFragment] s Segments
d DataStructures

AllocUnit 32 Extents, 256 Pages, 512KB 1 Device


1 Segment
1-32 DataStructures
AllocationPage A
For each Extent [32]:
ObjectId
IndexId
PartitionId
UsedPages An Object (the physical term, as in ObjectAllocationMap) is a discrete
FreePages 1 DataStructure
Extent 8 pages DataStructure, identified by ObjectId, IndexId and PartitionId.
▶OAMPage 1 of 5 Types
(An ObjectId alone identifies the table, which is not a DataStructure.)
Pages

Forwarded Row
Deleted Row

Page
Rows
DOL Heap (Always) APL Heap (When No CI) Clustered Index Nonclustered Index Text/Image Chain
• RowIds do not change • Rows shifted on Expand/ • Index/Row Order maintained (including Placement Index) • The entries are the content
• DELETES Marked but not Contract/INSERT/DELETE • Rows shifted on Expand/ • Index Entries & RowId of a single Row/Column
Removed • Chronological Order Contract/INSERT/DELETE • Entries shifted on • Allocated in units of Pages
• Expanded Rows Forwarded • INSERTS at end • Heap eliminated INSERT/DELETE
• Interspersed INSERTS at end • Page Splits when Full for
• No Clustered Index interspersed INSERTS

1.1 AllocationPage
• The first page of each AllocationUnit contains the AllocationPage, it identifies:
• the 32 Extents that it contains
• the Physical DataStructure residing in each Extent (identified by ObjectId, IndexId and PartitionId)
• pointers to the OAMPages of those 32 Physical Datastructure, and
• the space available in each Extents, and in each Page of each Extent.

1.2 ObjectAllocationMap
• Just as the first Page of an AllocationUnit is the AllocationPage, the first Page of a DataStructure is the O ObjectAllocMap
ObjectAllocationMap ▶AU0 ▶A ▶Extent
• It contains a linked list of the AllocationUnits in which Extents belonging to the DataStructure reside. ▶AU256 ▶A ▶Extent
▶AU1024 ▶A ▶Extent
• The AllocationPage of each AllocationUnit is then interrogated to locate the Extent.
▶AU512 ▶A ▶Extent
• The AllocationPage identifies which Extents & Pages have free space. If such exists, this allows rows in the ▶AU1280 ▶A ▶Extent
DataStructure to be placed close to other rows, however it is quite independent of rows in other DataStructures. ▶AU768 ▶A ▶Extent
• If more than one Page is required for the OAM, a linked list of OAMs is provided
• While the OAM provides a second access path to the DataStructure, it is especially relied upon during Table Scans of
DOL Heaps, since they do not have PageChains.

1.3 Other Control


In order to administer Sybase ASE, the above Data Storage units need to be understood, and they are covered in detail in the following pages. In order to
complete the picture, however, there are two more Pages that are used to manage space efficiently (these are not expanded):
• GlobalAllocationMap
Contains space usage bits (Used/Free) for all AllocationUnits in the database
• PartitionControlPage
Each Partition has an additional Page identifying free space

Intro DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 3 of 32
Sybase Data Storage & Fragmentation
2 DataStructure
This chapter introduces Sybase ASE DataStructures, again in logical order, and illustrates how they relate to each other.
1. Table
The catalogue tables may be easier to
• a Table has a single entry in sysobjects WHERE type U
understand if they had been named:
• the Primary Key is (id), as in OBJECT_ID()or ObjectId
• sysindexes
• a Table is a collection of Logical DataStructures sysLogicalStruct
2. Logical DataStructure • syspartitions
sysPhysicalStruct
• each Logical DataStructure has a single entry in sysindexes, which defines its logical structure, keys, etc
• the Primary Key is (id, indid), indid identifies the DataStructure Type
• There are five types of Logical DataStructure (the APL Heap and DOL Heap are very different, as detailed in the next
chapter):
sysobjects.
id/U
Table
LockScheme DPL/DRL APL Any
Logical DOL Heap APL Heap Clustered Index Nonclustered Index Text/Image Chain
DataStructure • Always • Only when no CI • Eliminates the Heap • one for all Text/Image
Type columns in the table

Allowed 1 1 Heap xor 1 Clustered Index 249 1


sysindexes. 0 0 1 2 to 250 255
indid means Heap (No CI) means Heap (No CI) means CI (No Heap) means NCI means Text/Image Chain

3. Physical DataStructure
• each Logical DataStructure is rendered physically as one or more Physical DataStructures
• the Heap or Clustered Index, which contains data rows, may be divided into several Physical DataStructures, called Partitions
• the Nonclustered Index and Text/Image Chain are not Partitioned
sysindexes.
indid
Partitions Partitions Partitions Nonclustered Index Text/Image Chain

• each Physical DataStructure has a single entry in syspartitions, which defines its physical structure, Data Storage location, etc
• hence the silliness in the manuals that "unpartitioned objects have one partition"
• the Primary Key is (id, indid, partitionid)
syspartitions.
partitionid
DOL Heap Partition APL Heap Partition CI Partition Nonclustered Index Text/Image Chain

4. Partitioned DataStructure
There are, therefore, five types of Physical DataStructure, and the Heap or the CI may be Partitioned. During the discussion of logical or
physical DataStructures, non-
5. In summary, a DataStructure is
technical terms such as
• an independent Data Storage structure that is 'table',
• first, belongs to a Table (ObjectId) 'base table' and
• second, one of five logical types (IndexId) 'object-index pair'
• third, a physical structure, which may be a Partition (PartitionId) are too ambiguous to be meaningful:
those who use them are committed to
your continued confusion.

Intro Unit Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

4 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
2 DataStructure
6. The five types of Physical DataStructure, the first three of which may be Partitioned, are located on Devices, which are identified by Segment:

sysindexes.
indid
Partitions Partitions Partitions Nonclustered Index Text/Image Chain

syspartitions.
partitionid
DOL Heap Partition APL Heap Partition CI Partition Nonclustered Index Text/Image Chain

2.1 Segment
A Segment 1 is a logical group of one or more Devices, within a database. A good Segment Plan has two fundamental purposes:
1. It allows DataStructures to be distributed for load balancing purposes:
• separating the data (CI or Heap) of a single table from its related NCIs
• separating the different tables within a Transaction
• separating the Partitions of a table, in order to support full parallelism
2. It drastically reduces Level I and II Fragmentation, which would otherwise be massive.
3. Either a Logical DataStructure (all Partitions in the DataStructure) or a Physical DataStructure (a single Partition) may be placed on a Segment.
• placing all the Partitions of a DataStructure on one Segment/Device has the same I/O contention as an unpartitioned DataStructure (shown)
• placing each Partition of a DataStructure on a separate Segment/Device eliminates that contention, and maximises parallelism (not shown)

Database tempdb user_db

Segment default • default is not a segment: it is the


segment one has when one does
not have segments.
system default
• Much like public is not a group:
it is the group one belongs to when
logsegment system data_seg index_seg text_seg logsegment one has no group.
• Or one is the number of partitions in
an unpartitioned DataStructture.
2.2 Device
A Sybase Device 1 is one of the following. Note that ASE treats it as a contiguous set of disk blocks:
• File
• Raw Partition
• Logical Volume (SAN or Volume Manager), which is a File or Raw Partition
Devices are server-level resources: part or all of a Device is allocated to a single Database.

• Each Device is a separate I/O


Device temp_01 data_01 data_02 data_03 data_04 log_01 queue within the server
• It is therefore best to use neither
data_09 too few Devices, nor too many,
based on the size of each database.

1. This is an introduction to Segments and Devices; it is not a full exposition.


Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 5 of 32
Sybase Data Storage & Fragmentation
3.1 Heap
This chapter discusses the APL Heap and the DOL Heap, and their characteristics.

AllPage Locked DataPage/DataRow Locked

Heap (When No Clustered Index) Fresh Heap (Always) Fresh


• All the Nonclustered Indices belonging to an APL table are • DOL tables always have a Heap
Clustered Index based (RowIds may change); there are Heap- • RowIds do not move, they are Static (except during REORG of course)
based (RowIds are static) only when the CI is absent • All the Nonclustered Indices (including the Placement Index) belonging
• The creation of a Clustered Index eliminates the Heap; dropping the to a DOL table are Heap (or Static RowId) based
CI returns the Heap • It is a mistake to view the DOL Heap as PI based, since all the NCIs
• This illustrates a Heap, which occurs only when the Clustered Index (including the PI) are dependent on the Heap, not other the way around.
has been actively avoided The NCIs cannot change because the Heap cannot be changed.
• Except when used as 'pipes' or 'queues', APL tables should always • By design, the Heap and any NCIs (including the PI) are logically and
have a Clustered Index physically separated, in order to reduce dependencies

Page Unused Space Deleted Row

Rows Forwarded Row (originl location) Forward (new location)


indid = 0 indid = 0
Page H ObjectAllocMap No Page H ObjectAllocMap
Chain ▶AU512 ▶A ▶Ext Chain ▶AU768 ▶A ▶Ext

Scans
via OAM
Heap Method
Heap

Row Row

• Table scans via PageChain • Table scans via OAM method only
• INSERTS are placed at the end of the Heap • RowIds do not change
• Pages are kept trim; rows are contiguous • Deleted rows are marked for delete but not deleted (they are deleted,
• Rows within the Page are shifted upon DELETE and UPDATE and the space is reclaimed, during REORG or aggressive Garbage
(Row Expansion/Contraction) Collection)
• Row Expansion may cause it to be moved to the end of the Heap, • If space is available in the current Page or Extent of the Heap (as a
changing the RowId) result of reserving same), the Forwarded Row or interspersed INSERT is
• If there are NCIs, the RowIds need to be updated placed there; otherwise (the usual case) it is placed at the end of the
Heap. The intended and actual locations are nowhere "near" the
original location and nowhere "near the Placement Index, refer to
section [8.3] and [9.5]. Forwards accumulate in Overflow Pages.
Pages
• When a row is Forwarded, the NCIs (including the PI) must access the
original location, to obtain the forward address, then access the
Forwarded Row.
• Contracted Rows are not repatriated

Heap (When No Clustered Index) Fragmented Heap (Always) Fragmented


Rows are • This leads to substantial Unused Space,
shifted upon which cannot be used for new rows; the
DELETE or DataStructure cannot retains its speed
UPDATE
• There is no traversal capability, the
• This leads to Unused Space, but the OAM method must be used
Heap Heap
DataStructure retains its speed and Deleted Rows
traversal capability No Page
Chain
Forwarded Rows

INSERTed INSERTed
Rows at End Forwards: Rows at End
Overflow
Pages

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

6 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
3.2 Clustered Index
This section discusses the Clustered Index, and its characteristics.

AllPage Locked DataPage/DataRow Locked

Clustered Index (Heap Eliminated) (None)


indid = 1
B D ObjectAllocMap
▶AU512 ▶A ▶Ext

Clustered
Index is
Leaf Page Sparse
Clust
Page Chain
Index
at Every
B Tree Index Level
Rows
Leaf Level
is Data Row

Leaf Level
B-Tree Entry Row

• The Index B-Tree is clustered with the data rows, into a single • Despite the demanded "clustered" syntax, there is no such thing as a DOL
DataStructure "clustered" index or DOL "clustered" table. The DataStructure addressed
• The Leaf level of the B-Tree is the data row (put another way, in is fact a Placement Index.
there is no Leaf level, the B-Tree is clustered with the data rows) • There is nothing remotely like the Clustered Index available for DOL
• Creation of the CI eliminates the Heap; dropping the CI returns tables.
the Heap
• One less logical Read on every access Confirmation
• There are still two OAMs to allow independent access If anyone suggests that DOL "clustered" indices do exist, run this
• All the DataStructures belonging to an APL table are Clustered simple query on a database that has both APL Clustered Indices and DOL
Index based "clustered" indices. Study the DataStructure chapter, along with the
• Index order = Row Order report, and ask them why, as far as Sybase ASE internally is concerned:
• Rows are distributed as per Index Key, and remain so • Clustered Indices always appear without a Heap
• Designed for • Heaps always appear without a Clustered Index
• Relational Keys (compound or composite keys) • Placement Indices are Nonclustered Indices
• Range Queries • Placement Indices always appear with a Heap (which means they are
• INSERTS into Key location: two separate Logical, and therefore Physical, DataStructures)
• For Interspersed INSERTS, if the page is full, a Page Split is Such persons evidently have little technical knowledge os Sybase.
necessary, and the RowIds (in the split Page) which are
referenced in any NCIs must be updated All the technical evidence from all the functions and catalogue
• Pages are kept trimmed components, is consistent. Even a simple query demonstrates the truth. It
• On Expand/Contract/INSERT/DELETE Rows in the CI may be can be extended to show other items as desired.
shifted within a Page, without additional overhead, maintaining
free space in the page
• According to the Relational Model, rows in a table must be unique.
The Clustered Index is designed for Relational tables, and to be
unique, and therefore should be
• Non-unique keys cause Overflow Pages .
A man and a woman are meant to be married; together they achieve more
than each achieves separately. Implementing APL tables without a Clustered
Index, is analogous to a divorced couple. Likewise, there is no fidelity in
non-unique Clustered Indices .

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 7 of 32
Sybase Data Storage & Fragmentation
3.3 Nonclustered Index
This chapter discusses the Nonclustered Index, and its characteristics under the different LockSchemes.

AllPage Locked DataPage/DataRow Locked

Nonclustered Index Nonclustered Index


indid = 2 indid = 1 indid = 2 indid = 0
N ObjectAllocMap B D ObjectAllocMap N ObjectAllocMap H ObjectAllocMap
▶AU1280 ▶A ▶Ext ▶AU512 ▶A ▶Ext ▶AU1280 ▶A ▶Ext ▶AU764 ▶A ▶Ext

Nonclustered Nonclustered
Indices are Indices are
Dense Dense

Page Chain Page Chain


Clust at Leaf at Leaf
NCI Index NCI Heap
B Tree Level Only B Tree Level Only

Leaf Level Leaf Level


B-Tree Entry IndexKey RowId Row B-Tree Entry IndexKey RowId Row

• If there is no space available in the NCI for interspersed INSERTS, • If there is no space available in the NCI for interspersed INSERTS, the
the Index Page must be split. Index Page must be split.
• This disturbs the PageChain • This disturbs the PageChain
• The NCI conatins the RowId in the CI; when the row moves (as the • The Placement Index is a Nonclustered Index, with a couple of
CI is re-ordered and kept trim), the NCIs need to be updated. additional attributes.
• The NCI conatins the RowId in the Heap; the rows do not move, and so
there is nothing to update in the NCIs (including the PI). This is better
stated as, in order to eliminate updating the NCIs, the rows in the Heap
are designed to be static.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

8 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
3.4 Placement Index
This chapter discusses the Placement Index, and its characteristics.

AllPage Locked DataPage/DataRow Locked

(None) Heap (Always) & Placement Index Fresh


indid = 3 indid = 0
N ObjectAllocMap H ObjectAllocMap
▶AU1280 ▶A ▶Ext ▶AU764 ▶A ▶Ext

Page Chain
at Leaf
Level Only

No Page
Heap Chain:
NCI
Scans
B Tree
must use
OAM
method

Leaf Level
B-Tree Entry IndexKey RowId Data Row

There is no equivalent on the APL side. A rough equivalent would be: DOL tables always have a Heap.Heap They may have a single Placement Index.
• a Heap (ie. where a Clustered Index has been actively avoided, It is a Nonclustered Index
Index (there is no structural difference), a separate
thereby crippling it). DataStructure to the Heap, with two additional criteria:
• but even then the APL Heap has a PageChain, providing faster 1. It identifies the initial placement of rows in the Heap
scans 2. Any settings made, such as placement ON segment and FILLFACTOR,
• plus a Nonclustered Index apply to the Heap as well.
As such, its relationship to the Heap is slightly closer than that of other
The Placement Index is not comparable to a Clustered Index, which is NCIs, but that does not constitute clustering ala Clustered Index; a term
available only for APL which existed before its advent;. Note that they are separate by design.
• It has no clustering (as per the definition of that term since 1984); the B-
Tree is not clustered with the data rows, forming a single physical • This initial row placement is not maintained under:
DataStructure; it remains a separate DataStructure to the Heap • interspersed INSERTS
• There is no such thing as a DOL "clustered" Index • DELETES and
• The use of the term "clustered" Index in relation to DOL tables is therefore • UPDATES that cause Row Expansion
incorrect, confusing, and fraudulent.
• The correct term, as per some, but not all, Sybase documentation, is • The Index & Heap remain two separate DataStructures; two OAMs
Placement Index • Two Logical Reads on every access (via any NCI, including the PI)
• Unfortunately, to address the Placement Index or the Heap, one is • Key order in each NCI is maintained, but Row order in the Heap cannot
required to use the "clustered" syntax. Talk about forced confusion. be maintained
• The Heap is Static RowId based
based.
• Other than to rebuild the Heap, there is no value in a Placement Index
• Range Queries are not possible, since it is not a Clustered Index (there is
no order to the Heap, and it does not have a PageChain).
• Ideal for non-relational Keys (surrogates, monotonic)
DOL tables have an additional third level of Fragmentation
Fragmentation, they get
fragmented at this level very quickly, and require regular REORG. The above
illustrates a fresh, unfragmented Heap and Placement Index; section [18]
18
illustrates a fragmented Heap and Placement Index.

Deeper Understanding, Less Irrelevant Work


Consider this. Since:
• Given that Range Queries are not supported, there is no value in the Placement of rows in the Heap, or maintaining the order of the rows
• Whatever placement is obtained by DROP/CREATE INDEX, is lost as soon as ordinary DML commences
therefore the placement intended by the Placement Index is actually quite irrelevant, and can be dispensed with. This merely eliminates the confusion,
and the small mountain of false expectations heaped upon it.
The issue that remains, that does matter, is fragmentation, since it hinders Asynch Pre-Fetch and Large I/O efficiency and consumes unused space.
When the Heap becomes fragmented enough to warrant it, de-fragment it by creating and dropping a Placement Index (realising its the fleeting value,
which is to identify some order when rebuilding the Heap). This method is usually much faster than REORG REBUILD, even though the WITH
SORTED_DATA qualifier cannot be used, since the data in the Heap is not in any order.

Placement Index Key


Since Range Queries cannot be supported, and the order cannot be maintained, the index that is chosen for the Placement Index is actually quite
irrelevant. The candidate index that explicitly identifies, or implies, a chronological order is best, since it groups the most frequently updated rows
away from the least frequently updated rows.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 9 of 32
Sybase Data Storage & Fragmentation
4 Data Model • Catalogue
A formal Relational Data Model is the best way to understand data, and its relations. This chapter presents the entities in the catalogue that pertain to
Data Storage elements, in terms of a formal Data Model (Entity Relation level), rendered in IDEF1X. Specifically, it shows the catalogue in which
information about each Data Storage Unit is stored.
Distribution Logical Physical

Database Device Devices and Databases are server objects


sysdatabases sysdevices

May Be May Contain


Arranged As Is Created On
Server
Comprises Database

Global The GAM is a server level entity, however it is


Segment Table Allocation Map DbAllocation
syssegments sysobject=U sysusages located in the Database catalogue
sysgams

The Database Allocation is the collection of


Consists Of (Logical) Consists Of Database Fragments
May Be Deployed On
May Contain

DataStructure Has Space DbFragment The size of Database Fragments is automatically


sysindexes Available In sysusages set, based on the ALTER DATABASE request versus
the space availability and location. The smaller
the Fragment, the more the database is fragmented
Manifests As (Physical) Exists As at Level I.

May Contain Partition


Control
Page
Partition As discussed above, the physical manifestation of a
syspartitions DataStructure is one or more Partitions.
Object
Allocation
Map

Locates
[1]

Allocation Allocation A Page or Extent number that is divisble by 256 is


Unit Page an AllocationUnit, containing an AllocationPage
and up to 32 Extents.
May Contain
Identifies [1]

• The entities in the catalogue are rendered with a


A Page number that is divisble by 8 is an Extent,
shadow and the catalogue name, the remainder Extent containing a single DataStructure. Contrary to
are in the DataStructures
the manuals, an Extent contains only one
• Square corners means Independent, round corners
DataStructure
mean dependent
May House
• Solid lines mean an Identifying relation; dashed
lines mean Non-identifying relations
• Read the VerbPhrases to understand the relations The atomic unit of Storage, and of I/O. Asynch
• For a full introduction to IDEF1X Notation, etc, Page Pre-Fetch can read an Extent or AllocationUnit in
use the link at the bottom. a single request.

This models the normal case: exceptional cases, such as the mandatory logsegment, which may or may not be correctly deployed, are not differentiated.

IDEF1X Notation

1. Additionally: Has Space Available In.


Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

10 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
5 Data Model • DataStructure
This chapter exposes the five types of DataStructures, starting from the catalogue, in terms of a formal Data Model (ER level).

Table
sysobjects=U

• The entities in the catalogue are rendered with a


Consists Of shadow and the catalogue name, the remainder
(Logical) are in the DataStructures
• Square corners means Independent, round corners
mean dependent
DataStruct
sysindexes • Solid lines mean an Identifying relation; dashed
lines mean Non-identifying relations
Partition
Control • Read the VerbPhrases to understand the relations
Manifests As
Page • For a full introduction to IDEF1X Notation, etc,
(Physical)
Has use the link at the bottom.

Partition May Have


syspartitions
4
NCI
B-Tree
Has
Has

NCI Has
Leaf
1
DOL
Object Heap
Allocation Deleted
Map Locates DOL Row
(RowId) May Contain May
Be
Locates [1]
DOL May Be Forwarded
Row DOL Row
Allocation Allocation
Unit Page
3
May Contain CI
Identifies [1] Has
B-Tree

Extent Is May Contain

Based on IndexId
CI Leaf Has
May Contain (Row)

2
APL
Heap Has

May Contain

APL Heap
Row
5
Text/Image
Is
Chain

The Text/Image Chain is a No PageChain: PageChain: May Has


PageChain. Each entry is • Nonclustered Index (Non-Leaf) • APL Heap Contain
one or more Pages, • DOL Heap. • Clustered Index
belonging to a column in (Leaf & Non-Leaf) Text/Image Page
a specific row. • Nonclustered Index (Leaf only) Entry Chain
• Text/Image Chain.
Orders

Page
There are always at least two paths to the data. That a Page belongs to a specific DataStructrure is directly identifiable (grey
relation); but the DataStructure consisting of Pages is not directly identifiable by this means. The PageChain or OAM provides that.

IDEF1X Notation

1. Additionally: Has Space Available In.


Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 11 of 32
Sybase Data Storage & Fragmentation
6 Definition
This document defines and discusses all aspects of Fragmentation, in substantial detail (albeit condensed) as it occurs in Sybase ASE.
The document is laid out as follows:
• this introduction, containing definitions and approach
• the impact of fragmented DataStructures
• Definition of every Type of Fragmentation, within each of the three Levels
• four sections identifying how Fragmentation can be determined accurately, and without confusion, fully detailed
• a section on evaluation of the various determinants
• an additional section of issues relating to Partitioned DataStructures
• eleven sections discussing the different Types of Fragmentation within each Level, fully illustrated and discussed
In particular, the level of detail provides information so that Fragmentation can be fully understood and therefore prevented, and leads up to why common
methods of correcting Fragmentation do not work. Put another way, the detail identifies why Fragmentation must be addressed using an overall approach,
at all three levels, if substantial performance gains are sought. It is not a point problem, and therefore point solutions do not apply.
Understanding the Data Storage structures that Sybase uses, is a pre-requisite to understanding Fragmentation.
• A table does not exist physically, it exists as a collection of Physical DataStructures: when a query is executed, it is the DataStrucures that belong
to the table that are accessed. In order to administer tables efficiently, the DataStructures and how they are accessed, must be clearly understood.
Level
The three Levels of Fragmentation are quite independent of each other, and can be differentiated easily. It is quite possible for a DataStructure to be
fragmented at one Level and free of Fragmentation at another Level: indeed, each Level requires quite different correction operations, and they
affect only that Level. The highest performance is obtained when all three levels are addressed.

Frequency
The frequency of correction operations for each Level, is also different: It is normal to de-fragment a
• Level III de-fragmentation (REORG REBUILD or DROP/CREATE CI or "CI") is required weekly at a minimum. DataStructure at Level II because
it is demanded presently, but to
• Level II is dependent on leave a full de-fragmentation
a. whether a good Segment plan has been implemented, and operation of Level I to a separate
b. the turnover within the DataStructure. maintenance window, addressing
The frequency required varies from monthly to annually. A good Segment plan and a well designed Clustered many DataStructures together,
Index may well eliminate the need for de-fragmentation altogether. because it requires reasonable
planning and the scripts require
• Level I de-fragmentation is required once, if it is done properly. It provides testing, etc.
a. the basis for reduced fragmentation at Level II
b. reduced frequency of Level II de-fragmentation operations, because it renders the correction operations at
Level II more permanent.
What it is Not
Administrators are sometimes confused by the masses of misinformation either available on the internet, or presented by Storage Teams who are
avoiding work, or hardware salesmen who are selling something on the false basis that it will result in less work for the DBA. To address this, it is
important to understand what Fragmentation is not:
• Hardware Striping equals Fragmentation
The SAN (or Logical Volume Manager) and Sybase ASE are completely independent of each other. ASE treats the Logical Volume as a
contiguous series of disk blocks. Whether the LV is striped or not is irrelevant to ASE; Fragmentation; performance; etc. Striping affects only the
speed of the LV within the hardware unit. De-fragmentation operations within ASE reclaims performance within ASE.
• If you use a SAN, you don't need Segments
See above. Total lack of technical ability and logic. My father works 50 hours a week, therefore your father does not need to work.
• Partitions equals Fragmentation
When the Partitions of a table (Physical DataStructure) are placed on several Devices or Segments, for performance purposes, by design, it is
distribution not fragmentation, and the result is substantially different to the fragmentation that occurs when there is no design.
• Data Distribution equals Fragmentation
Substantial performance can be gained in Relational tables when the Key (usually composite Keys) is used to distribute the data 1, and therefore
decrease contention. That is again, by design, and space must be reserved for interspersed INSERTS. Such reserved space is not the same as
unused or waste space, which cannot be used for interspersed INSERTS.
What it Is
Level I
Database Fragmentation: the unplanned or unconscious occupation of space, and the disturbed contiguity, of DataStructures across the Database.
Level II
DataStructure Fragmentation: the unplanned or unconscious occupation of space, and the disturbed contiguity, within the DataStructures.
Level III
Page Fragmentation: the unplanned or unconscious occupation of space, and the disturbed contiguity, within the DataStructures, in systems that
have been implemented quickly and without OLTP Standards or Relational technology.

1. That is not possible in record filing systems, where surrogate keys (single-column; monotontic) are used across the board.
Intro Unit DataStruct II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

12 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
6.1 Impact
This document is written for the qualified Sybase Database Administrator, and the subject is Fragmentation. As such, it does not detail how
the I/O subsystem; disk resources; caches and their configuration; etc, operate. It is expected the the reader understands all that, and therefore appreciates
the relevance of maintaining DataStructures in an un-fragmented state. However, there are basic features within Sybase ASE, that are commonly
unappreciated and therefore unused. It is a shame that in many sites, Sybase operates at a mere fraction of the speed that it is capable of.
Two such features that are fundamental to ASE delivering great speed when accessing the DataStructures, are described here.

Asynch Pre-Fetch Large I/O


is the mechanism, the set of methods that enables Sybase ASE to read refers to the resources used by Asynch Pre-Fetch. When large Buffers
large amounts of data, in anticipation of the query requirement. are requested, the (a) specifc Cache and (b) the available PoolSizes
• Asynch Pre-Fetch reads: come into play. The integrity of resident Buffers may cause denials:
Min 8 Pages (1 Extent) Pages or Extents within the requested Buffer may already exist in a
(Min for Covered NCI Scan is 2 Rows) smaller PoolSize; the PoolSize requested may not be present; etc.
Max 256 Pages (1 AllocationUnit or 32 Extents) Therefore Large I/O statistics relate to Caches and PoolSizes (not
(the first Page/Extent uses 2K I/O, due to it being an AllocPage) Buffers).
• Asynch Pre-Fetch is requested for Table Scans, Range Scans, Covered
NCI Scans, DBCC, and Recovery
• It has a self-modulating Look-Ahead Set which:
• prevents it from saturating the I/O subsystem, and
• prevents it from reading large numbers of Extents or Pages that will
not be used.
The modulation is based on the extent of success/failure of previous
APF attempts on the DataStructure.
• Due to ASEs brilliant architecture, Asynch Pre-Fetch operates
independent of the Caches and PoolSizes, and concerns itself with
Buffers; and subsequently, Pages used.

The impact of fragmentation is usually a subjective issue: people are used to a certain level of response from their queries, when the database contains a
somewhat higher population than it did during the initial testing, the response slows down. It is an awareness that is quite real, but unscientific.
• the loss of speed is certainly the result of naïve server installation and configuration, and a lack of planning and configuration at the Device and
Segment levels
• that loss of speed is not necessary: the server and its resources can be configured, such that response does not slow down with population, even
with very large tables 2
7 details the accurate determination of fragmentation, such that
• that subjectivity is relevant only in the absence of science and knowledge; chapter [7]
science and knowledge can de used instead of subjectivity
• the initial value of that subjective sense of speed is actually quite low (since the query did not enjoy the benefit of proper configuration, and thus
the use of Asynch Pre-Fetch and Large I/O), and therefore the users are in reality comparing 'slow' with 'very slow' on the scale of possible speed;
they have never enjoyed 'fast' and they do not know what they are missing.
Level I
Correcting Level I Fragmentation returns great speed to the DataStructures, due to enabling Asynch Pre-Fetch and Large I/O to their maximum
extents. It allows Sybase to operate at the 'fast' end of the possible speed spectrum. Further, it contains and therefore reduces the extent of Level II
Fragmentation 3.
Level II
Most DBAs are aware of some of the aspects of Level II Fragmentation, and how to correct it. There are some traps for young players, as detailed in
9 ignorance of which will cause de-fragmentation operations to be very transient, to have no persistence. However, without an awareness of
chapter [9],
Level I, the baseline speed is 'slow' and the frequency of de-fragmentation operations is increased.
Level III
This is mainly the consequence of storing unnormalised spreadsheets in a database container, as opposed to storing Normalised Relational tables. One
has to live with the consequences of such actions, and deal with the myriad problems, such as fragmentation of a new order; frequent and offline
maintenance of DataStructures; reduced concurrency (increased contention); increased number of locks; etc.

Performance & Tuning


• APF is generally automatic (one need not do anything to invoke it)
• Large I/O is possible if a large PoolSize is configured for the Cache
• Resources for both APF and Large I/O are fully configurable, monitored in detail, and
can be tuned at several levels.
• Sysmon reports statistics for both the APF mechanism and the Large I/O resources.
• the low usage of these facilities is always due to fragmentation at Level I or Level II
or both. Correcting that fragmentation returns great speed to the DataStructures.

2. Contrary to most articles on the web, Sybase is quite capable of high speed on very large tables. Archiving history data onto a separate database; the
consequent requirement to modify code (to look in two places for one thing); the maintenace of an archive database; the loss of DRI, are all quite
unnecessary.
3. Software Gems provides a High Performance Sybase Configuration, that ensures the server is operating as the highest levels of performance. We also
provide a complete Device & Segment [re-]configuration, such that Level I issues are eliminated. Both on a fixed price, guaranteed result basis.
Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 13 of 32
Sybase Data Storage & Fragmentation
6.2 Fragmentation Type
It is convenient when the Type identifies the exact location of the Fragmentation within the Database or DataStructure; other
forms of identifying the Type are confusing. In order to fully understand the three Levels of Fragmentation, and types of Fragmentation within each
Level, let us look at the best and worst scenarios in each Level and Type. Your DataStructures will be either one or the other, there is no 'in-between';
however, after correction operations using an overall plan have commenced, the DataStructures will move into that 'in-between' zone.

Level Location/Type Applies Condition Result Correction

I AllocationUnit APL DOL Best • Highest level of APF:


• AllocUnits • AllocUnits of a DataSructure spread across AllocUnits; Extents as
across the Db the smallest range required
• Extents within • Extents of a DataSructure spread across the • LIO structures heavily used
AllocUnits fewest AllocUnits [10.3]
10.3 • Fewest I/Os required to
• Each AllocUnit contains the fewest read the DataStucture
DataStructures
Worst • Loss of APF • Separate the tables
• AllocUnits of a DataSructure spread across • LIO Structures not used within a transaction
the largest range. • More I/Os required to read • Separate DataStructs in a
• Extents of a DataSructure spread across the the DataStucture table from each other
most AllocUnits; on the most Devices; across • Rebuild DataStructs in a
8.3
the database [8.3]. fresh location
• Each AllocUnit contains the most
DataStructures
4
II PageChain APL Best • Level I modulated 5
Contiguous PageChain [12.1]
12.1 • No interrupts during scans
Worst • Level I modulated 5
Disturbed PageChain, spread across Extents • More interrupts during
& AllocUnits [12.2]
12.2 scans
OverflowPage APL Best Prevention of insanity
• Duplicate No Non-unique CIs
Rows
Worst Additional I/O for duplicated Implement an Unique CI 6
High percentage of duplicated CI 'keys' [11]
13 'keys'
OverflowPage DOL Best Substantially faster queries
• Forwards No Forwards
Worst Additional I/O for Forwarded Fxed length rows or
13 17
High percentage of Forwards [11, 17] rows REORG REBUILD or
DROP/CREATE "CI"
Unused Space APL DOL Best • Level I modulated 5
Extent 7 No Unused Pages per Extent • Highest level of APF &
LIO
Worst • Level I modulated 5 DROP/CREATE CI or "CI"
High percentage of Unused Pages per Extent • APF & LIO scaled back
14
[14]
Unused Space APL DOL Best • Level I modulated 5
Page 7 No Unused space per Page • Fewest I/Os required
Worst • Level I modulated 5 DROP/CREATE CI or "CI"
High percentage of Unused space per Page • More I/Os required
15
[15]
III Page (Heap) DOL Best • Level I modulated 5
No Forwards & Deletes • Fewest I/Os required
Worst • Level I modulated 5 REORG REBUILD or
High percentage of Forwards & Deletes [17]
17 • Additional I/O for Forwards DROP/CREATE "CI"
& Deletes
• Creates Unused Space/Page

4. The DOL Heap (containing the data rows), has no PageChain; all scans must use the OAM method
5. The same Result identified at Level I, modulated to the scope identified by Location/Type (the row).
6. Duplicate rows (Keys) are illegal in Relational Databases.
7. It is a good practice to plan and allocate extra space it the Pages and Extents of the DataStructure that contains the data rows, to allow for interspersed
INSERTs; such planned space is not considered unused. Unused Space is specifically the space consumed that is unplanned or unconscious.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

14 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
7 Determination
This chapter explains how Fragmentation at each Level and Type (explained in the previous chapter), for each type of DataStructure
can be determined accurately, and evaluated. The next three sections provide information specific to each of the three Levels of Fragmentation; the
fourth section identifies issues relating to Partitions.

7.1 Determination I
There are no Sybase facilities for identifying Level I Fragmentation, it requires proprietary code, such as our HelpSpace or PhysicalSpace utility, the
report of which is shown here.

This section is for Customers only

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 15 of 32
Sybase Data Storage & Fragmentation
7.2 Determination II Space
First, we will examine the basic space metrics related to Level II Fragmentation of the Logical DataStructures, summarising the
underlying Physical DataStructures (Partitions) to the logical level. For non-patitioned DataStructures, this is all that is required. A simple query from
sysindexes, which identifies each Logical DataStructure, is required 1 2.

Table
1 DataStructure
2
Table Lck Row Fwd Del Struct IndexName Idx_KB Unused Used_% Data_KB Unused Used_% LGIO SPUT DPCR IPCR DRCR
TestBase_APL APL 2,000,010 Clst UC_SecurityId 508 96 81.1 89,020 124 99.86 99.96 93.74 99.99
NC1 U__Name 75,720 38 99.95 98.92 99.64 81.85
TestBase_APL_Heap APL 80,000 Heap 3,660 100 97.27 99.62 93.63 99.87
NC2 U__Name 3,056 28 99.08 99.08 99.69 81.68
TestBase_APL_Loc APL 2,000,000 Clst C__SecurityId 512 100 80.47 88,968 78 99.91 99.99 93.75 100.00
NC1 U__SecurityId 22,048 22 99.9 99.69 99.90 100.00
TestBase_DPL DPL 2,105,177 0 309 Heap 105,768 3,056 97.11 100.00 94.19
NC1 U__SecurityId 51,672 230 99.55 26.02 5.25 90.63
NC2 UP_Name 133,868 40 99.97 30.74 24.91 92.45
TestBase_DRL DRL 100,000 0 0 Heap 4,896 16 99.67 100.00 94.17
NC1 UP_SecurityId 1,326 16 98.79 100.00 100.00 100.00
NC2 U__Name 3,984 30 99.25 99.65 99.88 0.05

Requested For
Statistic Returns
(DataStructure)

1 Unused Space/Index Clustered Index (B-Tree) 3 Unused pages in the B-Tree portion of the CI
Nonclustered Index Unused pages in the NCI

2 Unused Space/Data Heap Unused pages in the Heap


Clustered Index (Data) 3 Unused pages in the Data portion of the CI

• The RESERVED_PAGES() function returns the number of Pages reserved for the DataStructure. If the partionid is not supplied, all Partitions in
the DataStructure are summarised. Multiplying this value by @@PAGESIZE returns bytes, which can then be divided into kilobytes or megabytes.
• Space for each DataStructure is allocated on an Extent basis (eight Pages); the Extent cannot be used by other DataStructures. Thus it is reserved.
• The value returned is of course, whole Pages.
• The DATA_PAGES() function returns the number of Pages in the DataStructure that contain data. If the partionid is not supplied, all Partitions in
the DataStructure are summarised.
• Subtracting DATA_PAGES() from RESERVED_PAGES() yields unused Pages.
• Dividing them yields the percentage used.

1. For DOL tables, on the physical plane, a Heap DataStructure always exists. Additionally, a separate Placement Index (falsely named "clustered")
DataStructure may exist. Such DataStructures are quite different to the single Clustered Index dataStructure. This is reflected in the catalogue, and is
easily confirmed in any report, such as the example.
2. The information in the example reports, and much more, is provided in our HelpIndex/HelpPartition utilities.
3. The Clustered Index DataStructure has both B-Tree and Data components: the Pages reserved and the Pages used can be obtained for the B-Tree
portion and the Data portion of the Clustered Index, separately.
Intro Unit DataStruct Defn III Determ I AllocUnit I Segment II PageChain II Unused III Page

16 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
7.3 Determination II DerivedStat
Second, we will examine the Derived Statistics provided by Sybase that relate to Level II Fragmentation of the Logical DataStructures,
again summarising the underlying Physical DataStructures (Partitions) to the logical level. A simple query from sysindexes, which identifies each
Logical DataStructure, is required 1 2.
Table DataStructure
3 4 5 6 7
Table Lck Row Fwd Del Struct IndexName Idx_KB Unused Used_% Data_KB Unused Used_% LGIO SPUT DPCR IPCR DRCR
TestBase_APL APL 2,000,010 Clst UC_SecurityId 508 96 81.1 89,020 124 99.86 99.96 93.74 99.99
NC1 U__Name 75,720 38 99.95 98.92 99.64 81.85
TestBase_APL_Heap APL 80,000 Heap 3,660 100 97.27 99.62 93.63 99.87
NC2 U__Name 3,056 28 99.08 99.08 99.69 81.68
TestBase_APL_Loc APL 2,000,000 Clst C__SecurityId 512 100 80.47 88,968 78 99.91 99.99 93.75 100.00
NC1 U__SecurityId 22,048 22 99.9 99.69 99.90 100.00
TestBase_DPL DPL 2,105,177 0 309 Heap 105,768 3,056 97.11 100.00 94.19
NC1 U__SecurityId 51,672 230 99.55 26.02 5.25 90.63
NC2 UP_Name 133,868 40 99.97 30.74 24.91 92.45
TestBase_DRL DRL 100,000 0 0 Heap 4,896 16 99.67 100.00 94.17
NC1 UP_SecurityId 1,326 16 98.79 100.00 100.00 100.00
NC2 U__Name 3,984 30 99.25 99.65 99.88 0.05

Returns
Requested For
Statistic
(DataStructure)
Meaningless & Confusing 4

3 LGIO Large I/O Efficiency Heap Page/Extent/AllocationUnit contiguity of Heap


Clustered Index Page/Extent/AllocationUnit contiguity of CI
Nonclustered Index Page/Extent/AllocationUnit contiguity of NCI

4 SPUT Data Space Utilisation Heap Density of data rows per data page
Clustered Index Density of data rows per data page
Nonclustered Index 5 Does not apply

5 DPCR Data Page Cluster Ratio Heap/APL Density of data per page in the Heap, via PageChain
Heap/DOL 6 Does not apply
Clustered Index Density of data per page in CI order
Nonclustered Index 7 Does not apply
6 IPCR Index Page Cluster Ratio Heap 8 Does not apply
Clustered Index 9 Does not apply
Nonclustered Index Density of index pages in NCI order

7 DRCR Data Row Cluster Ratio Heap 10 Does not apply


Clustered Index 11 Does not apply
Nonclustered Index Density of data rows in NCI order
• The DERIVED_STAT() function returns five statistics for four of the five types of DataStructure 12. Again, if the partitionid is not supplied, all
Partitions in the DataStructure are summarised.
• Unfortunately, DERIVED_STAT() does not operate the way RESERVED_PAGES() and DATA_PAGES() operate: The Clustered Index is treated as a
whole unit, values for the B-Tree and the data cannot be obtained separately.
• Further, instead of returning Null for requests that are not applicable, it returns interesting values or fixed values (0% or 100%), which lead to
confusion 4. The cells for meaningless figures are empty in the example report.
• Note that the function (all five statistics) return fairly exact information, at the row or intra-page level, whereas RESERVED_PAGES() and
DATA_PAGES() returns whole Pages.

4. Display of meaningless figures causes great confusion, and invites comparison with meaningful figures, eg. DPCR for a DOL Heap (fixed 100%,
meaningless) cannot be related to or be compared with DPCR for an APL Heap (meaningful) which can be addressed, in order to achieve close to
100%. Administrative time is wasted in correlating such figures and trying to make sense of them; decisions that may be made on the basis of such
confusion are consequently irrelevant and meaningless. It is therefore better to avoid displaying meaningless figures, and to focus on the meaningful
figures alone.
5. Data Space Utilisation Data is contained in either the Heap or the Clustered Index only, therefore SPUT applies to them alone, the fiigure for the
NCI (always 0%) is meaningless.
6. Data Page Cluster Ratio The DOL Heap does not have a PageChain; data page access is via the OAM only; the figure (always 100%) is
meaningless (space may well be poorly utilised); use LGIO or SPUT instead. It is not comparable with the DPCR of the APL Heap or CI.
7. DPCR is relevant for fetching data pages, which reside in the Heap or the Clustered Index only. It does not apply to the Nonclustered Index, since it is
used to access data rows; data pages are never fetched via that structure. The Nonclustered Index (including the Placement Index) does not support
Range Queries, only the Clustered Index does, and there it does fetch pages.
8. Index Page Cluster Ratio is relevant for fetching index pages; it applies to the Nonclustered Index. There are no index pages in the Heap; the figure
(always 0%) is meaningless; refer to IPCR of the relevant NCI.
9. Index pages in the Clustered Index are not provided separately; the figure (always 0%) is meaningless; use DPCR instead.
10. Data Row Cluster Ratio is relevant for fetching data rows; it applies to the Nonclustered Index, since it is used to fetch data rows. It does not apply
to the Heap since access to it is for pages, via the PageChain (APL) or the OAM (DOL). The figure (always 100%) is meaningless: for APL, use
DPCR instead; otherwise, refer to DRCR of the relevant Nonclustered Index.
11. DRCR does not apply to the Clustered Index. Since the data rows in the Clustered Index are maintained in index order, by definition the DRCR is
100%. The figure is meaningless: for APL, use DPCR instead; for DOL, there is no Clustered Index, refer to DRCR of the relevant Nonclustered Index.
12. The function does not provide statistics for the Text/Image chain.
Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 17 of 32
Sybase Data Storage & Fragmentation
7.4 Determination III
Third, we will examine the Forwarded and Deleted row counts that relate to Level III Fragmentation of the Logical DataStructures,
which occur in DPL/DRL lockschemes only. This applies to the Heap, and is in addition to, not instead of, LGIO and SPUT (which are explained in
[7.3]
7.3 ). Again summarising the underlying Physical DataStructures (Partitions) to the logical level. A simple query from sysindexes, which identifies
each Logical DataStructure, and systabstats.forwrowcnt & delrowcnt is required 1 2.

8 9 3 4
Table DataStructure
Table Lck Row Fwd Del Struct IndexName Idx_KB Unused Used_% Data_KB Unused Used_% LGIO SPUT DPCR IPCR DRCR
TestBase_APL APL 2,000,010 Clst UC_SecurityId 508 96 81.1 89,020 124 99.86 99.96 93.74 99.99
NC1 U__Name 75,720 38 99.95 98.92 99.64 81.85
TestBase_APL_Heap APL 80,000 Heap 3,660 100 97.27 99.62 93.63 99.87
NC2 U__Name 3,056 28 99.08 99.08 99.69 81.68
TestBase_APL_Loc APL 2,000,000 Clst C__SecurityId 512 100 80.47 88,968 78 99.91 99.99 93.75 100.00
NC1 U__SecurityId 22,048 22 99.9 99.69 99.90 100.00
TestBase_DPL DPL 2,105,177 0 309 Heap 105,768 3,056 97.11 100.00 94.19
NC1 U__SecurityId 51,672 230 99.55 26.02 5.25 90.63
NC2 UP_Name 133,868 40 99.97 30.74 24.91 92.45
TestBase_DRL DRL 100,000 0 0 Heap 4,896 16 99.67 100.00 94.17
NC1 UP_SecurityId 1,326 16 98.79 100.00 100.00 100.00
NC2 U__Name 3,984 30 99.25 99.65 99.88 0.05

Requested For
Statistic Returns
(DataStructure)
8 Forward 13 DOL Heap Variable length rows that have been transferred to another location
14 DOL Heap Rows that are marked for deletion
9 Delete

• systabstats contains one row for each Physical DataStructure, which means the columns must be summed to produce a Logical level report.
• Execute sp_flushstats before querying the table.
• Forwards and Deletes apply to the DOL Heap only.
• DOL tables always have a Heap, wherein the row resides. The Heap is Static RowId based. The space allocated for Forwarded rows (which
consume the space of two rows) and Deleted rows (which consumes the space of one row), cannot be re-used for interspersed INSERTS.
• Since Forwards and Deletes do not apply to APL tables (row expansion is performed in-place and deletion is immediate), the relevant cells are
empty in the example report.
• Space can be reclaimed via REORG or DROP/CREATE "CLUSTERED" INDEX (there is no Clustered index for DOL tables, but the syntax is
required).

7.5 Evaluation
a. The three sets of metrics (Unused Space; Derived Statistics; Forwards & Deletes) regarding Fragmentation of a DataStructure must be taken
together; any single metric should not be evaluated alone.
b. Similarly, all the DataStructures that belong to a table should be evaluated together. This should be done in the context of the actual usage: certain
queries require single-row data (via an index); covered queries require access across an entire index; yet others would require table scans.
Knowledge of how the data is accessed, and the DataStructures that are used to support that access, is essential to relevant administration.
c. In addition, the actual speed of the DataStructures belonging to the relevant tables must be monitored: timing records (for either a controlled test or
an actual production sample at certain times of day, ensuring the same configuration and cache settings) must be kept, so that they can be compared
before and after de-fragmentation operations.
• The value of any particular de-fragmentation operation must be confirmed: there is no point in performing operations that do no provide a benefit.
• The length of time between de-fragmentation operations, when speed is regained, and the point where the DataStructure has deteriorated enough
to warrant the operation being repeated, should be recorded. If Level I Fragmentation is addressed, the frequency of such operations is
substantially reduced.
d. Likewise, sysmon reports covering the period of the day should be maintained, or MDA data should be captured at relevant intervals. This is very
important because it will allow you to tune the structures at an overall level (rather than on a DataStructure basis).
• The most important indicator of Fragmentation is that the Asynchronous Pre-Fetch capability that is built into the server, and the Large I/O
resources that have been configured, are not used. Denying these facilities cripples the speed of Sybase.

13. For each Forwarded row, two row 'slots' are consumed: the first for the original location, the address of which is fixed, and cannot be moved; and the
second for the forwarded location, which contains the expanded data row.
14. Deletes are not physically removed from DOL Heaps until REORG is executed.
Intro Unit DataStruct Defn II Determ I AllocUnit I Segment II PageChain II Unused III Page

18 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
7.6 Determination Partition
The above reports view the Logical DataStructures, and that is quite adequate for initial inspection, before further inspection is
warranted. It is the end point for non-partitioned DataStructures. For Partitioned DataStructures 15, the Physical DataStructure must be inspected. The
determination of Level II & III Fragmentation is only slightly more complex, it requires a simple query from syspartitions, which identifies Physical
DataStructures, and systabstats.forwrowcnt & delrowcnt 1 2.

Table DataStructure
8 9 1 Partition
2 3 4 5 6 7
Table Lck Struct IndexName Partition Row Fwd Del Idx_KB Unused Used_% Data_KB Unused Used_% LGIO SPUT DPCR IPCR DRCR
TestBase_APL APL Clst UC_SecurityId [1] 496,821 128 24 81.25 22,126 42 99.81 99.94 93.74 99.98
[2] 496,195 126 24 80.95 22,080 26 99.88 100.00 93.75 100.00
[3] 496,091 128 26 79.69 22,080 30 99.86 100.00 93.74 100.00
[4] 510,903 126 22 82.54 22,734 26 99.89 100.00 93.75 100.00
NC 1 U__Name 75,720 38 99.95 98.92 99.64 81.85
TestBase_APL_Heap APL Heap [1] 20,891 960 30 96.88 100.00 93.60 100.00
[2] 20,245 928 28 96.98 100.00 93.73 100.00
[3] 20,007 910 20 97.80 100.00 93.67 100.00
[4] 18,857 862 22 97.45 100.00 93.54 100.00
NC 2 U__Name 3,056 28 99.08 99.08 99.69 81.68
TestBase_APL_Loc APL Clst C__SecurityId data_1 494,325 128 26 79.69 21,998 28 99.87 100.00 93.75 100.00
data_2 493,200 128 26 79.69 21,934 14 99.94 100.00 93.75 100.00
data_3 493,920 128 26 79.69 21,966 14 99.94 100.00 93.75 100.00
data_4 518,555 128 22 82.81 23,070 22 99.90 100.00 93.75 100.00
NC 1 U__SecurityId 22,048 22 99.90 99.69 99.90 100.00
TestBase_DPL DPL Heap [1] 571,980 0 94 28,748 840 97.08 100.00 94.18
[2] 494,252 0 49 24,998 884 96.46 100.00 94.19
[3] 508,744 0 90 25,540 718 97.19 100.00 94.19
[4] 530,201 0 76 26,482 614 97.68 100.00 94.19
NC 1 U__SecurityId 51,672 230 99.55 26.02 5.25 90.63
NC 2 UP_Name 133,868 40 99.97 30.74 24.91 92.45

• The columns have been re-arranged to clarify the DataStructure hierarchy and to make sense. The various row counts, space usage, and derived
statistics are shown at the Partition (Physical) level, where it is actually located.
• The Heap and the Text/Image Chain are not named. Where the Partition is not explicitly named, an ordinal number is used to identify it (rather than the
default Partition name, which is made up from the long and unusable partitionid).
• This example report lists Partitioned tables. It shows all DataStructures relating to each Partitioned table, in one place, in order to avoid having to
examine two reports.
• TestBase_DRL is not Partitioned, thus it is absent from this report.

Returns
Requested For
Statistic
(Partition)
Meaningless & Confusing 4

1 Unused Space/Index Clustered Index (B-Tree) Unused pages in the B-Tree portion of the CI

2 Unused Space/Data Heap Unused pages in the Heap


Clustered Index (Data) Unused pages in the Data portion of the CI

3 LGIO Large I/O Efficiency Heap Page/Extent/AllocationUnit contiguity of Heap


Clustered Index Page/Extent/AllocationUnit contiguity of CI

4 SPUT Data Space Utilisation Heap Density of data rows per data page
Clustered Index Density of data rows per data page

5 DPCR Data Page Cluster Ratio Heap/APL Density of data per page in the Heap, via PageChain
Heap/DOL 6 Does not apply
Clustered Index Density of data per page in CI order

6 IPCR Index Page Cluster Ratio Heap 8 Does not apply


Clustered Index 9 Does not apply

7 DRCR Data Row Cluster Ratio Heap 10 Does not apply


Clustered Index 11 Does not apply
8 Forward 13 DOL Heap Variable length rows that have been transferred to another location

9 Delete 14 DOL Heap Rows that are marked for deletion

• syspartitions and systabstats each contains one row for each Physical DataStructure (Partition).
• Execute sp_flushstats before querying the tables.
• Only the DataStructure that holds data rows, either the Heap or the Clustered Index, is Partitioned; the Nonclustered Index and the Text/Image
Chain are not Partitioned.

15. Partitioning (if implemented correctly at all resource levels) provides massively increased performance, improved concurrency (if OLTP Standards
are implemented), and substantially reduces maintenance and de-fragmentation windows, because Partitions can be administered individually, or a
needs basis.
Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 19 of 32
Sybase Data Storage & Fragmentation
8 I Allocation Unit
This part of the document identifies Level I Fragmentation: AllocationUnits within the Database (Allocations) and Extents within
AllocationUnits. It is provided in three sections:
• AllocationUnit basics
• Why Drop/Create does not return Asynch Pre-Fetch and Large I/O, and
• Prevention of Level I fragmentation, the use of Segments.

8.1 Fresh
AllocUnit 32 Extents, 256 Pages, 512KB

Extents A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

This shows the result of loading a single DataStructure into an empty AllocationUnit, and creating the Clustered Index, with SORTED_DATA if the CI was
just droppped. The Extents are contiguous within the AllocationUnit; Asynch Pre-Fetch and Large I/O are fully operational. Even if the order was not
sequential, and the PageChain was not linear, these facilities remain fully operational; the Look-Ahead set is not scaled down.

8.2 Fragmented
Extents A

Where Segments are not understood and used, as in most sites, the reality is somewhat different. Since the Extents of up to 32 DataStructures (physical
objects) can be located in an AllocationUnit, and all tables were loaded by concurrent INSERTS, the AllocationUnits each end up with Extents belonging
to 32 different DataStructures. The Extents are fragmented within the AllocationUnits, and the AllocationUnits are fragmented across all Devices.
• Where 128 DataStructures are loaded, they are all fragmented across four AllocationUnits, etc.
• The INSERTS to all tables contend for the few currently active AllocationUnits, creating an AllocationUnit Hotspot.
• Further, the INSERTS to each table contend with its own Nonclustered indices: if a nominal table has 3 Nonclustered indices, that would be 32 tables
with their NCIs, resulting in 128 DataStructures, across four AllocationUnits.
ASE correctly identifies that Asynch Pre-Fetch & Large I/O (multiple Extents, up to an entire AllocationUnit, at Level I) is not worth attempting. In such
circumstances, drop/create Clustered Index, while de-fragmenting the DataStructure within itself (Levels II & III), does nothing to improve the
established fragmentation at the AllocationUnit level (I): once it is set, it is set for life (refer next page), until Segments are used along with fresh
Allocation Units.

8.3 DataStructure Perspective


AU0 A 1 C ObjectAllocMap
▶AU0 ▶A ▶Extent
AU256 A 2 ▶AU256 ▶A ▶Extent
▶AU1024 ▶A ▶Extent
AU512 A 4 N ▶AU512 ▶A ▶Extent
▶AU1280 ▶A ▶Extent
AU768 A 6
▶AU768 ▶A ▶Extent
AU1024 A 6 N ObjectAllocMap
▶AU512 ▶A ▶Extent
AU1280 A 5

This shows the Extents of a typical table, comprising two DataStructures:


• a single Clustered Index (containing data and index Pages) or a DOL Heap (data Pages only) in green,
• and one Nonclustered Index (index Pages only) in blue. (The DOL Placement Index is an ordinary Nonclustered Index, a separate DataStructure, some
distance away from its data Heap, although on the same Segment.)
• The Pages within each DataStructure are some distance apart from each other. Here they cross Allocation Units.
• For DataStructures that have a PageChain, it is disturbed (the numbers show the sequence); it traverses AllocationUnits.
The first page of each AllocationUnit is the AllocationPage. The first page of a DataStructure contains its ObjectAllocationMap, that perspective is on
the right. It is a list of all AllocationUnits that contain the DataStructure. The AllocationUnit is then interrogated via its AllocationPage to find the
Extents that belong to the DataStructure.

An Object (physical term, as in ObjectAllocationMap; and • The web is full of mis-information, and shallow information.
which is unfortunately different to OBJECT_ID(), etc., • Single-vendor sites are censored, and exclude robust discussion of technical
which is a logical term) is a DataStructure, one of: issues related to their offerings; they have their commercial agenda.
• Clustered Index (APL Only) • There is no substitute for actual experience, or for diligently verifying that you
• Heap (DOL: always, APL: only when there is no CI) have actually accomplished what you set out to.
• the DOL Heap and APL Heap are very different • Fragmentation at every level shown here, is easy to identify.
• Nonclustered Index (DOL Placement Index is NCI) • The success, and ease of correction, depends on your skills and understanding
• Text/Image Chain of this information: this is published free to assist you in that regard.
Intro Unit DataStruct Defn II Determ III Determ I Segment II PageChain II Unused III Page

20 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
9 I Drop-Create
9.1 Common De-Fragmentation Issue
This chapter discusses some of the issues relevant to typical de-fragmentation exercises, and the limitations of DROP/CREATE CLUSTERED INDEX.
Many DBAs de-fragment their DataStructures by performing the full complement of the three steps identified here, and puzzled: while the table is
significantly faster, Asynch Pre-Fetch and Large I/O are not reurned. The DataStructure concerned is either a Clustered Index or a DOL Heap, the before
image is illustrated in [ 8.3 ].

9.2 BCP-Out, Drop


AU0 A
AU256 A
AU512 A
AU768 A
AU1024 A
AU1280 A

The data has been bcped-out, and the table has been truncated, or the table is dropped and recreated. As long as Segments are not used to place the
table on different Devices, or separate groups of tables, this sequence applies.

9.3 BCP-In, Create Clustered Index Sorted Data


AU0 A C ObjectAllocMap
AU256 A 1 2 ▶AU0 ▶A ▶Extent
▶AU256 ▶A ▶Extent
AU512 A 3
▶AU512 ▶A ▶Extent
AU768 A 4 ▶AU768 ▶A ▶Extent
AU1024 A 5 ▶AU1024 ▶A ▶Extent
AU1280 A 6 ▶AU1280 ▶A ▶Extent

When the data is bcped-in, it is placed in the available Extents, most likely the recently evacuated ones (assuming unload/load is performed when the
database in not in use). Certainly, the DataStructure is de-fragmented within its own Extents and Pages (Levels II & III). However, if proceeding with
one or a few DataStructures at at time; the Extents de-allocated will be re-used; they were fragmented at Level I before; and they remain so. Asynch Pre-
Fetch & Large I/O (multiple Extents, up to an entire AllocationUnit, at Level I) is still not possible. Although advised by many Sybase identities, this is a
common mistake; at any rate, its effect is temporary, and it needs to be repeated.
If SORTED_DATA is used, which does not re-write the data Pages, the Extents remain in their location.

9.4 Drop, Create Clustered Index


AU0 A C ObjectAllocMap
AU256 A ▶AU5120 ▶A ▶Extent
▶Extent
AU512 A
▶Extent
AU768 A ▶Extent
AU1024 A ▶Extent
AU1280 A ▶Extent

AU5120 A 1 2 3 4 5 6

The distilled requirement, is simply to create the Clustered Index without the SORTED_DATA option; this re-writes the data Pages to a new location.
Which makes the bcp-out/bcp-in unnecessary. However, the original DataStructure space, which is released at the end of the process, will be used for
whichever Clustered Index is created next, as shown in section [ 9.6 ].
bcp-out/bcp-in is effective only when the entire database, or at least a large groups of tables, are de-fragmented together. Otherwise, aa new location
can be specified by creating a new Device and identifying a new Segment on it.

Drop-Create and Sorted_Data


• DROP/CREATE CLUSTERED INDEX in its unqualified form rewrites all data Pages, and the PageChain
(if the structure has one); the operation requires 125% of the space used, at the new location.
• When it is qualified WITH SORTED_DATA, the data Pages are not re-written, which means
fragmentation is not corrected. It is extremely fast, especially in 15.0
• To correct fragmentation without losing the speed, use WITH SORTED_DATA, along with FILLFACTOR
and/or RESERVEPAGEGAP, which forces the data Pages to be rewritten.
DPL/DRL Lockscheme
• The WITH SORTED_DATA qualifier cannot be used with DOL tables, because the order of the data in the
Heap is not maintained (there is no Clustered Index).
• The exception is that fleeting moment immediately following a full REORG or DROP/CREATE
"CLUSTERED" INDEX (using the syntax demanded to address the Placement Index), in which case
the rebuild is not required.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 21 of 32
Sybase Data Storage & Fragmentation
9 I Drop-Create
9.5 Drop, Create Placement Index
AU5120 A P 1 2 3 4 5 6 7 8 P 9 10 P ObjectAllocMap
▶AU5120 ▶A ▶Extent
▶Extent
For DOL tables containing more than a few Extents, even immediately following a careful de-fragmentation
H ObjectAllocMap
exercise (DROP/CREATE "CLUSTERED" INDEX in fresh AllocationUnits), although the Heap is initially ▶AU5120 ▶A ▶Extent
contiguous, since the Heap and Placement Index are two separate DataStructures, except for the first few ▶Extent
Pages, the index and data Pages are substantially removed from each other. ▶Extent
...

9.6 Create Clustered Index/Next Clustered Index


AU0 A 1 C ObjectAllocMap
AU256 A 2 ▶AU0 ▶A ▶Extent
▶AU256 ▶A ▶Extent
AU512 A 3
▶AU512 ▶A ▶Extent
AU768 A 4 ▶AU768 ▶A ▶Extent
AU1024 A 5 ▶AU1024 ▶A ▶Extent
AU1280 A 6 ▶AU1280 ▶A ▶Extent

AU5120 A

The next Clustered Index created takes up the fragmented Extents which were vacated by the previous Clustered Index (green) when it was re-written to
a new location.
There really is no substitute for Segments.

DPL/DRL Lockscheme
• For DOL tables, once the Pages and Extents in the Heap are reasonably
full, unless space is reserved for interspersed INSERTs and row
expansion, it is not possible for rows to be placed "near" each other (as
intended by the Placement Index); logically sequential rows or Pages
could be hundreds of megabytes apart.
• Further, the index Pages in Placement Index and the related data Pages
in the Heap could be hundreds of megabytes apart (while remaining "on
the same Segment", default or otherwise).

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

22 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
10 I Segment
10.1 Normal Growth
Refer to section [ 2.1 ] for introduction to Segments; this chapter discusses the value of Segments in reducing or eliminating Fragmentation.
The use of Segments allows groups of tables to be stored together, and thus separated from competing table groups, on discrete Devices. This shows the
AllocationUnits of:
• 6 Segments Data1 through Data6 (table groups, base colours) used for the Clustered Indices of 18 tables (distinct shades)
• for the purpose of explanation, the Devices may well be named Data1 through Data6 as well
• 2 Segments NC1 and NC2, for all their Nonclustered Indices (an arbitrary 3 Nonclustered Indices A, B, C, per table is shown).
ObjectAllocMap (CI)
Data1 AU0 A 1 1 1 2 3 2 2 4 5 3 3 6 7 4 4 8 9 5 5 10 11 6 6 12 13 7 7 14 15 8 8 O O O ▶AU0

Data2 AU256 A 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 O O O ▶AU256

Data3 AU512 A 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 11 11 O O O ▶AU512

Data4 AU768 A 1 1 1 2 3 2 2 4 5 3 3 6 7 4 4 8 9 5 5 10 11 6 6 12 13 7 7 14 15 8 8 O O O ▶AU768

Data5 AU1024 A 1 1 1 2 3 2 2 4 5 3 3 6 7 4 4 8 9 5 5 10 11 6 6 12 13 7 7 14 15 8 8 O O O ▶AU1024

Data6 AU1280 A 1 1 1 2 3 2 2 4 5 3 3 6 7 4 4 8 9 5 5 10 11 6 6 12 13 7 7 14 15 8 8 O O O ▶AU1280

NC1 AU1792 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

NC2 AU2048 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

Where Segments are not used, all data is placed in the default Segment. Since all Objects are loaded via concurrent INSERTS, the Extents
are fragmented within the AllocationUnits, and the AllocationUnits are fragmented across all Devices. That case, unfortunately quite
common, is illustrated in sections [8] and [9 ]. The illustration abovee shows exactly the same quantity of DataStructures and Extents that are
shown in those sections, the numbers continue to identify Extent number within the DataStructure. The above illustrates the result of all
tables being evenly, and concurrently, INSERTED into.
The use of Segments provide three major advantages:
1. Reduction of fragmentation, due to more Extents belonging to fewer DataStructures being placed on each Allocation Unit
• thus Level I de-fragmentation operations are reduced, if not eliminated.
• Asynch Pre-Fetch & Large I/O (multiple Extents, up to an entire AllocationUnit, at Level I) is now reasonably possible, it is worthy
of consideration to the Optimiser.
2. Substantially increased performance, due to:
• enhanced concurrent INSERT speed, for several reasons, primarily because the:
• the tables required in each transaction are separated from each other, on separate Segments, and
• Nonclustered Indices are separated from their data (Clustered Index or DOL Heap), on separate Segments
• onto many Device queues.
3. The absence of Segments results in a few current Allocation Unit Hotspots, on one (the current) Device, despite many Devices being
available. Such hotspots are eliminated.

10.2 Fragmented ObjectAllocMap (CI)


Data1 AU0 A 1 1 1 3 4 2 3 6 7 3 2 9 10 4 8 12 13 5 4 14 2 6 6 5 8 7 5 11 15 8 7 O O O ▶AU0

Data2 AU256 A 1 1 1 2 2 2 4 3 4 3 4 5 7 5 7 5 6 8 6 7 10 11 8 3 8 9 6 9 10 9 10 O O O ▶AU256

Data3 AU512 A 1 1 1 3 2 2 4 3 4 6 4 5 7 5 7 10 6 8 11 7 11 2 8 3 5 9 6 8 10 9 9 O O O ▶AU512

Data4 AU768 A 1 1 1 3 4 3 2 5 6 4 3 8 10 6 4 12 13 7 5 14 2 2 6 7 9 5 7 11 15 8 8 O O O ▶AU768

Data5 AU1024 A 1 1 1 3 4 3 2 6 7 5 3 8 10 7 4 11 12 2 5 2 5 4 6 9 13 6 7 14 15 8 8 O O O ▶AU1024

Data6 AU1280 A 1 1 1 2 4 2 3 5 7 3 4 8 10 4 7 11 12 5 8 14 15 6 2 3 6 7 5 9 13 8 6 O O O ▶AU1280

NC1 AU1792 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

NC2 AU2048 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

This shows the same group, eventually fragmented at Level I under interspersed INSERT/DELETE activity (UPDATE only causes Row migration or Page
splits when the columns are variable), which would cause PageSplits, etc; the resulting fragmentation is depicted. Where even simple Segment plans are
used, fragmentation can be substantially reduced; where carefully considered Segment Plans are used, Level I de-fragmentation operations can be avoided
altogether. Even though fragmented, Asynch Pre-Fetch & Large I/O are fully enabled (although slightly less efficient than when not fragmented).
Note also that since the AllocationUnits are laid out initially as per [10.1], the structures are essentially immune to becoming fragmented. Therefore what
is shown here is the result of extreme interspersed INSERT/DELETE activity, and over a long period.
The effect of de-fragmenting single tables (ie. at the DataStructure level, as and when required, via DROP/CREATE CLUSTERED INDEX, to correct Level
II fragmentation as illustrated above, without requiring unload/reload, produces [10.1] for the subject DataStructure. Since each new DataStructure takes
up the Extents of the previous DataStructure, and that latter was unfragmented for the most part; the sequence of Extents is corrected. However, that is
not the completely contiguous, as shown next.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 23 of 32
Sybase Data Storage & Fragmentation
10 I Segment
10.3 Fresh ObjectAllocMap (CI)
Data1 AU0 A 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 O O O ▶AU0

Data2 AU256 A 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 O O O ▶AU256

Data3 AU512 A 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 11 1 2 3 4 5 6 7 8 9 10 O O O ▶AU512

Data4 AU768 A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 O O O ▶AU768

Data5 AU1024 A 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 O O O ▶AU1024

Data6 AU1280 A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 O O O ▶AU1280

NC1 AU1792 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

NC2 AU2048 A A B C A B C A B C A B C A B C A B C A B C A B C A B C

The effect of de-fragmenting most or all the tables in each Segment is illustrated here. Of course, Each Segment can be de-fragmented as and when
necessary; all Segments do not need to be de-fragmented at the same time. Where Segments are not used, none of this is possible.

11 Level I Fragmentation Summary


To summarise the types of fragmentation covered in Level I:
• AllocationUnits are fragmented across the Database, preventing Asynch Pre-Fetch & Large I/O (multiple Extents up to an AllocationUnit at Level I).
• Extents are fragmented across the AllocationUnits, preventing Asynch Pre-Fetch & Large I/O (multiple Extents up to an AllocationUnit at Level I).
• Such fragmentation can be greatly reduced by implementing Segments, since it limits the physical range of DataStructures.
Further, Segments increase performance by separating DataStructures that compete or contend with each other.

Segment Limit DPL/DRL Lockscheme


For large databases, the 29 Segment limit poses an obstacle, which must • Placement Indices and Heaps, which are separate DataStructures
be worked around by loading tables in tranches. At the least, when (although on the same Segment), are not explicitly illustrated here; a
Clustered Indices are rebuilt to address Level II Fragmentation, they can single pair is illustrated in [ 9.5 ].
be rebuilt singly, and in place, and without the vulnerability illustrated in • Sites that use such tables generally do not use Segments, and thus all
chapter [ 9 ]. DataStructures in the entire database is fragmented across the single
default Segment. Florists call this "striped", and wonder why it is
slow; engineers call it retarded.

Surrogate Key Mythology


• A monotonically increasing value, such as an IDENTITY column, Based on the naïve belief that The SAN Does Everything, and in
creates myriad problems, which do not occur with true Relational keys. substitution of knowledge or technical examination:
• IDENTITY columns are fine for prototype systems (development). • Myth that fragmentation does not matter when a SAN is used,
Due to the many attendant restrictions they impose on ordinary because the volumes are striped (effectively "fragmented").
maintenance task, they must not be allowed in production systems.
• Myth that Segments are not required where a SAN is used. Reasons
• It creates an INSERT HotSpot on the last Page, and guarantees
always fall apart under questioning; and the proof is in the pudding.
contention.
• The hotspot exists for both APL and DOL tables, with the latter being The SAN, and whatever configuration is implemented, is independent
slightly faster. of ASE, and vice versa. ASE 'sees' the Logical Volume on the SAN as
• A monotonically increasing key is the worst candidate for a Clustered a contiguous disk allocation, and treats it that way in attempting to
index: choose a Key that distributes the data, and therefore eliminates obtain performance out of it (eg. Asynch Pre-Fetch & Large I/O)
the hotspot. Based on the naïve belief that Evangelists Preach the Gospel, while
• Contact the author for alternative, high performance methods. ignoring the fact that Evangelism is a marketing concept, and in
substitution of genuine knowledge and technical examination:
• Myth that the DOL Placement Index (unfortunately addressed via the
"clustered" syntax), is the same as the Clustered Index.
The unconfused technical term for the two separate DataStructures is
Heap and Placement Index

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

24 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
12 II Page Chain
This part of the document identifies Level II Fragmentation: Pages within Extents, and shows the effect for the different
LockSchemes. There are four aspects to this level, presented in seven sections:
• PageChain Fragmentation
• Overflow Pages
• Unused Space (Pages) per Extent, and
• Unused Space per Page.

AllPage Locked DataPage/DataRow Locked

12 Page Chain Fragmentation


PageChains exist for: • There is no PageChain for the DOL Heap (it would defeat the purpose of
• Heap (which exists only when there is no Clustered Index) the RowId based design)
• Clustered Index (all Index levels & Leaf levels, meaning index and • There is a PageChain for the Leaf level of Nonclustered Indices
data Pages, since the Leaf is the data row), as per [ 3.2 ]. (including the Placement Index), as per [ 3.3 ]
• Nonclustered Index (Leaf level only)

12.1 Fresh
Clustered Index Heap & Placement Index
Extent 8 Pages Extent 8 Pages
ObjectAllocMap ObjectAllocMap
E512 1 2 3 4 5 6 7 8 C ▶AU512 ▶A ▶Extent E768 1 2 3 4 5 6 7 8 H ▶AU768 ▶A ▶Extent
▶Extent ▶Extent
E520 9 10 11 12 13 14 15 16 ▶Extent E60 9 10 11 12 13 14 15 16 ▶Extent

E1280 1 2 3 4 P ▶AU1280 ▶A ▶Extent

This illustrates an unfragmented Clustered Index Leaf level PageChain, This shows an unfragmented DOL Heap, the data; it is contiguous because
containing index Leaf plus data. It is contiguous, fresh after loading via it has been freshly re-ordered via DROP/CREATE CLUSTERED INDEX. It
bcp or DROP/CREATE CLUSTERED INDEX. also shows the unfragmented Placement Index.
• Asynch Pre-Fetch & Large I/O (multiple Extents, up to an entire • Although the syntax demands "clustered", it is false; the index is in fact
AllocUnit, at Level II, and multiple Pages) are fully enabled. a Placement Index, which is a Nonclustered Index with two additional
criteria (the data is not clustered with the index); the illustration shows
what initial placement does.

12.2 Fragmented ObjectAllocMap ObjectAllocMap


E512 1 2 4 5 6 8 9 10 C ▶AU512 ▶A ▶Extent E768 1 2 4 5 6 8 9 10 H ▶AU768 ▶A ▶Extent
▶Extent ▶Extent
E520 11 13 14 16 17 19 20 3 ▶Extent E776 11 13 14 16 17 19 20 3 ▶Extent

E528 22 7 23 12 24 15 18 21 E784 22 7 23 12 24 15 18 21

E1280 1 2 3 4 P ▶AU1280 ▶A ▶Extent

• This shows a disturbed PageChain, caused by Page Splits, when full • The Heap is fragmented due to DML activity, and no space being
pages need to be split due to interspersed INSERTS, and no space available in the Page, standard fare for monotonically increasing indices.
being available on the Page. • The sequence is not real, since Pages are not accessed in sequence; it
• This shows Pages out of sequence while remaining in the same merely provides a camparison to that on the left (the real sequence is
AllocationUnit; the I/O penalty is more severe when the out-of- much worse)
sequence Pages are located in other AllocationUnits, as per [ 8.3 ]. • To some extent that does not matter, because there is no PageChain and
Range Queries are not supported. However, the overall access to the
table is slowed, and scans must use the OAM method.
12.3 Effect/Range Query & Table Scan
1 2 • 3 • 4 5 6 • 7 • 8 9 10 11 • 12

13 14 • 15 • 16 17 • 18 • 19 20 • 21 • 22 • 23 • 24

• This shows the sequence in which the Pages must be fetched when • Range Queries are based on a Clustered Index (index Leaf plus data),
traversing the PageChain, eg. for Range Queries and Table Scans, Relational or compound Keys, and require a PageChain; since DOL
and highlights the interrupts involved in the traversal tables cannot have a Clustered Index, the feature is not possible for
• Asynch Pre-Fetch & Large I/O (multiple Extents, up to an entire them.
AllocUnit, at Level II) are prevented. Multiple Pages are hindered. • Traversing the Heap, eg. Table Scans, requires navigation via the
• When traversing the PageChain, 15 reads are required instead of 3. ObjectAllocationMap; to the Allocation Page; to the Extent; to the Page.
• On a busy server, that could be up to 14 interrupts, or context That is much slower than retrieval via a PageChain (or comparable to a
switches, which are to be avoided heavily fragmented PageChain)
• PageChains that are fragmented across AllocationUnits require
more of those to be read, and even more I/O
• If the Pages are aged out of the cache during this time, they must
be read again, etc. (Not illustrated.)

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 25 of 32
Sybase Data Storage & Fragmentation
12 II Page Chain
AllPage Locked DataPage/DataRow Locked

12.4 Effect/Covered Query


A Covered Query pertains to either Clustered Index or Nonclustered indices (including the Placement Index), where the query can be serviced by
reading the Index Leaf level alone, and reading the data Pages is avoided. This is quite different to Range Queries, which applies to index plus data. It
uses the PageChain available at the Leaf level of the Index.
• Refer to [ 3.2] for a definition of the CI, note the PageChain at every level of the B-Tree, and at the Leaf (data) level.
• Refer to [ 3.3] for a definition of NCI or PI and its relation to the data, note the PageChain at the Index Leaf level only.
• Refer to [ 12.2] and [12.3 ] for the effect of fragmentation on a Clustered Index on Table Scans and Range Queries.
We will now contemplate the effect of fragmentation on Nonclustered Indices.

Nonclustered Index Nonclustered or Placement Index


E1280 1 2 3 4 5 6 7 8 N ObjectAllocMap E1280 1 2 3 4 5 6 7 8 N ObjectAllocMap
▶AU1280 ▶A ▶Extent ▶AU1280 ▶A ▶Extent
E1288 9 10 11 12 13 14 15 16 ▶Extent E1288 9 10 11 12 13 14 15 16 ▶Extent

This illustrates an unfragmented Nonclustered Index Leaf level PageChain, containing index Leaf entries. It is contiguous, fresh after DROP/CREATE
NONCLUSTERED INDEX (or "clustered" if it is a Placement Index)
1 2 • 3 • 4 5 6 • 7 • 8 9 10 11 • 12 1 2 • 3 • 4 5 6 • 7 • 8 9 10 11 • 12

13 14 • 15 • 16 17 • 18 • 19 20 • 21 • 22 • 23 • 24 13 14 • 15 • 16 17 • 18 • 19 20 • 21 • 22 • 23 • 24

This illustrates the effect of fragmentation on the PageChain of a Nonclustered index (including PI). It shows the sequence in which the Pages must be
fetched when traversing the PageChain, and highlights the interrupts involved in the traversal
• Asynch Pre-Fetch & Large I/O (multiple Extents, up to an entire AllocUnit, at Level II) are prevented. Multiple Pages are hindered.
• When traversing the PageChain, 15 reads are required instead of 3.
• On a busy server, that could be up to 14 interrupts, or context switches, which are to be avoided
• PageChains that are fragmented across AllocationUnits require more of those to be read, and even more I/O
• If the Pages are aged out of the cache, they must be read again, etc. (Not illustrated.)

Focus
In order to avoid confusion, and to maintain focus, other Levels of
fragmentation are excluded from this Level II discussion. Page level
issues such as the space usage consequences relating to DOL tables are
discussed in Level III Fragmentation. Unused Space within Extents is
discussed in [ 14], Unused Space within Pages is discussed in [ 15 ].

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

26 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
13 II Overflow Page
AllPage Locked DataPage/DataRow Locked

13 Overflow Page
Clustered Index/Duplicate Row Heap & Placement Index/Forward
indid = 1 indid = 4 indid = 0
C ObjectAllocMap P ObjectAllocMap H ObjectAllocMap
▶AU512 ▶A ▶Ext ▶AU1280 ▶A ▶Ext ▶AU768 ▶A ▶Ext

Row Forwarded;
Original RowId
Placement
Clust Unchanged
Index Additional Read
Heap on Every Access
NCI
Additional
Page per
Dupe Key

Forwards at
End of Heap

Overflow pages occur only for a Clustered Index that is non-unique. DOL DataStructures do not have Overflow Pages in the sense that Sybase
For each CI key that is duplicated, an Overflow Page is required, which has not given it a name. However the concept of Forwarded Rows is
contains a chain of duplicate rows, the single original row remaining in identical, and far more frequent (row expansion vs row duplication),
the contiguous CI DataStructure. although the overhead is greater. A technically accurate name, in the
The Clustered Index
Index DataStructure is not designed to allow duplicate context of existing, established names, is Overflow Pages, albeit for
keys. Forwards rather than for Duplicates.
• By definition, in a Relational Database, every row must be unique; A further difference is that the Forwarded row consumes the space of two
APL tables are highly suited to that purpose; and thus it is not an rows, since the original location cannot be used; whereas the APL duplicate
issue in Relational tables consumes one row.
• Record filing systems with IDENTITY or surrogate keys should use Since the NonCLustered Index(including Placement Index) and the Heap
Nonclustered Index
DOL tables, and thus it is again not an issue. are physically separate DataStructures, and row order is not maintained,
• In any case, every CI should be unique; a non-unique CI should be duplicate rows are not an issue: the management of duplicate keys can be
viewed as a serious error, not merely as additional I/O. handled within the index B-tree structure. For such indices, there is one
• For 'queue' or 'pipe' or log tables, a Heap without a CI is best. Where Leaf entry (RowId) for each key, whether duplicated or not; the duplicate
a CI has been chosen (eliminating a Heap), ensure that the CI is rows are merely two Index Leaf entries; two different RowIds.
unique.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 27 of 32
Sybase Data Storage & Fragmentation
14 II Unused Space Extent
For all DataStructures, a few empty slots in each Page (via FILLFACTOR) and a few empty Pages in each Extent (via
RESERVEPAGEGAP) is desirable, to allow for interspersed INSERTS. However, where there are more interspersed DELETES than interspersed INSERTS,
this may be more than is desired. Where there are no interspersed INSERTS, unused space is not required.
The issue relevant to unused space is, whether it was planned or not; and only the latter is a problem. Let us consider unused space that is unplanned.
Here the DataStructure that contains the data rows (Clustered Index for APL or Heap for DOL) is most relevant, and detailed below. Nonclustered
Indices do get fragmented (in the category of unused space), when there are bulk DELETES that are interspersed. However, this is easy and fast to correct
(drop and create the index). In any case, Nonclustered Indices are affected more by disturbed PageChains, than by unused Extents or Pages.

AllPage Locked DataPage/DataRow Locked

14 Unused Space/Extent
Clustered Index Heap & Placement Index
Extent 8 Pages Extent 8 Pages
ObjectAllocMap ObjectAllocMap
E512 1 2 3 4 5 6 7 C ▶AU512 ▶A ▶Extent E768 1 2 3 4 5 6 7 H ▶AU768 ▶A ▶Extent
▶Extent ▶Extent
E520 8 9 10 11 12 13 ▶Extent E776 8 9 10 11 12 13 ▶Extent

E1280 1 2 3 4 5 6 7 N ▶AU1280 ▶A ▶Extent

E1288 8 9 10 11 12 13

Both CI and NCI are shown here, obviously the effect on data Pages, Both the Heap and Placement Index are shown here, obviously the effect
and the correction thereof, is much more serious. The NCI is easy and on data Pages, and the correction thereof, is much more serious.
fast to correct. Correcting the Heap constitutes a demand to drop and create the Placement
Index (unfortunately addressed via the "clustered" syntax), since the PI
defines initial placement of rows in the Heap.

Nonclustered Index Nonclustered or Placement Index


E1280 1 2 3 4 5 6 7 N ObjectAllocMap E1280 1 2 3 4 5 6 7 N ObjectAllocMap
▶AU1280 ▶A ▶Extent ▶AU1280 ▶A ▶Extent
E1288 8 9 10 11 12 13 ▶Extent E1288 8 9 10 11 12 13 ▶Extent

14.1 Effect
• Asynch Pre-Fetch & Large I/O (multiple Pages, up to an entire • Asynch Pre-Fetch & Large I/O (multiple Pages, up to an entire
Extent, at Level II), where Extents are requested, is not hindered. Extent, at Level II), where Extents are requested, is somewhat
The self-modulating Look-Ahead Set is simply scaled down a little, hindered. The self-modulating Look-Ahead Set is scaled down a
unless the ratio of empty Pages is large. little more than in APL.
• This applies when traversing the Clustered Index, eg. for Range • This applies when traversing the relevant Nonclustered Index, eg. for
Queries, Covered Queries and Table Scans, and traversing the Covered Queries.
Nonclustered Index for Covered Queries. • Range Queries are not supported for DOL tables.
• Table Scans use the OAMPage access method.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain III Page

28 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
15 II Unused Space Page
AllPage Locked DataPage/DataRow Locked

15 Unused Space/Page
This illustrates the result of heavy interspersed INSERT/DELETES at the Page level for the Lock Schemes, the rows in the Pages in the same pair of
Extents in [12]
14 above are shown.

E512 1 2 3 4 5 6 7 E768 1 2 3 4 5 6 7

E520 8 9 10 11 12 13 E776 8 9 10 11 12 13

Deleted,
E512 E768
Expanded
E520 E776

E792 Forwards

The Page is kept trim: rows are shifted upon deletion and row Note that even at this level, the forwarded rows (red); forwards (dark
expansion/contraction. pink); and deleted rows (dark grey) are visible, separate from unused space
(light grey). The additional space requirement is obvious. (In order to
avoid confusion, Level III Fragmentation is excluded from this Level II
discussion; it is discussed separately, overleaf.)

15.1 Effect
• Asynch Pre-Fetch & Large I/O (multiple Pages, up to an entire • Asynch Pre-Fetch & Large I/O (multiple Pages, up to an entire Extent, at
Extent, at Level II), where Extents are requested, is not hindered, Level II), where Extents are requested, is somewhat hindered, since the
since the Pages are trimmed. The self-modulating Look-Ahead Set is Pages are not trimmed; DELETED rows are not deleted; and rows are
simply scaled down a little, unless the ratio of Unused Space per page Forwarded. The self-modulating Look-Ahead Set is scaled down a lot
is large. more than in APL.
• This applies when traversing the Clustered Index, eg. for Range • This applies when traversing the relevant Nonclustered Index, eg. for
Queries, Table Scans, and traversing the Nonclustered Index for Covered Queries.
Covered Queries.

16 Level II Summary
To summarise the types of fragmentation covered in Level II:
• PageChains are fragmented across Extents, or worse, across AllocationUnits.
• This prevents Asynch Pre-Fetch & Large I/O (multiple Extents and Pages at Level II).
• Such fragmentation can be greatly reduced at the highest level by implementing Segments, since it limits the physical range of DataStructures.
• It can be reduced at the DataStructure level by reserving space for expected interspersed INSERTS and row expansion. Disk space is cheap.
• Unplanned Unused Space within Extents and within Pages scale down Asynch Pre-Fetch & Large I/O.
• Planned reserved space maintains the speed of the DataStructure. Yes sir, everything in a computer system is a trade-off.
• Level II fragmentation is corrected via DROP/CREATE CLUSTERED INDEX with the appropriate FILLFACTOR.

Elimination of Row Movement


• Variable length rows is the main causes of Deferred Writes,
which are much slower than Direct Writes.
• Row movement within the page, and the consequential PageSplits
(in APL), and Row Forwarding (in DOL) is caused by row size
changes. This can be eliminated by implementing fixed rows.
That means elimination of variable length and Nullable columns.

Reserved Space Reserved Space/DOL


• Space can always be reserved, for any DataStructure (Heap, CI, • Fixed length rows are best, because it eliminates Row Forwarding
NCI) via: entirely. However, if that cannot be achieved, the EXP_ROW_SIZE
RESERVEPAGEGAP: reserve Page(s) per AllocationUnit or Extent should always be set correctly.
FILLFACTOR: reserve space per Page
• use sp_chgattr in order to make the settings permanent

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 29 of 32
Sybase Data Storage & Fragmentation
17 III Page
Level III is a new form of fragmentation (Pages and Rows) that applies to DOL tables only. These pages illustrate the
fragmentation in their DataStructures, as a consequence of normal DML activity, step by step, and compares them with APL. Understanding the different
DataStructures and their relations, is a pre-requisite.

AllPage Locked DataPage/DataRow Locked


Before launching into the detail, the key issue that must be understood, the essential difference between APL vs DOL is:
• APL DataStructures are Clustered Index based based, and • DOL DataStructures are Static Heap based,
based and
• The Clustered Index is kept ordered and trim • The Heap is not ordered, it is not kept trim

17.1 Clustered Index Fresh Heap & Placement Index Fresh


indid = 1 indid = 3 indid = 0
B D ObjectAllocMap N ObjectAllocMap H ObjectAllocMap
▶AU512 ▶A ▶Ext ▶AU1280 ▶A ▶Ext ▶AU768 ▶A ▶Ext
• Shows that fleeting moment after
Leaf Page REORG, when the order in the Heap
Clustered matches the order in the PI
Clust Index is
Index Sparse
B Tree NCI Heap Page Chain
Page Chain B Tree at Leaf
Rows Level Only
at Every
Index Level
No PageChain:
Leaf Level Scans must use
is Data Row OAM method

Leaf Level Leaf Level


B-Tree Entry Data Row B-Tree Entry IndexKey RowId Data Row

17.2 Clustered Index Next Sequential Insert Heap & Placement Index Next Sequential Insert
• The next (new max) value of a • The next (new max) value of a
monotonic or surrogate Key. monotonic or surrogate Key.
Clust
• Such keys are the worst
Index candidate for a Clustered Index
Heap
NCI

New Page
at End
of Page
Chain New Page at
End of Heap

17.3 Clustered Index Interspersed Insert/Space Heap & Placement Index Interspersed Insert/Space
• A random value of a Relational • A random value of a Relational
(composite) Key, where there is (composite) Key, where there is space
Clust
space on the page. The rows on the page. The rows are not
Index remain ordered and distributed. ordered; it is located "near by"
• Such keys are the best Heap
candidates for a Clustered Index NCI

17.4 Clustered Index Interspersed Insert/No Space Heap & Placement Index Interspersed Insert/No Space
Original Page is Split Page • The page does not need to be full; if
Contiguity of Page Chain
the new row causes existing RowIds to
Chain is disturbed
Clust move, a new Page or Extent is used
Index
p.1 p.1 Heap
p.2 NCI

No PageChain
to disturb

p.2
New Page at
PageChain Fragmentation is Level II, shown here for comparison. End of Heap
In terms of the CI, or logically, the split pages appear next to each
other. Physically, the new page is at the end of the structure.

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused

30 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15
Sybase Data Storage & Fragmentation
17 III Page
AllPage Locked DataPage/DataRow Locked

17.5 Clustered Index Interspersed Delete Heap & Placement Index Interspersed Delete
Rows Shifted; • Note the unused space; it cannot be
Pages are
trimmed used for new rows.
Clust
Deletes Marked
Index
but not Removed;
Heap Pages are not
NCI trimmed

sysindexes.indid = 1 indid = 2 indid = 0

17.6 Clustered Index Interspersed Update (Expand) Heap & Placement Index Interspersed Update (Expand)
Rows Shifted; • Note the unused space; it cannot
Pages are
trimmed be used for new rows. Forwards
Clust consume twice the space.
Index
Row Forwarded;
Heap Original RowId
NCI Placement
Unchanged
Additional Read
on Every Access

Forwards at
End of Heap

17.7 Clustered Index No Page Fragmentation Heap & Placement Index Page Fragmentation
Page P4 45 46 47 48 49 RowIds Page P4 45 46 47 48 49 RowIds

P4 45 47 48 49 46 Deleted P4 45 47 48 49 46 Deleted

P4 45 47 48 49 47 Expanded P4 45 48 49 47 Expanded

P93 Available P93 91 47 92 95 47 Forward

Shown here for comparison only. In APL tables there is no


Level III Fragmentation, and the Pages are kept trim.

18 Level III Summary Level III Summary


Page No Page
Chain Chain

Forwarded Rows

Clust Heap
Index
NCI Deleted Rows

Shown here for comparison only, there is no Level III INSERTed


Forwards: Rows at End
Fragmentation in APL tables. PageChain fragmentation is Overflow
Level II. Pages

• No Level III Fragmentation: • Level III Fragmentation:


• Deletes are immediate, there is no dead space • Deleted row positions are not reused: dead space
• Row expansion is in place, there is no Row Forwarding • Row expansion causes Row Forwarding (twice the space usage)
• No REORG required • Regular REORG REBUILD is demanded
• There are therefore two levels of difference between APL and • Substantial additional space requirement
DOL DataStructures, regarding maintenance or performance

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

Derek Asirvadem • V2.5.1 • 12 Sep 15 Copyright © 2012 Software Gems Pty Ltd Sybase Data Storage & Fragmentation • 31 of 32
Sybase Data Storage & Fragmentation
19 Index Type
AllPage Locked DataPage/DataRow Locked
19.1 Heap (When No Clustered Index) Heap (Always)
data_segment data_segment
A • Chronological (INSERT) order A • Chronological (INSERT) order
F F
C C
Z Z
E E
D D

B B

sysindexes.indid = 0 sysindexes.indid = 0

19.2 Heap plus NCI (When No Clustered Index) Heap plus NCI (No Placement Index)
NCI_segment data_segment NCI_segment data_segment
4 R A 4 R A
3 I R F 3 I R F
2 I I R C 2 I I R C
1 I I I R Z 1 I I I R Z
I I I R E I I I R E
I I R D I I R D
I R I R
B B

sysindexes.indid = 2 sysindexes.indid = 0 sysindexes.indid = 2 sysindexes.indid = 0

19.3 Clustered Index Heap & Placement Index


data_segment • Rows are maintained in Clustered data_segment • RowId based, RowIds do not move
4 A Index order 4 R A •  The Placement Index and the Heap
3 I B • The Heap is eliminated 3 I R B remain separate storage (sysindexes)
2 I I C 2 I I R C
• Pages & Extents are trimmed structures, but on the same Segment.
1 I I I D 1 I I I R D
• One less I/O on every access. • Rows are placed in order initially (but
I I I E E
• RowIds may change on I I I R that cannot be maintained under DML
I I F F
interspersed INSERT/DELETE/ I I R activity)
I Z UPDATE (Expand/Shrink) I R Z • Rows are not shifted on INSERT/
DELETE/UPDATE (Expand/Shrink)
sysindexes.indid = 1 sysindexes.indid = 0 & 2

19.4 Clustered Index plus NCI Heap & Placement Index plus NCI
data_segment NCI_segment data_segment NCI_segment
4 A 4 R 4 R 4 R A 4 R 4 R
3 I B 3 I R 3 I R 3 I R B 3 I R 3 I R
2 I I C 2 I I R 2 I I R 2 I I R C 2 I I R 2 I I R
1 I I I D 1 I I I R 1 I I I R 1 I I I R D 1 I I I R 1 I I I R
I I I E E
I I I R I I I R I I I R I I I R I I I R
I I F F
I I R I I R I I R I I R I I R
I Z I R I R I R Z I R I R

sysindexes.indid = 1 sysindexes.indid = 2 & 3 sysindexes.indid = 0 & 2 sysindexes.indid = 3 & 4

19.5 No Level III Fragmentation Level III Fragmentation


data_segment NCI_segment data_segment NCI_segment
4 A   4 R 4 R 4 R A 4 R 4 R
3 I B   3 I R 3 I R 3 I R B 3 I R 3 I R
I I C   C
2 2 I I R 2 I I R 2 I I R 2 I I R 2 I I R
I I C5 D
I 1 I I I R 1 I I I R 1 I I I R E 1 I I I R 1 I I I R
1 I I D  
I I I I R I I I R I I I R F I I I R I I I R
I I E  
I I I R I I R I I R I I R I I R
I I F   Z
I R I R I R C5 I R I R
I Z  

sysindexes.indid = 1 sysindexes.indid = 2 & 3 sysindexes.indid = 0 & 2 sysindexes.indid = 3 & 4

APL Dis/Advantage DPL/DRLDis/Advantage


• Extents and Pages are kept trim, to maintain contiguity • Row Ids do not change: • Rows do not move
• RowIds change if Page is split or row is expanded: • No Page Chain
• NCI entries need to be updated if the RowId in the CI changes • No Range Queries
• Clustered Index & Page Chain allows Range Queries • Becomes heavily fragmented (Level III) over time
• No Level III fragmentation; REORG (offline maintenance) is not required • Expanded rows are forwarded
• Inserted rows placed at end of Heap
Indices are B-Trees: Only DOL tables are afflicted by Level III • Deleted rows are not deleted (only marked for deletion)
4 4 Level, Index Height Fragmentation, which is shown here in summary form: • Regular de-fragmentation via REORG REBUILD (offline) is required
I I Intermediate Level
F Deleted Rows • REORG RECLAIM_SPACE & FORWARDED_ROWS are ineffective in
Full Detail
Z CI Leaf: Data row G Forwarded Rows correcting Level III Fragmentation
R NCI Leaf: RowId C5 Forward

Intro Unit DataStruct Defn II Determ III Determ I AllocUnit I Segment II PageChain II Unused III Page

32 of 32 • Sybase Data Storage & Fragmentation Copyright © 2012 Software Gems Pty Ltd Derek Asirvadem • V2.5.1 • 12 Sep 15

You might also like