0% found this document useful (0 votes)
114 views24 pages

Inside The PBXT Storage Engine Presentation

PrimeBase XT is a pluggable storage engine for MySQL 5.1+ Transactional, ACID compliant (v1.0+) Open source (GPL), community project Designed and built specifically for MySQL.

Uploaded by

yejr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views24 pages

Inside The PBXT Storage Engine Presentation

PrimeBase XT is a pluggable storage engine for MySQL 5.1+ Transactional, ACID compliant (v1.0+) Open source (GPL), community project Designed and built specifically for MySQL.

Uploaded by

yejr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Inside the PrimeBase XT

Storage Engine

MySQL Conference & Expo 2008

Paul McCullagh
PrimeBase Technologies GmbH
www.primebase.org

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Contents
• Design & How it Works
• Applications of the Design: SSD
• Future of PBXT: HA Solutions

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


What is PrimeBase XT?
• A pluggable storage engine for MySQL 5.1+
• Transactional, ACID compliant (v1.0+)
• Open source (GPL), community project
• Designed and built specifically for MySQL
• Developed by PrimeBase Technologies:
http://www.primebase.org
• Hosted by Sourceforge.net:
http://sourceforge.net/projects/pbxt

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Design Principles
• MVCC, all versions are stored on disk.
• Writes sequentially/write once (log-based).
• Never updates in place.
• No undo, non-committed data is garbage
(collected by background threads).
• File-per-table, no table spaces.

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Disk Structure (v1.0.01)
CREATE TABLE test.notes ( dlog-1.xt
n_id INT,
n_name CHAR(35),
data dlog-2.xt Data Log Files
n_when DATETIME,
n_text VARCHAR(500) ....
) ENGINE=PBXT;
location Table locations (paths)
pbxt
system restart-1.xt
Recovery points
restart-2.xt
xlog-85.xt Transaction Log
var

notes-5.xtr Row Index File (with Table ID)


test notes.frm
notes.xtd Handle Data File
notes.xti Index File

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Record Structure
Control Data (14 - 26 bytes):
Status Prev. version Xaction ID Row ID Ext. Data Ref

Handle Data File (.xtd) Data Log File (dlog-n.xt)

Extended record data, variable size


Fixed length record data (set at table creation time)

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Row Structure
Row Index File (.xtr)

Row n-1 Row n Row n+1

Most recent version:

Previous version:

Oldest version:

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Background Threads

Writer
Sweeper
Compactor
Checkpointer

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Writer Thread UPDATE/INSERT

Xaction
Record
Index File Log Data Log
Cache

Row Index &


Handle Data File
Writer Thread

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Sweeper Thread
Index File
Transaction Log
..
.
Row n
Record x

Record x ..

Sweeper Thread

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Compactor Thread
Data Log Files

dlog-24.xt dlog-31.xt

Compactor
Thread
..
.

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Checkpointer Thread
Row Index &
Index File Handle Data File Transaction Log

Checkpointer
1 2 3 Thread
restart-1/2.xt

Restart File

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Recovery
Transaction Log
Recovery Process

Row Index &


Handle Data File Index File

Data Log

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Other Design Innovations
• Index entry recovery
! Indexes need not be flushed on transaction commit.

• Operation ID's
! Modifications normally require simultaneous update of cache
and transaction log
! Writer Thread uses the operations ID to sort changes

• Update clustering
! New records are grouped so that they can be written
together by the Writer

• Update consolidation
! The Writer sorts updates from the log

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Applications of Flexible Design

• Optimization for Solid State Drives


(SSD)
• Future of PBXT: High Availability (HA)
solutions.

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


SSD Performance
SSD HD
Random writes 269/s 175/s
Random writes (in cache) 271/s 427,204/s
Sequential writes 112,961/s 71,975/s
Sequential writes (in cache) 559,003/s 721,709/s
Random reads 6,925/s 225/s
Random reads (from cache) 650,533/s 676,956/s

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Optimizing for SSD/HD
Row Optimized for HD Row Optimized for SSD

Row Index & Handle Data File Data Log


(written mostly randomly) (written only sequentially)

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Combination HD & SSD
UPDATE/INSERT

Transaction Log
on HD
HD

Writer
Thread Compactor
Thread

Data Log
on SSD

SSD

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


How HA will Work?
Master System Slave System
MODIFY MODIFY
HA Ramp Up:
1. MVCC Snapshot transfer
2. Asynchronous replication
MySQL 3. Synchronous replication MySQL
3.1 Real-time feedback
3.2 Log flushing disabled
3.3 Bi-directional replication
PBXT PBXT
1.
3.1
2.
3.

3.3
Slave Thread
Master Thread

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Advantages of this Solution
• Does not require writing or synchronizing with the
binary log.
• Allows for rapid failover.
• Changes cannot be lost.
• Slave is online and can be used for reading or backup.
• Scalable:
! Multiple-reader slaves
! Master-to-master (switch master)
! Bi-directional replication, scalable writes

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Alternative HA Configuration
UPDATE/INSERT
Shared Data Logs on
Alternative FS/HA
GFS/OCFS

Master
System Slave System

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Scaling Writes
SELECT/UPDATE/INSERT/DELETE

………… ………… Server


Nodes

Shared
Data Logs

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


PBXT Road Map
• Q1 2008: • Q3 2008:
! Alpha Version ! Sync. HA - Alpha
! ACID Compliant • Q4 2008:
! Referential Integrity ! Release Candidate
• Q2 2008: • Q1 2009:
! Beta Version
! GA
! Index Consistent Write
! Sync. HA - Beta
! Windows Version

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org


Q&A
Thanks for Listening!

http://www.primebase.org

http://sourceforge.net/projects/pbxt

http://pbxt.blogspot.com

© Copyright 2008 PrimeBase Technologies Paul McCullagh www.primebase.org

You might also like