100% found this document useful (1 vote)

3K views37 pages

Optimizing MySQL Performance With ZFS

ZFS offers many leading-edge features, including automatic protection against silent data corruption, immense capacity, and vastly simplified administration. But how well does it perform with MySQL? This session explores MySQL performance with ZFS compared to alternative file system implementations. The performance implications of ZFS compression and other features are also examined.

Uploaded by

Best Tech Videos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views37 pages

Optimizing MySQL Performance With ZFS

Uploaded by

Best Tech Videos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Optimizing MySQL

performance with ZFS

Neelakanth Nadgir
Allan Packer
Sun Microsystems
Who are we?
Allan Packer
Principal Engineer, Performance
http://blogs.sun.com/allanp

Neelakanth Nadgir
Senior Engineer, Performance
http://blogs.sun.com/realneel

MySQL User Conference 2009

What do we do?
• Work in performance organization at Sun
Microsystems
• Work in the MySQL performance virtual team
• Check out a video about our group
> Google for “mysql optimization lab”

MySQL User Conference 2009

Agenda
• ZFS Introduction
• ZFS Performance features
• ZFS and MySQL
> MySQL IO model
> Best practices
> Performance Results
• ZFS FAQ

MySQL User Conference 2009

ZFS – The last word in Filesystems
• Developed at Sun circa 2004
• Opensource (CDDL)
> 47 patents have been donated to CDDL Patents Common
• Supported Platforms
> Officially supported on Solaris
> Default filesystem on OpenSolaris
> Read-Only access in MacOS 10.5
> Experimental feature on FreeBSD 7.1
> FUSE on Linux

MySQL User Conference 2009

ZFS – Core Features
• Data Integrity
> Everything is checksummed
• Immense capacity
> 128 bit filesystem
> Max size of a file is 264 bytes
> Each directory can hold 248 files
• Simple administration
> zpool & zfs are the only two commands you need to know
• Performance

MySQL User Conference 2009

ZFS – Design Principals
• Pooled Storage
> Common pool from which filesystems are allocated
• End-to-end data integrity
> Historically thought to be expensive, not really
> Alternative is unacceptable
• Everything is transactional
> Always consistent on disk
> Removes almost all constraints on IO order
> Think database transactions

MySQL User Conference 2009

ZFS – Design Principals
• Copy on Write
> Never overwrite live data
> On-disk state is always consistent.
– No fsck
• Entire storage pool is a tree of blocks rooted at
"uberblock"
> Transactions are COW of the tree
> Transaction group is committed when uberblock is
rewritten to point to new tree
> All levels of the tree are checksummed
> Checksum stored in parent node, separate from data
MySQL User Conference 2009
Copy-On-Write Transactions
1. Initial block tree 2. COW some blocks

3. COW indirect 4. Rewrite uberblock (atomic)

blocks

MySQL User Conference 2009

Constant-Time Snapshots

• At end of TX group, don't free COWed blocks

> Actually cheaper to take a snapshot than not!

Snapshot Current
uberblock uberblock

MySQL User Conference 2009

End-to-End Checksums
Disk Block Checksums ZFS Checksum Trees
• Checksum stored with data block
• Checksum stored in parent block pointer
• Any self-consistent block will pass
• Fault isolation between data and checksum
• Can't even detect stray writes
• Inherent FS/volume interface • Entire pool (block tree) is self-validating
Address Address
limitation • Enabling technology: ChecksumChecksum
ZFS stack integration
Data Data Address Address
Checksum Checksum ChecksumChecksum

Data Data

Only validates the media Validates the entire I/O path

✔ Bit rot ✔ Bit rot
✗ Phantom writes ✔ Phantom writes
✗ Misdirected reads and writes ✔ Misdirected reads and writes
✗ DMA parity errors ✔ DMA parity errors
✗ Driver bugs ✔ Driver bugs
✗ Accidental overwrite ✔
MySQL User Conference 2009
Accidental overwrite
Traditional Mirroring
1. Application issues a 2. Volume manager 3. Filesystem returns bad
read. Mirror reads the first passes bad block up to data to the application. If
disk, which has a corrupt filesystem. the data is modified, both
block. If it's a metadata block, the good & bad mirror copies
It can't tell. filesystem panics. If not... will then be corrupted.

Application Application Application

FS FS FS

xxVM mirror xxVM mirror xxVM mirror

MySQL User Conference 2009

Self-Healing Data in ZFS
1. Application issues a 2. ZFS tries the second 3. ZFS returns good data
read. ZFS mirror tries the disk. Checksum indicates to the application and
first disk. Checksum that the block is good.
reveals that the block is repairs the damaged block.
corrupt on disk.

Application Application Application

ZFS mirror ZFS mirror ZFS mirror

MySQL User Conference 2009

ZFS Performance features
• Dynamic striping across all devices maximizes
throughput
• Copy on write makes most writes sequential
• Intelligent prefetch
• Multiple block sizes
• O(1) directory operations
• Explicit IO priority with deadline scheduling
• Globally optimal IO sorting and aggregation
• Concurrent writes
• Safely use write cache
MySQL User Conference 2009
ZFS read(2) code path MySQL
• ZFS Primary cache – The ARC 1
> primarycache=[all|metadata|none]
ARC
• ZFS Second level cache -
2
L2ARC
L2ARC
• When primarycache is used, 3
data is buffered in ARC
• If L2ARC is used, it is checked
before going to disk
• Prefetch is trigged if needed
• Reads have higher priority than
regular writes MySQL User Conference 2009
ZFS write(2) code path
• Regular writes are buffered in memory
• Periodically they are flushed to disk
> Usually a sequential write to disk
• Synchronous writes are written to the ZFS Intent
Log
> After periodic write, the ZIL is cleaned up
> ZIL aggregates IO from multiple writers
> Can use a separate disk (or SSD) for the ZIL
> ZIL can be disabled (Don't)
• ZFS employs byte-range locking to allow maximum
concurrency. i.e No Single Writer Lock
MySQL User Conference 2009
ZFS ARC (Filesystem buffer)
• Adaptive Replacement Cache
• Dynamically switches between MRU/MFU
• Caches data from all pools
• Dynamically shrinks or grows based on memory
pressure
• Survives full table scan
• Limitations
> Works better with 64bit kernel
> Works better with swap configured

MySQL User Conference 2009

MySQL IO Model
• Dependent on Storage engine
• Dependent on Workload
• Replication
> One thread reading and applying the binlog to the
datafiles (sequential reads, random writes)
> One thread updating the binlog (sequential writes)
• MyISAM
> Relies on filesystem to buffer data
> Index is buffered in the key cache

MySQL User Conference 2009

InnoDB IO Model
• InnoDB
> Reads are issued by user connection threads (N)
> Writes are done by asynchronous threads
– 1 for log and 1 for data files
– Configurable with Performance version
> Writes are either
– Synchronous writes
– Writes followed by a fsync()
> Doublewrite buffer

MySQL User Conference 2009

MySQL and ZFS Best practices

MySQL User Conference 2009

Best practices - Caching
• Prefer to cache inside MySQL/Innodb rather than
ARC
> Benchmark shows 7-200% improvement
> Same block is buffered inside Innodb as well as ARC
• Limit ARC Size
> Even though ARC is dynamic, more efficient to just limit it
• Cache only metadata for Innodb
> zfs set primarycache=metadata tank/db

MySQL User Conference 2009

Best practices – Record size
• Match recordsize to block size
> zfs set recordsize=16k tank/db
> Can be changed dynamically, but do this before creating
the database
• Prevents read-modify-write
• Read only data that you want and nothing more
• Innodb
> 16k recordsize for data
> 128k recordsize for log and binlog

MySQL User Conference 2009

Best practices – Prefetch
• ZFS has two kinds of prefetch
> File level prefetch AKA zfetch
> Low level prefetch AKA vdev prefetch
• Turn off file level prefetch
> set zfs:zfs_prefetch_disable = 1

• Low level prefetch is not trigged when recordsize is

set (i.e not 128k)
• Innodb prefetch assumes file is laid out in order of
primary key.
> Not true for ZFS
> Not configurable right now, but should be easy to fix
MySQL User Conference 2009
Best practices – IO
• ZFS IO scheduler prioritizes reads over regular
writes
> ZFS Log writes are still higher priority
> If IO queue is full, have to wait for empty slot
> Bug 6471212: will be fixed soon using reserved slots
• Prefer Raid0 or Mirroring over RaidZ
> RaidZ is not suitable for random IO
• Use L2ARC to reduce penalty of missing buffer
cache
> zpool add tank cache c2t0d0 c2t1d0

MySQL User Conference 2009

Best practices – Separate Intent Log
• ZFS log writes can use the Separate Intent log
> Usually NVRAM card or SSD
> Can be done dynamically.
> Match reliability of the pool
> Seen 10-20% improvement for certain workloads
• Use slog to get low latency writes
> zpool add tank log c2t0d0
> Watch out – Cannot remove a slog. Fix in progress

MySQL User Conference 2009

Best practices – Cache flush
• ZFS issues a cache flush after every transaction
group sync and synchronous writes
• Some vendors flush every time even if they have a
battery backed cache
> set zfs:zfs_nocacheflush = 1
• Be fair when you are comparing ZFS with other
filesystems which do not flush caches.

MySQL User Conference 2009

Best practices – Compression
• ZFS supports a pluggable compression
> Gzip and other algorithms
> Data is not compressed if less than 12.5% compression
• Scalable, asynchronous compression
> No need for query to wait for compression to complete
• CPU cost
> Compression is not free, but many algorithms to choose
from
• IO reduction
> CPU cost sometimes offset by IO reduction
MySQL User Conference 2009
Best practices – Innodb
• Follow general Innodb IO tunings
• Innodb provides checksum and compression
> So does ZFS
> But ZFS “self heals” instead of crashing :-)
• Disable innodb_doublewritebuffer to remove
redundant writes
> skip-innodb_doublewrite

MySQL User Conference 2009

Best practices – Backup/Restore
• ZFS snapshots are very cheap
• Need to quiesce the database before snapshot
> Flush tables with read lock
> Get a snapshot
> Unlock tables
• Zmanda Recovery Manager supports backup and
recovery using ZFS.
• ZFS clone is a read-write snapshot
> Useful in replication in a shared storage scenario

MySQL User Conference 2009

Applying best practices

MySQL User Conference 2009

ZFS COW penalty for table scans
• Since ZFS is Copy-on-write, Sequential scans will
be slower when compared to 'in-place' modification
• We ran sysbench read write for a week to study this
impact
> One hour sysbench read-write tes
> Followed by select count(*) from sbtest
> Repeat

MySQL User Conference 2009

ZFS COW penalty for table scans

Around 25% penalty after few hours

MySQL User Conference 2009
ZFS COW penalty for table scans
• Copying file over reorders the file in the optimal way.
• Idea for 'in-place' editing for db workloads
> Bug#6699230
• SSDs nullify this penalty

MySQL User Conference 2009

ZFS FAQ (for database people)
• ZFS needs more RAM – Not true
> Need one byte to cache one byte
> Metadata is usually 1% (and can be compressed)
• ZFS needs more CPU – Somewhat true
> Feature set provided is much stronger than for other FS
> Performance bugs are actively being fixed
> Checksumming is not free
> Some people want more CPU-hungry features (ex gzip)
> Industry benchmarks that run at 100% CPU utilization
will be at an disadvantage; however most customers
rarely run at 100% utilization
MySQL User Conference 2009
ZFS FAQ - DirectIO
• ZFS and Directio
> DirectIO is an overloaded term to mean several things
> No double caching – ZFS primarycache property
> Concurrent writes – ZFS Range locks
> Direct copy to application buffer – Not supported in ZFS
> No inflated IO – ZFS supports multiple recordsizes
> ZFS does not yet support the directio() hint.

MySQL User Conference 2009

More information
• ZFS Best Practices Guide
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

• ZFS Evil Tuning Guide

> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

• Blogs
> http://blogs.sun.com/realneel
> http://blogs.sun.com/roch
• Mail to [email protected]

MySQL User Conference 2009

Questions?
[email protected]

[email protected]
MySQL User Conference 2009

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Solution Assigment Chapter 5
No ratings yet
Solution Assigment Chapter 5
11 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Welcomes You To ISO 9001: 2015 Awareness Training Programme
100% (2)
Welcomes You To ISO 9001: 2015 Awareness Training Programme
184 pages
RIPMWC Round 2 Sample Questions 2019
100% (3)
RIPMWC Round 2 Sample Questions 2019
2 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
CAO 7-2022 Reference Materials
No ratings yet
CAO 7-2022 Reference Materials
1 page
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
No ratings yet
06 Intro ERP Using GBI Case Study PP (Letter) en v2.11 PDF
41 pages
Waste Hierarchy
No ratings yet
Waste Hierarchy
4 pages
I Need You To Survive - DB
No ratings yet
I Need You To Survive - DB
9 pages
223 Dak 17 DRG Cul Misc GW Typ 01
No ratings yet
223 Dak 17 DRG Cul Misc GW Typ 01
2 pages
"The Electoral Reforms Law of 1987" Sec. 27. Election Offenses. - in Addition To The Prohibited Acts and Election Offenses Enumerated in
100% (1)
"The Electoral Reforms Law of 1987" Sec. 27. Election Offenses. - in Addition To The Prohibited Acts and Election Offenses Enumerated in
24 pages
Peta1 Q1
No ratings yet
Peta1 Q1
2 pages
Developing Models of Managerial Competencies of Managers: A Review
No ratings yet
Developing Models of Managerial Competencies of Managers: A Review
15 pages
Michael's Resume 2024
No ratings yet
Michael's Resume 2024
3 pages
(MDS-G6) PMS
No ratings yet
(MDS-G6) PMS
22 pages
Test 1A DF
No ratings yet
Test 1A DF
11 pages
Audit Course 8 Report
No ratings yet
Audit Course 8 Report
15 pages
LFAR 1 - LFAR Format
No ratings yet
LFAR 1 - LFAR Format
18 pages
BSD Junction Blok A No 3, JL Pahlawan Seribu, BSD City, Tangerang Selatan PH: (021) 3032 1716 / 081 689 5500 / Cs@royalgardenspa - Co.id
No ratings yet
BSD Junction Blok A No 3, JL Pahlawan Seribu, BSD City, Tangerang Selatan PH: (021) 3032 1716 / 081 689 5500 / Cs@royalgardenspa - Co.id
26 pages
Spare Parts List: Forward and Reversible Plate
No ratings yet
Spare Parts List: Forward and Reversible Plate
44 pages
Certificate of Analysis: Product: ACCESS Prolactin Calibrators
No ratings yet
Certificate of Analysis: Product: ACCESS Prolactin Calibrators
1 page
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
No ratings yet
BRKCRS 3147 Advanced Troubleshooting of The ASR1K and ASR4400 Made Easy 2014 Milan 90 Mins PDF
92 pages
Fisheries Code
No ratings yet
Fisheries Code
33 pages
Zenit Mataplast P.Ltd. vs. State of Maharashtra & Ors PDF
No ratings yet
Zenit Mataplast P.Ltd. vs. State of Maharashtra & Ors PDF
3 pages
Experiment 8 Fuentes Mark
No ratings yet
Experiment 8 Fuentes Mark
29 pages
Business Model Canvas
No ratings yet
Business Model Canvas
3 pages
Chapter 2 Architectural Models
No ratings yet
Chapter 2 Architectural Models
44 pages
FVC Labor Union-Ptgwo vs. Sanama-Fvc-Siglo
100% (1)
FVC Labor Union-Ptgwo vs. Sanama-Fvc-Siglo
3 pages
PR Electronics 5715v104 - Uk
No ratings yet
PR Electronics 5715v104 - Uk
25 pages
3 RD Sem Results
No ratings yet
3 RD Sem Results
2 pages
GT Operating and Maintenance Manual v943 - 240416 - 184428
No ratings yet
GT Operating and Maintenance Manual v943 - 240416 - 184428
765 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages

Optimizing MySQL Performance With ZFS

Uploaded by

Optimizing MySQL Performance With ZFS

Uploaded by

Optimizing MySQL

performance with ZFS

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

3. COW indirect 4. Rewrite uberblock (atomic)

MySQL User Conference 2009

• At end of TX group, don't free COWed blocks

MySQL User Conference 2009

Only validates the media Validates the entire I/O path

Application Application Application

xxVM mirror xxVM mirror xxVM mirror

MySQL User Conference 2009

Application Application Application

ZFS mirror ZFS mirror ZFS mirror

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

• Low level prefetch is not trigged when recordsize is

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

MySQL User Conference 2009

Around 25% penalty after few hours

MySQL User Conference 2009

MySQL User Conference 2009

• ZFS Evil Tuning Guide

MySQL User Conference 2009

You might also like