Btrfs - B-Tree File System
Btrfs - B-Tree File System
Linux filesystem for the future Copy on write (COW) Checksums Writable snapshots Supported by Oracle Open source project with developers from many companies Chris Mason (Oracle) primary author Still experimental (missing features)
Back in 2007
Linux needs a better filesystem Ext* are reaching limits (30+ yo format) Reiserfs has issues ZFS had been available for two years Ohad Rodeh (IBM) presents a Btree that is COW friendly. Chris Mason starts working on btrfs using those Btrees (all objects in btrfs use it).
Btrees
Guaranteed log time key-search, insert and remove. Can represent sparse files well. Not well suited for COW (one leaf change results in entire Btree to be rewritten) Rodeh's Btrees do not have the links between leaves so work with COW (similar to B+trees, but more algorithms)
Beginning of BTRFS
Chris Mason started working on it shortly after starting at Oracle in the Linux division. (he previously worked on Reiserfs) Everything is in a Btree Nodes have keys and block headers Leaves have keys and data Code, space and time efficient Accepted into Linux (2009, 2.6.29 kernel)
BTRFS vs ZFS
Many similar features and goals Completely different implementation ZFS is much more mature and ready to use BTRFS needs more development
BTRFS Features
Subvolumes - each filesystem can have independent child filesystems Snapshots - Snapshots are clones of subvolumes Multidevice filesystems - RAID{0,1,10} currently supported - RAID{5,6} are planned Copy on Write
BTRFS Structure
Three on-disk structure types - block headers, keys, items Nodes in tree: only keys and block headers - keys point to items - block headers point to nodes on disk Leaves are items - keys and data
Data Structures
struct btrfs_header { u8 csum[32]; u8 fsid[16]; __le64 blocknr; __le64 flags; u8 chunk_tree_uid[16]; __le64 generation; __le64 owner; __le32 nritems; u8 level; } struct btrfs_disk_key { __le64 objectid; U8 type; __le64 offset; } struct btrfs_item { struct btrfs_disk_key key; __le32 offset; __le32 size; }
Item - each item has a key Key - objectid each object has unique ID similar to an inode number - type inode, dir entry, extent, file data Each extent in Btree also contains back-references to all extents that reference it
B-tree implementation
B-trees store data with 136 bit key - first 64 bits are for an object ID - next 8 bits are for object type - next 64 are for type specific code - objects of the same type are physically adjacent
Commands
mkfs.btrfs
- mkfs.btrfs /dev/sda - mkfs.btrfs -m raid1 -d raid1 /dev/sda /dev/sdb
BTRFS Availability
Still in development - missing features - may lose/corrupt data Several Linux distributions have it as an install option: - Ubuntu, https://help.ubuntu.com/community/btrfs - Red Hat, (tech preview) - Fedora, fedoraproject.org/wiki/Features/F16BtrfsDefaultFs - openSUSE - Debian, http://wiki.debian.org/Btrfs