UNLIMITED
Episode 240: TCP Blackbox Recording | BSD Now 240: New ZFS features landing in FreeBSD, MAP_STACK for OpenBSD, how to write safer C code with Clang’s address sanitizer, Michael W. Lucas on sponsor gifts, TCP blackbox recorder, and Dell disk system hacking. by BSD Nowratings:
Length:
80 minutes
Released:
Jul 18, 2018
Format:
Podcast episode
Description
What ZFS blockpointers are, zero-day rewards offered, KDE on FreeBSD status, new FreeBSD core team, NetBSD WiFi refresh, poor man’s CI, and the power of Ctrl+T.
##Headlines
What ZFS block pointers are and what’s in them
I’ve mentioned ZFS block pointers in the past; for example, when I wrote about some details of ZFS DVAs, I said that DVAs are embedded in block pointers. But I’ve never really looked carefully at what is in block pointers and what that means and implies for ZFS.
The very simple way to describe a ZFS block pointer is that it’s what ZFS uses in places where other filesystems would simply put a block number. Just like block numbers but unlike things like ZFS dnodes, a block pointer isn’t a separate on-disk entity; instead it’s an on disk data format and an in memory structure that shows up in other things. To quote from the (draft and old) ZFS on-disk specification (PDF):
A block pointer (blkptr_t) is a 128 byte ZFS structure used to physically locate, verify, and describe blocks of data on disk.
Block pointers are embedded in any ZFS on disk structure that points directly to other disk blocks, both for data and metadata. For instance, the dnode for a file contains block pointers that refer to either its data blocks (if it’s small enough) or indirect blocks, as I saw in this entry. However, as I discovered when I paid attention, most things in ZFS only point to dnodes indirectly, by giving their object number (either in a ZFS filesystem or in pool-wide metadata).
So what’s in a block pointer itself? You can find the technical details for modern ZFS in spa.h, so I’m going to give a sort of summary. A regular block pointer contains:
various metadata and flags about what the block pointer is for and what parts of it mean, including what type of object it points to.
Up to three DVAs that say where to actually find the data on disk. There can be more than one DVA because you may have set the copies property to 2 or 3, or this may be metadata (which normally has two copies and may have more for sufficiently important metadata).
The logical size (size before compression) and ‘physical’ size (the nominal size after compression) of the disk block. The physical size can do odd things and is not necessarily the asize (allocated size) for the DVA(s).
The txgs that the block was born in, both logically and physically (the physical txg is apparently for dva[0]). The physical txg was added with ZFS deduplication but apparently also shows up in vdev removal.
The checksum of the data the block pointer describes. This checksum implicitly covers the entire logical size of the data, and as a result you must read all of the data in order to verify it. This can be an issue on raidz vdevs or if the block had to use gang blocks.
Just like basically everything else in ZFS, block pointers don’t have an explicit checksum of their contents. Instead they’re implicitly covered by the checksum of whatever they’re embedded in; the block pointers in a dnode are covered by the overall checksum of the dnode, for example. Block pointers must include a checksum for the data they point to because such data is ‘out of line’ for the containing object.
(The block pointers in a dnode don’t necessarily point straight to data. If there’s more than a bit of data in whatever the dnode covers, the dnode’s block pointers will instead point to some level of indirect block, which itself has some number of block pointers.)
There is a special type of block pointer called an embedded block pointer. Embedded block pointers directly contain up to 112 bytes of data; apart from the data, they contain only the metadata fields and a logical birth txg. As with conventional block pointers, this data is implicitly covered by the checksum of the containing object.
Since block pointers directly contain the address of things on disk (in the form of DVAs), they have to change any time that address changes, which means any time ZFS does its copy on write thing.
##Headlines
What ZFS block pointers are and what’s in them
I’ve mentioned ZFS block pointers in the past; for example, when I wrote about some details of ZFS DVAs, I said that DVAs are embedded in block pointers. But I’ve never really looked carefully at what is in block pointers and what that means and implies for ZFS.
The very simple way to describe a ZFS block pointer is that it’s what ZFS uses in places where other filesystems would simply put a block number. Just like block numbers but unlike things like ZFS dnodes, a block pointer isn’t a separate on-disk entity; instead it’s an on disk data format and an in memory structure that shows up in other things. To quote from the (draft and old) ZFS on-disk specification (PDF):
A block pointer (blkptr_t) is a 128 byte ZFS structure used to physically locate, verify, and describe blocks of data on disk.
Block pointers are embedded in any ZFS on disk structure that points directly to other disk blocks, both for data and metadata. For instance, the dnode for a file contains block pointers that refer to either its data blocks (if it’s small enough) or indirect blocks, as I saw in this entry. However, as I discovered when I paid attention, most things in ZFS only point to dnodes indirectly, by giving their object number (either in a ZFS filesystem or in pool-wide metadata).
So what’s in a block pointer itself? You can find the technical details for modern ZFS in spa.h, so I’m going to give a sort of summary. A regular block pointer contains:
various metadata and flags about what the block pointer is for and what parts of it mean, including what type of object it points to.
Up to three DVAs that say where to actually find the data on disk. There can be more than one DVA because you may have set the copies property to 2 or 3, or this may be metadata (which normally has two copies and may have more for sufficiently important metadata).
The logical size (size before compression) and ‘physical’ size (the nominal size after compression) of the disk block. The physical size can do odd things and is not necessarily the asize (allocated size) for the DVA(s).
The txgs that the block was born in, both logically and physically (the physical txg is apparently for dva[0]). The physical txg was added with ZFS deduplication but apparently also shows up in vdev removal.
The checksum of the data the block pointer describes. This checksum implicitly covers the entire logical size of the data, and as a result you must read all of the data in order to verify it. This can be an issue on raidz vdevs or if the block had to use gang blocks.
Just like basically everything else in ZFS, block pointers don’t have an explicit checksum of their contents. Instead they’re implicitly covered by the checksum of whatever they’re embedded in; the block pointers in a dnode are covered by the overall checksum of the dnode, for example. Block pointers must include a checksum for the data they point to because such data is ‘out of line’ for the containing object.
(The block pointers in a dnode don’t necessarily point straight to data. If there’s more than a bit of data in whatever the dnode covers, the dnode’s block pointers will instead point to some level of indirect block, which itself has some number of block pointers.)
There is a special type of block pointer called an embedded block pointer. Embedded block pointers directly contain up to 112 bytes of data; apart from the data, they contain only the metadata fields and a logical birth txg. As with conventional block pointers, this data is implicitly covered by the checksum of the containing object.
Since block pointers directly contain the address of things on disk (in the form of DVAs), they have to change any time that address changes, which means any time ZFS does its copy on write thing.
Released:
Jul 18, 2018
Format:
Podcast episode
Titles in the series (100)
- 99 min listen