Singularity crashes while caching SIF image #6408

soichih · 2022-03-08T22:06:40Z

This is related to #5329

IU HPC systems use GPFS for the home directories (where .local/share/container is created) and luster for everything else. We've been seeing the following error message quite frequently while try to start singularity.

brlife@carbonate(h1):~ $ singularity -d exec -e docker://busybox whoami
DEBUG   [U=1589653,P=27653]persistentPreRunE()           Singularity version: 3.5.2
DEBUG   [U=1589653,P=27653]handleConfDir()               /N/u/brlife/Carbonate/.singularity already exists. Not creating.
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/library
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/net
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/shub
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oras
DEBUG   [U=1589653,P=27653]parseURI()                    Parsing docker://busybox into reference
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
INFO    [U=1589653,P=27653]handleOCI()                   Converting OCI blobs to SIF format
DEBUG   [U=1589653,P=27653]newBundle()                   Created temporary directory "/tmp/bundle-temp-602597868" for the bundle
DEBUG   [U=1589653,P=27653]newBundle()                   Created directory "/tmp/rootfs-510567b9-a430-11ea-84de-42f2e9c677b7" for the bundle
DEBUG   [U=1589653,P=27653]ensureGzipComp()              Ensuring gzip compression for mksquashfs
DEBUG   [U=1589653,P=27653]ensureGzipComp()              Gzip compression by default ensured
INFO    [U=1589653,P=27653]Full()                        Starting build...
DEBUG   [U=1589653,P=27653]Get()                         Reference: busybox
Getting image source signatures
panic: page 57 already freed

goroutine 117 [running]:
github.com/etcd-io/bbolt.(*freelist).free(0xc000660400, 0x460, 0x2b4261a46000)
	github.com/etcd-io/[email protected]/freelist.go:175 +0x3d4
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca540, 0xc000538a40, 0x1)
	github.com/etcd-io/[email protected]/node.go:363 +0x206
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca4d0, 0xc000536730, 0x47)
	github.com/etcd-io/[email protected]/node.go:350 +0xba
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514380, 0xc000545800, 0xc0002451d8)
	github.com/etcd-io/[email protected]/bucket.go:568 +0x473
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514340, 0xc000545000, 0xc0002453d0)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514300, 0xc000544e00, 0xc0002455c8)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc0003a27f8, 0x6f1352e0, 0x1ecb360)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Tx).Commit(0xc0003a27e0, 0x0, 0x0)
	github.com/etcd-io/[email protected]/tx.go:160 +0xec
github.com/etcd-io/bbolt.(*DB).Update(0xc0005c4400, 0xc0002218c0, 0x0, 0x0)
	github.com/etcd-io/[email protected]/db.go:701 +0x106
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).update(0xc0006da0c0, 0xc0002218c0, 0x0, 0x0)
	github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:141 +0x147
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).RecordKnownLocation(0xc0006da0c0, 0x15f5e00, 0x1ee9ec8, 0xc000512920, 0x9, 0xc0005360a0, 0x47, 0xc0002687c0, 0x19)
	github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:226 +0xcd
github.com/containers/image/docker.(*dockerImageSource).GetBlob(0xc0004221c0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/containers/[email protected]+incompatible/docker/docker_image_src.go:239 +0x484
github.com/containers/image/copy.(*imageCopier).copyLayer(0xc0003e1dd0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/containers/[email protected]+incompatible/copy/copy.go:690 +0x16b
github.com/containers/image/copy.(*imageCopier).copyLayers.func1(0x0, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, 0xc0005d21c0, 0x31, ...)
	github.com/containers/[email protected]+incompatible/copy/copy.go:486 +0x4a6
created by github.com/containers/image/copy.(*imageCopier).copyLayers.func2
	github.com/containers/[email protected]+incompatible/copy/copy.go:497 +0x204

The output is from singularity 3.5.2. We are currently using 3.7.2 on IU Quartz with a very similar error message.

Normally, we can just remove the boltdb file ~/.local/share/containers/cache/blob-info-cache-v1.boltdb and get singularity working again, but after a few days of operations boltdb gets corrupted again and we have to keep removing this file.

There is already an old open issue reported on bbolt repo etcd-io/bbolt#135 But I am not sure if they are any closer to solving this issue. My theory is that, this issue occurs when multiple jobs start up and tries to cache the same docker image. Maybe bbolt is not designed to handle concurrency (although it said it supports ACID) particularly on GPFS?

I wonder if it's possible to workaround this issue within signularity, or maybe automatically remove the corrupted boltdb file?

The text was updated successfully, but these errors were encountered:

github-actions · 2022-03-08T22:06:54Z

New issues are no longer accepted in this repository. If singularity --version says singularity-ce, submit instead to https://github.com/sylabs/singularity, otherwise submit to https://github.com/apptainer/apptainer.

github-actions bot closed this as completed Mar 8, 2022

github-actions bot locked and limited conversation to collaborators Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Singularity crashes while caching SIF image #6408

Singularity crashes while caching SIF image #6408

soichih commented Mar 8, 2022

github-actions bot commented Mar 8, 2022

Singularity crashes while caching SIF image #6408

Singularity crashes while caching SIF image #6408

Comments

soichih commented Mar 8, 2022

github-actions bot commented Mar 8, 2022