Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singularity crashes while caching SIF image #6408

Closed
soichih opened this issue Mar 8, 2022 · 1 comment
Closed

Singularity crashes while caching SIF image #6408

soichih opened this issue Mar 8, 2022 · 1 comment

Comments

@soichih
Copy link
Contributor

soichih commented Mar 8, 2022

This is related to #5329

IU HPC systems use GPFS for the home directories (where .local/share/container is created) and luster for everything else. We've been seeing the following error message quite frequently while try to start singularity.

brlife@carbonate(h1):~ $ singularity -d exec -e docker://busybox whoami
DEBUG   [U=1589653,P=27653]persistentPreRunE()           Singularity version: 3.5.2
DEBUG   [U=1589653,P=27653]handleConfDir()               /N/u/brlife/Carbonate/.singularity already exists. Not creating.
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/library
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/net
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/shub
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oras
DEBUG   [U=1589653,P=27653]parseURI()                    Parsing docker://busybox into reference
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
DEBUG   [U=1589653,P=27653]updateCacheSubdir()           Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
INFO    [U=1589653,P=27653]handleOCI()                   Converting OCI blobs to SIF format
DEBUG   [U=1589653,P=27653]newBundle()                   Created temporary directory "/tmp/bundle-temp-602597868" for the bundle
DEBUG   [U=1589653,P=27653]newBundle()                   Created directory "/tmp/rootfs-510567b9-a430-11ea-84de-42f2e9c677b7" for the bundle
DEBUG   [U=1589653,P=27653]ensureGzipComp()              Ensuring gzip compression for mksquashfs
DEBUG   [U=1589653,P=27653]ensureGzipComp()              Gzip compression by default ensured
INFO    [U=1589653,P=27653]Full()                        Starting build...
DEBUG   [U=1589653,P=27653]Get()                         Reference: busybox
Getting image source signatures
panic: page 57 already freed

goroutine 117 [running]:
github.com/etcd-io/bbolt.(*freelist).free(0xc000660400, 0x460, 0x2b4261a46000)
	github.com/etcd-io/[email protected]/freelist.go:175 +0x3d4
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca540, 0xc000538a40, 0x1)
	github.com/etcd-io/[email protected]/node.go:363 +0x206
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca4d0, 0xc000536730, 0x47)
	github.com/etcd-io/[email protected]/node.go:350 +0xba
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514380, 0xc000545800, 0xc0002451d8)
	github.com/etcd-io/[email protected]/bucket.go:568 +0x473
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514340, 0xc000545000, 0xc0002453d0)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514300, 0xc000544e00, 0xc0002455c8)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc0003a27f8, 0x6f1352e0, 0x1ecb360)
	github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Tx).Commit(0xc0003a27e0, 0x0, 0x0)
	github.com/etcd-io/[email protected]/tx.go:160 +0xec
github.com/etcd-io/bbolt.(*DB).Update(0xc0005c4400, 0xc0002218c0, 0x0, 0x0)
	github.com/etcd-io/[email protected]/db.go:701 +0x106
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).update(0xc0006da0c0, 0xc0002218c0, 0x0, 0x0)
	github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:141 +0x147
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).RecordKnownLocation(0xc0006da0c0, 0x15f5e00, 0x1ee9ec8, 0xc000512920, 0x9, 0xc0005360a0, 0x47, 0xc0002687c0, 0x19)
	github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:226 +0xcd
github.com/containers/image/docker.(*dockerImageSource).GetBlob(0xc0004221c0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/containers/[email protected]+incompatible/docker/docker_image_src.go:239 +0x484
github.com/containers/image/copy.(*imageCopier).copyLayer(0xc0003e1dd0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/containers/[email protected]+incompatible/copy/copy.go:690 +0x16b
github.com/containers/image/copy.(*imageCopier).copyLayers.func1(0x0, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, 0xc0005d21c0, 0x31, ...)
	github.com/containers/[email protected]+incompatible/copy/copy.go:486 +0x4a6
created by github.com/containers/image/copy.(*imageCopier).copyLayers.func2
	github.com/containers/[email protected]+incompatible/copy/copy.go:497 +0x204

The output is from singularity 3.5.2. We are currently using 3.7.2 on IU Quartz with a very similar error message.

Normally, we can just remove the boltdb file ~/.local/share/containers/cache/blob-info-cache-v1.boltdb and get singularity working again, but after a few days of operations boltdb gets corrupted again and we have to keep removing this file.

There is already an old open issue reported on bbolt repo etcd-io/bbolt#135 But I am not sure if they are any closer to solving this issue. My theory is that, this issue occurs when multiple jobs start up and tries to cache the same docker image. Maybe bbolt is not designed to handle concurrency (although it said it supports ACID) particularly on GPFS?

I wonder if it's possible to workaround this issue within signularity, or maybe automatically remove the corrupted boltdb file?

@github-actions
Copy link

github-actions bot commented Mar 8, 2022

New issues are no longer accepted in this repository. If singularity --version says singularity-ce, submit instead to https://github.com/sylabs/singularity, otherwise submit to https://github.com/apptainer/apptainer.

@github-actions github-actions bot closed this as completed Mar 8, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant