You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IU HPC systems use GPFS for the home directories (where .local/share/container is created) and luster for everything else. We've been seeing the following error message quite frequently while try to start singularity.
brlife@carbonate(h1):~ $ singularity -d exec -e docker://busybox whoami
DEBUG [U=1589653,P=27653]persistentPreRunE() Singularity version: 3.5.2
DEBUG [U=1589653,P=27653]handleConfDir() /N/u/brlife/Carbonate/.singularity already exists. Not creating.
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/library
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/oci-tmp
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/oci
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/net
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/shub
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/oras
DEBUG [U=1589653,P=27653]parseURI() Parsing docker://busybox into reference
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
DEBUG [U=1589653,P=27653]updateCacheSubdir() Caching directory set to /tmp/soichi/cache/oci-tmp/836945da1f3afe2cfff376d379852bbb82e0237cb2925d53a13f53d6e8a8c48c
INFO [U=1589653,P=27653]handleOCI() Converting OCI blobs to SIF format
DEBUG [U=1589653,P=27653]newBundle() Created temporary directory "/tmp/bundle-temp-602597868" for the bundle
DEBUG [U=1589653,P=27653]newBundle() Created directory "/tmp/rootfs-510567b9-a430-11ea-84de-42f2e9c677b7" for the bundle
DEBUG [U=1589653,P=27653]ensureGzipComp() Ensuring gzip compression for mksquashfs
DEBUG [U=1589653,P=27653]ensureGzipComp() Gzip compression by default ensured
INFO [U=1589653,P=27653]Full() Starting build...
DEBUG [U=1589653,P=27653]Get() Reference: busybox
Getting image source signatures
panic: page 57 already freed
goroutine 117 [running]:
github.com/etcd-io/bbolt.(*freelist).free(0xc000660400, 0x460, 0x2b4261a46000)
github.com/etcd-io/[email protected]/freelist.go:175 +0x3d4
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca540, 0xc000538a40, 0x1)
github.com/etcd-io/[email protected]/node.go:363 +0x206
github.com/etcd-io/bbolt.(*node).spill(0xc0002ca4d0, 0xc000536730, 0x47)
github.com/etcd-io/[email protected]/node.go:350 +0xba
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514380, 0xc000545800, 0xc0002451d8)
github.com/etcd-io/[email protected]/bucket.go:568 +0x473
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514340, 0xc000545000, 0xc0002453d0)
github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc000514300, 0xc000544e00, 0xc0002455c8)
github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Bucket).spill(0xc0003a27f8, 0x6f1352e0, 0x1ecb360)
github.com/etcd-io/[email protected]/bucket.go:535 +0x3d5
github.com/etcd-io/bbolt.(*Tx).Commit(0xc0003a27e0, 0x0, 0x0)
github.com/etcd-io/[email protected]/tx.go:160 +0xec
github.com/etcd-io/bbolt.(*DB).Update(0xc0005c4400, 0xc0002218c0, 0x0, 0x0)
github.com/etcd-io/[email protected]/db.go:701 +0x106
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).update(0xc0006da0c0, 0xc0002218c0, 0x0, 0x0)
github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:141 +0x147
github.com/containers/image/pkg/blobinfocache/boltdb.(*cache).RecordKnownLocation(0xc0006da0c0, 0x15f5e00, 0x1ee9ec8, 0xc000512920, 0x9, 0xc0005360a0, 0x47, 0xc0002687c0, 0x19)
github.com/containers/[email protected]+incompatible/pkg/blobinfocache/boltdb/boltdb.go:226 +0xcd
github.com/containers/image/docker.(*dockerImageSource).GetBlob(0xc0004221c0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
github.com/containers/[email protected]+incompatible/docker/docker_image_src.go:239 +0x484
github.com/containers/image/copy.(*imageCopier).copyLayer(0xc0003e1dd0, 0x15fabc0, 0xc000040098, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, ...)
github.com/containers/[email protected]+incompatible/copy/copy.go:690 +0x16b
github.com/containers/image/copy.(*imageCopier).copyLayers.func1(0x0, 0xc0005360a0, 0x47, 0xb9c0f, 0x0, 0x0, 0x0, 0x0, 0xc0005d21c0, 0x31, ...)
github.com/containers/[email protected]+incompatible/copy/copy.go:486 +0x4a6
created by github.com/containers/image/copy.(*imageCopier).copyLayers.func2
github.com/containers/[email protected]+incompatible/copy/copy.go:497 +0x204
The output is from singularity 3.5.2. We are currently using 3.7.2 on IU Quartz with a very similar error message.
Normally, we can just remove the boltdb file ~/.local/share/containers/cache/blob-info-cache-v1.boltdb and get singularity working again, but after a few days of operations boltdb gets corrupted again and we have to keep removing this file.
There is already an old open issue reported on bbolt repo etcd-io/bbolt#135 But I am not sure if they are any closer to solving this issue. My theory is that, this issue occurs when multiple jobs start up and tries to cache the same docker image. Maybe bbolt is not designed to handle concurrency (although it said it supports ACID) particularly on GPFS?
I wonder if it's possible to workaround this issue within signularity, or maybe automatically remove the corrupted boltdb file?
The text was updated successfully, but these errors were encountered:
This is related to #5329
IU HPC systems use GPFS for the home directories (where .local/share/container is created) and luster for everything else. We've been seeing the following error message quite frequently while try to start singularity.
The output is from singularity 3.5.2. We are currently using 3.7.2 on IU Quartz with a very similar error message.
Normally, we can just remove the boltdb file
~/.local/share/containers/cache/blob-info-cache-v1.boltdb
and get singularity working again, but after a few days of operations boltdb gets corrupted again and we have to keep removing this file.There is already an old open issue reported on bbolt repo etcd-io/bbolt#135 But I am not sure if they are any closer to solving this issue. My theory is that, this issue occurs when multiple jobs start up and tries to cache the same docker image. Maybe bbolt is not designed to handle concurrency (although it said it supports ACID) particularly on GPFS?
I wonder if it's possible to workaround this issue within signularity, or maybe automatically remove the corrupted boltdb file?
The text was updated successfully, but these errors were encountered: