Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted db file when the vm got turned off because of an overload #103

Open
bharathramh92 opened this issue Jun 14, 2018 · 13 comments
Open

Comments

@bharathramh92
Copy link

bharathramh92 commented Jun 14, 2018

OS: MacOS with RHEL VM

The db file got corrupted when the MAC OS decided to restart by itself and my program was running in RHEL VM. Following is the check output.

$ bolt check tmp.db
page 0: multiple references
page 0: invalid type: unknown<00>
panic: invalid page type: 0: 0

goroutine 5 [running]:
panic(0x4e4120, 0xc420010610)
        /usr/lib/golang/src/runtime/panic.go:500 +0x1a1
github.com/boltdb/bolt.(*Cursor).search(0xc42003eba8, 0x7f50350f20f0, 0xa, 0xa, 0x1bb69)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/cursor.go:256 +0x429
github.com/boltdb/bolt.(*Cursor).seek(0xc42003eba8, 0x7f50350f20f0, 0xa, 0xa, 0x0, 0x0, 0x4f77a0, 0xc42000a3f0, 0x2, 0x2, ...)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/cursor.go:159 +0xb1
github.com/boltdb/bolt.(*Bucket).Bucket(0xc420078018, 0x7f50350f20f0, 0xa, 0xa, 0x0)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/bucket.go:112 +0x108
github.com/boltdb/bolt.(*Tx).checkBucket.func2(0x7f50350f20f0, 0xa, 0xa, 0x7f50350f20fa, 0x66, 0x66, 0x66, 0x0)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:449 +0x70
github.com/boltdb/bolt.(*Bucket).ForEach(0xc420078018, 0xc42003ecc0, 0x0, 0xc42003ecf0)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/bucket.go:390 +0xff
github.com/boltdb/bolt.(*Tx).checkBucket(0xc420078000, 0xc420078018, 0xc42003eea0, 0xc42003eed0, 0xc4200540c0)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:453 +0x135
github.com/boltdb/bolt.(*Tx).check(0xc420078000, 0xc4200540c0)
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:404 +0x5f7
created by github.com/boltdb/bolt.(*Tx).Check
        /opt/pindrop/include/go/src/github.com/boltdb/bolt/tx.go:379 +0x67

Is there a way to fix the db file by any means? I check boltdb/bolt#348 and my version (ee30b748bcfbd74ec1d8439ae8fd4f9123a5c94e) is greater than that .

Note that it didn't happen again when i tried to reproduce again by powering off the virtual machine manually from MAC OS.

@bharathramh92
Copy link
Author

Had anyone else bumped into this issue?

@bharathramh92
Copy link
Author

Can any maintainer check this?

@vtolstov
Copy link

ping?

@bharathramh92
Copy link
Author

is this repo actively maintained?

@dtfinch
Copy link

dtfinch commented Aug 17, 2018

I don't know, but it's still the main fork that I know of. The original fork was archived because it was considered to already be complete and they didn't want to weigh it down with extra features.

Virtual machines are tricky. You didn't say what you ran the VM in, but VirtualBox for example ignores flush requests by default, which Bolt (and every other database) depends on to ensure that writes occur in the correct order. That's not a problem if it's shut down normally, but a forced shutdown outside of the VM software's control can lead to partial, out-of-order writes which lead to corruption.

@liqingsanjin
Copy link

I have the same problem that it happened on Windows XP. I use the repo on release project and it happend yesterday. I didn't run it on the VM and didn't power off the system. I just used the put function to save some info and the bucket can be readed and cannot be writed.

@bharathramh92
Copy link
Author

@dtfinch It was a redhat OS in VM. In that case, how would the accidental power failure case be?

@bharathramh92
Copy link
Author

@liqingsanjin I just got corrupted for no reason?

I just used the put function to save some info and the bucket can be readed and cannot be writed. Can you explain how it was done?

@liqingsanjin
Copy link

liqingsanjin commented Aug 18, 2018

@bharathramh92 Sorry I don't know how it happened. I deploy my program on 600+ computers that operation system are windows 7 and windows XP. It's about a month since I deploy my program. It's no problem until yesterday. From log files of my program, I saw that when my program tried to write a bucket and then it panic an error which is same of yours, but the bucket can be read. I tried to restart the program and windows. It can't be write any more.
Following is my log out:
time="2018-08-17T22:04:16+08:00" level=error msg="invalid page type: 0: 0"

@bharathramh92
Copy link
Author

@liqingsanjin that is so strange. I never had that issue.

@xiusin
Copy link

xiusin commented Sep 15, 2018

I have the same problem.

@tmm1
Copy link
Contributor

tmm1 commented Oct 27, 2022

I saw a similar problem:

invalid page type: 0: 0
  File "go.etcd.io/[email protected]/cursor.go", line 250, in go.etcd.io/bbolt.(*Cursor).search
  File "go.etcd.io/[email protected]/cursor.go", line 159, in go.etcd.io/bbolt.(*Cursor).seek
  File "go.etcd.io/[email protected]/bucket.go", line 105, in go.etcd.io/bbolt.(*Bucket).Bucket
  File "go.etcd.io/[email protected]/tx.go", line 101, in go.etcd.io/bbolt.(*Tx).Bucket

This message comes from:

bbolt/cursor.go

Lines 249 to 250 in 4b8b43e

if p != nil && (p.flags&(branchPageFlag|leafPageFlag)) == 0 {
panic(fmt.Sprintf("invalid page type: %d: %x", p.id, p.flags))

So p.id == 0 and also p.flags == 0. If this is truly page 0, it should have flags = metaPageFlag set, and regardless flags == 0 is not one of the valid values:

bbolt/cmd/bbolt/main.go

Lines 1841 to 1846 in 4b8b43e

const (
branchPageFlag = 0x01
leafPageFlag = 0x02
metaPageFlag = 0x04
freelistPageFlag = 0x10
)

Unfortunately I don't have access to the db file that caused the issue in my case, but if someone else does I would suggest looking at the backup meta on page 2 to see if its correct.

@ahrtr
Copy link
Member

ahrtr commented Jun 1, 2023

The page was somehow reset, in other words, all content in the page are zero values. FYI. #520

@github-actions github-actions bot added the stale label May 11, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024
@ahrtr ahrtr reopened this Jun 1, 2024
@ahrtr ahrtr removed the stale label Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants