On Mon, Aug 30, 2004 at 12:34:22AM -0700, Ash wrote:
> Hi
>
> I was running a kind of crash test on an XFS
> filesystem to check recovery/corruptions from unclean
> shutdowns.
> ...
> logged the dmesg outputs for each reboot cycle
> and all of them showed that XFS recovery did not face
> any problems. The message seen in each dmesg log was
>
> Starting XFS recovery on filesystem: cciss/c0d0p8
> (dev: cciss/c0d0p8)
> Ending XFS recovery on filesystem: cciss/c0d0p8 (dev:
> cciss/c0d0p8
Hmm, we should fix that dup'd device name (looks like
you're running a debug version of XFS here...?)
> Here, in the "rm -rf" command for one of the
> directories, I noticed a hang.
A kdb backtrace at this point would have been useful (in
case you see it again).
> After sometime of inactivity, I rebooted the system (a
> clean reboot) and noticed
> that XFS recovery failed. The relevant sections of the
> boot messages are attached in xfs_bootup_failure.txt
>
> Next, I tried xfs_check. It basically printed a lot of
> "block 12/232064 type unknown not expected" messages
> and stopped responding too. I noticed a defunct xfs_db
> process on the system at this point.
That would be due to not yet running log recovery. More
recent versions of xfs_check now act like repair, and wont
run on a filesystem with a dirty log.
> xfs_repair with -L also results in a hang after this
> point.
>
> Any ideas whats going wrong ?
> Basically, its looking like my filesystem is
> inaccessible now.
> I am unable to mount it or run any repair on it.
If you can't even repair, looks like the device has got
into a funny state (repair talks directly to the device).
I'd reboot to try clear that up, then run repair with -L
again see if that resolves it.
If repair still hangs, kdb will be of use - get a backtrace
on the hung repair process.
> Unable to handle kernel NULL pointer dereference at virtual address 000002f2
> printing eip:
> c026447f
> *pde = 00000000
> Oops: 0000 [#1]
> Modules linked in: usbcore
> CPU: 0
> EIP: 0060:[<c026447f>] Not tainted
> EFLAGS: 00010282 (2.6.7-mirahp1compiled30jul)
> EIP is at xfs_trans_brelse+0x1f/0x100
There's a couple of known use-after-free bugs related to forced
filesystem shutdown, I suspect thats what you're hitting here
where it oops'd.
HTH.
cheers.
--
Nathan
|