[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xfs_force_shutdown with linux-2.4.6-xfs-07052001?
> No, I don't believe this. I think it's an xfs problem. I've got similar probl
> ems.
> I copy about 120 GB of files to an xfs filesystem and get errors. When I chan
> ge the filesystem type from xfs to ext2 on errors occur during copying.
Can someone who is seeing this type of error please do the following:
1. Edit the function _xfs_force_shutdown() in fs/xfs/xfs_rw.c and add a
call to BUG() at line 126.
2. Build a kernel with kdb enabled
3. when you hit the shutdown it will drop into the debugger, use the
bt command to produce a stack trace and send us the output.
There are about 18 calls to xfs_force_shutdown in the filesystem, they
occur for a variety of reasons, some of which are I/O errors reported from
the block layer, some of which are due to consistency check failures on
metadata. If we get an I/O error reported then it probably really is
one, if we get one of the other errors then it is possible it is a Linux
bug, maybe in the endian conversion code.
In Andreas case the shutdown happened because we attempted to cancel a
transaction after it had been started and we had already dirtied metadata.
In XFS this does not happen in normal operation, it only happens due to
some other failure. In order to make progress I need to see more of
what was going on, hence the request for help.
Steve
>
> Andreas
>
> System:
> Debian/GNU Linux (unstable) on a PC equipped with a Pentium III-450
> (Katmai), 192 M of RAM, and a RAID (easyRAID II) with 165 GB hooked to an
> Adaptec AIC-7881U SCSI host adapter. The kernel I'm using is 2.4.6 with XFS
> patch "linux-2.4.6-xfs-07052001" applied.
>
> Soon after I've written some gigs of data I'm seeing the following in
> /var/log/messages:
>
> Jul 31 14:06:43 server kernel: xfs_force_shutdown(sd(8,5),0x1) called from li
> ne
> 1013 of file xfs_trans.c. Return address = 0xcc8e71b3
> Jul 31 14:06:43 server kernel: I/O Error Detected. Shutting down filesystem:
> sd
> (8,5)
> Jul 31 14:06:43 server kernel: Please umount the filesystem, and rectify the
> problem(s)
> >
> >> > This error could be on the device (bad cluster on the disk) or
> >> > something in a software layer like md or lvm going wrong which
> >> > is seen by XFS as a hardware error. What is actually the lvm
> >> > device. Do you use md or any other software that might
> >> > interfere? IDE or scsi and what controller and system. How is
> >> > the lvm device constructed.
> >>
> >> EIDE, no UDMA. No MD at all, LVM is pretty straightforward, a
> >> lonely 20G disk sliced into two, sitting in an extended partition.
> >> A big partition is actually the only PV, carrying a VG with six
> >> LVs. No magic, this is supposed to be a workstation. Kernel is
> >> tracked CVS, compiled with egcs-1.1.2. No overclock.
> >
> >That indeed seems very simple. The problem is that it is hard to debug
> >since it will probably be extrmely hard to replicate. If you can find a
> >way to reproduce this, let us know please. We'd love to squash XFS related
> >bugs ;-)
> >
> >I have absolutely zipp experience with LVM so I can't comment on any
> >problems on that front.
> >
> >> The thing hasn't happened since (knock on wood).
> >
> >Good luck then, you will need it for the next knock on wood.
> >
> >Cheers
> >
> >
>
>