xfs umount with i/o error hang/memory corruption
Bob Mastors
bob.mastors at solidfire.com
Fri Apr 4 14:47:28 CDT 2014
On Fri, Apr 4, 2014 at 12:50 PM, Stan Hoeppner <stan at hardwarefreak.com>wrote:
> On 4/4/2014 1:15 PM, Bob Mastors wrote:
> > Greetings,
> >
> > I am new to xfs and am running into a problem
> > and would appreciate any guidance on how to proceed.
> >
> > After an i/o error from the block device that xfs is using,
> > an umount results in a message like:
> > [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down
> filesystem
> > [ 370.644073] XFS (h ���h"h ���H#h ���bsg):
> > Please umount the filesystem and rectify the problem(s)
> > Note the garbage on the previous line which suggests memory corruption.
> > About half the time I get the garbled log message. About half the time
> > umount hangs.
> >
> > And then I get this kind of error and the system is unresponsive:
> > Message from syslogd at debian at Apr 4 09:27:40 ...
> > kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s!
> [umount:2849]
> >
> > The problem appears to be similar to this issue:
> > http://www.spinics.net/lists/linux-xfs/msg00061.html
> >
> > I can reproduce the problem easily using open-iscsi to create
> > the block device with an iscsi initiator.
> > I use lio to create an iscsi target.
> >
> > The problem is triggered by doing an iscsi logout which causes
> > the block device to return i/o errors to xfs.
> > Steps to reproduce the problem are below.
>
> This is not a problem but the expected behavior. XFS is designed to do
> this to prevent filesystem corruption. Logging out of a LUN is no
> different than pulling the power plug on a direct attached disk drive.
> Surely you would not do that to a running filesystem.
>
Sorry, I don't think I was clear on the nature of the problem and it's wide
ranging effects.
Using iscsi to access block storage on another server is the goal.
A failure on the other server is possible which could result in the iscsi
initiator returning i/o errors to xfs.
The behavior that I would like from xfs is to put the filesystem in some
kind of offline state
on an i/o error from the block device. Today xfs usually has this desirable
behavior.
But there is a corner case were instead xfs hangs the entire server,
forcing a hard reboot.
On a large file server with many filesystems using iscsi to talk to block
storage on multiple
block servers, I would like the failure of a single block server to only
impact the filesystems
that are dependent on the block server, and to not impact the other
filesystems.
Bob
> Using VirtualBox, I can reproduce it with two processors but not one.
> > I first saw this on a 3.8 kernel and most recently reproduced it with
> 3.14+.
> ...
>
> The only problem I see here is that XFS should be shutting down every
> time the disk device disappears. Which means in your test cases where
> it does not, your VM environment isn't passing the IO errors up the
> stack, and it should be. Which means your VM environment is broken.
>
> Cheers,
>
> Stan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140404/d9e665b0/attachment.html>
More information about the xfs
mailing list