<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Apr 4, 2014 at 12:50 PM, Stan Hoeppner <span dir="ltr"><<a href="mailto:stan@hardwarefreak.com" target="_blank">stan@hardwarefreak.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 4/4/2014 1:15 PM, Bob Mastors wrote:<br>
> Greetings,<br>
><br>
> I am new to xfs and am running into a problem<br>
> and would appreciate any guidance on how to proceed.<br>
><br>
> After an i/o error from the block device that xfs is using,<br>
> an umount results in a message like:<br>
> [ 370.636473] XFS (sdx): Log I/O Error Detected. Shutting down filesystem<br>
> [ 370.644073] XFS (h ���h"h ���H#h ���bsg):<br>
> Please umount the filesystem and rectify the problem(s)<br>
> Note the garbage on the previous line which suggests memory corruption.<br>
> About half the time I get the garbled log message. About half the time<br>
> umount hangs.<br>
><br>
> And then I get this kind of error and the system is unresponsive:<br>
> Message from syslogd@debian at Apr 4 09:27:40 ...<br>
> kernel:[ 680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]<br>
><br>
> The problem appears to be similar to this issue:<br>
> <a href="http://www.spinics.net/lists/linux-xfs/msg00061.html" target="_blank">http://www.spinics.net/lists/linux-xfs/msg00061.html</a><br>
><br>
> I can reproduce the problem easily using open-iscsi to create<br>
> the block device with an iscsi initiator.<br>
> I use lio to create an iscsi target.<br>
><br>
> The problem is triggered by doing an iscsi logout which causes<br>
> the block device to return i/o errors to xfs.<br>
> Steps to reproduce the problem are below.<br>
<br>
</div>This is not a problem but the expected behavior. XFS is designed to do<br>
this to prevent filesystem corruption. Logging out of a LUN is no<br>
different than pulling the power plug on a direct attached disk drive.<br>
Surely you would not do that to a running filesystem.<br></blockquote><div>Sorry, I don't think I was clear on the nature of the problem and it's wide ranging effects.</div><div>Using iscsi to access block storage on another server is the goal.</div>
<div>A failure on the other server is possible which could result in the iscsi initiator returning i/o errors to xfs.</div><div><br></div><div>The behavior that I would like from xfs is to put the filesystem in some kind of offline state</div>
<div>on an i/o error from the block device. Today xfs usually has this desirable behavior.</div><div>But there is a corner case were instead xfs hangs the entire server, forcing a hard reboot.</div><div><br></div><div>On a large file server with many filesystems using iscsi to talk to block storage on multiple</div>
<div>block servers, I would like the failure of a single block server to only impact the filesystems</div><div>that are dependent on the block server, and to not impact the other filesystems.</div><div><br></div><div>Bob</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">
> Using VirtualBox, I can reproduce it with two processors but not one.<br>
> I first saw this on a 3.8 kernel and most recently reproduced it with 3.14+.<br>
</div>...<br>
<br>
The only problem I see here is that XFS should be shutting down every<br>
time the disk device disappears. Which means in your test cases where<br>
it does not, your VM environment isn't passing the IO errors up the<br>
stack, and it should be. Which means your VM environment is broken.<br>
<br>
Cheers,<br>
<br>
Stan<br>
<br>
<br>
</blockquote></div><br></div></div>