Disk error, then endless loop
Chris Dunlop
chris at onthe.net.au
Tue Nov 17 16:16:09 CST 2015
On Tue, Nov 17, 2015 at 03:34:55PM -0500, Brian Foster wrote:
> On Tue, Nov 17, 2015 at 03:21:31PM -0500, Brian Foster wrote:
>> On Wed, Nov 18, 2015 at 06:35:34AM +1100, Chris Dunlop wrote:
>>> On Tue, Nov 17, 2015 at 12:37:24PM -0500, Brian Foster wrote:
>>>> If the device has already dropped and reconnected as a new dev node,
>>>> it's probably harmless at this point to just try to forcibly shut down
>>>> the fs on the old one. Could you try the following?
>>>>
>>>> xfs_io -x -c shutdown <mnt>
>>>
>>> # xfs_io -x -c shutdown /var/lib/ceph/osd/ceph-18
>>> foreign file active, shutdown command is for XFS filesystems only
>>>
>>> # grep ceph-18 /etc/mtab
>>> <<< crickets >>>
>>>
>>> I don't know when the fs disappeared from mtab, it could have been when I
>>> first did the umount I guess, I didn't think to check at the time. But the
>>> umount is still there:
>>>
>>> # date; ps -opid,lstart,time,stat,wchan='WCHAN-xxxxxxxxxxxxxxxxxx',cmd -C umount
>>> Wed Nov 18 06:23:21 AEDT 2015
>>> PID STARTED TIME STAT WCHAN-xxxxxxxxxxxxxxxxxx CMD
>>> 23946 Tue Nov 17 17:30:41 2015 00:00:00 D+ xfs_ail_push_all_sync umount /var/lib/ceph/osd/ceph-18
>>
>> Ah, so it's already been removed from the namespace. Apparently it's
>> stuck at some point after the mount is made inaccessible and before it
>> actually finishes with I/O. I'm not sure we have any other option other
>> than a reset at this point, unfortunately. :/
Yes, I thought this would likely be the case.
> One last thought... it occurred to me that scsi devs have a delete
> option under the /sysfs fs. Does the old/stale device still exist under
> /sys/block/<dev>? If so, perhaps an 'echo 1 >
> /sys/block/<dev>/device/delete' would move things along..?
Unfortunately, no, it's not there.
> Note that I have no idea what effect that will have beyond removing the
> device node (so if it is still accessible now, it probably won't be
> after that command). I just tried it while doing I/O to a test device
> and it looked like it caused an fs shutdown, so it could be worth a try
> as a last resort before a system restart.
>
> Brian
Thanks again,
Chris
More information about the xfs
mailing list