xfs
[Top] [All Lists]

Re: xfs umount with i/o error hang/memory corruption

To: Bob Mastors <bob.mastors@xxxxxxxxxxxxx>
Subject: Re: xfs umount with i/o error hang/memory corruption
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sat, 5 Apr 2014 08:20:16 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CALjwKZAJ-R8dS13Rsj3+K3hM9p0z08qvi4ZVTYbDWKT1Biu=-Q@xxxxxxxxxxxxxx>
References: <CALjwKZAJ-R8dS13Rsj3+K3hM9p0z08qvi4ZVTYbDWKT1Biu=-Q@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Apr 04, 2014 at 12:15:23PM -0600, Bob Mastors wrote:
> Greetings,
> 
> I am new to xfs and am running into a problem
> and would appreciate any guidance on how to proceed.
> 
> After an i/o error from the block device that xfs is using,
> an umount results in a message like:
> [  370.636473] XFS (sdx): Log I/O Error Detected.  Shutting down filesystem
> [  370.644073] XFS (h           ïïïh"h          ïïïH#h          ïïïbsg):
> Please umount the filesystem and rectify the problem(s)
> Note the garbage on the previous line which suggests memory corruption.
> About half the time I get the garbled log message. About half the time
> umount hangs.

I got an email about this last night with a different trigger - thin
provisioning failing log IO in the unmount path. I know what the
problem is, I just don't have a fix for it yet.

To confirm it's the same problem, can you post the entirity of the
dmesg where the error occurs?

In essence, the log IO failure is triggering a shutdown, and as part
of the shutdown process it wakes anyone waiting on a log force.
The log quiesce code that waits for log completion during unmount
uses a log force to ensure the log is idle before tearing down all
the log structures and finishing the unmount. Unfortunatey, the log
force the unmount blocks on is woken prematurely by the shutdown,
and hence it runs before the log IO processing is completed. Hence
the use after free.

> And then I get this kind of error and the system is unresponsive:
> Message from syslogd@debian at Apr  4 09:27:40 ...
>  kernel:[  680.080022] BUG: soft lockup - CPU#2 stuck for 22s! [umount:2849]
> 
> The problem appears to be similar to this issue:
> http://www.spinics.net/lists/linux-xfs/msg00061.html

Similar symptoms, but not the same bug.

> The problem is triggered by doing an iscsi logout which causes
> the block device to return i/o errors to xfs.
> Steps to reproduce the problem are below.

Seems similar to the thinp ENOSPC issue I mentioned above - data IO
errors occur, then you do an unmount, which causes a log IO error
writing the superblock, and then this happens....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>