On Sun, Jan 12, 2014 at 11:53:59AM -0800, Zachary Kotlarek wrote:
> On Jan 12, 2014, at 10:47 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > If this is due to a bug it may have already been fixed. Note the first
> > two things asked for.
> Thanks for the pointer.
> My kernels a bit old, but xfsprogs is shiny and new:
> Linux vera 22.214.171.124 #1 SMP Fri Sep 30 23:55:41 PDT 2011 x86_64 x86_64 x86_64
> xfs_repair version 3.1.11
> 2x4 core CPUs
> 8 GB RAM, mostly free (more than 6 GB cached)
> Related mount:
> /dev/lvmsas/tv /mnt/media/TV xfs
> 0 0
> Underlying partition:
> 254 31 16252928000 dm-31
> Which is a no-frills LVM2 volume allocation over mdadm raid-6.
> meta-data=/dev/lvmsas/tv isize=256 agcount=33, agsize=126975872
> = sectsz=512 attr=2
> data = bsize=4096 blocks=4063232000, imaxpct=5
> = sunit=128 swidth=512 blks
> naming =version 2 bsize=4096 ascii-ci=1
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=8 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> Attempts to access the now-busted files/directories with accents in their
> paths result in a kernel log like:
> Jan 11 02:05:39 vera XFS (dm-31): I/O error occurred: meta-data dev dm-31
> block 0x3c8ff73e0 ("xfs_trans_read_buf") error 11 buf count 4096
error 11 = EAGAIN/EWOULDBLOCK
That tends to imply that there's some interesting error occurring in
the layers below XFS here. XFS on a kernel that old is not expecting
an EAGAIN error from storage, so it is likely not being captured
properly. There have been bugs in the raid/dm code in the past that
would cause issues like this, and bugs in the XFS error handling
that allowed them to slip throw and shut down the filesystem.
For example, this fix made in March 2013:
$ gl -n1 -p c163f9a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Mar 12 23:30:34 2013 +1100
xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.
Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
Is probably relevant, but there are many more changes up and down
the stack that may be the cause of your problem. Indeed, the above
fix may simply turn EAGAIN into EIO because there really is
something wrong with that block on disk....