Files with non-ASCII names inaccessible after xfs_repair
Dave Chinner
david at fromorbit.com
Sun Jan 12 19:50:07 CST 2014
On Sun, Jan 12, 2014 at 11:53:59AM -0800, Zachary Kotlarek wrote:
>
> On Jan 12, 2014, at 10:47 AM, Stan Hoeppner <stan at hardwarefreak.com> wrote:
>
> > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >
> > If this is due to a bug it may have already been fixed. Note the first
> > two things asked for.
>
>
> Thanks for the pointer.
>
> My kernels a bit old, but xfsprogs is shiny and new:
> Linux vera 2.6.39.2 #1 SMP Fri Sep 30 23:55:41 PDT 2011 x86_64 x86_64 x86_64 GNU/Linux
> xfs_repair version 3.1.11
>
> 2x4 core CPUs
> 8 GB RAM, mostly free (more than 6 GB cached)
>
> Related mount:
> /dev/lvmsas/tv /mnt/media/TV xfs rw,nosuid,nodev,noexec,relatime,attr2,delaylog,inode64,sunit=1024,swidth=4096,noquota 0 0
>
> Underlying partition:
> 254 31 16252928000 dm-31
>
> Which is a no-frills LVM2 volume allocation over mdadm raid-6.
>
> meta-data=/dev/lvmsas/tv isize=256 agcount=33, agsize=126975872 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=4063232000, imaxpct=5
> = sunit=128 swidth=512 blks
> naming =version 2 bsize=4096 ascii-ci=1
> log =internal bsize=4096 blocks=521728, version=2
> = sectsz=512 sunit=8 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> Attempts to access the now-busted files/directories with accents in their paths result in a kernel log like:
> Jan 11 02:05:39 vera XFS (dm-31): I/O error occurred: meta-data dev dm-31 block 0x3c8ff73e0 ("xfs_trans_read_buf") error 11 buf count 4096
error 11 = EAGAIN/EWOULDBLOCK
That tends to imply that there's some interesting error occurring in
the layers below XFS here. XFS on a kernel that old is not expecting
an EAGAIN error from storage, so it is likely not being captured
properly. There have been bugs in the raid/dm code in the past that
would cause issues like this, and bugs in the XFS error handling
that allowed them to slip throw and shut down the filesystem.
For example, this fix made in March 2013:
$ gl -n1 -p c163f9a
commit c163f9a1760229a95d04e37b332de7d5c1c225cd
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Mar 12 23:30:34 2013 +1100
xfs: ensure we capture IO errors correctly
Failed buffer readahead can leave the buffer in the cache marked
with an error. Most callers that then issue a subsequent read on the
buffer do not zero the b_error field out, and so we may incorectly
detect an error during IO completion due to the stale error value
left on the buffer.
Avoid this problem by zeroing the error before IO submission. This
ensures that the only IO errors that are detected those captured
from are those captured from bio submission or completion.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Mark Tinguely <tinguely at sgi.com>
Signed-off-by: Ben Myers <bpm at sgi.com>
Is probably relevant, but there are many more changes up and down
the stack that may be the cause of your problem. Indeed, the above
fix may simply turn EAGAIN into EIO because there really is
something wrong with that block on disk....
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list