xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

符永涛 yongtaofu at gmail.com
Sat Apr 20 08:12:35 CDT 2013


Hi Brian,
Yes, this directory (./crashtest/.glusterfs/indices) is glusterfs
index-base directory for glusterfs features/index xlator.The
./crashtest/.glusterfs/indices/xattrop directory is XATTROP_SUBDIR
directory for glusterfs. This path can be set in glusterfs server-side
volume file so before there're any progresses I'll try to isolate this
issue by try to set it to a ext4 path or somewhere else.
Thank you.


2013/4/20 Brian Foster <bfoster at redhat.com>

> On 04/20/2013 07:52 AM, 符永涛 wrote:
> > Hi Brain,
> > Here is the file:
> > find . -inum 539274
> > ./crashtest/.glusterfs/indices/xattrop
> > [root at 10.23.72.94 <mailto:root at 10.23.72.94> xfsd]# ls -lih
> > ./crashtest/.glusterfs/indices/xattrop
> > total 0
> > 539275 ---------- 2 root root 0 Apr 20 17:17
> > 132ef294-71d1-4435-8daa-aa002e67cb6e
> > 539275 ---------- 2 root root 0 Apr 20 17:17
> > xattrop-f3ad589a-b8dc-4416-ab84-fc9ad4033540
> > find . -inum 539275
> >
> ./crashtest/.glusterfs/indices/xattrop/xattrop-f3ad589a-b8dc-4416-ab84-fc9ad4033540
> >
> ./crashtest/.glusterfs/indices/xattrop/132ef294-71d1-4435-8daa-aa002e67cb6e
> > I'm not sure if it is normal or glusterfs fall in infinite loop. Is
> > there a change that the kernel fall into dead loop?
> > I'll study it.
> >
>
> Very interesting, thank you. I don't have the full context here yet, but
> the short of it is that this particular indices/xattrop directory is
> managed by a backend translator and driven by replication. It appears to
> be doing some kind of transaction level tracking based on links. E.g.,
> some quick behavioral observations:
>
> - This directory is created and an xattrop-#### file created (no size).
> - On starting a large sequential write from a client, I start observing
> a continuous sequence of xfs_link/xfs_remove operations via tracing.
> - The backend appears to create a link to the xattr-#### file for every
> replication transaction (handwave) with the name of the link referring
> to the gfid name of the file under write (but note again the link is not
> to the inode under write, but this special xattr-#### file). It then
> apparently removes this link on transaction completion and the process
> repeats. I suspect this has to do with identifying what files were under
> modification in the event of a crash before transaction completion, but
> the underlying pattern/workload is what's more important to us here for
> the time being...
>
> So we're seeing a heavy link related workload on a directory inode
> (0x83a8a) where the entries are all links to the same inode (0x83a8b).
> These inodes happen to be close in proximity, which may or may not be a
> factor. The translator that generates these link/unlink ops sits right
> above a generic thread pool translator, so this is multi-threaded.
>
> What isn't clear to me yet is where the xfs_iunlink() for this heavily
> linked inode is induced. The primary dentry remains after my file copy
> test, but then taking another look after a few minutes I see it removed.
> The same thing occurs if I gracefully restart the volume. I'm going to
> have to dig into that some more and also see if we can use this to
> narrow in on a reproducer. I'm thinking something along the lines of:
>
> - Create a directory/file. Ideally the directory and file inodes are in
> the same cluster.
> - Start a highly-threaded link-unlink workload against that file in the
> same directory.
> - Somewhere in the background unlink the main file.
> - Check for multiple xfs_iunlink() ops, repeat.
>
> ... the assumption being that the xfs_iunlink() race could have lead to
> a possible unlinked list corruption on the associated list, such that a
> later inactivation/xfs_iunlink_remove of some other inode in that bucket
> could fail.
>
> Brian
>
> >
> > 2013/4/20 Brian Foster <bfoster at redhat.com <mailto:bfoster at redhat.com>>
> >
> >     On 04/20/2013 06:10 AM, 符永涛 wrote:
> >     > Dear Eric,
> >     > I have applied your latest patch and collected the following log:
> >     >
> >     > /var/log/message
> >     > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_iunlink_remove:
> >     xfs_inotobp()
> >     > returned error 22 for inode 0x1b20b ag 0 agino 1b20b
> >     > Apr 20 17:28:23 10 kernel:
> >     > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree
> returned
> >     > error 22
> >     > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1)
> >     called
> >     > from line 1184 of file fs/xfs/xfs_vnodeops.c.  Return address =
> >     > 0xffffffffa02d4d0a
> >     > Apr 20 17:28:23 10 kernel: XFS (sdb): I/O Error Detected. Shutting
> >     down
> >     > filesystem
> >     > Apr 20 17:28:23 10 kernel: XFS (sdb): Please umount the filesystem
> and
> >     > rectify the problem(s)
> >     > Apr 20 17:28:37 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> >     > Apr 20 17:29:07 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> >     > Apr 20 17:29:37 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> >     > Apr 20 17:30:07 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> >     >
> >     > debugfs trace:
> >     >
> >
> https://docs.google.com/file/d/0B7n2C4T5tfNCTlZGUVpnZENrZ3M/edit?usp=sharing
> >     >
> >
> >     FWIW...
> >
> >     <...>-6908  [001]  8739.967623: xfs_iunlink: dev 8:16 ino 0x83a8b
> mode
> >     0100000, flags 0x0
> >     <...>-6909  [001]  8739.970252: xfs_iunlink: dev 8:16 ino 0x83a8b
> mode
> >     0100000, flags 0x0
> >
> >     0x83a8b and 0x1b20b both hash to unlinked list bucket 11.
> >
> >     As to the rest of the trace, there appears to be a significant
> amount of
> >     link activity on (directory) inode 0x83a8a (the immediately prior
> inode
> >     to the inode involved in the race). The name data in the trace
> suggests
> >     activity somewhere under .glusterfs. A couple questions:
> >
> >     1.) Any idea what entries point to this inode right now (e.g., how
> many
> >     links on this inode) and where it resides in the fs (path)?
> >
> >     2.) Can you associate this kind of heavy remove/link pattern on a
> single
> >     inode to a higher level activity? For example, if you were to watch
> the
> >     trace data live, is this a normal pattern you observe? Does it only
> >     occur when a rebalance is in progress? Or when a rebalance finishes?
> Any
> >     detailed observations you can make in that regard could be helpful.
> >
> >     Brian
> >
> >     > Thank you.
> >     >
> >     >
> >     > 2013/4/20 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>
> >     <mailto:yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>>
> >     >
> >     >     Hi Eric,
> >     >     The xfs module is loaded from system kernel, it happens on our
> >     >     production server too (I did not touch that till now) and if
> >     the xfs
> >     >     module is mess up the systemstap may also not working but now
> it
> >     >     works. As you have mentioned, strange thing is xfs shutdown
> always
> >     >     happens when glusterfs rebalance completes.
> >     >
> >     >
> >     >     2013/4/20 Eric Sandeen <sandeen at sandeen.net
> >     <mailto:sandeen at sandeen.net>
> >     >     <mailto:sandeen at sandeen.net <mailto:sandeen at sandeen.net>>>
> >     >
> >     >         On 4/19/13 9:03 PM, 符永涛 wrote:
> >     >         > Hi Eric,
> >     >         > I will enable them and run test again. I can only
> >     reproduce it
> >     >         with
> >     >         > glusterfs rebalance. Glusterfs uses a mechanism it called
> >     >         syncop to
> >     >         > unlink file. For rebalance it uses
> >     >         > syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In
> the
> >     >         glusterfs
> >     >         > sync_task framework(glusterfs/libglusterfs/src/syncop.c)
> >     it uses
> >     >         > "makecontext/swapcontext"
> >     >         >
> >     >
> >     <
> http://www.opengroup.org/onlinepubs/009695399/functions/makecontext.html>.
> >     >         > Does it leads to racing unlink from different CPU core?
> >     >
> >     >         Yep, I understand that it's rebalance.  It dies when
> rebalance
> >     >         finishes because an
> >     >         open but unlinked file trips over the corrupted list from
> >     >         earlier, it seems.
> >     >
> >     >         I don't know why makecontext would matter...
> >     >
> >     >         Just to be sure, you are definitely loading the xfs module
> >     from
> >     >         the kernel you built, right, and you don't have a
> "priority"
> >     >         module getting loaded from elsewhere?  Seems unlikely, but
> >     just
> >     >         to be sure.
> >     >
> >     >         > Thank you.
> >     >
> >     >         You could also add this patch to the xfs tracepoints to
> print
> >     >         more information about the inodes - the mode & flags.
> >     >
> >     >         -Eric
> >     >
> >     >
> >     >         diff --git a/fs/xfs/linux-2.6/xfs_trace.h
> >     >         b/fs/xfs/linux-2.6/xfs_trace.h
> >     >         index e8ce644..c314b87 100644
> >     >         --- a/fs/xfs/linux-2.6/xfs_trace.h
> >     >         +++ b/fs/xfs/linux-2.6/xfs_trace.h
> >     >         @@ -544,14 +544,18 @@ DECLARE_EVENT_CLASS(xfs_inode_class,
> >     >                 TP_STRUCT__entry(
> >     >                         __field(dev_t, dev)
> >     >                         __field(xfs_ino_t, ino)
> >     >         +               __field(__u16, mode)
> >     >         +               __field(unsigned long, flags)
> >     >                 ),
> >     >                 TP_fast_assign(
> >     >                         __entry->dev = VFS_I(ip)->i_sb->s_dev;
> >     >                         __entry->ino = ip->i_ino;
> >     >         +               __entry->mode = VFS_I(ip)->i_mode;
> >     >         +               __entry->flags = ip->i_flags;
> >     >                 ),
> >     >         -       TP_printk("dev %d:%d ino 0x%llx",
> >     >         +       TP_printk("dev %d:%d ino 0x%llx mode 0%o, flags
> >     0x%lx",
> >     >                           MAJOR(__entry->dev), MINOR(__entry->dev),
> >     >         -                 __entry->ino)
> >     >         +                 __entry->ino, __entry->mode,
> __entry->flags)
> >     >          )
> >     >
> >     >          #define DEFINE_INODE_EVENT(name) \
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >     --
> >     >     符永涛
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > 符永涛
> >
> >
> >
> >
> > --
> > 符永涛
>
>


-- 
符永涛
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130420/9112c3c3/attachment-0001.html>


More information about the xfs mailing list