xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging
符永涛
yongtaofu at gmail.com
Sat Apr 20 08:12:35 CDT 2013
Hi Brian,
Yes, this directory (./crashtest/.glusterfs/indices) is glusterfs
index-base directory for glusterfs features/index xlator.The
./crashtest/.glusterfs/indices/xattrop directory is XATTROP_SUBDIR
directory for glusterfs. This path can be set in glusterfs server-side
volume file so before there're any progresses I'll try to isolate this
issue by try to set it to a ext4 path or somewhere else.
Thank you.
2013/4/20 Brian Foster <bfoster at redhat.com>
> On 04/20/2013 07:52 AM, 符永涛 wrote:
> > Hi Brain,
> > Here is the file:
> > find . -inum 539274
> > ./crashtest/.glusterfs/indices/xattrop
> > [root at 10.23.72.94 <mailto:root at 10.23.72.94> xfsd]# ls -lih
> > ./crashtest/.glusterfs/indices/xattrop
> > total 0
> > 539275 ---------- 2 root root 0 Apr 20 17:17
> > 132ef294-71d1-4435-8daa-aa002e67cb6e
> > 539275 ---------- 2 root root 0 Apr 20 17:17
> > xattrop-f3ad589a-b8dc-4416-ab84-fc9ad4033540
> > find . -inum 539275
> >
> ./crashtest/.glusterfs/indices/xattrop/xattrop-f3ad589a-b8dc-4416-ab84-fc9ad4033540
> >
> ./crashtest/.glusterfs/indices/xattrop/132ef294-71d1-4435-8daa-aa002e67cb6e
> > I'm not sure if it is normal or glusterfs fall in infinite loop. Is
> > there a change that the kernel fall into dead loop?
> > I'll study it.
> >
>
> Very interesting, thank you. I don't have the full context here yet, but
> the short of it is that this particular indices/xattrop directory is
> managed by a backend translator and driven by replication. It appears to
> be doing some kind of transaction level tracking based on links. E.g.,
> some quick behavioral observations:
>
> - This directory is created and an xattrop-#### file created (no size).
> - On starting a large sequential write from a client, I start observing
> a continuous sequence of xfs_link/xfs_remove operations via tracing.
> - The backend appears to create a link to the xattr-#### file for every
> replication transaction (handwave) with the name of the link referring
> to the gfid name of the file under write (but note again the link is not
> to the inode under write, but this special xattr-#### file). It then
> apparently removes this link on transaction completion and the process
> repeats. I suspect this has to do with identifying what files were under
> modification in the event of a crash before transaction completion, but
> the underlying pattern/workload is what's more important to us here for
> the time being...
>
> So we're seeing a heavy link related workload on a directory inode
> (0x83a8a) where the entries are all links to the same inode (0x83a8b).
> These inodes happen to be close in proximity, which may or may not be a
> factor. The translator that generates these link/unlink ops sits right
> above a generic thread pool translator, so this is multi-threaded.
>
> What isn't clear to me yet is where the xfs_iunlink() for this heavily
> linked inode is induced. The primary dentry remains after my file copy
> test, but then taking another look after a few minutes I see it removed.
> The same thing occurs if I gracefully restart the volume. I'm going to
> have to dig into that some more and also see if we can use this to
> narrow in on a reproducer. I'm thinking something along the lines of:
>
> - Create a directory/file. Ideally the directory and file inodes are in
> the same cluster.
> - Start a highly-threaded link-unlink workload against that file in the
> same directory.
> - Somewhere in the background unlink the main file.
> - Check for multiple xfs_iunlink() ops, repeat.
>
> ... the assumption being that the xfs_iunlink() race could have lead to
> a possible unlinked list corruption on the associated list, such that a
> later inactivation/xfs_iunlink_remove of some other inode in that bucket
> could fail.
>
> Brian
>
> >
> > 2013/4/20 Brian Foster <bfoster at redhat.com <mailto:bfoster at redhat.com>>
> >
> > On 04/20/2013 06:10 AM, 符永涛 wrote:
> > > Dear Eric,
> > > I have applied your latest patch and collected the following log:
> > >
> > > /var/log/message
> > > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_iunlink_remove:
> > xfs_inotobp()
> > > returned error 22 for inode 0x1b20b ag 0 agino 1b20b
> > > Apr 20 17:28:23 10 kernel:
> > > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree
> returned
> > > error 22
> > > Apr 20 17:28:23 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1)
> > called
> > > from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address =
> > > 0xffffffffa02d4d0a
> > > Apr 20 17:28:23 10 kernel: XFS (sdb): I/O Error Detected. Shutting
> > down
> > > filesystem
> > > Apr 20 17:28:23 10 kernel: XFS (sdb): Please umount the filesystem
> and
> > > rectify the problem(s)
> > > Apr 20 17:28:37 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> > > Apr 20 17:29:07 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> > > Apr 20 17:29:37 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> > > Apr 20 17:30:07 10 kernel: XFS (sdb): xfs_log_force: error 5
> returned.
> > >
> > > debugfs trace:
> > >
> >
> https://docs.google.com/file/d/0B7n2C4T5tfNCTlZGUVpnZENrZ3M/edit?usp=sharing
> > >
> >
> > FWIW...
> >
> > <...>-6908 [001] 8739.967623: xfs_iunlink: dev 8:16 ino 0x83a8b
> mode
> > 0100000, flags 0x0
> > <...>-6909 [001] 8739.970252: xfs_iunlink: dev 8:16 ino 0x83a8b
> mode
> > 0100000, flags 0x0
> >
> > 0x83a8b and 0x1b20b both hash to unlinked list bucket 11.
> >
> > As to the rest of the trace, there appears to be a significant
> amount of
> > link activity on (directory) inode 0x83a8a (the immediately prior
> inode
> > to the inode involved in the race). The name data in the trace
> suggests
> > activity somewhere under .glusterfs. A couple questions:
> >
> > 1.) Any idea what entries point to this inode right now (e.g., how
> many
> > links on this inode) and where it resides in the fs (path)?
> >
> > 2.) Can you associate this kind of heavy remove/link pattern on a
> single
> > inode to a higher level activity? For example, if you were to watch
> the
> > trace data live, is this a normal pattern you observe? Does it only
> > occur when a rebalance is in progress? Or when a rebalance finishes?
> Any
> > detailed observations you can make in that regard could be helpful.
> >
> > Brian
> >
> > > Thank you.
> > >
> > >
> > > 2013/4/20 符永涛 <yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>
> > <mailto:yongtaofu at gmail.com <mailto:yongtaofu at gmail.com>>>
> > >
> > > Hi Eric,
> > > The xfs module is loaded from system kernel, it happens on our
> > > production server too (I did not touch that till now) and if
> > the xfs
> > > module is mess up the systemstap may also not working but now
> it
> > > works. As you have mentioned, strange thing is xfs shutdown
> always
> > > happens when glusterfs rebalance completes.
> > >
> > >
> > > 2013/4/20 Eric Sandeen <sandeen at sandeen.net
> > <mailto:sandeen at sandeen.net>
> > > <mailto:sandeen at sandeen.net <mailto:sandeen at sandeen.net>>>
> > >
> > > On 4/19/13 9:03 PM, 符永涛 wrote:
> > > > Hi Eric,
> > > > I will enable them and run test again. I can only
> > reproduce it
> > > with
> > > > glusterfs rebalance. Glusterfs uses a mechanism it called
> > > syncop to
> > > > unlink file. For rebalance it uses
> > > > syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In
> the
> > > glusterfs
> > > > sync_task framework(glusterfs/libglusterfs/src/syncop.c)
> > it uses
> > > > "makecontext/swapcontext"
> > > >
> > >
> > <
> http://www.opengroup.org/onlinepubs/009695399/functions/makecontext.html>.
> > > > Does it leads to racing unlink from different CPU core?
> > >
> > > Yep, I understand that it's rebalance. It dies when
> rebalance
> > > finishes because an
> > > open but unlinked file trips over the corrupted list from
> > > earlier, it seems.
> > >
> > > I don't know why makecontext would matter...
> > >
> > > Just to be sure, you are definitely loading the xfs module
> > from
> > > the kernel you built, right, and you don't have a
> "priority"
> > > module getting loaded from elsewhere? Seems unlikely, but
> > just
> > > to be sure.
> > >
> > > > Thank you.
> > >
> > > You could also add this patch to the xfs tracepoints to
> print
> > > more information about the inodes - the mode & flags.
> > >
> > > -Eric
> > >
> > >
> > > diff --git a/fs/xfs/linux-2.6/xfs_trace.h
> > > b/fs/xfs/linux-2.6/xfs_trace.h
> > > index e8ce644..c314b87 100644
> > > --- a/fs/xfs/linux-2.6/xfs_trace.h
> > > +++ b/fs/xfs/linux-2.6/xfs_trace.h
> > > @@ -544,14 +544,18 @@ DECLARE_EVENT_CLASS(xfs_inode_class,
> > > TP_STRUCT__entry(
> > > __field(dev_t, dev)
> > > __field(xfs_ino_t, ino)
> > > + __field(__u16, mode)
> > > + __field(unsigned long, flags)
> > > ),
> > > TP_fast_assign(
> > > __entry->dev = VFS_I(ip)->i_sb->s_dev;
> > > __entry->ino = ip->i_ino;
> > > + __entry->mode = VFS_I(ip)->i_mode;
> > > + __entry->flags = ip->i_flags;
> > > ),
> > > - TP_printk("dev %d:%d ino 0x%llx",
> > > + TP_printk("dev %d:%d ino 0x%llx mode 0%o, flags
> > 0x%lx",
> > > MAJOR(__entry->dev), MINOR(__entry->dev),
> > > - __entry->ino)
> > > + __entry->ino, __entry->mode,
> __entry->flags)
> > > )
> > >
> > > #define DEFINE_INODE_EVENT(name) \
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > 符永涛
> > >
> > >
> > >
> > >
> > > --
> > > 符永涛
> >
> >
> >
> >
> > --
> > 符永涛
>
>
--
符永涛
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130420/9112c3c3/attachment-0001.html>
More information about the xfs
mailing list