On 4/18/13 8:23 AM, 符永涛 wrote:
> Hi Brian and Eric,
> The shutdown is not easy to produce but finally right now 2 of our servers in
> our test cluster xfs was shutdown.
>
> the trace output as following
> https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing
>
> Sorry but the systemtap is interrupt and I didn't noticed that so I didn't
> get systemtap logs.
>
> /var/log/message is same as before
> Apr 18 22:43:14 10 kernel: XFS (sdb): : xfs_inotobp() returned error
> 22.
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_inactive: xfs_ifree returned error
> 22
> Apr 18 22:43:14 10 kernel: XFS (sdb): xfs_do_force_shutdown(0x1) called from
> line 1184 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffffa02d44aa
> Apr 18 22:43:14 10 kernel: XFS (sdb): I/O Error Detected. Shutting down
> filesystem
> Apr 18 22:43:14 10 kernel: XFS (sdb): Please umount the filesystem and
> rectify the problem(s)
> Apr 18 22:43:20 10 kernel: XFS (sdb): xfs_log_force: error 5 returned.
>
> The metadump file is large I'll share it to you soon.
>
Thanks, we'll take a look. Just to double check, in the kernel that ran the
tracepoints, did you use brian's 2nd version of the patch? I want to make sure
the tracepoints were at the top of the function.
Since you're patching xfs anyway, can you add something like this for next time:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 796edce..cad0e8e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1777,8 +1777,9 @@ xfs_iunlink_remove(
&last_ibp, &last_offset, 0);
if (error) {
xfs_warn(mp,
- "%s: xfs_inotobp() returned error %d.",
- __func__, error);
+ "%s: xfs_inotobp() returned error %d "
+ "for inode 0x%llx ag %d agino %x\n",
+ __func__, error, ip->i_ino, agno,
agino);
return error;
}
next_agino = be32_to_cpu(last_dip->di_next_unlinked);
so that when we encounter the error we're sure to have the problematic inode
number.
Thanks,
-Eric
|