<div dir="ltr"><div><div><div><div>Hi All,<br><br></div>I am having an issue with an XFS filesystem shutting down under high load with very many small files.<br></div>Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average).<br>
<br></div>at some point I get the following in dmesg:<br><br>[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8826bb7d<br>[2870477.695558]<br>
[2870477.695559] Call Trace:<br>[2870477.695611] [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe<br>[2870477.695643] [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7<br>[2870477.695673] [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2<br>
[2870477.695707] [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb<br>[2870477.695726] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14<br>[2870477.695736] [<ffffffff802230e6>] __up_read+0x19/0x7f<br>
[2870477.695764] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79<br>[2870477.695776] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14<br>[2870477.695784] [<ffffffff802230e6>] __up_read+0x19/0x7f<br>[2870477.695791] [<ffffffff80209f4c>] __d_lookup+0xb0/0xff<br>
[2870477.695803] [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57<br>[2870477.695814] [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89<br>[2870477.695829] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14<br>
[2870477.695837] [<ffffffff802230e6>] __up_read+0x19/0x7f<br>[2870477.695861] [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79<br>[2870477.695887] [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46<br>[2870477.695899] [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14<br>
[2870477.695923] [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152<br>[2870477.695933] [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4<br>[2870477.695953] [<ffffffff80260295>] tracesys+0x47/0xb6<br>[2870477.695963] [<ffffffff802602f9>] tracesys+0xab/0xb6<br>
[2870477.695977]<br>[2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46<br>[2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5<br>
[2870477.696464] Please umount the filesystem, and rectify the problem(s)<br><br># ls -l /store<br>ls: /store: Input/output error<br>?--------- 0 root root 0 Jan 1 1970 /store<br><br></div><div>Filesystems is ~1T in size<br>
</div># df -hT /store<br>Filesystem Type Size Used Avail Use% Mounted on<br>/dev/sda5 xfs 910G 142G 769G 16% /store<br><br><div><br><div><div><div><div>Using CentOS 5.9 with kernel 2.6.18-348.el5xen<br><br>
</div><div>The filesystem is in a virtual machine (Xen) and on top of LVM.<br><br></div><div>Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.)<br>
<br></div><div>These are the defaults with which the filesystem was created:<br># xfs_info /store<br>meta-data=/dev/sda5 isize=256 agcount=32, agsize=7454720 blks<br> = sectsz=512 attr=0<br>
data = bsize=4096 blocks=238551040, imaxpct=25<br> = sunit=0 swidth=0 blks, unwritten=1<br>naming =version 2 bsize=4096<br>log =internal bsize=4096 blocks=32768, version=1<br>
= sectsz=512 sunit=0 blks, lazy-count=0<br>realtime =none extsz=4096 blocks=0, rtextents=0<br clear="all"></div><div><div><div><br><br></div><div>The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that.<br>
<br><br></div><div>Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9<br></div><div><br></div>
<div>Any help will be greatly appreciated...<br></div><div><br>-- <br>Sincerely yours,<br>Alexandru Cardaniuc
</div></div></div></div></div></div></div></div>