[Top] [All Lists]

corruption of in-memory data detected

To: xfs@xxxxxxxxxxx
Subject: corruption of in-memory data detected
From: Alexandru Cardaniuc <cardaniuc@xxxxxxxxx>
Date: Mon, 30 Jun 2014 23:44:45 -0700
Cc: Alexandru Cardaniuc <cardaniuc@xxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=rBtF8cXEUQP9a4T9dtSsByh2Ao39JYScJWLA1JZQvno=; b=bi/0Xd1vWkoWkviDoYERunheaWj6m216y1gn0ZtbpsgQCstebKrx7287t9Wc6KClCd SeubZLOvsXsqxBXs7krluiX0kCGFfgET3pQg7FGsYvcB0Jz5kaoiyM6hJN8WyZkseISU ZyLrF1yuLPbygyfgKPPZ916lJn4sx965r/08u8zNNdjSNwDhl4iFfkdIm5Lhs2iK3tZQ WQs8P8nvY9KoXC7qOLWw9PH2vWGAjDxngE7Umnw52dPWjPjANWq+xWTDDkxYUNVxDJ2g 6zo9QH8P8lnkQ+5kf8DVN/L63rGskUFPCDh91yQ7g2qZzANp53btbjY5h4EOAERxEDZz KA4w==
Hi All,

I am having an issue with an XFS filesystem shutting down under high load with very many small files.
Basically, I have around 3.5 - 4 million files on this filesystem. New files are being written to the FS all the time, until I get to 9-11 mln small files (35k on average).

at some point I get the following in dmesg:

[2870477.695512] Filesystem "sda5": XFS internal error xfs_trans_cancel at line 1138 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8826bb7d
[2870477.695559] Call Trace:
[2870477.695611]Â [<ffffffff88262c28>] :xfs:xfs_trans_cancel+0x5b/0xfe
[2870477.695643]Â [<ffffffff8826bb7d>] :xfs:xfs_mkdir+0x57c/0x5d7
[2870477.695673]Â [<ffffffff8822f3f8>] :xfs:xfs_attr_get+0xbf/0xd2
[2870477.695707]Â [<ffffffff88273326>] :xfs:xfs_vn_mknod+0x1e1/0x3bb
[2870477.695726]Â [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695736]Â [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695764]Â [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695776]Â [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695784]Â [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695791]Â [<ffffffff80209f4c>] __d_lookup+0xb0/0xff
[2870477.695803]Â [<ffffffff8020cd4a>] _atomic_dec_and_lock+0x39/0x57
[2870477.695814]Â [<ffffffff8022d6db>] mntput_no_expire+0x19/0x89
[2870477.695829]Â [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695837]Â [<ffffffff802230e6>] __up_read+0x19/0x7f
[2870477.695861]Â [<ffffffff8824f8f4>] :xfs:xfs_iunlock+0x57/0x79
[2870477.695887]Â [<ffffffff882680af>] :xfs:xfs_access+0x3d/0x46
[2870477.695899]Â [<ffffffff80264929>] _spin_lock_irqsave+0x9/0x14
[2870477.695923]Â [<ffffffff802df4a3>] vfs_mkdir+0xe3/0x152
[2870477.695933]Â [<ffffffff802dfa79>] sys_mkdirat+0xa3/0xe4
[2870477.695953]Â [<ffffffff80260295>] tracesys+0x47/0xb6
[2870477.695963]Â [<ffffffff802602f9>] tracesys+0xab/0xb6
[2870477.695985] xfs_force_shutdown(sda5,0x8) called from line 1139 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88262c46
[2870477.696452] Filesystem "sda5": Corruption of in-memory data detected. Shutting down filesystem: sda5
[2870477.696464] Please umount the filesystem, and rectify the problem(s)

# ls -l /store
ls: /store: Input/output error
?--------- 0 root root 0 Jan 1 1970 /store

Filesystems is ~1T in size
# df -hT /store
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda5ÂÂÂÂÂ xfsÂÂÂ 910GÂ 142GÂ 769GÂ 16% /store

Using CentOS 5.9 with kernel 2.6.18-348.el5xen

The filesystem is in a virtual machine (Xen) and on top of LVM.

Filesystem was created using mkfs.xfs defaults with xfsprogs-2.9.4-1.el5.centos (that's the one that comes with CentOS 5.x by default.)

These are the defaults with which the filesystem was created:
# xfs_info /store
meta-data="" isize=256ÂÂÂ agcount=32, agsize=7454720 blks
dataÂÂÂÂ =ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ bsize=4096ÂÂ blocks=238551040, imaxpct=25
ÂÂÂÂÂÂÂÂ =ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ sunit=0ÂÂÂÂÂ swidth=0 blks, unwritten=1
namingÂÂ =version 2ÂÂÂÂÂÂÂÂÂÂÂÂÂ bsize=4096
logÂÂÂÂÂ =internalÂÂÂÂÂÂÂÂÂÂÂÂÂÂ bsize=4096ÂÂ blocks=32768, version=1
ÂÂÂÂÂÂÂÂ =ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ sectsz=512ÂÂ sunit=0 blks, lazy-count=0
realtime =noneÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ extsz=4096ÂÂ blocks=0, rtextents=0

The problem is reproducible and I don't think it's hardware related. The problem was reproduced on multiple servers of the same type. So, I doubt it's a memory issue or something like that.

Is that a known issue? If it is then what's the fix? I went through the kernel updates for CentOS 5.10 (newer kernel), but didn't see any xfs related fixes since CentOS 5.9

Any help will be greatly appreciated...

Sincerely yours,
Alexandru Cardaniuc
<Prev in Thread] Current Thread [Next in Thread>