<div dir="ltr">Hi,<br><br>We have a new Linux/XFS deployment (about a month old) and randomly without warning the XFS filesystem will go off-line. We are running Scientific Linux release 5.9 with the latest updates.<br><br>
# uname -a<br>Linux node24 2.6.18-348.3.1.el5 #1 SMP Mon Mar 11 15:43:13 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux<br><br># cat /etc/redhat-release<br>Scientific Linux release 5.9 (Boron)<br><br>Here are the errors we see in /var/log/messages after the initial off-line event:<br>
<br>-- snip --<br><br>Apr 2 07:50:28 node24 kernel: xfs_iunlink_remove: xfs_inotobp() returned an error 22 on dm-6. Returning error.<br>Apr 2 07:50:28 node24 kernel: xfs_inactive: xfs_ifree() returned an error = 22 on dm-6<br>
Apr 2 07:50:28 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 1419 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffff8855b86b<br>Apr 2 07:50:28 node24 kernel: Filesystem dm-6: I/O Error Detected. Shutting down filesystem: dm-6<br>
Apr 2 07:50:28 node24 kernel: Please umount the filesystem, and rectify the problem(s)<br>Apr 2 07:50:52 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.<br>Apr 2 07:51:52 node24 last message repeated 2 times<br>
<br>-- snip --<br><br>Here are the messages after I umount/xfs_repair/mount the filesystem:<br><br>-- snip --<br><br>Apr 2 10:23:04 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff8855c0fe<br>
Apr 2 10:23:07 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned.<br>Apr 2 10:23:07 node24 last message repeated 4 times<br>Apr 2 10:24:08 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed<br>
Apr 2 10:24:08 node24 kernel: XFS mounting filesystem dm-6<br>Apr 2 10:24:08 node24 kernel: Starting XFS recovery on filesystem: dm-6 (logdev: internal)<br>Apr 2 10:24:10 node24 kernel: Ending XFS recovery on filesystem: dm-6 (logdev: internal)<br>
Apr 2 10:24:17 node24 multipathd: dm-6: umount map (uevent)<br>Apr 2 10:58:54 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed<br>Apr 2 10:58:54 node24 kernel: XFS mounting filesystem dm-6<br>
<br>-- snip --<br><br>We are taking 6 devices from a SAN and using LVM to effectively create a RAID0 block devices which XFS is sitting on. We do not see any multipathd errors.<br><br>I created the filesystem using this command.<br>
<br># mkfs.xfs -f -d su=256k,sw=6,sectsize=4096,unwritten=0 -i attr=2 -l sectsize=4096,lazy-count=1 -r extsize=4096 /dev/mapper/vol_d24-root<br><br>Here are the mount options:<br><br># cat /etc/fstab | grep xfs<br>/dev/mapper/vol_d24-root /archive/d24 xfs defaults,inode64 0 9<br>
<br># mount | grep xfs<br>/dev/mapper/vol_d24-root on /archive/d24 type xfs (rw,inode64)<br><br>Here is the output of xfs_info:<br><br># xfs_info /dev/mapper/vol_d24-root<br>meta-data=/dev/mapper/vol_d24-root isize=256 agcount=88, agsize=268435392 blks<br>
= sectsz=4096 attr=2<br>data = bsize=4096 blocks=23441774592, imaxpct=25<br> = sunit=64 swidth=384 blks, unwritten=0<br>naming =version 2 bsize=4096 <br>
log =internal bsize=4096 blocks=32768, version=2<br> = sectsz=4096 sunit=1 blks, lazy-count=1<br>realtime =none extsz=4096 blocks=0, rtextents=0<br><br>
After the initial off-line event I:<br>- umount<br>- ran xfs_repair (it told me to mount/umount and then re-run xfs_repair)<br>- mount<br>- umount<br>- xfs_repair<br><br>Here is the output of xfs_repair:<br><br>-- snip --<br>
<br># xfs_repair /dev/mapper/vol_d24-root<br>Phase 1 - find and verify superblock...<br>Phase 2 - using internal log<br> - zero log...<br> - scan filesystem freespace and inode maps...<br> - found root inode chunk<br>
Phase 3 - for each AG...<br> - scan and clear agi unlinked lists...<br> - process known inodes and perform inode discovery...<br> - agno = 0<br> - agno = 1<br> - agno = 2<br> - agno = 3<br>
- agno = 4<br> - agno = 5<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 14657493984, len 16384 bytes) key=(bno 14657493984, len 8192 bytes)<br> - agno = 6<br> - agno = 7<br> - agno = 8<br>
- agno = 9<br> - agno = 10<br> - agno = 11<br> - agno = 12<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 26065183200, len 16384 bytes) key=(bno 26065183200, len 8192 bytes)<br> - agno = 13<br>
- agno = 14<br> - agno = 15<br> - agno = 16<br> - agno = 17<br> - agno = 18<br> - agno = 19<br> - agno = 20<br>2acde2e17940: Badness in key lookup (length)<br>bp=(bno 43039175488, len 16384 bytes) key=(bno 43039175488, len 8192 bytes)<br>
- agno = 21<br> - agno = 22<br> - agno = 23<br> - agno = 24<br> - agno = 25<br> - agno = 26<br> - agno = 27<br> - agno = 28<br> - agno = 29<br> - agno = 30<br>
- agno = 31<br> - agno = 32<br> - agno = 33<br> - agno = 34<br> - agno = 35<br> - agno = 36<br> - agno = 37<br> - agno = 38<br> - agno = 39<br> - agno = 40<br>
- agno = 41<br> - agno = 42<br> - agno = 43<br> - agno = 44<br> - agno = 45<br> - agno = 46<br> - agno = 47<br>2acde0613940: Badness in key lookup (length)<br>bp=(bno 101051527232, len 16384 bytes) key=(bno 101051527232, len 8192 bytes)<br>
2acde0613940: Badness in key lookup (length)<br>bp=(bno 101081120768, len 16384 bytes) key=(bno 101081120768, len 8192 bytes)<br>2acde0613940: Badness in key lookup (length)<br>bp=(bno 102336613216, len 16384 bytes) key=(bno 102336613216, len 8192 bytes)<br>
- agno = 48<br> - agno = 49<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 107185599392, len 16384 bytes) key=(bno 107185599392, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>
bp=(bno 107606543312, len 16384 bytes) key=(bno 107606543312, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>bp=(bno 107674994560, len 16384 bytes) key=(bno 107674994560, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>
bp=(bno 107675078656, len 16384 bytes) key=(bno 107675078656, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>bp=(bno 107675078688, len 16384 bytes) key=(bno 107675078688, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>
bp=(bno 107675078720, len 16384 bytes) key=(bno 107675078720, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>bp=(bno 107675175008, len 16384 bytes) key=(bno 107675175008, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>
bp=(bno 107704942624, len 16384 bytes) key=(bno 107704942624, len 8192 bytes)<br>2acde1014940: Badness in key lookup (length)<br>bp=(bno 107763211904, len 16384 bytes) key=(bno 107763211904, len 8192 bytes)<br> - agno = 50<br>
2acde1014940: Badness in key lookup (length)<br>bp=(bno 109436122656, len 16384 bytes) key=(bno 109436122656, len 8192 bytes)<br>2acde2e17940: Badness in key lookup (length)<br>bp=(bno 110466056352, len 16384 bytes) key=(bno 110466056352, len 8192 bytes)<br>
2acde2e17940: Badness in key lookup (length)<br>bp=(bno 110603835392, len 16384 bytes) key=(bno 110603835392, len 8192 bytes)<br> - agno = 51<br> - agno = 52<br> - agno = 53<br> - agno = 54<br>
- agno = 55<br> - agno = 56<br> - agno = 57<br> - agno = 58<br> - agno = 59<br> - agno = 60<br> - agno = 61<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 132435472416, len 16384 bytes) key=(bno 132435472416, len 8192 bytes)<br>
- agno = 62<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 135330780000, len 16384 bytes) key=(bno 135330780000, len 8192 bytes)<br>2acde2416940: Badness in key lookup (length)<br>bp=(bno 135508074496, len 16384 bytes) key=(bno 135508074496, len 8192 bytes)<br>
2acde2416940: Badness in key lookup (length)<br>bp=(bno 135675982432, len 16384 bytes) key=(bno 135675982432, len 8192 bytes)<br> - agno = 63<br> - agno = 64<br> - agno = 65<br> - agno = 66<br>
- agno = 67<br> - agno = 68<br> - agno = 69<br> - agno = 70<br> - agno = 71<br> - agno = 72<br> - agno = 73<br> - agno = 74<br> - agno = 75<br> - agno = 76<br>
- agno = 77<br> - agno = 78<br> - agno = 79<br> - agno = 80<br> - agno = 81<br> - agno = 82<br> - agno = 83<br> - agno = 84<br> - agno = 85<br> - agno = 86<br>
- agno = 87<br> - process newly discovered inodes...<br>Phase 4 - check for duplicate blocks...<br> - setting up duplicate extent list...<br> - check for inodes claiming duplicate blocks...<br>
- agno = 0<br> - agno = 1<br> - agno = 5<br> - agno = 6<br> - agno = 7<br> - agno = 2<br> - agno = 3<br> - agno = 8<br> - agno = 9<br> - agno = 4<br> - agno = 10<br>
- agno = 11<br> - agno = 12<br> - agno = 13<br> - agno = 14<br> - agno = 15<br> - agno = 16<br> - agno = 17<br> - agno = 19<br> - agno = 20<br> - agno = 18<br>
- agno = 21<br> - agno = 22<br> - agno = 23<br> - agno = 24<br> - agno = 25<br> - agno = 26<br> - agno = 27<br> - agno = 28<br> - agno = 29<br> - agno = 30<br>
- agno = 31<br> - agno = 32<br> - agno = 33<br> - agno = 34<br> - agno = 35<br> - agno = 36<br> - agno = 37<br> - agno = 38<br> - agno = 39<br> - agno = 40<br>
- agno = 41<br> - agno = 42<br> - agno = 43<br> - agno = 44<br> - agno = 45<br> - agno = 46<br> - agno = 47<br> - agno = 48<br> - agno = 49<br> - agno = 50<br>
- agno = 51<br> - agno = 52<br> - agno = 53<br> - agno = 54<br> - agno = 55<br> - agno = 56<br> - agno = 57<br> - agno = 58<br> - agno = 59<br> - agno = 60<br>
- agno = 61<br> - agno = 62<br> - agno = 63<br> - agno = 64<br> - agno = 65<br> - agno = 66<br> - agno = 67<br> - agno = 68<br> - agno = 69<br> - agno = 70<br>
- agno = 71<br> - agno = 72<br> - agno = 73<br> - agno = 74<br> - agno = 75<br> - agno = 76<br> - agno = 77<br> - agno = 78<br> - agno = 79<br> - agno = 80<br>
- agno = 81<br> - agno = 82<br> - agno = 83<br> - agno = 84<br> - agno = 85<br> - agno = 86<br> - agno = 87<br>Phase 5 - rebuild AG headers and trees...<br> - reset superblock...<br>
Phase 6 - check inode connectivity...<br> - resetting contents of realtime bitmap and summary inodes<br> - traversing filesystem ...<br> - traversal finished ...<br> - moving disconnected inodes to lost+found ...<br>
disconnected inode 202102936036, moving to lost+found<br>disconnected inode 215350040250, moving to lost+found<br>disconnected inode 215350208634, moving to lost+found<br>disconnected inode 271016406074, moving to lost+found<br>
Phase 7 - verify and correct link counts...<br>done<br><br>-- snip --<br><br>Any ideas?<br><br>Thanks</div>