| To: | xfs@xxxxxxxxxxx |
|---|---|
| Subject: | 88TB filesystem going off-line without warning |
| From: | L Ox <lox8096@xxxxxxxxx> |
| Date: | Tue, 2 Apr 2013 11:44:15 -0700 |
| Delivered-to: | xfs@xxxxxxxxxxx |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=taEgk6jqiLuUpkvRrQMIC1DhUfbkr0FdIdCW1+uTrg4=; b=mBl8LniFiC8EjVmaouPdxTrFpSH3xYOixMV6ekpnol/rXB8jFrAvNu+mkSuwvfSToe x7WaY5BkVo6OyIY8jTj4dy0x6qP8F+FLJBkNWug7jQ+opDqr0zL/o6BA9C61hcLINa2G 3xWUdvO2z5L1AGkZrV9SD9JNN3+M/IjRrvKwNQFf+hQLwKI1GHvLHS9SC91NknZXAjPP ypd26Y9mtI2RSVB72z2YkjkPdHIPiREjSKQ3JnllO1e0bDHLcQu3kdqxyVyG5pTbybBZ c8xdeFjH1BDSHtoCln45IJ7tfM/sdxaVvzgdjfxcJ56BtJ1vtxzY0juFWkpBttBVRpyY 0gsg== |
|
Hi,
We have a new Linux/XFS deployment (about a month old) and randomly without warning the XFS filesystem will go off-line. We are running Scientific Linux release 5.9 with the latest updates. # uname -a Linux node24 2.6.18-348.3.1.el5 #1 SMP Mon Mar 11 15:43:13 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Scientific Linux release 5.9 (Boron) Here are the errors we see in /var/log/messages after the initial off-line event: -- snip -- Apr 2 07:50:28 node24 kernel: xfs_iunlink_remove: xfs_inotobp() returned an error 22 on dm-6. Returning error. Apr 2 07:50:28 node24 kernel: xfs_inactive: xfs_ifree() returned an error = 22 on dm-6 Apr 2 07:50:28 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 1419 of file fs/xfs/xfs_vnodeops.c. Return address = 0xffffffff8855b86b Apr 2 07:50:28 node24 kernel: Filesystem dm-6: I/O Error Detected. Shutting down filesystem: dm-6 Apr 2 07:50:28 node24 kernel: Please umount the filesystem, and rectify the problem(s) Apr 2 07:50:52 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned. Apr 2 07:51:52 node24 last message repeated 2 times -- snip -- Here are the messages after I umount/xfs_repair/mount the filesystem: -- snip -- Apr 2 10:23:04 node24 kernel: xfs_force_shutdown(dm-6,0x1) called from line 420 of file fs/xfs/xfs_rw.c. Return address = 0xffffffff8855c0fe Apr 2 10:23:07 node24 kernel: Filesystem dm-6: xfs_log_force: error 5 returned. Apr 2 10:23:07 node24 last message repeated 4 times Apr 2 10:24:08 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed Apr 2 10:24:08 node24 kernel: XFS mounting filesystem dm-6 Apr 2 10:24:08 node24 kernel: Starting XFS recovery on filesystem: dm-6 (logdev: internal) Apr 2 10:24:10 node24 kernel: Ending XFS recovery on filesystem: dm-6 (logdev: internal) Apr 2 10:24:17 node24 multipathd: dm-6: umount map (uevent) Apr 2 10:58:54 node24 kernel: Filesystem dm-6: Disabling barriers, trial barrier write failed Apr 2 10:58:54 node24 kernel: XFS mounting filesystem dm-6 -- snip -- We are taking 6 devices from a SAN and using LVM to effectively create a RAID0 block devices which XFS is sitting on. We do not see any multipathd errors. I created the filesystem using this command. # mkfs.xfs -f -d su=256k,sw=6,sectsize=4096,unwritten=0 -i attr=2 -l sectsize=4096,lazy-count=1 -r extsize=4096 /dev/mapper/vol_d24-root Here are the mount options: # cat /etc/fstab | grep xfs /dev/mapper/vol_d24-root /archive/d24 xfs defaults,inode64 0 9 # mount | grep xfs /dev/mapper/vol_d24-root on /archive/d24 type xfs (rw,inode64) Here is the output of xfs_info: # xfs_info /dev/mapper/vol_d24-root meta-data="" isize=256 agcount=88, agsize=268435392 blks = sectsz=4096 attr=2 data = bsize=4096 blocks=23441774592, imaxpct=25 = sunit=64 swidth=384 blks, unwritten=0 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 After the initial off-line event I: - umount - ran xfs_repair (it told me to mount/umount and then re-run xfs_repair) - mount - umount - xfs_repair Here is the output of xfs_repair: -- snip -- # xfs_repair /dev/mapper/vol_d24-root Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 2acde2416940: Badness in key lookup (length) bp=(bno 14657493984, len 16384 bytes) key=(bno 14657493984, len 8192 bytes) - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 2acde2416940: Badness in key lookup (length) bp=(bno 26065183200, len 16384 bytes) key=(bno 26065183200, len 8192 bytes) - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 - agno = 20 2acde2e17940: Badness in key lookup (length) bp=(bno 43039175488, len 16384 bytes) key=(bno 43039175488, len 8192 bytes) - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - agno = 32 - agno = 33 - agno = 34 - agno = 35 - agno = 36 - agno = 37 - agno = 38 - agno = 39 - agno = 40 - agno = 41 - agno = 42 - agno = 43 - agno = 44 - agno = 45 - agno = 46 - agno = 47 2acde0613940: Badness in key lookup (length) bp=(bno 101051527232, len 16384 bytes) key=(bno 101051527232, len 8192 bytes) 2acde0613940: Badness in key lookup (length) bp=(bno 101081120768, len 16384 bytes) key=(bno 101081120768, len 8192 bytes) 2acde0613940: Badness in key lookup (length) bp=(bno 102336613216, len 16384 bytes) key=(bno 102336613216, len 8192 bytes) - agno = 48 - agno = 49 2acde2416940: Badness in key lookup (length) bp=(bno 107185599392, len 16384 bytes) key=(bno 107185599392, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107606543312, len 16384 bytes) key=(bno 107606543312, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107674994560, len 16384 bytes) key=(bno 107674994560, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107675078656, len 16384 bytes) key=(bno 107675078656, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107675078688, len 16384 bytes) key=(bno 107675078688, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107675078720, len 16384 bytes) key=(bno 107675078720, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107675175008, len 16384 bytes) key=(bno 107675175008, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107704942624, len 16384 bytes) key=(bno 107704942624, len 8192 bytes) 2acde1014940: Badness in key lookup (length) bp=(bno 107763211904, len 16384 bytes) key=(bno 107763211904, len 8192 bytes) - agno = 50 2acde1014940: Badness in key lookup (length) bp=(bno 109436122656, len 16384 bytes) key=(bno 109436122656, len 8192 bytes) 2acde2e17940: Badness in key lookup (length) bp=(bno 110466056352, len 16384 bytes) key=(bno 110466056352, len 8192 bytes) 2acde2e17940: Badness in key lookup (length) bp=(bno 110603835392, len 16384 bytes) key=(bno 110603835392, len 8192 bytes) - agno = 51 - agno = 52 - agno = 53 - agno = 54 - agno = 55 - agno = 56 - agno = 57 - agno = 58 - agno = 59 - agno = 60 - agno = 61 2acde2416940: Badness in key lookup (length) bp=(bno 132435472416, len 16384 bytes) key=(bno 132435472416, len 8192 bytes) - agno = 62 2acde2416940: Badness in key lookup (length) bp=(bno 135330780000, len 16384 bytes) key=(bno 135330780000, len 8192 bytes) 2acde2416940: Badness in key lookup (length) bp=(bno 135508074496, len 16384 bytes) key=(bno 135508074496, len 8192 bytes) 2acde2416940: Badness in key lookup (length) bp=(bno 135675982432, len 16384 bytes) key=(bno 135675982432, len 8192 bytes) - agno = 63 - agno = 64 - agno = 65 - agno = 66 - agno = 67 - agno = 68 - agno = 69 - agno = 70 - agno = 71 - agno = 72 - agno = 73 - agno = 74 - agno = 75 - agno = 76 - agno = 77 - agno = 78 - agno = 79 - agno = 80 - agno = 81 - agno = 82 - agno = 83 - agno = 84 - agno = 85 - agno = 86 - agno = 87 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 5 - agno = 6 - agno = 7 - agno = 2 - agno = 3 - agno = 8 - agno = 9 - agno = 4 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 19 - agno = 20 - agno = 18 - agno = 21 - agno = 22 - agno = 23 - agno = 24 - agno = 25 - agno = 26 - agno = 27 - agno = 28 - agno = 29 - agno = 30 - agno = 31 - agno = 32 - agno = 33 - agno = 34 - agno = 35 - agno = 36 - agno = 37 - agno = 38 - agno = 39 - agno = 40 - agno = 41 - agno = 42 - agno = 43 - agno = 44 - agno = 45 - agno = 46 - agno = 47 - agno = 48 - agno = 49 - agno = 50 - agno = 51 - agno = 52 - agno = 53 - agno = 54 - agno = 55 - agno = 56 - agno = 57 - agno = 58 - agno = 59 - agno = 60 - agno = 61 - agno = 62 - agno = 63 - agno = 64 - agno = 65 - agno = 66 - agno = 67 - agno = 68 - agno = 69 - agno = 70 - agno = 71 - agno = 72 - agno = 73 - agno = 74 - agno = 75 - agno = 76 - agno = 77 - agno = 78 - agno = 79 - agno = 80 - agno = 81 - agno = 82 - agno = 83 - agno = 84 - agno = 85 - agno = 86 - agno = 87 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 202102936036, moving to lost+found disconnected inode 215350040250, moving to lost+found disconnected inode 215350208634, moving to lost+found disconnected inode 271016406074, moving to lost+found Phase 7 - verify and correct link counts... done -- snip -- Any ideas? Thanks |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | A change!, Nick Robinson |
|---|---|
| Next by Date: | FYI: xfs_freeze/thaw doesn't sync free space w/df output, Linda Walsh |
| Previous by Thread: | A change!, Nick Robinson |
| Next by Thread: | Re: 88TB filesystem going off-line without warning, Emmanuel Florac |
| Indexes: | [Date] [Thread] [Top] [All Lists] |