http://bugme.osdl.org/show_bug.cgi?id=2841
Summary: XFS internal error xfs_da_do_buf + umount oops
Kernel Version: 2.6.6
Status: NEW
Severity: high
Owner: xfs-masters@xxxxxxxxxxx
Submitter: janfrode@xxxxxxxxxxxxxxx
Distribution:WhiteBox Enterprise Linux 3.0 (RedHat Enterprise clone)
Hardware Environment: IBM x330, dual Pentium III, 2 GB memory, qlogic qla2300
fibre channel connected disks
Software Environment: NFS-exported filesystems are XFS.
Problem Description:
I just had a crash on my NFS server, reported in
http://bugme.osdl.org/show_bug.cgi?id=2840
After booting the node up again I go lots of XFS internal errors on one of the
filesystems:
0x0: 00 00 00 00 00 00 00 af 07 a8 00 28 00 30 00 20
Filesystem "sdb1": XFS internal error xfs_da_do_buf(2) at line 2273 of
file fs/xfs/xfs_da_btree.c. Caller 0xc020a867
Call Trace:
[<c020a3f2>] xfs_da_do_buf+0x5e2/0x9b0
[<c020a867>] xfs_da_read_buf+0x47/0x60
[<c020a867>] xfs_da_read_buf+0x47/0x60
[<c023c0e2>] xfs_trans_read_buf+0x2d2/0x330
[<c020a867>] xfs_da_read_buf+0x47/0x60
[<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0
[<c020e412>] xfs_dir2_block_lookup_int+0x52/0x1a0
[<c0250033>] xfs_initialize_vnode+0x2d3/0x2e0
[<c0250c29>] vfs_init_vnode+0x39/0x40
[<c01fc80d>] xfs_bmap_last_offset+0xbd/0x120
[<c020e330>] xfs_dir2_block_lookup+0x20/0xb0
[<c020c91b>] xfs_dir2_lookup+0xab/0x110
[<c0221c37>] xfs_ilock+0x57/0x100
[<c023d1a8>] xfs_dir_lookup_int+0x38/0x100
[<c0221c37>] xfs_ilock+0x57/0x100
[<c02425f6>] xfs_lookup+0x66/0x90
[<c024ded4>] linvfs_lookup+0x64/0xa0
[<c015ecf9>] __lookup_hash+0x89/0xb0
[<c015ed80>] lookup_one_len+0x50/0x60
[<c01d2bba>] compose_entry_fh+0x5a/0x120
[<c01d3084>] encode_entry+0x404/0x510
[<c024bbf6>] linvfs_readdir+0x196/0x240
[<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
[<c0151d6c>] open_private_file+0x1c/0x90
[<c0162949>] vfs_readdir+0x99/0xb0
[<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
[<c01ca279>] nfsd_readdir+0x79/0xc0
[<c03c8aa6>] svcauth_unix_accept+0x286/0x2a0
[<c01cffe0>] nfsd3_proc_readdirplus+0xe0/0x1c0
[<c01d31e0>] nfs3svc_encode_entry_plus+0x0/0x50
[<c01d21b0>] nfs3svc_decode_readdirplusargs+0x0/0x180
[<c01c493e>] nfsd_dispatch+0xbe/0x19c
[<c03c785d>] svc_authenticate+0x4d/0x80
[<c03c4e92>] svc_process+0x4d2/0x5f9
[<c0119b40>] default_wake_function+0x0/0x10
[<c01c46f0>] nfsd+0x1b0/0x340
[<c01c4540>] nfsd+0x0/0x340
[<c0104b2d>] kernel_thread_helper+0x5/0x18
I then did a 'exportfs -a -u' and tried unmounting this sdb1
filesystem, but then the machine froze, and I got this oops:
Unable to handle kernel paging request at virtual address 65000000
printing eip:
c013c9a5
*pde = 00000000
Oops: 0002 [#1]
SMP
CPU: 0
EIP: 0060:[<c013c9a5>] Not tainted
EFLAGS: 00010012 (2.6.6)
EIP is at free_block+0x65/0xf0
eax: eadd60c0 ebx: eab6c000 ecx: eab6cd30 edx: 65000000
esi: f7d4dda0 edi: 00000010 ebp: f7d4ddb8 esp: f26a5e28
ds: 007b es: 007b ss: 0068
Process umount (pid: 3699, threadinfo=f26a5000 task=f28128e0)
Stack: f7d4ddc8 0000001b f7d608c4 f7d608c4 00000292 eaba8ca0 f7d5f000 c013cb05
0000001b f7d608b4 f7d4dda0 f7d608b4 00000292 eaba8ca0 eaba9c40 c013cd79
eaba8ca0 00000001 00000001 c02456a5 eaba9c40 00000000 01af8f60 c0251006
Call Trace:
[<c013cb05>] cache_flusharray+0xd5/0xe0
[<c013cd79>] kmem_cache_free+0x49/0x50
[<c02456a5>] xfs_finish_reclaim+0xe5/0x110
[<c0251006>] vn_reclaim+0x56/0x60
[<c02514e8>] vn_purge+0x138/0x160
[<c02421d4>] xfs_inactive+0xf4/0x4b0
[<c013ca11>] free_block+0xd1/0xf0
[<c0251668>] vn_remove+0x58/0x5a
[<c025022f>] linvfs_clear_inode+0xf/0x20
[<c0168e74>] clear_inode+0xb4/0xd0
[<c0168ecc>] dispose_list+0x3c/0x80
[<c016905d>] invalidate_inodes+0x8d/0xb0
[<c0156eca>] generic_shutdown_super+0x8a/0x190
[<c0157a57>] kill_block_super+0x17/0x40
[<c0156d3b>] deactivate_super+0x6b/0xa0
[<c016c0cb>] sys_umount+0x3b/0x90
[<c014596a>] unmap_vma_list+0x1a/0x30
[<c016c137>] sys_oldumount+0x17/0x20
[<c010689f>] syscall_call+0x7/0xb
Code: 89 02 8b 43 0c c7 03 00 01 10 00 31 d2 c7 43 04 00 02 20 00
After getting the node back up again, I ran xfs_repair, but everything
looked OK:
# xfs_repair /dev/sdb1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
Steps to reproduce:
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|