Bugzilla – Bug 272
xfs_force_shutdown in xfs_trans_cancel, part 2
Last modified: 2008-12-27 03:18:36 CST
I am seeing a problem very similar to the one described in bug #186. I wasn't sure if it's the same problem so I am creating a new ticket: xfs_force_shutdown(sd(8,5),0x8) called from line 1071 of file xfs_trans.c. Return address = 0xc024d1fb Filesystem "sd(8,5)": Corruption of in-memory data detected. Shutting down filesystem: sd(8,5) Please umount the filesystem, and rectify the problem(s) That line is in xfs_trans_cancel(). unmounting and remounting results in a clean filesystem, xfs_repair doesn't find any errors. I started seeing this about 10 days ago, running linux-2.4.20 plus the latest xfs-2.4.20-all-i386.bz2 that was available on Jan 20 2003. Previous to that this system has been running without problems for several months, with the same kernel. Since then the same error has been occurring once in about every 48 hours. I have tried 2.4.21+xfs-1.3.0pre4 and 1.3.0pre5 and the latest xfs-2.4.21-all as well, all versions produce the error. All of these kernels have the fix proposed in bug #186. I am fairly confident that there are no hardware problems, I just ran my burn-in script on the machine for 24 hours without any errors. About the system: Fairly heavily loaded NFS server, SMP (2xPIII), 1G RAM, Intel e1000, Adaptec 3210S hardware raid (showing all disks optimal). Please let me know if you would like any other info or if you want me to do any tests. I have just rebooted the system with a kdb enabled kernel.
We found a particular case where we can repeatedly reproduce this error. xfs_force_shutdown(md0,0x8) called from line 1088 of file fs/xfs/xfs_trans.c. Return address = 0xa00000020034ce90 Filesystem "md0": Corruption of in-memory data detected. Shutting down filesystem: md0 Hardware: 4x Itanium2 1.5 GHz (HP rx4640), 16 GB RAM 2x 3ware 9500-12MI (all disks JBOD) 24x WD Raptor 74GB Software: Linux 2.6.8-rc1-mm1 $ zgrep _XFS /proc/config.gz CONFIG_XFS_FS=m # CONFIG_XFS_RT is not set CONFIG_XFS_QUOTA=y # CONFIG_XFS_SECURITY is not set CONFIG_XFS_POSIX_ACL=y mdadm-1.5.0-3 xfsprogs-2.6.13-1 DEVICE /dev/sd[c-z] ARRAY /dev/md0 level=raid0 num-devices=24 1640349696 blocks 1024k chunks Filesystem was created with mkfs.xfs -f -L data01 -d su=1m,sw=24 -l version=2,su=256k -i size=512 /dev/md0 default mount options Script to reproduce: # script not optimized :-) dd if=/dev/zero of=/tmp/TEST bs=199813120 count=1 cd /mnt for i in `seq 1 10`; do cat TEST >> test; done for i in `seq 2 12`; do echo $i; cp test test$i; sleep 1; done # cache is full around #8 cp test12 test13 # KABOOM. I don't see the contents of the proposed patch for bug#186 in my source tree.
Filesystem layout: meta-data=/mnt isize=512 agcount=32, agsize=12815360 blks = sectsz=512 data = bsize=4096 blocks=410087424, imaxpct=25 = sunit=256 swidth=6144 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=64 blks realtime =none extsz=25165824 blocks=0, rtextents=0
Hi Peter, Any chance you could try your same MD setup but with a filesystem size which is below a terabyte (-dsize=500g or something) and see if that still fails. thanks!
Nathan, Tried with -d size=512g (additionally to the parameters used before) and the forced shutdown still happens. meta-data=/mnt isize=512 agcount=16, agsize=8388608 blks = sectsz=512 data = bsize=4096 blocks=134217728, imaxpct=25 = sunit=256 swidth=6144 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=64 blks realtime =none extsz=25165824 blocks=0, rtextents=0 xfs_force_shutdown(md0,0x8) called from line 1088 of file fs/xfs/xfs_trans.c. Return address = 0xa00000020034ce90 Moreover, we've tried with different sw RAID configurations and the problem is persistent. Peter
Also happens with 2.6.8-rc2-mm1 as well. Do you want me to try the CVS tree? Peter
> Tried with -d size=512g (additionally to the parameters used before) and the > forced shutdown still happens. Unfortunately, I don't have a 500g device to test with either ;) - but here's a few more things worth testing: - try mkfs without a version 2 log, does problem persist? - try without any data stripe alignment options - try with -dagsize=4g - try with default inode size (256 bytes).. and one-at-a-time, not all-at-once of course - if the problem goes away with any one of these, then we have a big hint. Also, does it only happen on top of MD, or does the failure occur with those options on a regular disk too? thanks.
Nathan, In all cases, the filesystem was shut down. Additional cases: - XFS on md with mkfs.xfs defaults: CRASH - XFS on single JBOD: OK - XFS on 3ware array (no md): CRASH - XFS on md (whatever underlying devices): CRASH - ext2/3 on 3ware array: OK - ext2/3 on md: OK I'm afraid I have to do the full matrix if we want to investigate this further. Might be completely unrelated to XFS/md/3w-9xxx... Peter
FWIW, I tried this with a small raid0 of 4 devices, and did not hit the problem. [root@penguin3 root]# xfs_info /mnt/foo meta-data=/mnt/foo isize=256 agcount=16, agsize=555696 blks = sectsz=512 data = bsize=4096 blocks=8891136, imaxpct=25 = sunit=16 swidth=64 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=4352, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=262144 blocks=0, rtextents=0 [root@penguin3 root]# mdadm -Q --detail /dev/md0 /dev/md0: Version : 00.90.00 Creation Time : Sat Jul 24 21:16:12 2004 Raid Level : raid0 Array Size : 35565056 (33.92 GiB 36.46 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Jul 24 21:16:12 2004 State : dirty, no-errors Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K Number Major Minor RaidDevice State 0 8 81 0 active sync /dev/sdf1 1 8 97 1 active sync /dev/sdg1 2 8 113 2 active sync /dev/sdh1 3 8 129 3 active sync /dev/sdi1 UUID : 2e3f1b40:01d6b0a0:f08ed8ec:9936997f
Can confirm the same bug.Celeron 850, Kernel 2.6.7, Debian Sarge. System crash when try intensive disk IO. Boot log follow: md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: raid1 personality registered as nr 3 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller at PCI slot 0000:00:07.1 VP_IDE: chipset revision 16 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci0000:00:07.1 ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:pio ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio hda: ST38410A, ATA DISK drive Using anticipatory io scheduler ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: ST38410A, ATA DISK drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 16841664 sectors (8622 MB) w/512KiB Cache, CHS=16708/16/63, UDMA(66) /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 p4 hdc: max request size: 128KiB hdc: 16841664 sectors (8622 MB) w/512KiB Cache, CHS=16708/16/63, UDMA(66) /dev/ide/host0/bus1/target0/lun0: p1 p2 p3 p4 md: md0 stopped. md: bind<hdc1> md: bind<hda1> raid1: raid set md0 active with 2 out of 2 mirrors mdadm: /devfs/md/0 has been started with 2 drives. SGI XFS with ACLs, security attributes, realtime, large block numbers, no debugd SGI XFS Quota Management subsystem XFS mounting filesystem md0 Starting XFS recovery on filesystem: md0 (dev: md0) Ending XFS recovery on filesystem: md0 (dev: md0) INIT: version 2.85 booting Activating swap. Unable to find swap-space signature xfs_force_shutdown(md0,0x8) called from line 1088 of file fs/xfs/xfs_trans.c. b Filesystem "md0": Corruption of in-memory data detected. Shutting down filesys0 Please umount the filesystem, and rectify the problem(s) can't create loc
Talked with peter, confirmed no other messages prior to shutdown: Aug 4 10:37:01 oplapro97 kernel: XFS mounting filesystem md0 Aug 4 10:39:50 oplapro97 kernel: xfs_force_shutdown(md0,0x8) called from line 1088 of file fs/xfs_trans.c. Return address = 0xa0000002002b4dd0 If anyone who's hitting this reliably can get kdb compiled in, please set the xfs_panic_mask to BUG on a shutdown (see xfs.txt in the kernel tree), load xfsidbg, and then when the fs shuts down & bugs, try dumping the transaction with the "xtp" command in kdb - find the transaction pointer in one of the arguments on the stack. -Eric
Update from peter - set a panic mask and got this backtrace. This is from 2.6.8-rc3, vanilla kernel from kernel.org. XFS mounting filesystem md0 xfs_force_shutdown(md0,0x8) called from line 1088 of file fs/xfs/xfs_trans.c. Return address = 0xa0 XFS: Transforming an alert into a BUG. Filesystem "md0": Corruption of in-memory data detected. Shutting down filesystem: md0 kernel BUG at fs/xfs/support/debug.c:126! cp[1679]: bugcheck! 0 [1] Modules linked in: raid0 md 3w_9xxx button ixgb tg3 ipt_REJECT ipt_state ip_conntrack iptable_filtes Pid: 1679, CPU 2, comm: cp psr : 0000101008026038 ifs : 800000000000038a ip : [<a0000002002b7620>] Not tainted ip is at icmn_err+0x280/0x2a0 [xfs] unat: 0000000000000000 pfs : 000000000000038a rsc : 0000000000000003 rnat: a0000001008a6910 bsps: a0000001008a6908 pr : 0000006455996955 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000002002b7620 b6 : a000000100002d70 b7 : a000000100019400 f6 : 1003e0fc0fc0fc0fc0fc1 f7 : 0ffd9a200000000000000 f8 : 1003e0000000000000240 f9 : 1003e0000000000002490 f10 : 1003e000000000ea00000 f11 : 1003e00000000367b7ad0 r1 : a000000100a8e870 r2 : 0000000000000000 r3 : e000004091d30ec0 r8 : 000000000000002a r9 : 0000000000000000 r10 : 0000000000000001 r11 : 0000000000004000 r12 : e000004091d3fb80 r13 : e000004091d30000 r14 : 0000000000004000 r15 : 0000000000004000 r16 : 0000000000000300 r17 : e0000040fd8cfdf0 r18 : 0000000000000001 r19 : 0000000000000003 r20 : a0000001008be784 r21 : a0000001007f1c40 r22 : a0000001008be784 r23 : a00000010088e9d8 r24 : 0000000000000002 r25 : 0000000000000000 r26 : e000004091d30ec0 r27 : e000004091d30ed0 r28 : 0000001008022038 r29 : 0000000000000002 r30 : e000004091d3fb40 r31 : 0000000000000000 Call Trace: [<a00000010001a440>] show_stack+0x80/0xa0 sp=e000004091d3f750 bsp=e000004091d31470 [<a0000001000418a0>] die+0x200/0x300 sp=e000004091d3f920 bsp=e000004091d31448 [<a000000100041c30>] ia64_bad_break+0x230/0x340 sp=e000004091d3f920 bsp=e000004091d31428 [<a0000001000121e0>] ia64_leave_kernel+0x0/0x270 sp=e000004091d3f9b0 bsp=e000004091d31428 [<a0000002002b7620>] icmn_err+0x280/0x2a0 [xfs] sp=e000004091d3fb80 bsp=e000004091d313d0 [<a00000020023d5d0>] xfs_fs_vcmn_err+0xf0/0x160 [xfs] sp=e000004091d3fb80 bsp=e000004091d31388 [<a00000020023d7b0>] xfs_cmn_err+0xd0/0x120 [xfs] sp=e000004091d3fb80 bsp=e000004091d31330 [<a00000020029ba10>] xfs_do_force_shutdown+0x1f0/0x2a0 [xfs] sp=e000004091d3fba0 bsp=e000004091d312d8 [<a0000002002b4dd0>] vfs_force_shutdown+0xd0/0x100 [xfs] sp=e000004091d3fba0 bsp=e000004091d312a0 [<a000000200281e40>] xfs_trans_cancel+0x2a0/0x2c0 [xfs] sp=e000004091d3fba0 bsp=e000004091d31270 [<a000000200292d50>] xfs_create+0x6b0/0xf40 [xfs] sp=e000004091d3fba0 bsp=e000004091d31150 [<a0000002002adb30>] linvfs_mknod+0x550/0x5e0 [xfs] sp=e000004091d3fc00 bsp=e000004091d310f0 [<a000000100167640>] vfs_create+0x140/0x1a0 sp=e000004091d3fdb0 bsp=e000004091d310b8 [<a000000100168c30>] open_namei+0x1110/0x1260 sp=e000004091d3fdb0 bsp=e000004091d31038 [<a00000010013bb20>] filp_open+0x60/0xe0 sp=e000004091d3fdc0 bsp=e000004091d31010 [<a00000010013c850>] sys_open+0xb0/0x140 sp=e000004091d3fe30 bsp=e000004091d30f90 [<a000000100012040>] ia64_ret_from_syscall+0x0/0x20 sp=e000004091d3fe30 bsp=e000004091d30f90
Further digging with Eric's guidance. Modifications to the source: fs/xfs/Makefile: EXTRA_CFLAGS += -Ifs/xfs -Ifs/xfs/linux-2.6 -funsigned-char -DDEBUG fs/xfs/xfs_error.c: #ifdef DEBUG int xfs_etrap[XFS_ERROR_NTRAP] = { EFSCORRUPTED, 0, }; We've got a trace: XFS: device md0- bad inode magic/vsn daddr 2064 #0 (magic=0) xfs_error_trap: error 990 kernel BUG at fs/xfs/xfs_error.c:75! cp[1663]: bugcheck! 0 [1] Modules linked in: raid0 md 3w_9xxx button ixgb tg3 ipt_REJECT ipt_state ip_conntrack iptable_filtes Pid: 1663, CPU 0, comm: cp psr : 0000101008026038 ifs : 8000000000000288 ip : [<a00000020033bf90>] Not tainted ip is at xfs_error_trap+0xd0/0xe0 [xfs] unat: 0000000000000000 pfs : 0000000000000288 rsc : 0000000000000003 rnat: 0009804c8a70033f bsps: 0000000000000060 pr : 000119501a556965 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000020033bf90 b6 : a000000100002d70 b7 : a000000100094fc0 f6 : 1003e0fc0fc0fc0fc0fc1 f7 : 0ffdc8dc0000000000000 f8 : 1003e0000000000000240 f9 : 1003e0000000000002490 f10 : 1003e000000000ea00000 f11 : 1003e00000000367b7ad0 r1 : a000000100a0a810 r2 : 0000000000000000 r3 : e00000034a1b0eb0 r8 : 0000000000000025 r9 : 0000000000000000 r10 : 0000000000000001 r11 : 0000000000004000 r12 : e00000034a1bfad0 r13 : e00000034a1b0000 r14 : 0000000000004000 r15 : 0000000000004000 r16 : 0000000000000100 r17 : e00000003e26fdf0 r18 : 0000000000000001 r19 : 0000000000000001 r20 : a000000100838a8c r21 : a000000100771c40 r22 : a000000100838a8c r23 : a00000010080a958 r24 : 0000000000000002 r25 : 0000000000000000 r26 : e00000034a1b0eb0 r27 : e00000034a1b0ec0 r28 : 0000001008022038 r29 : 0000000000000002 r30 : e00000034a1bfa90 r31 : 0000000000000000 Call Trace: [<a000000100019ba0>] show_stack+0x80/0xa0 sp=e00000034a1bf6a0 bsp=e00000034a1b1650 [<a000000100040fa0>] die+0x200/0x300 sp=e00000034a1bf870 bsp=e00000034a1b1628 [<a000000100041330>] ia64_bad_break+0x230/0x340 sp=e00000034a1bf870 bsp=e00000034a1b1608 [<a000000100011f80>] ia64_leave_kernel+0x0/0x270 sp=e00000034a1bf900 bsp=e00000034a1b1608 [<a00000020033bf90>] xfs_error_trap+0xd0/0xe0 [xfs] sp=e00000034a1bfad0 bsp=e00000034a1b15c0 [<a000000200354ce0>] xfs_itobp+0x4a0/0x6a0 [xfs] sp=e00000034a1bfad0 bsp=e00000034a1b1518 [<a000000200358930>] xfs_iread+0xd0/0x540 [xfs] sp=e00000034a1bfb20 bsp=e00000034a1b14c0 [<a00000020034f840>] xfs_iget_core+0x220/0x1900 [xfs] sp=e00000034a1bfb30 bsp=e00000034a1b1438 [<a0000002003511e0>] xfs_iget+0x2c0/0x360 [xfs] sp=e00000034a1bfb40 bsp=e00000034a1b13c0 [<a0000002003a8070>] xfs_trans_iget+0x5f0/0x920 [xfs] sp=e00000034a1bfb40 bsp=e00000034a1b1378 [<a0000002003591d0>] xfs_ialloc+0x150/0xd40 [xfs] sp=e00000034a1bfb50 bsp=e00000034a1b1300 [<a0000002003aac30>] xfs_dir_ialloc+0x110/0x740 [xfs] sp=e00000034a1bfb60 bsp=e00000034a1b1238 [<a0000002003b9f60>] xfs_create+0x840/0x1340 [xfs] sp=e00000034a1bfba0 bsp=e00000034a1b1140 [<a0000002003d8390>] linvfs_mknod+0x5d0/0x660 [xfs] sp=e00000034a1bfc00 bsp=e00000034a1b10e0 [<a000000100161960>] vfs_create+0x140/0x1a0 sp=e00000034a1bfdb0 bsp=e00000034a1b10a8 [<a000000100162f50>] open_namei+0x1110/0x1260 sp=e00000034a1bfdb0 bsp=e00000034a1b1028 [<a000000100135e80>] filp_open+0x60/0xe0 sp=e00000034a1bfdc0 bsp=e00000034a1b1000 [<a000000100136bb0>] sys_open+0xb0/0x140 sp=e00000034a1bfe30 bsp=e00000034a1b0f80 [<a000000100011de0>] ia64_ret_from_syscall+0x0/0x20 sp=e00000034a1bfe30 bsp=e00000034a1b0f80
here's the inode: xfs_db> dblock 2064, type inode, print: <Fuji^> core.magic = 0x494e <Fuji^> core.mode = 0 <Fuji^> core.version = 1 <Fuji^> core.format = 0 (dev) <Fuji^> core.nlinkv1 = 0 <Fuji^> core.uid = 0 <Fuji^> core.gid = 0 <Fuji^> core.flushiter = 0 <Fuji^> core.atime.sec = Thu Jan 1 01:00:00 1970 <Fuji^> core.atime.nsec = 000000000 <Fuji^> core.mtime.sec = Thu Jan 1 01:00:00 1970 <Fuji^> core.mtime.nsec = 000000000 <Fuji^> core.ctime.sec = Thu Jan 1 01:00:00 1970 <Fuji^> core.ctime.nsec = 000000000 <Fuji^> core.size = 0 <Fuji^> core.nblocks = 0 <Fuji^> core.extsize = 0 <Fuji^> core.nextents = 0 <Fuji^> core.naextents = 0 <Fuji^> core.forkoff = 0 <Fuji^> core.aformat = 0 (dev) <Fuji^> core.dmevmask = 0 <Fuji^> core.dmstate = 0 <Fuji^> core.newrtbm = 0 <Fuji^> core.prealloc = 0 <Fuji^> core.realtime = 0 <Fuji^> core.immutable = 0 <Fuji^> core.append = 0 <Fuji^> core.sync = 0 <Fuji^> core.noatime = 0 <Fuji^> core.nodump = 0 <Fuji^> core.gen = 0 so this is a little odd; we returned EFSCORRUPTED from xfs_itobp because either the magic or the version was wrong, but on disk it looks fine.
Okay, so following the lead given by panic_mask, the testcase is reduced to: # mount fresh filesystem for i in `seq 1 13`; do touch $i; done # now blow up touch 14 Further info when I get kdb and xfsidbg running on ia64. Peter
Additional info. Using 64K pages, I can trigger the bug, however, any other supported page size (4K, 8K, 16K) works OK. This is with default 4K block size (mkfs.xfs). BUT: if block size equals page size (-b size=64k), then 64K pages are OK as well. Overview matrix: +----------+--------+-----+-----+-----+-----+ |page/block| 4k | 8k | 16k | 32k | 64k | +----------+--------+-----+-----+-----+-----+ | 4k | ok | - | - | - | - | +----------+--------+-----+-----+-----+-----+ | 8k | ok | - | - | - | - | +----------+--------+-----+-----+-----+-----+ | 16k | ok | - | - | - | - | +----------+--------+-----+-----+-----+-----+ | 64k |shutdown| - | - |oops | ok | +----------+--------+-----+-----+-----+-----+ HTH, Peter
Update: +----------+--------+--------+----------+-----+-----+ |page/block| 4k | 8k | 16k | 32k | 64k | +----------+--------+--------+----------+-----+-----+ | 4k | ok | - | - | - | - | +----------+--------+--------+----------+-----+-----+ | 8k | ok | - | - | - | - | +----------+--------+--------+----------+-----+-----+ | 16k | ok | - | - | - | - | +----------+--------+--------+----------+-----+-----+ | 64k |shutdown|shutdown|MCE+reboot|oops | ok | +----------+--------+--------+----------+-----+-----+ The oops is below: Unable to handle kernel NULL pointer dereference (address 0000000000000037) kswapd0[23]: Oops 11012296146944 [1] Modules linked in: raid0 md 3w_9xxx button ixgb tg3 ipt_REJECT ipt_state ip_cons Pid: 23, CPU 3, comm: kswapd0 psr : 0000121008026018 ifs : 8000000000000813 ip : [<a0000001001078d1>] Notd ip is at shrink_list+0x1211/0x1280 unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003 rnat: 0000000000000001 bsps: 0000000000001009 pr : 0000456656aa9965 ldrs: 0000000000000000 ccv : 0000000000008005 fpsr: 0009804c8a74433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001001069f0 b6 : a0000001000f6db0 b7 : a0000002002a0200 f6 : 1003e6db6db6db6db6db7 f7 : 0ffdca200000000000000 f8 : 1003e0000000000000000 f9 : 1003e0000000000000000 f10 : 1003e0000000010400000 f11 : 1003e000000003c893320 r1 : a000000100a0b8f0 r2 : 0000000000008005 r3 : 0000000000008001 r8 : 0000000000000001 r9 : a07ffffff224d314 r10 : 0000000000000000 r11 : 0000000000008001 r12 : e0000040fe88fc50 r13 : e0000040fe880000 r14 : 0000000000000005 r15 : ffffffffffffffff r16 : 0000000000000000 r17 : 0000000000000000 r18 : 0000000000008001 r19 : 0000000000000000 r20 : 0000000000008001 r21 : 0000000000000037 r22 : 000000000000004f r23 : 0000000000000004 r24 : 0000000000000005 r25 : 0000000000000004 r26 : 0000000000000005 r27 : 0000000000000005 r28 : 0000000000000000 r29 : 0000000000000009 r30 : 0000000000000005 r31 : 0000000000000008 Call Trace: [<a000000100019ba0>] show_stack+0x80/0xa0 sp=e0000040fe88f820 bsp=e0000040fe881260 [<a000000100040e00>] die+0x200/0x300 sp=e0000040fe88f9f0 bsp=e0000040fe881238 [<a0000001000602a0>] ia64_do_page_fault+0x200/0x9c0 sp=e0000040fe88f9f0 bsp=e0000040fe8811d0 [<a000000100011f80>] ia64_leave_kernel+0x0/0x270 sp=e0000040fe88fa80 bsp=e0000040fe8811d0 [<a0000001001078d0>] shrink_list+0x1210/0x1280 sp=e0000040fe88fc50 bsp=e0000040fe881138 [<a000000100107f30>] shrink_cache+0x5f0/0xde0 sp=e0000040fe88fcf0 bsp=e0000040fe881040 [<a000000100109d80>] shrink_zone+0x1c0/0x220 sp=e0000040fe88fd90 bsp=e0000040fe881000 [<a00000010010a890>] balance_pgdat+0x570/0x680 sp=e0000040fe88fd90 bsp=e0000040fe880f48 [<a00000010010aba0>] kswapd+0x200/0x220 sp=e0000040fe88fdc0 bsp=e0000040fe880f18 [<a00000010001baa0>] kernel_thread_helper+0xe0/0x100 sp=e0000040fe88fe30 bsp=e0000040fe880ef0 [<a000000100009060>] start_kernel_thread+0x20/0x40 sp=e0000040fe88fe30 bsp=e0000040fe880ef0 <6>note: kswapd0[23] exited with preempt_count 1 Unable to handle kernel NULL pointer dereference (address 000000000000002f) pdflush[22]: Oops 11012296146944 [2] Modules linked in: raid0 md 3w_9xxx button ixgb tg3 ipt_REJECT ipt_state ip_cons Pid: 22, CPU 1, comm: pdflush psr : 0000121008026018 ifs : 800000000000038a ip : [<a0000001000e8311>] Notd ip is at __lock_page+0x211/0x2e0 unat: 0000000000000000 pfs : 000000000000038a rsc : 0000000000000003 rnat: e0000040fe87f3d8 bsps: e0000040fe870000 pr : 59595555a5666a59 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e8280 b6 : a0000001002acc60 b7 : a00000020027af40 f6 : 1003e0000000000000000 f7 : 0ffe7b2ec57be5a000000 f8 : 1003e0000000000000001 f9 : 1003e0000000000000001 f10 : 1003e0000000000001000 f11 : 1003e0000000000000000 r1 : a000000100a0b8f0 r2 : e0000040fe87f598 r3 : 000000000000002f r8 : 0000000000000001 r9 : 0000000000000000 r10 : 0000000000000001 r11 : 0000000000000001 r12 : e0000040fe87f560 r13 : e0000040fe870000 r14 : 0000000000008001 r15 : ffffffffffffffff r16 : e0000040fe870ec0 r17 : e0000040fe870eb0 r18 : 0000000000000000 r19 : 0000000000000001 r20 : e000000001e31768 r21 : 0000000000004000 r22 : 0000000000004000 r23 : e0000040fe87f5b8 r24 : e0000040fe87f5b0 r25 : 0000000000000001 r26 : 0000000000008001 r27 : 0000001008026018 r28 : a000000100094cc0 r29 : 0000000000000001 r30 : 0000000000000001 r31 : 0000000000008001 Call Trace: [<a000000100019ba0>] show_stack+0x80/0xa0 sp=e0000040fe87f130 bsp=e0000040fe871cf0 [<a000000100040e00>] die+0x200/0x300 sp=e0000040fe87f300 bsp=e0000040fe871cc8 [<a0000001000602a0>] ia64_do_page_fault+0x200/0x9c0 sp=e0000040fe87f300 bsp=e0000040fe871c68 [<a000000100011f80>] ia64_leave_kernel+0x0/0x270 sp=e0000040fe87f390 bsp=e0000040fe871c68 [<a0000001000e8310>] __lock_page+0x210/0x2e0 sp=e0000040fe87f560 bsp=e0000040fe871c18 [<a0000001000e88f0>] find_lock_page+0x210/0x380 sp=e0000040fe87f5e0 bsp=e0000040fe871bd8 [<a0000001000e8a80>] find_or_create_page+0x20/0x160 sp=e0000040fe87f5e0 bsp=e0000040fe871ba0 [<a0000002002a11c0>] _pagebuf_lookup_pages+0x220/0x6e0 [xfs] sp=e0000040fe87f5e0 bsp=e0000040fe871ac0 [<a0000002002a23a0>] pagebuf_get+0x400/0x420 [xfs] sp=e0000040fe87f5f0 bsp=e0000040fe871a70 [<a000000200284450>] xfs_trans_read_buf+0x310/0x780 [xfs] sp=e0000040fe87f5f0 bsp=e0000040fe871a10 [<a0000002001dc040>] xfs_alloc_read_agf+0xc0/0x580 [xfs] sp=e0000040fe87f5f0 bsp=e0000040fe8719c8 [<a0000002001db6d0>] xfs_alloc_fix_freelist+0x950/0x9e0 [xfs] sp=e0000040fe87f600 bsp=e0000040fe871950 [<a0000002001dca80>] xfs_alloc_vextent+0x580/0x980 [xfs] sp=e0000040fe87f680 bsp=e0000040fe871868 [<a0000002001fb5a0>] xfs_bmap_alloc+0x1040/0x1e40 [xfs] sp=e0000040fe87f680 bsp=e0000040fe871788 [<a000000200201570>] xfs_bmapi+0x730/0x1a60 [xfs] sp=e0000040fe87f6e0 bsp=e0000040fe8715d8 [<a00000020025d170>] xfs_iomap_write_allocate+0x3b0/0x760 [xfs] sp=e0000040fe87f7a0 bsp=e0000040fe871508 [<a00000020025bb90>] xfs_iomap+0x5b0/0x800 [xfs] sp=e0000040fe87f830 bsp=e0000040fe871498 [<a0000002002b1920>] xfs_bmap+0x40/0x60 [xfs] sp=e0000040fe87f870 bsp=e0000040fe871450 [<a00000020029cdb0>] xfs_map_blocks+0x90/0x200 [xfs] sp=e0000040fe87f870 bsp=e0000040fe871418 [<a00000020029f010>] xfs_page_state_convert+0x850/0xb80 [xfs] sp=e0000040fe87f880 bsp=e0000040fe871320 [<a0000002002a0080>] linvfs_writepage+0xe0/0x260 [xfs] sp=e0000040fe87fcc0 bsp=e0000040fe8712e8 [<a000000100196d00>] mpage_writepages+0x400/0x700 sp=e0000040fe87fcd0 bsp=e0000040fe871210 [<a0000001000f6300>] do_writepages+0xe0/0x100 sp=e0000040fe87fd80 bsp=e0000040fe8711e0 [<a000000100192980>] __sync_single_inode+0x100/0x5e0 sp=e0000040fe87fd80 bsp=e0000040fe871190 [<a000000100193580>] sync_sb_inodes+0x4c0/0x860 sp=e0000040fe87fd80 bsp=e0000040fe8710e0 [<a000000100193c20>] writeback_inodes+0x300/0x520 sp=e0000040fe87fd80 bsp=e0000040fe871090 [<a0000001000f5900>] background_writeout+0x120/0x1c0 sp=e0000040fe87fd80 bsp=e0000040fe871040 [<a0000001000f8300>] __pdflush+0x400/0x720 sp=e0000040fe87fdf0 bsp=e0000040fe870f58 [<a0000001000f8660>] pdflush+0x40/0x60 sp=e0000040fe87fdf0 bsp=e0000040fe870f48 [<a0000001000d3270>] kthread+0x170/0x180 sp=e0000040fe87fe20 bsp=e0000040fe870f18 [<a00000010001baa0>] kernel_thread_helper+0xe0/0x100 sp=e0000040fe87fe30 bsp=e0000040fe870ef0 [<a000000100009060>] start_kernel_thread+0x20/0x40 sp=e0000040fe87fe30 bsp=e0000040fe870ef0 <1>Unable to handle kernel NULL pointer dereference (address 0000000000000000) swapper[0]: Oops 11012296146944 [3] Modules linked in: raid0 md 3w_9xxx button ixgb tg3 ipt_REJECT ipt_state ip_cons Pid: 0, CPU 1, comm: swapper psr : 0000121008022018 ifs : 800000000000050e ip : [<a00000010008ffc1>] Notd ip is at __wake_up_common+0x81/0x120 unat: 0000000000000000 pfs : 000000000000038c rsc : 0000000000000003 rnat: 80000000000011a7 bsps: 0000000000000000 pr : 80000000ff959569 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100090110 b6 : a000000100003320 b7 : a00000010013d680 f6 : 1003e0000000000000080 f7 : 1003e0000000000000000 f8 : 1000d8000000000000000 f9 : 1003e0000000000000080 f10 : 10005fffffffff0000000 f11 : 1003e0000000000004000 r1 : a000000100a0b8f0 r2 : e000000001e31768 r3 : e0000040fe87f598 r8 : 0000000000000000 r9 : e0000040fe87f580 r10 : e0000040fe87f590 r11 : 0000000000000034 r12 : e00000003f00fb90 r13 : e00000003f000000 r14 : ba4036bda6309308 r15 : e0000040fe87f598 r16 : e00000003f000ec0 r17 : 0000000000000102 r18 : 0000000000000101 r19 : 0000000000000000 r20 : a0000001007a9f38 r21 : a0000001007a9f48 r22 : 0000000000000000 r23 : a0000001007a9b80 r24 : 1000000000000000 r25 : a00000010085d258 r26 : 084036bda6309308 r27 : 0000001008026018 r28 : a0000001000900d0 r29 : a00000010085d258 r30 : 0000000000000000 r31 : e000000001e31760 Call Trace: [<a000000100019ba0>] show_stack+0x80/0xa0 sp=e00000003f00f760 bsp=e00000003f0014c8 [<a000000100040e00>] die+0x200/0x300 sp=e00000003f00f930 bsp=e00000003f0014a0 [<a0000001000602a0>] ia64_do_page_fault+0x200/0x9c0 sp=e00000003f00f930 bsp=e00000003f001440 [<a000000100011f80>] ia64_leave_kernel+0x0/0x270 sp=e00000003f00f9c0 bsp=e00000003f001440 [<a00000010008ffc0>] __wake_up_common+0x80/0x120 sp=e00000003f00fb90 bsp=e00000003f0013c8 [<a000000100090110>] __wake_up+0xb0/0x160 sp=e00000003f00fb90 bsp=e00000003f001390 [<a0000001000e7c20>] wake_up_page+0x60/0x80 sp=e00000003f00fb90 bsp=e00000003f001378 [<a00000010013d900>] end_buffer_async_write+0x280/0x4c0 sp=e00000003f00fb90 bsp=e00000003f001350 [<a000000100144cf0>] end_bio_bh_io_sync+0x90/0xc0 sp=e00000003f00fbb0 bsp=e00000003f001330 [<a000000100148040>] bio_endio+0x100/0x160 sp=e00000003f00fbb0 bsp=e00000003f001300 [<a00000010036e710>] __end_that_request_first+0x390/0x460 sp=e00000003f00fbb0 bsp=e00000003f001290 [<a0000001003d0d30>] scsi_end_request+0x50/0x280 sp=e00000003f00fbb0 bsp=e00000003f001250 [<a0000001003d1530>] scsi_io_completion+0x270/0x940 sp=e00000003f00fbb0 bsp=e00000003f0011d0 [<a000000100408210>] sd_rw_intr+0x170/0x5a0 sp=e00000003f00fbb0 bsp=e00000003f001188 [<a0000001003c46a0>] scsi_finish_command+0x320/0x340 sp=e00000003f00fbb0 bsp=e00000003f001160 [<a0000001003c4220>] scsi_softirq+0x220/0x280 sp=e00000003f00fbb0 bsp=e00000003f001128 [<a0000001000a7820>] __do_softirq+0x1e0/0x200 sp=e00000003f00fbc0 bsp=e00000003f0010b8 [<a0000001000a78c0>] do_softirq+0x80/0xe0 sp=e00000003f00fbc0 bsp=e00000003f001060 [<a000000100018bb0>] ia64_handle_irq+0x190/0x1c0 sp=e00000003f00fbc0 bsp=e00000003f001028 [<a000000100011f80>] ia64_leave_kernel+0x0/0x270 sp=e00000003f00fbc0 bsp=e00000003f001028 [<a000000100018f00>] ia64_pal_call_static+0xa0/0xc0 sp=e00000003f00fd90 bsp=e00000003f000fd0 [<a00000010001a6a0>] default_idle+0xe0/0x1a0 sp=e00000003f00fd90 bsp=e00000003f000f88 [<a00000010001a8f0>] cpu_idle+0x190/0x260 sp=e00000003f00fe30 bsp=e00000003f000f00 [<a00000010069d500>] start_secondary+0x80/0xa0 sp=e00000003f00fe30 bsp=e00000003f000ef0 [<a000000100008580>] _start+0x260/0x290 sp=e00000003f00fe30 bsp=e00000003f000ef0 <0>Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing
Final matrix: +----------+-----+-----+-----+--------+--------+----------+-----+-----+ |page/block|512b | 1k | 2k | 4k | 8k | 16k | 32k | 64k | +----------+-----+-----+-----+--------+--------+----------+-----+-----+ | 4k | ok | ok | ok | ok | n/a | n/a | n/a | n/a | +----------+-----+-----+-----+--------+--------+----------+-----+-----+ | 8k | ok | ok | ok | ok | ok | n/a | n/a | n/a | +----------+-----+-----+-----+--------+--------+----------+-----+-----+ | 16k | ok | ok | ok | ok | ok | ok | n/a | n/a | +----------+-----+-----+-----+--------+--------+----------+-----+-----+ | 64k | n/a | n/a | n/a |shutdown|shutdown|MCE+reboot|oops | ok | +----------+-----+-----+-----+--------+--------+----------+-----+-----+ As Eric summed up: something about block size < page, + 64k pages, is foo
Created attachment 134 [details] Increase size of the page offset field in an XFS buffer Hi there, Could you try this patch and see if it helps at all with a 64K pagesize? Its a bit of a guess, so odds are this wont be the problem, but ya never know... thanks.
The modified kernel withstood the testcase, but barfed on the 30th touch(1). 2.6.8-rc4 vanilla, same backtrace. Peter
The offset is fixed in the meantime. Given that the other barfing not specified here is either fixed or another more detailed bug has hopefully been opened for it I'm going to close this one.
Umm, really closing now.