[Top] [All Lists]

[Bug 411] Reproducible memory corruption, oops, panic with xfs on sata r

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 411] Reproducible memory corruption, oops, panic with xfs on sata raid5 in 2.6
From: bugzilla-daemon@xxxxxxxxxxx
Date: Tue, 10 Feb 2009 11:11:20 -0600
Auto-submitted: auto-generated
In-reply-to: <bug-411-113@xxxxxxxxxxxxxxxx/bugzilla/>
References: <bug-411-113@xxxxxxxxxxxxxxxx/bugzilla/>

--- Comment #6 from Andras Korn <korn-sgi.com@xxxxxxxxxxxxxxxxxxxxxx>  
2009-02-10 11:11:19 CST ---
Well, it would appear I'm hitting something similar, albeit maybe not the same.

This is now a completely different computer (amd64 with
linux- (linux-vserver patch)). I have 4 SATA disks with
softraid5, LUKS on top of md and LVM on top of LUKS. I was copying, rather
slowly, data over from a different computer using rsync, when suddenly the box
froze solid, without even sending an oops over netconsole. On the physical
console I could only see "eth0: link down."

I rebooted, but when I attempted to mount one of the xfs filesystems, I
received the following (captured via netconsole): BUG: unable to handle kernel NULL pointer dereference  at 0000000000000010 IP:  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160 PGD 11d191067 PUD 11c91e067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/block/dm-12/removable CPU 0 Modules linked in:  aes_x86_64  aes_generic  dummy  it87_wdt  it87  hwmon_vid  sg  sr_mod  cdrom  amd74xx  ide_core  ata_generic  pata_acpi  ohci_hcd  k8temp  hwmon  ehci_hcd  i2c_nforce2  pata_amd  usbcore  fan  button Pid: 4197, comm: mount Not tainted
#4 RIP: 0010:[<ffffffff803cf63a>]  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160 RSP: 0018:ffff88011818b9f8  EFLAGS: 00010256 RAX: 0000000000000000 RBX: ffff88011ac01200 RCX: ffff880117db5000 RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000010 RBP: ffff88011818ba28 R08: 0000000000000000 R09: 0000000000000010 R10: 000000000000007f R11: 0000000000000000 R12: 0000000000000000 R13: ffff88011ac01c80 R14: ffff88011ac019c0 R15: 0000000000001000 FS:  00007f1d512af7c0(0000) GS:ffffffff80880040(0000)
knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000010 CR3: 000000011c897000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mount (pid: 4197, threadinfo ffff88011818a000, task
ffff8801180696d0) Stack:  ffff880117db5000  000000001c5cb0d8  ffff88011818bb58  ffffc20008f52b70  ffffc20008f52b7c  000000000000002b  ffff88011818ba88  ffffffff803d0a74  000000011818ba58  ffff88011788b000  ffff88011818bb28  ffff88011803d600 Call Trace:  [<ffffffff803d0a74>] xlog_recover_process_data+0x1c4/0x250  [<ffffffff803d100a>] xlog_do_recovery_pass+0x1ca/0x890  [<ffffffff803e564e>] ? xfs_buf_free+0x5e/0x90  [<ffffffff803d171a>] xlog_do_log_recovery+0x4a/0x90  [<ffffffff803d177c>] xlog_do_recover+0x1c/0xf0  [<ffffffff803d2f2c>] xlog_recover+0x7c/0x90  [<ffffffff803cbbdc>] xfs_log_mount+0xbc/0x1a0  [<ffffffff803d5c97>] xfs_mountfs+0x347/0x680  [<ffffffff803e1e9e>] ? kmem_zalloc+0x2e/0x40  [<ffffffff803d6869>] ? xfs_mru_cache_create+0x139/0x170  [<ffffffff803ee8d2>] xfs_fs_fill_super+0x282/0x450  [<ffffffff802d2df2>] get_sb_bdev+0x162/0x190  [<ffffffff803ee650>] ? xfs_fs_fill_super+0x0/0x450  [<ffffffff802aa116>] ? kstrdup+0x56/0x70  [<ffffffff803ebf03>] xfs_fs_get_sb+0x13/0x20  [<ffffffff802d23b1>] vfs_kern_mount+0x81/0x250  [<ffffffff802d25ee>] do_kern_mount+0x4e/0x110  [<ffffffff802eb220>] do_mount+0x230/0x9c0  [<ffffffff802eba68>] sys_mount+0xb8/0xf0  [<ffffffff8020bb1b>] system_call_fastpath+0x16/0x1b Code: 42 18 85 c0 75 5b 49 8b 46 28 48 8b 58 08 44 8b 63 18 45 85 e4 0f 84 86 00 00 00 48 63 43 14 48 8b 53 20 48 8b 4d d0 48 c1 e0 04 lpr.emerg: 89 0c 02 48 63 43 14 48 c1 e0 04 44 89 7c 02 08 RIP  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160  RSP <ffff88011818b9f8> CR2: 0000000000000010 Kernel panic - not syncing: Fatal exception

This now appears to be reproducible, at least it happened again when I tried to
mount that xfs again. I haven't run xfs_repair on it yet.

The option for 4k stacks doesn't even seem to be present in this kernel

Is there a way to obtain a metadata dump of the filesystem I could give you?
The entire fs is 3G and contains some semi-sensitive data, so I wouldn't be
entirely comfortable submitting it.

Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>