xfs-masters
[Top] [All Lists]

[Bug 411] Reproducible memory corruption, oops, panic with xfs on sata r

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 411] Reproducible memory corruption, oops, panic with xfs on sata raid5 in 2.6
From: bugzilla-daemon@xxxxxxxxxxx
Date: Tue, 10 Feb 2009 11:11:20 -0600
Auto-submitted: auto-generated
In-reply-to: <bug-411-113@xxxxxxxxxxxxxxxx/bugzilla/>
References: <bug-411-113@xxxxxxxxxxxxxxxx/bugzilla/>
http://oss.sgi.com/bugzilla/show_bug.cgi?id=411





--- Comment #6 from Andras Korn <korn-sgi.com@xxxxxxxxxxxxxxxxxxxxxx>  
2009-02-10 11:11:19 CST ---
Well, it would appear I'm hitting something similar, albeit maybe not the same.

This is now a completely different computer (amd64 with
linux-2.6.28.3+vs2.3.0.36.5 (linux-vserver patch)). I have 4 SATA disks with
softraid5, LUKS on top of md and LVM on top of LUKS. I was copying, rather
slowly, data over from a different computer using rsync, when suddenly the box
froze solid, without even sending an oops over netconsole. On the physical
console I could only see "eth0: link down."

I rebooted, but when I attempted to mount one of the xfs filesystems, I
received the following (captured via netconsole):

192.168.0.99: BUG: unable to handle kernel
192.168.0.99: NULL pointer dereference
192.168.0.99:  at 0000000000000010
192.168.0.99: IP:
192.168.0.99:  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160
192.168.0.99: PGD 11d191067
192.168.0.99: PUD 11c91e067
192.168.0.99: PMD 0
192.168.0.99:
192.168.0.99: Oops: 0002 [#1]
192.168.0.99: SMP
192.168.0.99:
192.168.0.99: last sysfs file: /sys/block/dm-12/removable
192.168.0.99: CPU 0
192.168.0.99:
192.168.0.99: Modules linked in:
192.168.0.99:  aes_x86_64
192.168.0.99:  aes_generic
192.168.0.99:  dummy
192.168.0.99:  it87_wdt
192.168.0.99:  it87
192.168.0.99:  hwmon_vid
192.168.0.99:  sg
192.168.0.99:  sr_mod
192.168.0.99:  cdrom
192.168.0.99:  amd74xx
192.168.0.99:  ide_core
192.168.0.99:  ata_generic
192.168.0.99:  pata_acpi
192.168.0.99:  ohci_hcd
192.168.0.99:  k8temp
192.168.0.99:  hwmon
192.168.0.99:  ehci_hcd
192.168.0.99:  i2c_nforce2
192.168.0.99:  pata_amd
192.168.0.99:  usbcore
192.168.0.99:  fan
192.168.0.99:  button
192.168.0.99:
192.168.0.99: Pid: 4197, comm: mount Not tainted 2.6.28.3-vs2.3.0.36.5-hellgate
#4
192.168.0.99: RIP: 0010:[<ffffffff803cf63a>]
192.168.0.99:  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160
192.168.0.99: RSP: 0018:ffff88011818b9f8  EFLAGS: 00010256
192.168.0.99: RAX: 0000000000000000 RBX: ffff88011ac01200 RCX: ffff880117db5000
192.168.0.99: RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000010
192.168.0.99: RBP: ffff88011818ba28 R08: 0000000000000000 R09: 0000000000000010
192.168.0.99: R10: 000000000000007f R11: 0000000000000000 R12: 0000000000000000
192.168.0.99: R13: ffff88011ac01c80 R14: ffff88011ac019c0 R15: 0000000000001000
192.168.0.99: FS:  00007f1d512af7c0(0000) GS:ffffffff80880040(0000)
knlGS:0000000000000000
192.168.0.99: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
192.168.0.99: CR2: 0000000000000010 CR3: 000000011c897000 CR4: 00000000000006e0
192.168.0.99: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
192.168.0.99: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
192.168.0.99: Process mount (pid: 4197, threadinfo ffff88011818a000, task
ffff8801180696d0)
192.168.0.99: Stack:
192.168.0.99:  ffff880117db5000
192.168.0.99:  000000001c5cb0d8
192.168.0.99:  ffff88011818bb58
192.168.0.99:  ffffc20008f52b70
192.168.0.99:
192.168.0.99:  ffffc20008f52b7c
192.168.0.99:  000000000000002b
192.168.0.99:  ffff88011818ba88
192.168.0.99:  ffffffff803d0a74
192.168.0.99:
192.168.0.99:  000000011818ba58
192.168.0.99:  ffff88011788b000
192.168.0.99:  ffff88011818bb28
192.168.0.99:  ffff88011803d600
192.168.0.99:
192.168.0.99: Call Trace:
192.168.0.99:  [<ffffffff803d0a74>] xlog_recover_process_data+0x1c4/0x250
192.168.0.99:  [<ffffffff803d100a>] xlog_do_recovery_pass+0x1ca/0x890
192.168.0.99:  [<ffffffff803e564e>] ? xfs_buf_free+0x5e/0x90
192.168.0.99:  [<ffffffff803d171a>] xlog_do_log_recovery+0x4a/0x90
192.168.0.99:  [<ffffffff803d177c>] xlog_do_recover+0x1c/0xf0
192.168.0.99:  [<ffffffff803d2f2c>] xlog_recover+0x7c/0x90
192.168.0.99:  [<ffffffff803cbbdc>] xfs_log_mount+0xbc/0x1a0
192.168.0.99:  [<ffffffff803d5c97>] xfs_mountfs+0x347/0x680
192.168.0.99:  [<ffffffff803e1e9e>] ? kmem_zalloc+0x2e/0x40
192.168.0.99:  [<ffffffff803d6869>] ? xfs_mru_cache_create+0x139/0x170
192.168.0.99:  [<ffffffff803ee8d2>] xfs_fs_fill_super+0x282/0x450
192.168.0.99:  [<ffffffff802d2df2>] get_sb_bdev+0x162/0x190
192.168.0.99:  [<ffffffff803ee650>] ? xfs_fs_fill_super+0x0/0x450
192.168.0.99:  [<ffffffff802aa116>] ? kstrdup+0x56/0x70
192.168.0.99:  [<ffffffff803ebf03>] xfs_fs_get_sb+0x13/0x20
192.168.0.99:  [<ffffffff802d23b1>] vfs_kern_mount+0x81/0x250
192.168.0.99:  [<ffffffff802d25ee>] do_kern_mount+0x4e/0x110
192.168.0.99:  [<ffffffff802eb220>] do_mount+0x230/0x9c0
192.168.0.99:  [<ffffffff802eba68>] sys_mount+0xb8/0xf0
192.168.0.99:  [<ffffffff8020bb1b>] system_call_fastpath+0x16/0x1b
192.168.0.99: Code:
192.168.0.99: 42
192.168.0.99: 18
192.168.0.99: 85
192.168.0.99: c0
192.168.0.99: 75
192.168.0.99: 5b
192.168.0.99: 49
192.168.0.99: 8b
192.168.0.99: 46
192.168.0.99: 28
192.168.0.99: 48
192.168.0.99: 8b
192.168.0.99: 58
192.168.0.99: 08
192.168.0.99: 44
192.168.0.99: 8b
192.168.0.99: 63
192.168.0.99: 18
192.168.0.99: 45
192.168.0.99: 85
192.168.0.99: e4
192.168.0.99: 0f
192.168.0.99: 84
192.168.0.99: 86
192.168.0.99: 00
192.168.0.99: 00
192.168.0.99: 00
192.168.0.99: 48
192.168.0.99: 63
192.168.0.99: 43
192.168.0.99: 14
192.168.0.99: 48
192.168.0.99: 8b
192.168.0.99: 53
192.168.0.99: 20
192.168.0.99: 48
192.168.0.99: 8b
192.168.0.99: 4d
192.168.0.99: d0
192.168.0.99: 48
192.168.0.99: c1
192.168.0.99: e0
192.168.0.99: 04
192.168.0.99: lpr.emerg:
192.168.0.99: 89
192.168.0.99: 0c
192.168.0.99: 02
192.168.0.99: 48
192.168.0.99: 63
192.168.0.99: 43
192.168.0.99: 14
192.168.0.99: 48
192.168.0.99: c1
192.168.0.99: e0
192.168.0.99: 04
192.168.0.99: 44
192.168.0.99: 89
192.168.0.99: 7c
192.168.0.99: 02
192.168.0.99: 08
192.168.0.99:
192.168.0.99: RIP
192.168.0.99:  [<ffffffff803cf63a>] xlog_recover_add_to_trans+0x8a/0x160
192.168.0.99:  RSP <ffff88011818b9f8>
192.168.0.99: CR2: 0000000000000010
192.168.0.99: Kernel panic - not syncing: Fatal exception

This now appears to be reproducible, at least it happened again when I tried to
mount that xfs again. I haven't run xfs_repair on it yet.

The option for 4k stacks doesn't even seem to be present in this kernel
anymore.

Is there a way to obtain a metadata dump of the filesystem I could give you?
The entire fs is 3G and contains some semi-sensitive data, so I wouldn't be
entirely comfortable submitting it.

-- 
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>