xfs
[Top] [All Lists]

Re: XFS kernel BUG at fs/buffer.c:470! with 2.6.28.4

To: Jan Kara <jack@xxxxxxx>
Subject: Re: XFS kernel BUG at fs/buffer.c:470! with 2.6.28.4
From: Alessandro Bono <alessandro.bono@xxxxxxxxx>
Date: Mon, 02 Mar 2009 16:04:59 +0100
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, linux-xfs <linux-xfs@xxxxxxxxxxx>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:from:to:cc :in-reply-to:references:content-type:date:message-id:mime-version :x-mailer:content-transfer-encoding; bh=V+t3Ijo03HF3ifQfuG1iJmbIVLRsHSBAUJke45D9tbI=; b=xQmtIXJ3MmtVi3GNpQUwQ/MblPS/NPgDJ7AFiB6b28XMILDj5QvPjGDeULL6Yti65B HLU9bK/Z17OGLb+ie2KnTrprlhZvQqCcO8PCO02kM0e0AiSfoK5jvCvMEcBy7catYlgy oQjSQZTUdYzrfP1Uq4LDoadUDj1DGrIkUVWO8=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=Y03Tdqu4MQFT5yt3L+fMA1UrdCaZN6M6uDVRpWY0OScGOPQzlIkc4u0W7ORDImW19a yIUWibROo0zx/12EQcr6YnqnSD2qWZf+5uHHNUt1ooWTomao5cibiUATDyehaBvTCb9U bxyFAARKXYq+4ryJrWDXHdQ41QllGNLqNIAtA=
In-reply-to: <20090302133609.GD23880@xxxxxxxxxxxx>
References: <1234011974.7435.11.camel@xxxxxxxxxxxxxxxxx> <20090208222859.GA2532@xxxxxxxxxxxxx> <1234132752.12370.0.camel@xxxxxxxxxxxxxxxxx> <20090208224249.GA11931@xxxxxxxxxxxxx> <1234133120.12370.7.camel@xxxxxxxxxxxxxxxxx> <20090209075308.GA7360@xxxxxxxxxxxxx> <20090210104304.GP8830@disturbed> <1234432077.9204.15.camel@xxxxxxxxxxxxxxxxx> <20090226165838.GA9602@xxxxxxxxxxxxxxxxxxxxxxxx> <1235763895.5743.3.camel@champagne> <20090302133609.GD23880@xxxxxxxxxxxx>
On Mon, 2009-03-02 at 14:36 +0100, Jan Kara wrote:
> On Fri 27-02-09 20:44:55, Alessandro Bono wrote:
> > On Thu, 2009-02-26 at 17:58 +0100, Jan Kara wrote:
> > ....
> > > > any suggestion?
> > >   Hmm, are you still able to reproduce the problem? As I'm looking into
> > > registers in your dump, no register really seems to contain sensible page
> > > flags so it could be some corruption of page pointer. If you are still
> > > able to reproduce, could you please do so with the attached patch
> > > applied? It will dump us much more information... Thanks.
> > > 
> > >                                                           Honza
> > > 
> > 
> > this time with plain xfs, no lvm no dm_crypt
> > just to add some info
> > xfs fs created with this command
> > 
> > mkfs.xfs -f -l lazy-count=1,version=2,size=128m -i attr=2 -d agcount=4
> > -L store /dev/sdb1
>   OK, thanks for the data. I've looked at both your traces and I think the
> most likely cause is a bit-flip in memory. Look: The page address (we got
> from bh->b_page) is ffffe20000885604. IMHO it should be ffffe20000885600
> because if you look e.g. at mapping, it is 000ac69affff8800, which is a
> bogus value but it would look like a valid pointer if we shift it by 32
> bits to the left. The same with private pointer. Index also looks absurdly
> high but low 32-bits are actually 0 and previous 4 bytes in the page
> structure (we have them shown in the upper 4 bytes of mapping pointer)
> contain 0x000ac69a => so given together page index would be 706202 -
> perfectly sensible value.
>   And in the second report it is the same. Also buffer head looks perfectly
> valid in both cases.
>   Such bit flips are usually caused by faulty memory or other HW (io
> controler etc.) so I suggest trying to shuffle the hardware somehow -
> change memory DIMMs as a starter, running memtest if you don't have a spare
> DIMMs but it is not an exception that even though memtest runs just fine
> for a long time, memory is really at fault.

Ok, I'll change DIMMs, retest and report back

Thanks a lot for your support



> 
>                                                               Honza
> 
> > Feb 27 19:20:28 champagne kernel: [ 2664.993272] Buffer ffff8800371e7e70 of 
> > page ffffe20000885604 not private! Some data to debug:
> > Feb 27 19:20:28 champagne kernel: [ 2664.993282] flags: 240000000, mapping: 
> > 000ac69affff8800, index: 54276223274057728, private: 2489e50ffff8800
> > Feb 27 19:20:28 champagne kernel: [ 2664.993288] Buffer: state=125, 
> > block=23795642, b_size=4096, b_this_page=ffff8800371e7e70
> > Feb 27 19:20:28 champagne kernel: [ 2664.993293] Other buffers in the page:
> > Feb 27 19:20:28 champagne kernel: [ 2664.993355] ------------[ cut here 
> > ]------------
> > Feb 27 19:20:28 champagne kernel: [ 2664.993361] Kernel BUG at 
> > ffffffff802b3653 [verbose debug info unavailable]
> > Feb 27 19:20:28 champagne kernel: [ 2664.993366] invalid opcode: 0000 [#1] 
> > SMP
> > Feb 27 19:20:28 champagne kernel: [ 2664.993373] last sysfs file: 
> > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
> > Feb 27 19:20:28 champagne kernel: [ 2664.993378] CPU 1
> > Feb 27 19:20:28 champagne kernel: [ 2664.993382] Modules linked in: xfs 
> > nls_iso8859_1 nls_cp437 vfat fat nls_base usb_storage libusual af_packet 
> > binfmt_misc rfcomm bridge stp llc bnep sco l2cap kvm_intel kvm ppdev ipv6 
> > acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_userspace 
> > cpufreq_ondemand freq_table cpufreq_conservative pci_slot sbs sbshc 
> > iptable_filter ip_tables x_tables dm_crypt sbp2 lp snd_hda_intel snd_hwdep 
> > snd_pcm_oss arc4 snd_pcm snd_page_alloc ecb snd_mixer_oss snd_seq_dummy 
> > snd_seq_oss iwlagn iwlcore usbhid snd_seq_midi rfkill btusb snd_rawmidi 
> > snd_seq_midi_event snd_seq hid parport_pc snd_timer mac80211 pcmcia 
> > bluetooth parport snd_seq_device lis3lv02d leds_hp_disk sdhci_pci sdhci snd 
> > video output ricoh_mmc pcspkr tpm_infineon tpm tpm_bios cfg80211 mmc_core 
> > led_class yenta_socket rsrc_nonstatic pcmcia_core psmouse container wmi 
> > battery ac button soundcore serio_raw iTCO_wdt iTCO_vendor_support evdev 
> > dm_multipath ext3 jbd mbcache sr_mod cdrom sg sd_mod crc_t10dif ohci1394 
> > ata_piix ahci ieee1394 l
> > Feb 27 19:20:28 champagne kernel: bata scsi_mod ehci_hcd uhci_hcd usbcore 
> > e1000e dm_mirror dm_region_hash dm_log dm_snapshot dm_mod thermal processor 
> > fan thermal_sys hwmon fuse
> > Feb 27 19:20:28 champagne kernel: [ 2664.993553] Pid: 8367, comm: 
> > xfsdatad/1 Not tainted 2.6.28.7 #1
> > Feb 27 19:20:28 champagne kernel: [ 2664.993558] RIP: 
> > 0010:[<ffffffff802b3653>]  [<ffffffff802b3653>] 
> > end_buffer_async_write+0x119/0x18d
> > Feb 27 19:20:28 champagne kernel: [ 2664.993573] RSP: 0018:ffff880084555e40 
> >  EFLAGS: 00010246
> > Feb 27 19:20:28 champagne kernel: [ 2664.993578] RAX: 0000000240000000 RBX: 
> > ffff8800371e7e70 RCX: ffffffff80554d00
> > Feb 27 19:20:28 champagne kernel: [ 2664.993584] RDX: ffff8800a7ae3000 RSI: 
> > 0000000000000046 RDI: ffffffff80572f50
> > Feb 27 19:20:28 champagne kernel: [ 2664.993589] RBP: ffff8800371e7e70 R08: 
> > 0000000000000000 R09: 0000000000000000
> > Feb 27 19:20:28 champagne kernel: [ 2664.993594] R10: 000000000000000a R11: 
> > 0000000000018600 R12: ffff8801159fd288
> > Feb 27 19:20:28 champagne kernel: [ 2664.993599] R13: ffffe20000885604 R14: 
> > ffff88013b85ff00 R15: 0000000000000001
> > Feb 27 19:20:28 champagne kernel: [ 2664.993605] FS:  
> > 0000000000000000(0000) GS:ffff88013b803a00(0000) knlGS:0000000000000000
> > Feb 27 19:20:28 champagne kernel: [ 2664.993611] CS:  0010 DS: 0018 ES: 
> > 0018 CR0: 000000008005003b
> > Feb 27 19:20:28 champagne kernel: [ 2664.993616] CR2: 00007f9deefb8ca8 CR3: 
> > 0000000117829000 CR4: 00000000000026e0
> > Feb 27 19:20:28 champagne kernel: [ 2664.993621] DR0: 0000000000000000 DR1: 
> > 0000000000000000 DR2: 0000000000000000
> > Feb 27 19:20:28 champagne kernel: [ 2664.993626] DR3: 0000000000000000 DR6: 
> > 00000000ffff0ff0 DR7: 0000000000000400
> > Feb 27 19:20:28 champagne kernel: [ 2664.993632] Process xfsdatad/1 (pid: 
> > 8367, threadinfo ffff880084554000, task ffff88008c1222b0)
> > Feb 27 19:20:28 champagne kernel: [ 2664.993636] Stack:
> > Feb 27 19:20:28 champagne kernel: [ 2664.993640]  0000000000000282 
> > 0000000000000004 ffff88000248b6c0 ffff8801159fd288
> > Feb 27 19:20:28 champagne kernel: [ 2664.993648]  ffff88013b85fee0 
> > 0000000000000000 ffff88010dc356c0 ffff8801159fd288
> > Feb 27 19:20:28 champagne kernel: [ 2664.993657]  ffff88013b85fee0 
> > ffffffffa055a9f9 ffffffffa055ab6b ffff8801159fd280
> > Feb 27 19:20:28 champagne kernel: [ 2664.993667] Call Trace:
> > Feb 27 19:20:28 champagne kernel: [ 2664.993671]  [<ffffffffa055a9f9>] ? 
> > xfs_destroy_ioend+0x23/0x71 [xfs]
> > Feb 27 19:20:28 champagne kernel: [ 2664.993723]  [<ffffffffa055ab6b>] ? 
> > xfs_end_bio_delalloc+0x0/0x19 [xfs]
> > Feb 27 19:20:28 champagne kernel: [ 2664.993770]  [<ffffffffa055ab6b>] ? 
> > xfs_end_bio_delalloc+0x0/0x19 [xfs]
> > Feb 27 19:20:28 champagne kernel: [ 2664.993815]  [<ffffffff8023fdc2>] ? 
> > run_workqueue+0x79/0xfe
> > Feb 27 19:20:28 champagne kernel: [ 2664.993826]  [<ffffffff8023ff1f>] ? 
> > worker_thread+0xd8/0xe7
> > Feb 27 19:20:28 champagne kernel: [ 2664.993834]  [<ffffffff80243254>] ? 
> > autoremove_wake_function+0x0/0x2e
> > Feb 27 19:20:28 champagne kernel: [ 2664.993842]  [<ffffffff8023fe47>] ? 
> > worker_thread+0x0/0xe7
> > Feb 27 19:20:28 champagne kernel: [ 2664.993849]  [<ffffffff80242f42>] ? 
> > kthread+0x47/0x73
> > Feb 27 19:20:28 champagne kernel: [ 2664.993856]  [<ffffffff8022ec78>] ? 
> > schedule_tail+0x27/0x5f
> > Feb 27 19:20:28 champagne kernel: [ 2664.993864]  [<ffffffff8020c199>] ? 
> > child_rip+0xa/0x11
> > Feb 27 19:20:28 champagne kernel: [ 2664.993872]  [<ffffffff80242efb>] ? 
> > kthread+0x0/0x73
> > Feb 27 19:20:28 champagne kernel: [ 2664.993879]  [<ffffffff8020c18f>] ? 
> > child_rip+0x0/0x11
> > Feb 27 19:20:28 champagne kernel: [ 2664.993885] Code: 10 4c 8b 43 20 48 8b 
> > 13 48 89 de 48 c7 c7 f8 ef 46 80 31 c0 e8 b2 db 13 00 48 8b 5b 08 48 39 eb 
> > 75 d7 49 8b 45 00 f6 c4 08 75 04 <0f> 0b eb fe 49 8b 5d 10 9c 41 5c fa eb 
> > 07 f3 90 f603 10 75 f9
> > Feb 27 19:20:28 champagne kernel: [ 2664.993950] RIP  [<ffffffff802b3653>] 
> > end_buffer_async_write+0x119/0x18d
> > Feb 27 19:20:28 champagne kernel: [ 2664.993959]  RSP <ffff880084555e40>
> > Feb 27 19:20:28 champagne kernel: [ 2664.993985] ---[ end trace 
> > 2d34f97811caf921 ]---
> > 
> 
-- 
---
Cordiali Saluti
Alessandro Bono

<Prev in Thread] Current Thread [Next in Thread>