Bug 850 - XFS file system segfaults , repeatedly and 100 % reproducable in 2.6.30 , 2.6.31
: XFS file system segfaults , repeatedly and 100 % reproducable in 2.6.30 , 2.6.31
Status: RESOLVED FIXED
: XFS
XFS kernel code
: Current
: PC Linux
: P5 major
: ---
Assigned To:
:
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-09-19 19:07 CST by
Modified: 2010-01-27 09:57 CST (History)


Attachments
dmesg output (15.00 KB, text/plain)
2009-09-19 19:07 CST, Tobias Gerschner
Details
dmesg from the same filesystem now using 2.6.27 (15.08 KB, application/octet-stream)
2009-09-21 21:46 CST, Tobias Gerschner
Details
metadump image to demonstrate (344.51 KB, application/octet-stream)
2009-09-22 22:38 CST, Eric Sandeen
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description From 2009-09-19 19:07:35 CST
Created an attachment (id=281) [details]
dmesg output

Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4846

This behavior is confirmed on kernel 2.6.30 and now on 2.6.31 . I can reproduce
the behavior every single time . It actually happens on the installation
procedure of our Linux distribution making xfs unusable for us at the moment.
------- Comment #1 From 2009-09-20 20:31:44 CST -------
(In reply to comment #0)
> Created an attachment (id=281) [details] [details]
> dmesg output
> 
> Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4846
> 
> This behavior is confirmed on kernel 2.6.30 and now on 2.6.31 . I can reproduce
> the behavior every single time . It actually happens on the installation
> procedure of our Linux distribution making xfs unusable for us at the moment.

And which distribution would that be?  Where can we get an installation image
to test?

Thanks,
-Eric
------- Comment #2 From 2009-09-20 22:38:03 CST -------
Same assert down the same path (xfs_readdir -> xfs_dir2_leaf_getdents ->
xfs_bmapi) reported here: http://oss.sgi.com/archives/xfs/2009-06/msg00002.html
------- Comment #3 From 2009-09-20 22:48:35 CST -------
The distribution is Yoper Linux. XFS has been our first choice as file system
for years. 

A test image can be downloaded here :
http://development.yoper.com/pub/buildresults/iso e.g.
http://development.yoper.com/pub/buildresults/iso/latest-kde4.iso . As a matter
of fact any ISO will do. Please note that these images are development versions
and may need some manual adjustments. I plan to provide a stripped down version
within the next 24 hours. This would allow you to better focus on the xfs
issue.

To reproduce choose xfs as the file system in the installer. As soon as
mkfontscale is called either during first run or by the new installation of
packages the segfault happens. When you choose a bigger CD that has a desktop
environment pre installed the error may occur during the copying of files to
the new root file system.
------- Comment #4 From 2009-09-20 23:05:54 CST -------
If it's always mkfontscale that blows it up, then if you can modify things to
unmount the fs and run xfs_repair prior to mkfontscale, that could be
interesting.

If it checks clean and it's mkfontscale's activity that -causes- the corruption
then perhaps we can get a snapshot of the filesystem just before that happens.

Thanks,
-Eric
------- Comment #5 From 2009-09-21 00:42:11 CST -------
I am uploading xfs images at the moment to here :
http://development.yoper.com/pub/buildresults/iso/xfs-crash/ The upload may
take a while 2-4 hrs. 

The content should be self explaining : md5sum contains checksums , the
prepared image is an xfs image that can be mounted to trigger the segfault.

To reproduce

1) take backup copy of downloaded image ( xfs-prepared )
2) mount the image 
3) chroot into the mounted image
4) mount at minimum proc
5) run 'smart install xorg-fonts' . This will trigger the segfault.

I am also uploading a copy of the image once it's crashed. 

Please let me know if you need any further information.
------- Comment #6 From 2009-09-21 14:34:47 CST -------
Sure took it's time but upload of both images is finished.
------- Comment #7 From 2009-09-21 15:09:06 CST -------
The plain install worked ok for me:

root@inode / # uname -a
Linux inode.lab.msp.redhat.com 2.6.31-2.fc12.x86_64 #1 SMP Thu Sep 10 00:25:40
EDT 2009 x86_64 GNU/Linux
root@inode / # dmesg -n 8
root@inode / # smart install xorg-fonts

...

  12:Installing xorg-fonts                                
#################################################################################
[100%]

Saving cache...

root@inode / # dmesg | tail
sd 6:0:0:0: [sdc] Write Protect is off
sd 6:0:0:0: [sdc] Mode Sense: 73 00 10 08
sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and
FUA
 sdc: unknown partition table
sd 6:0:0:0: [sdc] Attached SCSI disk
XFS mounting filesystem loop1
Ending clean XFS mount for filesystem: loop1
SELinux: initialized (dev loop1, type xfs), uses xattr
chkconfig used greatest stack depth: 2472 bytes left
smart used greatest stack depth: 2232 bytes left

... but upon image unmount, all hell broke loose.

Looks like lockdep messages but can't actually see it go by.  Will try another
kernel.
------- Comment #8 From 2009-09-21 15:59:40 CST -------
hm on a .29 based kernel w/ less debug turned on I don't see any problems at
all ...
------- Comment #9 From 2009-09-21 16:45:33 CST -------
I will test this here as well with a 2.6.29 kernel , but that's my experience
too. This issue started with 2.6.30 . Will report back within the hour.
------- Comment #10 From 2009-09-21 17:11:01 CST -------
If it really regressed between .29 and .30, then a bisect should be possible
(could bisect only fs/xfs ...)

It'd be tedious but maybe the shortest path in the long run.  :)
------- Comment #11 From 2009-09-21 17:18:26 CST -------
There are not many XFS related commits on the path to 2.6.30 . A quick look
through http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.30 and I'd be
tempted to blame:

commit 4157fd85fc794bb7896b65c0cf686aa89d711d57
Merge: e7c4f03 1b17d76
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jun 2 09:47:21 2009 -0700

    Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

    * 'for-linus' of git://oss.sgi.com/xfs/xfs:
      xfs: prevent deadlock in xfs_qm_shake()
      xfs: fix overflow in xfs_growfs_data_private
      xfs: fix double unlock in xfs_swap_extents()
------- Comment #12 From 2009-09-21 18:35:26 CST -------
Hi,

I hope you have not started bisecting yet ... . I just tried a 2.6.29 kernel
here and had exactly the same behavior.

I will setup a kernel 2.6.28 here and hopefully have a working baseline and
then patch my way up to narrow down the version.
------- Comment #13 From 2009-09-21 19:22:57 CST -------
To be honest this seems more like a corner case in the directory code, but if
you can narrow down a recent regression that'd be great too :)

I wonder why I couldn't it it ....
------- Comment #14 From 2009-09-21 20:03:38 CST -------
I've just reproduced the bug with 2.6.28. I've jumped from 2.6.25 to 2.6.30
though . So my experience with 2.6.26-29 is pretty much non existent. 

I will continue testing and agree to your speculation. I'll keep posting
progress and will go back to 2.6.25 and earlier . Just to rule one out. I have
got xfs on the root / boot partition and also found a way to reliably install
grub . My understanding is though that this is sub optimal . Might this be in
any way related ? Far fetched, but I'd like to rule it out.
------- Comment #15 From 2009-09-21 21:46:28 CST -------
Created an attachment (id=282) [details]
dmesg from the same filesystem now using 2.6.27 

probably useless for this specific bug, but for reference to the discussion.
------- Comment #16 From 2009-09-21 22:16:47 CST -------
As visible in the 2nd attachment the error has changed and the filesystem can
not even be mounted rw . I'll stop testing further kernel versions and will
instead focus on altering other parameters. Any input welcome.
------- Comment #17 From 2009-09-21 23:18:54 CST -------
I installed xorg-fonts w/o running the font update script to try to isolate
things more.

Appears to be dying when mkfontscale runs against the /usr/share/fonts/100dpi
dir.

Hmm actually even just doing an "ls" of /usr/share/fonts/100dpi also blows up.

It's a fairly large dir:

# ls -ld usr/share/fonts/100dpi
drwxr-xr-x. 2 root root 45056 2009-09-21 20:46 usr/share/fonts/100dpi

# xfs_bmap -v usr/share/fonts/100dpi
usr/share/fonts/100dpi:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          682016..682023    1 (145016..145023)     8
   1: [8..15]:         682064..682071    1 (145064..145071)     8
   2: [16..31]:        682136..682151    1 (145136..145151)    16
   3: [32..39]:        682280..682287    1 (145280..145287)     8
   4: [40..47]:        682352..682359    1 (145352..145359)     8
   5: [48..63]:        682424..682439    1 (145424..145439)    16
   6: [64..79]:        682536..682551    1 (145536..145551)    16
   7: [80..87]:        682624..682631    1 (145624..145631)     8
------- Comment #18 From 2009-09-22 16:08:52 CST -------
Hi,

I have a confirmed workaround . Thus I have also identified what exactly
creates the problem. 

The bug is triggered by copying files from an squashfs file system to an xfs
system.

By not using cp but unsquashfs to extract the file system the issue disappears. 

http://powerplant.yoper.com/projects/yoper/repository/revisions/9851/diff?rev=9851&type=sbs

Hope it helps to get a reproducer.
------- Comment #19 From 2009-09-22 22:38:35 CST -------
Created an attachment (id=283) [details]
metadump image to demonstrate

after bunzip2'ing, xfs_mdrestor'ing and mounting, ls mnt/75dpi will demonstrate
the bug under a CONFIG_XFS_DEBUG build.  xfs_bmapi gets *nmap == 0 via
xfs_dir2_leaf_getdents().
------- Comment #20 From 2009-09-23 00:05:49 CST -------
Basic problem seems to be that we had to take a guess at the readdir bufsize
(see xfs_file_readdir comments).

xfs_dir2_leaf_getdents() decrements bufsize as it goes, and eventually that
goes negative.  This adversely affects the readahead window calculations etc
and munges up some other logic, causing us to call into xfs_bmapi() asking for
0 maps.  </hand_wave> - still need to see just where that is but it's late now
for me ;)

This patch:

Index: linux/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- linux.orig/fs/xfs/xfs_dir2_leaf.c
+++ linux/fs/xfs/xfs_dir2_leaf.c
@@ -1089,6 +1089,7 @@ xfs_dir2_leaf_getdents(
         ptr += length;
         curoff += length;
         bufsize -= length;
+        if ((int)bufsize < 0) bufsize = 0;
     }

     /*

stops bufsize from going negative and keeps things sane, but it's ugly.

Will sort out final details of the problem & a better patch tomorrow I hope.
------- Comment #21 From 2009-10-01 14:12:55 CST -------
Do we have a final patch ? I saw the discussion on the list, but no final
patch.
------- Comment #22 From 2009-10-01 14:14:36 CST -------
The sgi maintainer was going to choose the most aesthetically pleasing version
and merge it; not sure if that's done yet.

-eric
------- Comment #23 From 2009-10-01 17:51:21 CST -------
I have another issue with the same corrupted file system. Just tried to recover
some data from a dd copy off of the corrupted fs. The issue happened on
2.6.31.1 .

Please tell me whether this should be handled as a new bug or whether it's fine
as is.

XFS is configured as :

CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_DEBUG=y
CONFIG_VXFS_FS=m

This is after using the following patch : 

--- fs/xfs/xfs_dir2_leaf.c      2009-08-16 10:27:36.000000000 +1200
+++ fs/xfs/xfs_dir2_leaf.c.new  2009-09-24 09:12:57.000000000 +1200
@@ -854,6 +854,7 @@
                         */
                        ra_want = howmany(bufsize + mp->m_dirblksize,
                                          mp->m_sb.sb_blocksize) - 1;
+                         ASSERT(ra_want >= 0);

                        /*
                         * If we don't have as many as we want, and we haven't
@@ -1088,8 +1089,12 @@
                 */
                ptr += length;
                curoff += length;
-               bufsize -= length;
-       }
+               /* bufsize may have just been a guess; don't go negative */
+                              if (bufsize >= length)
+                                      bufsize -= length;
+                              else
+                                      bufsize = 0;
+        }

        /*
         * All done.  Set output offset value to current offset.

--

Here the dmesg output .


XFS mounting filesystem loop0
Starting XFS recovery on filesystem: loop0 (logdev: internal)
Assertion failed: ip->i_d.di_nextents == 0, file: fs/xfs/xfs_inode.c, line:
2155
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:109!
invalid opcode: 0000 [#1] PREEMPT SMP
last sysfs file:
/sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda3/uevent
Modules linked in: iptable_filter ip_tables x_tables binfmt_misc vboxdrv ipv6
af_packet aes_i586 fuse sbs sbshc pci_slot snd_hda_codec_analog snd_hda_intel
snd_hda_codec usbhid btusb snd_hwdep bluetooth snd_pcm rfkill i915 snd_timer
drm_kms_helper snd ppdev firewire_ohci firewire_core crc_itu_t parport_pc
joydev pcmcia hp_accel lis3lv02d tg3 psmouse drm ohci1394 yenta_socket ieee1394
tpm_infineon serio_raw rsrc_nonstatic intel_agp tpm uhci_hcd parport evdev
pcspkr pcmcia_core agpgart tpm_bios ehci_hcd input_polldev i2c_algo_bit
rtc_cmos rtc_core rtc_lib soundcore sg wmi snd_page_alloc video output lib80211
fan ac battery button processor thermal container

Pid: 18393, comm: mount Not tainted (2.6.31_yos-65 #1) HP Compaq 6710b
EIP: 0060:[<c126131b>] EFLAGS: 00010296 CPU: 1
EIP is at assfail+0x1b/0x20
EAX: 00000054 EBX: f44b4840 ECX: 00000003 EDX: 00000001
ESI: f4c5bd48 EDI: c430f000 EBP: f4c5bd48 ESP: f4c5bce8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount (pid: 18393, ti=f4c5a000 task=c3e72480 task.ti=f4c5a000)
Stack:
 c165ab78 c163ab78 c163a6db 0000086b c1232ae6 f44b4840 c3b2b400 c4599000
<0> c4599000 f44b4840 00000005 c430f000 c124c788 f44b4840 f4c5bd48 c3b2b400
<0> c430f000 c124ff7d 00000000 00000004 00000002 f44b4aac 00000007 f44b48ec
Call Trace:
 [<c1232ae6>] ? xfs_ifree+0x86/0x210
 [<c124c788>] ? xfs_trans_ijoin+0xd8/0x120
 [<c124ff7d>] ? xfs_inactive+0x30d/0x570
 [<c10a7a49>] ? clear_inode+0x59/0xe0
 [<c10a7d78>] ? generic_delete_inode+0x118/0x140
 [<c10a6ff4>] ? iput+0x44/0x50
 [<c1240a89>] ? xlog_recover_process_one_iunlink+0x149/0x170
 [<c1240b32>] ? xlog_recover_process_iunlinks+0x82/0x110
 [<c1240c59>] ? xlog_recover_finish+0x99/0xf0
 [<c124644a>] ? xfs_mountfs+0x55a/0x780
 [<c1253d25>] ? kmem_zalloc+0x15/0x50
 [<c1246ed3>] ? xfs_mru_cache_create+0xf3/0x130
 [<c125fba2>] ? xfs_fs_fill_super+0x1c2/0x2d0
 [<c10976ec>] ? get_sb_bdev+0x11c/0x150
 [<c1092b4e>] ? pcpu_alloc+0x1ce/0x1f0
 [<c125dbc0>] ? xfs_fs_get_sb+0x20/0x30
 [<c125f9e0>] ? xfs_fs_fill_super+0x0/0x2d0
 [<c1096e45>] ? vfs_kern_mount+0x65/0x120
 [<c1096f5d>] ? do_kern_mount+0x3d/0x90
 [<c10ac706>] ? do_new_mount+0x86/0xc0
 [<c10aca59>] ? do_mount+0x1f9/0x220
 [<c10acaf2>] ? sys_mount+0x72/0xb0
 [<c1002ec4>] ? sysenter_do_call+0x12/0x26
Code: 00 e8 5a f9 0a 00 83 c4 14 c3 8d b6 00 00 00 00 83 ec 10 89 4c 24 0c 89
54 24 08 89 44 24 04 c7 04 24 78 ab 65 c1 e8 e5 20 dd ff <0f> 0b eb fe 90 55 57
89 cf 56 89 c6 53 b8 40 0d 82 c1 83 ec 0c
EIP: [<c126131b>] assfail+0x1b/0x20 SS:ESP 0068:f4c5bce8
---[ end trace 9fdfbb3d24581a78 ]---
------- Comment #24 From 2009-10-01 20:15:55 CST -------
That's most likely an unrelated, new issue.  Can you repeat it?  If so and it
only happens w/ the patch then leave it here; else open a new bug please.

Thanks,
-Eric
------- Comment #25 From 2009-10-01 20:22:54 CST -------
(In reply to comment #24)
> That's most likely an unrelated, new issue.  Can you repeat it?  If so and it
> only happens w/ the patch then leave it here; else open a new bug please.
> 
> Thanks,
> -Eric

It's 100 % reproducable on 2.6.30 and 2.6.31.1 including the patch . Filed as
#851
------- Comment #26 From 2010-01-26 18:03:21 CST -------
(In reply to comment #20)
> Basic problem seems to be that we had to take a guess at the readdir bufsize
> (see xfs_file_readdir comments).
> 
> xfs_dir2_leaf_getdents() decrements bufsize as it goes, and eventually that
> goes negative.  This adversely affects the readahead window calculations etc
> and munges up some other logic, causing us to call into xfs_bmapi() asking for
> 0 maps.
> [...]
> Will sort out final details of the problem & a better patch tomorrow I hope.

Thanks for the explanation! Any news about a final fix on that? I've ran into
this bug yesterday. A productive system crashed repeatedly, the XFS settings
are almost similar to Tobias':

CONFIG_XFS_FS=y
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_RT=y
CONFIG_XFS_DEBUG=y
# CONFIG_VXFS_FS is not set

System: Gentoo, 64Bit, Tested with 2.6.28-hardened-r7/r9, 2.6.29-r3/r4/r5,
2.6.30-r4/r5, 2.6.31-r6 - every time the same behavior. Even just running
emerge --metadata or eix-update after logging in crashed it:

Jan 25 22:40:31 mail ------------[ cut here ]------------
Jan 25 22:40:31 mail kernel BUG at fs/xfs/support/debug.c:109!
Jan 25 22:40:31 mail invalid opcode: 0000 [#1] SMP
Jan 25 22:40:31 mail last sysfs file: /sys/block/sda/uevent
Jan 25 22:40:31 mail CPU 0
Jan 25 22:40:31 mail Modules linked in: sr_mod cdrom ide_pci_generic ide_core
ata_piix libata ehci_hcd uhci_hcd bnx2 psmouse pcspkr
Jan 25 22:40:31 mail Pid: 4465, comm: python2.6 Not tainted 2.6.31-gentoo-r6 #1
PowerEdge R710
Jan 25 22:40:31 mail RIP: 0010:[<ffffffff8116bd8d>]  [<ffffffff8116bd8d>]
assfail+0x1a/0x1e
Jan 25 22:40:31 mail RSP: 0018:ffff88022e1cbbc8  EFLAGS: 00010292
Jan 25 22:40:31 mail RAX: 0000000000000045 RBX: 0000000000000000 RCX:
000000000000dfd6
Jan 25 22:40:31 mail RDX: ffffc90000000000 RSI: 0000000000000046 RDI:
ffff88022e1cba18
Jan 25 22:40:31 mail RBP: ffff88022e1cbbc8 R08: ffffffff8152cb02 R09:
0000000000000001
Jan 25 22:40:31 mail R10: 00000000ffffffff R11: 000000000000000a R12:
0000000000000002
Jan 25 22:40:31 mail R13: ffff880207b91480 R14: 0000000800000000 R15:
000000000000d000
Jan 25 22:40:31 mail FS:  00007fafc44256f0(0000) GS:ffffc90000000000(0000)
knlGS:0000000000000000
Jan 25 22:40:31 mail CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 22:40:31 mail CR2: 000000000333f007 CR3: 000000012f2bb000 CR4:
00000000000006f0
Jan 25 22:40:31 mail DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Jan 25 22:40:31 mail DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Jan 25 22:40:31 mail Process python2.6 (pid: 4465, threadinfo ffff88022e1ca000,
task ffff88022e6690a0)
Jan 25 22:40:31 mail Stack:
Jan 25 22:40:31 mail ffff88022e1cbda8 ffffffff81120db7 ffff88022e1cbc68
ffffffff8108c92c
Jan 25 22:40:31 mail <0> 8000000203b50067 0000000000000206 0000000100000000
000000000333f007
Jan 25 22:40:31 mail <0> ffff88012ebb77c0 ffff88012eb700c8 00000000000009f8
0000000000000292
Jan 25 22:40:31 mail Call Trace:
Jan 25 22:40:31 mail [<ffffffff81120db7>] xfs_bmapi+0x62/0x15d2
Jan 25 22:40:31 mail [<ffffffff8108c92c>] ? handle_mm_fault+0x655/0x6c6
Jan 25 22:40:31 mail [<ffffffff8105329b>] ? up_read+0x9/0xb
Jan 25 22:40:31 mail [<ffffffff81163cb9>] ? xfs_buf_free+0xc3/0xcc
Jan 25 22:40:31 mail [<ffffffff81163db9>] ? xfs_buf_rele+0xf7/0x100
Jan 25 22:40:31 mail [<ffffffff81159a3f>] ? xfs_trans_brelse+0x238/0x241
Jan 25 22:40:31 mail [<ffffffff8112b311>] ? xfs_da_brelse+0xab/0xd0
Jan 25 22:40:31 mail [<ffffffff810b1a3e>] ? filldir+0x6e/0xbd
Jan 25 22:40:31 mail [<ffffffff81133ca4>] xfs_dir2_leaf_getdents+0x23e/0x683
Jan 25 22:40:31 mail [<ffffffff8113fa44>] ? xfs_iext_get_ext+0x5e/0x8a
Jan 25 22:40:31 mail [<ffffffff810b19d0>] ? filldir+0x0/0xbd
Jan 25 22:40:31 mail [<ffffffff810b19d0>] ? filldir+0x0/0xbd
Jan 25 22:40:31 mail [<ffffffff8112f232>] xfs_readdir+0xdd/0xec
Jan 25 22:40:31 mail [<ffffffff810b19d0>] ? filldir+0x0/0xbd
Jan 25 22:40:31 mail [<ffffffff81164d3b>] xfs_file_readdir+0x34/0x43
Jan 25 22:40:31 mail [<ffffffff810b1b97>] vfs_readdir+0x6a/0x9f
Jan 25 22:40:31 mail [<ffffffff810b1d0a>] sys_getdents+0x7d/0xc4
Jan 25 22:40:31 mail [<ffffffff812e3b5f>] ? page_fault+0x1f/0x30
Jan 25 22:40:31 mail [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b
Jan 25 22:40:31 mail Code: 39 81 c7 44 24 08 01 00 00 00 e8 f0 8d 02 00 c9 c3
55 89 d1 31 c0 48 89 f2 48 89 fe 48 c7 c7 25 24 39 81 48 89 e5 e8 94 56 17 00
<0f> 0b eb fe 55 48 89 e5 41 57 41 56 49 89 d6 41 55 49 89 cd 41
Jan 25 22:40:31 mail RIP  [<ffffffff8116bd8d>] assfail+0x1a/0x1e
Jan 25 22:40:31 mail RSP <ffff88022e1cbbc8>
Jan 25 22:40:31 mail ---[ end trace bbbeb21e08418d1c ]---

The 'if bufzize < 0..'-patch from Eric brought the system back to live, but it
still feels worrysome, because there must be something wrong in the fs that
makes XFS crash as it worked well before & w/o the patch, and it looks like
xfs_repair isn't able to fix it yet?

Interestingly the FS worked always fine when booting a Gentoo minimal CD and
mounting/chrooting into the existing system. The Gentoo Live-CD kernel has been
probably compiled w/o XFS_DEBUG, or is there any other difference?

During all the crashes, my root partition got weak (xfs_check showed errors). I
dd'd a copy of the partition before I xfs_repaired it. I'd like to attach the
output of xfs_check and xfs_repair -n, but to be honest I'm (please don't kill
me) not used to Bugzilla and dunno how to glue an attachment to this post.
halp?
------- Comment #27 From 2010-01-26 18:29:30 CST -------
Was fixed in commit
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8e69ce147127a0235e8d1f2b75ea214be78c61b3
in 2.6.32

-Eric
------- Comment #28 From 2010-01-26 19:25:48 CST -------
(In reply to comment #27)
> Was fixed in commit
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8e69ce147127a0235e8d1f2b75ea214be78c61b3
> in 2.6.32
> 
> -Eric

Thanks for the update! I'm still not sure about the consistency of my xfs fs.
Can you explain why my xfs fs broke apart without the patch even though it
worked before? Alex
------- Comment #29 From 2010-01-27 09:57:26 CST -------
(In reply to comment #28)

> Thanks for the update! I'm still not sure about the consistency of my xfs fs.
> Can you explain why my xfs fs broke apart without the patch even though it
> worked before? Alex

Did you recently turn on XFS_DEBUG?

(FWIW, turn it off - you're not an xfs developer ;)

Other than that, no, not sure.  But if you apply the patch referenced in my
comment (or get gentoo to do it) it should take care of the problem.

-Eric