xfs-masters
[Top] [All Lists]

[Bug 27492] BUG: unable to handle kernel NULL pointer dereference, on hi

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 27492] BUG: unable to handle kernel NULL pointer dereference, on high filesystem io
From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
Date: Wed, 16 Mar 2011 14:56:46 GMT
Auto-submitted: auto-generated
In-reply-to: <bug-27492-470@xxxxxxxxxxxxxxxxxxxxxxxxx/>
References: <bug-27492-470@xxxxxxxxxxxxxxxxxxxxxxxxx/>
https://bugzilla.kernel.org/show_bug.cgi?id=27492





--- Comment #21 from Katharine Manton <kat@xxxxxxxxxxxxxxxxxx>  2011-03-16 
14:56:44 ---
(In reply to comment #20)
> That implies you have run your filesystem out of space and exhausted
> the reserve pool of blocks. Or perhaps you are getting IO errors
> from your hardware.

I've since added a SCSI controller and drive to the test system and see the
same thing happening.

The source fs for rsync contains 584M of files.  The destination is a 2G
partition, freshly formatted each time.

On a recent test (using the SCSI hardware), this:

Mar 16 12:42:44 magnum kernel: Filesystem "sdc2": page discard on page
f7766980, inode 0x40a0fe, offset 0.
Mar 16 12:42:44 magnum kernel: Filesystem "sdc2": XFS internal error
xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c.  Caller 0xc1124a2e

(etc.)

...occurred after rsync had transferred 2.8M of files:

After umount/mount:

# du -sh /mnt/c.1k/
2.8M    /mnt/c.1k/

# df -hT /mnt/c.1k/
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sdc2      xfs    2.0G   14M  2.0G   1% /mnt/c.1k

The same behaviour with PATA controller/drive and SCSI controller/drive seems
to rule out I/O and it doesn't seem to be ENOSPC...

I rebooted with 2.6.32 and knocked up a script to umount/mkfs.xfs/mount/rsync
repeatedly.  67 iterations later with no kernel messages and all files
transferred each time I stopped testing; all seems well with 2.6.32

> Which further implies that you are at ENOSPC, I think. However,
> there should not be a shutdown here due to ENOSPC - all known
> accounting bugs were fixed quite some time ago. If you can isolate
> this problem, please raise a new bug for it.

It's repeatable with SCSI and PATA hardware.  This behaviour seems to have
replaced the oops; I need to boot the unpatched 2.6.37.2 and run the tests
again.

> Which is back to the original problem. If increasing vmalloc space
> doesn't fix your problem then you really, really need to get them VM
> folk to triage and fix the problem (which appears to be vmap area
> fragmentation). The only other thing you can do to avoid this
> is move to x86_64...

How much to increase vmalloc space to, though?  Default on this system is 128M.
 On one system I increased it incrementally; 'vmalloc=768M' seemed to help but
didn't eliminate the problem entirely.  What are the consequences (if any) of
increasing vmalloc space considerably beyond the default?

I'm not sure how to proceed.  Moving to x86_64 isn't an option.  XFS /was/
reliable; I've used it on 32-bit systems from 2.4.19 + XFS 1.3 patch (probably
earlier) and haven't had any problems until >2.6.32.  Something's changed.

I'll do some more testing with ever-larger vmalloc space.  I don't want to
report another bug unless there really is one!

Cheers, Kat

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>