https://bugzilla.kernel.org/show_bug.cgi?id=27492
--- Comment #20 from Dave Chinner <david@xxxxxxxxxxxxx> 2011-03-14 21:45:02 ---
On Mon, Mar 14, 2011 at 04:51:49PM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=27492
>
>
>
>
>
> --- Comment #19 from Katharine Manton <kat@xxxxxxxxxxxxxxxxxx> 2011-03-14
> 16:51:33 ---
> Patch applied to 2.6.36-gentoo-r5 and 'vanilla' 2.6.37.2
>
> Both exhibit the same behaviour, so I'll stick with 2.6.37.2 now.
>
> Running rsync, I now see a lot of:
>
> Mar 14 15:50:00 magnum kernel: Filesystem "sdb2": page discard on page
> f777d9a0, inode 0x2000f1, offset 0.
> Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page
> f77549e0, inode 0x600452, offset 0.
> Mar 14 15:50:10 magnum kernel: Filesystem "sdb2": page discard on page
> f77549c0, inode 0x600453, offset 0.
That implies you have run your filesystem out of space and exhausted
the reserve pool of blocks. Or perhaps you are getting IO errors
from your hardware.
> ...eventually and inevitably followed by:
>
> Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": XFS internal error
> xfs_trans_cancel at line 1815 of file fs/xfs/xfs_trans.c. Caller 0xc1124a2e
> Mar 14 15:50:20 magnum kernel:
> Mar 14 15:50:20 magnum kernel: Pid: 1835, comm: rsync Not tainted 2.6.37.2-p1
> #2
> Mar 14 15:50:20 magnum kernel: Call Trace:
> Mar 14 15:50:20 magnum kernel: [<c1109d0a>] xfs_error_report+0x2c/0x2e
> Mar 14 15:50:20 magnum kernel: [<c1121153>] xfs_trans_cancel+0x4b/0xc9
> Mar 14 15:50:20 magnum kernel: [<c1124a2e>] ? xfs_create+0x40f/0x460
> Mar 14 15:50:20 magnum kernel: [<c1124a2e>] xfs_create+0x40f/0x460
> Mar 14 15:50:20 magnum kernel: [<c112d2fe>] xfs_vn_mknod+0xc8/0x153
> Mar 14 15:50:20 magnum kernel: [<c112d3a2>] xfs_vn_create+0xa/0xc
> Mar 14 15:50:20 magnum kernel: [<c108a707>] vfs_create+0x60/0xaa
> Mar 14 15:50:20 magnum kernel: [<c112d398>] ? xfs_vn_create+0x0/0xc
> Mar 14 15:50:20 magnum kernel: [<c108b43c>] do_last+0x290/0x511
> Mar 14 15:50:20 magnum kernel: [<c108cc21>] do_filp_open+0x19a/0x47c
> Mar 14 15:50:20 magnum kernel: [<c1021bda>] ? get_parent_ip+0xb/0x31
> Mar 14 15:50:20 magnum kernel: [<c10223d4>] ? sub_preempt_count+0x7c/0x89
> Mar 14 15:50:20 magnum kernel: [<c1094700>] ? alloc_fd+0xbd/0xca
> Mar 14 15:50:20 magnum kernel: [<c1081891>] do_sys_open+0x44/0xc0
> Mar 14 15:50:20 magnum kernel: [<c108194f>] sys_open+0x1e/0x26
> Mar 14 15:50:20 magnum kernel: [<c100270c>] sysenter_do_call+0x12/0x22
> Mar 14 15:50:20 magnum kernel: xfs_force_shutdown(sdb2,0x8) called from line
> 1816 of file fs/xfs/xfs_trans.c. Return address = 0xc1121169
> Mar 14 15:50:20 magnum kernel: Filesystem "sdb2": Corruption of in-memory data
> detected. Shutting down filesystem: sdb2
> Mar 14 15:50:20 magnum kernel: Please umount the filesystem, and rectify the
> problem(s)
> Mar 14 15:50:35 magnum kernel: Filesystem "sdb2": xfs_log_force: error 5
> returned.
> Mar 14 15:51:35 magnum last message repeated 2 times
Which further implies that you are at ENOSPC, I think. However,
there should not be a shutdown here due to ENOSPC - all known
accounting bugs were fixed quite some time ago. If you can isolate
this problem, please raise a new bug for it.
> At this point, I can umount/mount/umount/xfs_check/mount and run
> rsync again. The above will happen a few times; each time XFS
> recovery succeeds on mount and xfs_check doesn't report anything.
> After repeating the above a few times, eventually this happens:
>
> Mar 14 16:03:12 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:03:12 magnum kernel: xfs_buf_get: failed to map pages
Which is back to the original problem. If increasing vmalloc space
doesn't fix your problem then you really, really need to get them VM
folk to triage and fix the problem (which appears to be vmap area
fragmentation). The only other thing you can do to avoid this
is move to x86_64...
> Mar 14 16:04:24 magnum last message repeated 2 times
>
> After this has occurred:
>
> magnum ~ # umount /mnt/1k
> magnum ~ # mount /mnt/1k
> mount: Cannot allocate memory
>
> Mar 14 16:10:49 magnum kernel: XFS mounting filesystem sdb2
> Mar 14 16:10:49 magnum kernel: alloc_vmap_area: 1561 callbacks suppressed
> Mar 14 16:10:49 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:10:49 magnum kernel: xfs_buf_get_uncached: failed to map pages
> Mar 14 16:10:49 magnum kernel: XFS: log mount failed
>
> At this point, the only recourse is to reboot. Destroying the log doesn't
> help:
>
> magnum ~ # xfs_repair -L /dev/sdb2
> [...]
> magnum ~ # mount /mnt/1k
>
> mount: Cannot allocate memory
> Mar 14 16:15:35 magnum kernel: XFS mounting filesystem sdb2
> Mar 14 16:15:35 magnum kernel: vmap allocation for size 4194304 failed: use
> vmalloc=<size> to increase size.
> Mar 14 16:15:35 magnum kernel: xfs_buf_get_uncached: failed to map pages
> Mar 14 16:15:35 magnum kernel: XFS: log mount failed
Of course not - mounting still needs to allocate the memory for the
log buffers. Once the VM is screwed, rebooting is your only option.
> At least the oops is fixed by your patch, as expected.
OK, I'll push it forwards.
Cheers,
Dave.
--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
|