On Sun, Feb 22, 2004 at 04:49:41PM +0100, Mikael Wahlberg wrote:
> Description:
>
> On heavy FTP Load (About 1Gbit/s) running both reads and writes on two
> ServeRAID6m Raid5 controllers merged together to one filesystem with
> Raidtools we see the error below. The filesystem gets totally hanged up.
> Currently with XFS, but JFS gets the same problem (Actually even more often).
What does the JFS oops look like?
> Feb 22 15:00:53 mserv1 kernel: [<c011e54a>] __wake_up_common+0x3a/0x60
> Feb 22 15:00:53 mserv1 kernel: [<c011e5af>] __wake_up+0x3f/0x70
This doesn't make a lot of sense, there's only two mrlocks in XFS, and
they're in the inodes that have well-defined and understood lifetime rules.
OTOH the previos oops might have messed quite a bit up in your system.
did you run memtest86 on the box? do you some strange patches applied or
external modules loaded? What's your .config?
> Feb 22 15:00:54 mserv1 kernel: [<c011b150>] do_page_fault+0x0/0x523
> Feb 22 15:00:54 mserv1 kernel: [<c010baf5>] error_code+0x2d/0x38
> Feb 22 15:00:54 mserv1 kernel: [<c011e5b5>] __wake_up+0x45/0x70
> Feb 22 15:00:54 mserv1 kernel: [<c011e54a>] __wake_up_common+0x3a/0x60
> Feb 22 15:00:55 mserv1 kernel: [<c011e5af>] __wake_up+0x3f/0x70
> Feb 22 15:00:55 mserv1 kernel: [<c0259e62>] mrunlock+0x82/0xb0
> Feb 22 15:00:55 mserv1 kernel: [<c0259b00>] mraccessf+0xc0/0xe0
> Feb 22 15:00:55 mserv1 kernel: [<c023038e>] xfs_iunlock+0x3e/0x80
> Feb 22 15:00:55 mserv1 kernel: [<c023727b>] xfs_iomap+0x3bb/0x540
> Feb 22 15:00:55 mserv1 kernel: [<c0163fc7>] bio_alloc+0xd7/0x1c0
> Feb 22 15:00:55 mserv1 kernel: [<c025a17a>] map_blocks+0x7a/0x170
> Feb 22 15:00:55 mserv1 kernel: [<c025b40b>] page_state_convert+0x52b/0x6d0
> Feb 22 15:00:55 mserv1 kernel: [<c0236cb9>] xfs_imap_to_bmap+0x39/0x240
> Feb 22 15:00:55 mserv1 kernel: [<c025be48>] linvfs_release_page+0xa8/0xb0
> Feb 22 15:00:55 mserv1 kernel: [<c025bce0>] linvfs_writepage+0x60/0x120
> Feb 22 15:00:55 mserv1 kernel: [<c014990c>] shrink_list+0x41c/0x710
> Feb 22 15:00:55 mserv1 kernel: [<c0149df8>] shrink_cache+0x1f8/0x3d0
> Feb 22 15:00:55 mserv1 kernel: [<c01b3a00>] journal_stop+0x220/0x330
> Feb 22 15:00:55 mserv1 kernel: [<c014a6dc>] shrink_zone+0xbc/0xc0
> Feb 22 15:00:55 mserv1 kernel: [<c014a7a5>] shrink_caches+0xc5/0xe0
> Feb 22 15:00:55 mserv1 kernel: [<c014a87c>] try_to_free_pages+0xbc/0x190
> Feb 22 15:00:55 mserv1 kernel: [<c0143043>] __alloc_pages+0x203/0x370
> Feb 22 15:00:55 mserv1 kernel: [<c01431d5>] __get_free_pages+0x25/0x40
Hmm, from the trace it looks like ->release_page was called from a context
where we can't sleep. XFS defintily doesn't handle that, so the question
is whether the kernel should do it.
|