Chris,<div><br></div><div>Thanks for the reply. You have been a great help.</div><div><br></div><div>Do you know if these changes were implemented any farther back than 3.2? I wouldn't feel comfortable running a release candidate kernel in a production environment.</div>
<div><br></div><div>Thanks again<br><br><div class="gmail_quote">On Fri, Dec 9, 2011 at 6:55 AM, Christoph Hellwig <span dir="ltr"><<a href="mailto:hch@infradead.org">hch@infradead.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Thu, Dec 08, 2011 at 01:03:51PM -0500, Ryan C. England wrote:<br>
> I am looking for assistance on XFS which is why I have joined this mailing<br>
> list. I'm receiving a stack overflow on our file server. The server is<br>
> running Scientific Linux 6.1 with the following kernel,<br>
> 2.6.32-131.21.1.el6.x86_64.<br>
><br>
> This is causing random reboots which is more annoying than anything. I<br>
> found a couple of links in the archives but wasn't quite sure how to apply<br>
> this patch. I can provide whatever information necessary in order for<br>
> assistance in troubleshooting.<br>
<br>
</div>It's really mostly an issue with the VM page reclaim and writeback<br>
code. The kernel still has the old balance dirty pages code which calls<br>
into writeback code from the stack of the write system call, which<br>
already comes from NFSD with massive amounts of stack used. Then<br>
the writeback code calls into XFS to write data out, then you get the<br>
full XFS btree code, which then ends up in kmalloc and memory reclaim.<br>
<br>
You probably have only a third of the stack actually used by XFS, the<br>
rest is from NFSD/writeback code and page reclaim. I don't think any<br>
of this is easily fixable in a 2.6.32 codebase. Current mainline 3.2-rc<br>
now has the I/O-less balance dirty pages which will basically split the<br>
stack footprint in half, but it's an invasive change to the writeback<br>
code that isn't easily backportable.<br>
<div><div class="h5"><br>
> Dec 6 20:27:55 localhost kernel: ------------[ cut here ]------------<br>
> Dec 6 20:27:55 localhost kernel: WARNING: at arch/x86/kernel/irq_64.c:47<br>
> handle_irq+0x8f/0xa0() (Not tainted)<br>
> Dec 6 20:27:55 localhost kernel: Hardware name: X8DTH-i/6/iF/6F<br>
> Dec 6 20:27:55 localhost kernel: do_IRQ: nfsd near stack overflow<br>
> (cur:ffff880622208000,sp:ffff880622208160)<br>
> Dec 6 20:27:55 localhost kernel: Modules linked in: mpt2sas<br>
> scsi_transport_sas raid_class mptctl mptbase nfsd lockd nfs_acl auth_rpcgss<br>
> autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ip6t_REJECT<br>
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter<br>
> ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log ses enclosure<br>
> ixgbe mdio microcode igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt<br>
> iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext4 mbcache<br>
> jbd2 megaraid_sas(U) sd_mod crc_t10dif ahci dm_mod [last unloaded:<br>
> scsi_wait_scan]<br>
> Dec 6 20:27:55 localhost kernel: Pid: 2898, comm: nfsd Not tainted<br>
> 2.6.32-131.21.1.el6.x86_64 #1<br>
> Dec 6 20:27:55 localhost kernel: Call Trace:<br>
> Dec 6 20:27:55 localhost kernel: <IRQ> [<ffffffff81067097>] ?<br>
> warn_slowpath_common+0x87/0xc0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8106f6da>] ?<br>
> __do_softirq+0x11a/0x1d0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81067186>] ?<br>
> warn_slowpath_fmt+0x46/0x50<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8100c2cc>] ?<br>
> call_softirq+0x1c/0x30<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8100dfcf>] ?<br>
> handle_irq+0x8f/0xa0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff814e310c>] ? do_IRQ+0x6c/0xf0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8100bad3>] ?<br>
> ret_from_intr+0x0/0x11<br>
> Dec 6 20:27:55 localhost kernel: <EOI> [<ffffffff8115b80f>] ?<br>
> kmem_cache_free+0xbf/0x2b0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811a2542>] ?<br>
> free_buffer_head+0x22/0x50<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811a2919>] ?<br>
> try_to_free_buffers+0x79/0xc0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0259a9c>] ?<br>
> xfs_vm_releasepage+0xbc/0x130 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110c6c0>] ?<br>
> try_to_release_page+0x30/0x60<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811262c1>] ?<br>
> shrink_page_list.clone.0+0x4f1/0x5c0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81126688>] ?<br>
> shrink_inactive_list+0x2f8/0x740<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8111f7f6>] ?<br>
> free_pcppages_bulk+0x2b6/0x390<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811278df>] ?<br>
> shrink_zone+0x38f/0x520<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811646f8>] ?<br>
> __mem_cgroup_uncharge_common+0x198/0x270<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81128684>] ?<br>
> zone_reclaim+0x354/0x410<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811292c0>] ?<br>
> isolate_pages_global+0x0/0x380<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8111ebf4>] ?<br>
> get_page_from_freelist+0x694/0x820<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81126882>] ?<br>
> shrink_inactive_list+0x4f2/0x740<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8111fb01>] ?<br>
> __alloc_pages_nodemask+0x111/0x8b0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110d17e>] ?<br>
> find_get_page+0x1e/0xa0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110e307>] ?<br>
> find_lock_page+0x37/0x80<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811546da>] ?<br>
> alloc_pages_current+0xaa/0x110<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110d6b7>] ?<br>
> __page_cache_alloc+0x87/0x90<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110e45f>] ?<br>
> find_or_create_page+0x4f/0xb0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025b945>] ?<br>
> _xfs_buf_lookup_pages+0x145/0x360 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025b2ab>] ?<br>
> _xfs_buf_initialize+0xcb/0x140 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025cb57>] ?<br>
> xfs_buf_get+0x77/0x1b0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025ccbc>] ?<br>
> xfs_buf_read+0x2c/0x100 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0250e39>] ?<br>
> xfs_trans_read_buf+0x219/0x440 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021efde>] ?<br>
> xfs_btree_read_buf_block+0x5e/0xc0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021f6d4>] ?<br>
> xfs_btree_lookup_get_block+0x84/0xf0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021d64c>] ?<br>
> xfs_btree_ptr_offset+0x4c/0x90 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021fd5f>] ?<br>
> xfs_btree_lookup+0xbf/0x470 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0209cfa>] ?<br>
> xfs_alloc_ag_vextent_near+0x98a/0xb70 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0250afd>] ?<br>
> xfs_trans_log_buf+0x9d/0xe0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021348f>] ?<br>
> xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021a2e4>] ?<br>
> xfs_bmap_add_extent_delay_real+0xe54/0x18d0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025737a>] ?<br>
> kmem_zone_alloc+0x9a/0xe0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa01ff009>] ?<br>
> xfs_trans_mod_dquot_byino+0x79/0xd0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021b15f>] ?<br>
> xfs_bmap_add_extent+0x3ff/0x420 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021ce7a>] ?<br>
> xfs_bmbt_init_cursor+0x4a/0x150 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa021bc94>] ?<br>
> xfs_bmapi+0xb14/0x11a0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff814dc986>] ?<br>
> down_write+0x16/0x40<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa023ddd5>] ?<br>
> xfs_iomap_write_allocate+0x1c5/0x3b0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81248a9e>] ?<br>
> generic_make_request+0x21e/0x5b0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa023eb19>] ?<br>
> xfs_iomap+0x389/0x440 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8119b6ac>] ?<br>
> __mark_inode_dirty+0x6c/0x160<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0257f4d>] ?<br>
> xfs_map_blocks+0x2d/0x40 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0259588>] ?<br>
> xfs_page_state_convert+0x2f8/0x750 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81268505>] ?<br>
> radix_tree_gang_lookup_tag_slot+0x95/0xe0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0259b96>] ?<br>
> xfs_vm_writepage+0x86/0x170 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81120d67>] ?<br>
> __writepage+0x17/0x40<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811220f9>] ?<br>
> write_cache_pages+0x1c9/0x4a0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81120d50>] ?<br>
> __writepage+0x0/0x40<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa023ab93>] ?<br>
> xfs_iflush+0x203/0x210 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025af9f>] ?<br>
> xfs_bdwrite+0x5f/0xa0 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa024fe99>] ?<br>
> xfs_trans_unlocked_item+0x39/0x60 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811223f4>] ?<br>
> generic_writepages+0x24/0x30<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025898e>] ?<br>
> xfs_vm_writepages+0x5e/0x80 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81122421>] ?<br>
> do_writepages+0x21/0x40<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8119bc8d>] ?<br>
> writeback_single_inode+0xdd/0x2c0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8119c08e>] ?<br>
> writeback_sb_inodes+0xce/0x180<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8119c1eb>] ?<br>
> writeback_inodes_wb+0xab/0x1b0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8112181e>] ?<br>
> balance_dirty_pages+0x21e/0x4d0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811a3851>] ?<br>
> mark_buffer_dirty+0x61/0xa0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81121b34>] ?<br>
> balance_dirty_pages_ratelimited_nr+0x64/0x70<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8110dd23>] ?<br>
> generic_file_buffered_write+0x1c3/0x2a0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8106dcb7>] ?<br>
> current_fs_time+0x27/0x30<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0261e4f>] ?<br>
> xfs_write+0x76f/0xb70 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff814174b5>] ?<br>
> memcpy_toiovec+0x55/0x80<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025d800>] ?<br>
> xfs_file_aio_write+0x0/0x70 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa025d861>] ?<br>
> xfs_file_aio_write+0x61/0x70 [xfs]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811723bb>] ?<br>
> do_sync_readv_writev+0xfb/0x140<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8118ae9d>] ?<br>
> d_obtain_alias+0x4d/0x160<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8108e120>] ?<br>
> autoremove_wake_function+0x0/0x40<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff812056b6>] ?<br>
> security_task_setgroups+0x16/0x20<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff81205356>] ?<br>
> security_file_permission+0x16/0x20<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8117347f>] ?<br>
> do_readv_writev+0xcf/0x1f0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa047f852>] ?<br>
> nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff811735e6>] ?<br>
> vfs_writev+0x46/0x60<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa04813d7>] ?<br>
> nfsd_vfs_write+0x107/0x430 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8116fe22>] ?<br>
> dentry_open+0x52/0xc0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa04839fe>] ?<br>
> nfsd_open+0x13e/0x210 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa0483e87>] ?<br>
> nfsd_write+0xe7/0x100 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa048b7df>] ?<br>
> nfsd3_proc_write+0xaf/0x140 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa047c43e>] ?<br>
> nfsd_dispatch+0xfe/0x240 [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa03f24d4>] ?<br>
> svc_process_common+0x344/0x640 [sunrpc]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8105dbc0>] ?<br>
> default_wake_function+0x0/0x20<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa03f2b10>] ?<br>
> svc_process+0x110/0x160 [sunrpc]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa047cb62>] ? nfsd+0xc2/0x160<br>
> [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffffa047caa0>] ? nfsd+0x0/0x160<br>
> [nfsd]<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8108ddb6>] ? kthread+0x96/0xa0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8108dd20>] ? kthread+0x0/0xa0<br>
> Dec 6 20:27:55 localhost kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20<br>
> Dec 6 20:27:55 localhost kernel: ---[ end trace e8b62253d4084e2b ]---<br>
><br>
> --<br>
> Ryan C. England<br>
</div></div>> Corvid Technologies <<a href="http://www.corvidtec.com/" target="_blank">http://www.corvidtec.com/</a>><br>
<div class="im">> office: <a href="tel:704-799-6944%20x158" value="+17047996944">704-799-6944 x158</a><br>
> cell: <a href="tel:980-521-2297" value="+19805212297">980-521-2297</a><br>
<br>
</div>> _______________________________________________<br>
> xfs mailing list<br>
> <a href="mailto:xfs@oss.sgi.com">xfs@oss.sgi.com</a><br>
> <a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
<br>
---end quoted text---<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Ryan C. England</div><div><a href="http://www.corvidtec.com/" target="_blank">Corvid Technologies</a></div>office: 704-799-6944 x158<br>cell: 980-521-2297<br>
</div>