[regression] stack overflow in xfs_buf_iodone_callbacks
Dave Chinner
david at fromorbit.com
Thu Jun 21 04:18:03 CDT 2012
Folks,
I just had a stack overflow in the delayed write buffer error
handling with a shut down filesystem:
.....
[ 20.712744] [<ffffffff81448023>] xfs_buf_iodone_work+0x23/0x50
[ 20.712744] [<ffffffff814481a0>] xfs_buf_ioend+0x70/0x180
[ 20.712744] [<ffffffff814484c5>] _xfs_buf_ioend+0x25/0x30
[ 20.712744] [<ffffffff81448788>] __xfs_buf_iorequest+0x98/0x130
[ 20.712744] [<ffffffff81448836>] xfs_buf_iorequest+0x16/0x20
[ 20.712744] [<ffffffff81448945>] xfs_bdstrat_cb+0x65/0x110
[ 20.712744] [<ffffffff814b9d7c>] xfs_buf_iodone_callbacks+0x11c/0x290
[ 20.712744] [<ffffffff81448023>] xfs_buf_iodone_work+0x23/0x50
[ 20.712744] [<ffffffff814481a0>] xfs_buf_ioend+0x70/0x180
[ 20.712744] [<ffffffff814484c5>] _xfs_buf_ioend+0x25/0x30
[ 20.712744] [<ffffffff81448788>] __xfs_buf_iorequest+0x98/0x130
[ 20.712744] [<ffffffff81448836>] xfs_buf_iorequest+0x16/0x20
[ 20.712744] [<ffffffff81448945>] xfs_bdstrat_cb+0x65/0x110
[ 20.712744] [<ffffffff814b9d7c>] xfs_buf_iodone_callbacks+0x11c/0x290
[ 20.712744] [<ffffffff81448023>] xfs_buf_iodone_work+0x23/0x50
[ 20.712744] [<ffffffff814481a0>] xfs_buf_ioend+0x70/0x180
[ 20.712744] [<ffffffff814484c5>] _xfs_buf_ioend+0x25/0x30
[ 20.712744] [<ffffffff81448788>] __xfs_buf_iorequest+0x98/0x130
[ 20.712744] [<ffffffff81448836>] xfs_buf_iorequest+0x16/0x20
[ 20.712744] [<ffffffff81448945>] xfs_bdstrat_cb+0x65/0x110
[ 20.712744] [<ffffffff814b9d7c>] xfs_buf_iodone_callbacks+0x11c/0x290
[ 20.712744] [<ffffffff81448023>] xfs_buf_iodone_work+0x23/0x50
[ 20.712744] [<ffffffff814481a0>] xfs_buf_ioend+0x70/0x180
[ 20.712744] [<ffffffff814484c5>] _xfs_buf_ioend+0x25/0x30
[ 20.712744] [<ffffffff81448788>] __xfs_buf_iorequest+0x98/0x130
[ 20.712744] [<ffffffff81448836>] xfs_buf_iorequest+0x16/0x20
[ 20.712744] [<ffffffff81448945>] xfs_bdstrat_cb+0x65/0x110
[ 20.712744] [<ffffffff81448c39>] __xfs_buf_delwri_submit+0x249/0x280
[ 20.712744] [<ffffffff81449920>] xfs_buf_delwri_submit_nowait+0x20/0x30
[ 20.712744] [<ffffffff814bc43e>] xfsaild+0x21e/0x750
[ 20.712744] [<ffffffff810a0472>] kthread+0xa2/0xb0
[ 20.712744] [<ffffffff81b83c64>] kernel_thread_helper+0x4/0x10
Basically, the commit:
43ff212 xfs: on-stack delayed write buffer lists
took away the delay in resubmitting metadata buffers that have
had a write error, and so the xfsbdstrat() resubmission immediately
errors out on the shutdown flag, calling the io completion for teh
buffer that then runs xfs_buf_iodone_callbacks(), that then calls
xfs_bdstrat_cb(), that then errors out on the shutdown flag, calls
io completion, and around it goes in a spiral of death.
I did flag the change to an immediate xfsbdstrat() call as a problem
in review, and mentioned a possible solution to the problem, but it
looks like it fell through the cracks
http://oss.sgi.com/archives/xfs/2012-04/msg00760.html
"This will just resubmit the IO immediately after it is
failed, while previously it will only be pushed again after
it ages out (15s later). Perhaps it can just be left to be
pushed by the aild next time it passes over it?"
That would definitely prevent the Spiral of Stack Doom that I've
just seen....
I don't have time to come up with a fix for this right now, but it
needs to be fixed before 3.5 releases. I don't have time because I'm
going to be AFK next week, so I'd appreciate it if someone could
look at fixing this in the mean time?
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list