I gave this another shot after understanding it better. I now have a
version that doesn't crash. I'm attaching the diff and the new
xfs_buf.c file.
However, in the new version the io_list list doesn't get populated at
all in __xfs_buf_delwri_submit(). I haven't completely familiarized
myself with what callers do with this list. Callers initialize this
list and send a pointer to __xfs_buf_delwri_submit() and expect a
populated list back.
Nonetheless, I ran my experiments after building and inserting the XFS
module with this change. Strangely enough, I see the performance going
down by ~25% compared to the original XFS module.
On Fri, Jun 5, 2015 at 10:31 AM, Shrinand Javadekar
<shrinand@xxxxxxxxxxxxxx> wrote:
> The file xfs_buf.c seems to have gone through a few revisions. I tried
> to understand the code and make the changes in the 3.16.0 kernel but
> it didn't quite work out. XFS crashed while unmounting the disks.
>
> On Thu, Jun 4, 2015 at 5:59 PM, Shrinand Javadekar
> <shrinand@xxxxxxxxxxxxxx> wrote:
>> Dave,
>>
>> I believe this code is slightly different from the one I have (kernel
>> v3.16.0). Can you give me a patch for kernel v3.16.0? I have a working
>> setup to try this out.
>>
>> http://lxr.free-electrons.com/source/fs/xfs/xfs_buf.c?v=3.16
>>
>> Thanks in advance.
>> -Shri
>>
>>
>> On Wed, Jun 3, 2015 at 11:23 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>> On Thu, Jun 04, 2015 at 12:03:39PM +1000, Dave Chinner wrote:
>>>> Fixing this requires a tweak to the algorithm in
>>>> __xfs_buf_delwri_submit() so that we don't lock an entire list of
>>>> thousands of IOs before starting submission. In the mean time,
>>>> reducing the number of AGs will reduce the impact of this because
>>>> the delayed write submission code will skip buffers that are already
>>>> locked or pinned in memory, and hence an AG under modification at
>>>> the time submission occurs will be skipped by the delwri code.
>>>
>>> You might like to try the patch below on a test machine to see if
>>> it helps with the problem.
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@xxxxxxxxxxxxx
>>>
>>> xfs: reduce lock hold times in buffer writeback
>>>
>>> From: Dave Chinner <dchinner@xxxxxxxxxx>
>>>
>>> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
>>> ---
>>> fs/xfs/xfs_buf.c | 80
>>> ++++++++++++++++++++++++++++++++++++++++++--------------
>>> 1 file changed, 61 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>>> index bbe4e9e..8d2cc36 100644
>>> --- a/fs/xfs/xfs_buf.c
>>> +++ b/fs/xfs/xfs_buf.c
>>> @@ -1768,15 +1768,63 @@ xfs_buf_cmp(
>>> return 0;
>>> }
>>>
>>> +static void
>>> +xfs_buf_delwri_submit_buffers(
>>> + struct list_head *buffer_list,
>>> + struct list_head *io_list,
>>> + bool wait)
>>> +{
>>> + struct xfs_buf *bp, *n;
>>> + struct blk_plug plug;
>>> +
>>> + blk_start_plug(&plug);
>>> + list_for_each_entry_safe(bp, n, buffer_list, b_list) {
>>> + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC |
>>> + XBF_WRITE_FAIL);
>>> + bp->b_flags |= XBF_WRITE | XBF_ASYNC;
>>> +
>>> + /*
>>> + * We do all IO submission async. This means if we need
>>> + * to wait for IO completion we need to take an extra
>>> + * reference so the buffer is still valid on the other
>>> + * side. We need to move the buffer onto the io_list
>>> + * at this point so the caller can still access it.
>>> + */
>>> + if (wait) {
>>> + xfs_buf_hold(bp);
>>> + list_move_tail(&bp->b_list, io_list);
>>> + } else
>>> + list_del_init(&bp->b_list);
>>> +
>>> + xfs_buf_submit(bp);
>>> + }
>>> + blk_finish_plug(&plug);
>>> +}
>>> +
>>> +/*
>>> + * submit buffers for write.
>>> + *
>>> + * When we have a large buffer list, we do not want to hold all the buffers
>>> + * locked while we block on the request queue waiting for IO dispatch. To
>>> avoid
>>> + * this problem, we lock and submit buffers in groups of 50, thereby
>>> minimising
>>> + * the lock hold times for lists which may contain thousands of objects.
>>> + *
>>> + * To do this, we sort the buffer list before we walk the list to lock and
>>> + * submit buffers, and we plug and unplug around each group of buffers we
>>> + * submit.
>>> + */
>>> static int
>>> __xfs_buf_delwri_submit(
>>> struct list_head *buffer_list,
>>> struct list_head *io_list,
>>> bool wait)
>>> {
>>> - struct blk_plug plug;
>>> struct xfs_buf *bp, *n;
>>> + LIST_HEAD (submit_list);
>>> int pinned = 0;
>>> + int count = 0;
>>> +
>>> + list_sort(NULL, buffer_list, xfs_buf_cmp);
>>>
>>> list_for_each_entry_safe(bp, n, buffer_list, b_list) {
>>> if (!wait) {
>>> @@ -1802,30 +1850,24 @@ __xfs_buf_delwri_submit(
>>> continue;
>>> }
>>>
>>> - list_move_tail(&bp->b_list, io_list);
>>> + list_move_tail(&bp->b_list, &submit_list);
>>> trace_xfs_buf_delwri_split(bp, _RET_IP_);
>>> - }
>>> -
>>> - list_sort(NULL, io_list, xfs_buf_cmp);
>>> -
>>> - blk_start_plug(&plug);
>>> - list_for_each_entry_safe(bp, n, io_list, b_list) {
>>> - bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC |
>>> XBF_WRITE_FAIL);
>>> - bp->b_flags |= XBF_WRITE | XBF_ASYNC;
>>>
>>> /*
>>> - * we do all Io submission async. This means if we need to
>>> wait
>>> - * for IO completion we need to take an extra reference so
>>> the
>>> - * buffer is still valid on the other side.
>>> + * We do small batches of IO submission to minimise lock
>>> hold
>>> + * time and unnecessary writeback of buffers that are hot
>>> and
>>> + * would otherwise be relogged and hence not require
>>> immediate
>>> + * writeback.
>>> */
>>> - if (wait)
>>> - xfs_buf_hold(bp);
>>> - else
>>> - list_del_init(&bp->b_list);
>>> + if (count++ < 50)
>>> + continue;
>>>
>>> - xfs_buf_submit(bp);
>>> + xfs_buf_delwri_submit_buffers(&submit_list, io_list, wait);
>>> + count = 0;
>>> }
>>> - blk_finish_plug(&plug);
>>> +
>>> + if (!list_empty(&submit_list))
>>> + xfs_buf_delwri_submit_buffers(&submit_list, io_list, wait);
>>>
>>> return pinned;
>>> }
xfs_diff
Description: Binary data
xfs_buf.c
Description: Text Data
|