xfs
[Top] [All Lists]

Re: [PATCH] Use atomic_t and wait_event to track dquot pincount

To: Peter Leckie <pleckie@xxxxxxx>
Subject: Re: [PATCH] Use atomic_t and wait_event to track dquot pincount
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 24 Sep 2008 17:43:17 +1000
Cc: xfs@xxxxxxxxxxx, xfs-dev@xxxxxxx
In-reply-to: <48D9E3E2.3040504@xxxxxxx>
Mail-followup-to: Peter Leckie <pleckie@xxxxxxx>, xfs@xxxxxxxxxxx, xfs-dev@xxxxxxx
References: <48D9C1DD.6030607@xxxxxxx> <20080924060508.GH5448@disturbed> <48D9E3E2.3040504@xxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, Sep 24, 2008 at 04:53:22PM +1000, Peter Leckie wrote:
> Dave Chinner wrote:
>> [ Pete, please wrap your text at 68-72 columns]
>>   
> Ok will do in future
>> On Wed, Sep 24, 2008 at 02:28:13PM +1000, Peter Leckie wrote:
>>   
>>> The reason the lsn had changed was due to it not being initialized
>>> by  the time a copy of the lsn was taken in xfs_qm_dqflush().
>>> The lsn was then latter updated causing the test  in
>>> xfs_qm_dqflush_done() to fail.
>>>
>>> Synchronizations between the 2 functions is done by the pincount
>>> in  struct xfs_dquot_t  and xfs_qm_dqflush() calls
>>> xfs_qm_dqunpin_wait() to wait for the pincount == 0. However after
>>> been woken up it does not validate the pincount is actually 0,
>>>     
>>
>> Sure - that's because we only ever send a wakeup when the pin count
>> falls to zero. Because we have to be holding the dquot lock when we
>> either pin a dquot or wait for it to be unpinned, the act of waiting
>> for it to be unpinned with the dquot lock held guarantees that it
>> is not pinned when we wake up.
>>
>> IOWs, the pin count cannot be incremented while we are waiting for
>> it to be unpinned, and hence it must be zero when we are woken......
>>
>>   
>>> allowing a false wake up by the scheduler to  cause
>>> xfs_qm_dqflush() to prematurely start processing the dquot.
>>>     
>>
>> .... which means I can't see how that would happen...
>>
>> What am I missing here?
>>   
> Yeah it's quite intriguing  isn't it, I added the pincount to the
> dquot tracing code and sure enough it's 1 all the way through
> xfs_qm_dqflush() I can tell you that none of the XFS code is
> causing it to wake up.

Only the XFS code can cause it to be woken....

> I was going to add some tracing code to
> the scheduler to determine who is causing us to wake up however
> I had other priorities to work on.
>
> Reading the code it appears things like a threads dying or a
> system suspending can cause it.

Wait queues are not affected by such events when they are
configured to be uninterruptable. The sv_t:

#define sv_wait(sv, pri, lock, s) \
        _sv_wait(sv, lock, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT)

Is as uninterruptible as it comes. Which means only an explicit
wakeup will cause any waiters to wake up.

> However none of these had
> happened, after reading other examples in the Linux kernel
> it appeared pretty standard to always re-evaluate the condition
> after being woken so I suspect that Linux simply does not
> guarantee that we will only be woken by the thread calling
> wake_up on  us.

Not true - wake_up() will only wake tasks on the wait queue
it was called for.

> Any insight into this would be much appreciated as I'm also very curious
> as to why this is happening.

Assert fail the machine when a dquot with a non-zero pincount is
woken in xfs_qm_dqunpin_wait(). If the assert trips, then we need
to work out how the pin count is getting elevated while we have
a waiter....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>