Sure Eric, I'll keep you posted with the results w/o loop back file.
When you say that the deadlock could be due to vm, is it due to lack of
memory? I checked meminfo and I found that sufficient buffers and
committed_as were persent when xfs is stalled.
Thanks
Sagar
Sagar Borikar wrote:
> That's right Eric but I am still surprised that why should we get a
> dead lock in this scenario as it is a plain copy of file in multiple
> directories. Our customer is reporting similar kind of lockup in our
> platform.
ok, I guess I had missed that, sorry.
> I do understand that we are chasing the access to block zero exception
> and XFS forced shutdown which I mentioned earlier. But we also see
> quite a few smbd processes which are writing data to XFS are in
> uninterruptible sleep state and the system locks up too.
Ok; then the next step is probably to do sysrq-t and see where things
are stuck. It might be better to see if you can reproduce w/o the
loopback file, too, since that's just another layer to go through that
might be changing things.
> So I thought
> the test which I am running could be pointing to similar issue which
> we are observing on our platform. But does this indicate that the
> problem lies with x86 XFS too ?
or maybe the vm ...
> Also I presume in enterprise market such kind of simultaneous write
> situation may happen. Has anybody reported similar issues to you? As
> you observed it over x86 and 2.6.24 kernel, could you say what would
> be root cause of this?
Haven't really seen it before that I recall, and at this point can't say
for sure what it might be.
-Eric
> Sorry for lots of questions at same time :) But I am happy that
> you were able to see the deadlock in x86 on your setup with 2.6.24
>
> Thanks
> Sagar
>
>
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>
>>> Hi Eric,
>>>
>>> Did you see any issues in your test?
>>>
>> I got a deadlock but that's it; I don't think that's the bug you want
>> to chase...
>>
>>
>> -Eric
>>
>>
>>> Thanks
>>> Sagar
>>>
>>>
>>> Sagar Borikar wrote:
>>>
>>>> Eric Sandeen wrote:
>>>>
>>>>> Sagar Borikar wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Could you kindly try with my test? I presume you should see
>>>>>> failure soon. I tried this on
>>>>>> 2 different x86 systems 2 times ( after rebooting the system )
>>>>>> and I saw it every time.
>>>>>>
>>>>>>
>>>>> Sure. Is there a reason you're doing this on a loopback file?
>>>>> That probably stresses the vm a bit more, and might get even
>>>>> trickier if the loopback file is sparse...
>>>>>
>>>>>
>>>> Initially I thought to do that since I didn't want to have a strict
>>>> allocation limit but allowing allocations to grow as needed until
>>>> the backing filesystem runs out of free space due to type of the
>>>> test case I had. But then I dropped the plan and created a
>>>> non-sparse loopback device. There was no specific reason to create
>>>> loopback but as it was simplest option to do it.
>>>>
>>>>> But anyway, on an x86_64 machine with 2G of memory and a
>>>>> non-sparse 10G loopback file on 2.6.24.7-92.fc8, your test runs
>>>>> w/o problems for me, though the system does get sluggish. I let
>>>>> it run a bit then ran repair and it found no problems, I'll run it
>>>>> overnight to see if anything else turns up.
>>>>>
>>>>>
>>>> That will be great. Thanks indeed.
>>>> Sagar
>>>>
>>>>
>>>>> -Eric
>>>>>
>>>>>
>>
>
|