[Top] [All Lists]

RE: Xfs Access to block zero exception and system crash

To: "Eric Sandeen" <sandeen@xxxxxxxxxxx>
Subject: RE: Xfs Access to block zero exception and system crash
From: "Sagar Borikar" <Sagar_Borikar@xxxxxxxxxxxxxx>
Date: Wed, 9 Jul 2008 22:12:51 -0700
Cc: <xfs@xxxxxxxxxxx>
In-reply-to: <340C71CD25A7EB49BFA81AE8C839266702A08F91@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <4872E0BC.6070400@xxxxxxxxxxxxxx> <4872E33E.3090107@xxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266702A08F91@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
Thread-index: AcjgrWReJLSDlJRQSj6EZpV6D5oEvQBNiDiQABhR0dA=
Thread-topic: Xfs Access to block zero exception and system crash

Could be a slight digression but can you let me know why the
fragmentation factor is going to 99% immediately? I observed this on
both x86 and MIPS platform. Also to alleviate this issue, if I specify
allocsize=512m what would be the consequences? Since default allocsize
is 64k right? Also while mounting we are setting up default option for
mounting file system.


-----Original Message-----
From: xfs-bounce@xxxxxxxxxxx [mailto:xfs-bounce@xxxxxxxxxxx] On Behalf
Of Sagar Borikar
Sent: Wednesday, July 09, 2008 10:28 PM
To: Eric Sandeen
Cc: xfs@xxxxxxxxxxx
Subject: RE: Xfs Access to block zero exception and system crash

Sagar Borikar wrote:
> That's right Eric but I am still surprised that why should we get a
> lock in this scenario as it is a plain copy of file in multiple 
> directories.  Our customer is reporting similar kind of lockup in our 
> platform. 

ok, I guess I had missed that, sorry.

> I do understand that we are chasing the access to block zero 
> exception and XFS forced shutdown which I mentioned earlier.  But we 
> also see quite a few smbd processes which are writing data to XFS are
> uninterruptible sleep state and the system locks up too. 

Ok; then the next step is probably to do sysrq-t and see where things
are stuck.  It might be better to see if you can reproduce w/o the
loopback file, too, since that's just another layer to go through that
might be changing things.

<Sagar> I ran it on actual device w/o loopback file and even there
observed that XFS transactions going into uninterruptible sleep state
and the copies were stalled. I had to hard reboot the system to bring
XFS out of that state since soft reboot didn't work, it was waiting for
file system to get unmounted. I shall provide the sysrq-t update later.

> So I thought 
> the test which I am running could be pointing to similar issue which
> are observing on our platform. But does this indicate that the problem

> lies with x86 XFS too ?  

or maybe the vm ...

> Also I presume in enterprise market such kind 
> of simultaneous write situation may happen.  Has anybody reported 
> similar issues to you? As you observed it over x86 and 2.6.24 kernel, 
> could you say what would be root cause of this?

Haven't really seen it before that I recall, and at this point can't say
for sure what it might be.


>     Sorry for lots of questions at same time :) But I am happy that
> were able to see the deadlock in x86 on your setup with 2.6.24
> Thanks
> Sagar
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>> Hi Eric,
>>> Did you see any issues in your test? 
>> I got a deadlock but that's it; I don't think that's the bug you want
>> chase...
>> -Eric
>>> Thanks
>>> Sagar
>>> Sagar Borikar wrote:
>>>> Eric Sandeen wrote:
>>>>> Sagar Borikar wrote:
>>>>>> Could you kindly try with my test? I presume you should see
>>>>>> soon. I tried this on
>>>>>> 2 different x86 systems 2 times ( after rebooting the system )
and I 
>>>>>> saw it every time.
>>>>> Sure.  Is there a reason you're doing this on a loopback file?
>>>>> probably stresses the vm a bit more, and might get even trickier
if the
>>>>> loopback file is sparse...
>>>> Initially I thought to do that since I didn't want to have a strict

>>>> allocation limit but
>>>> allowing allocations to  grow as needed until the backing
>>>> runs out of free space
>>>> due to type of the test case I had. But then I dropped the plan and

>>>> created a non-sparse
>>>> loopback device. There was no specific reason to create loopback
>>>> as it was
>>>> simplest option to do it.
>>>>> But anyway, on an x86_64 machine with 2G of memory and a
non-sparse 10G
>>>>> loopback file on, your test runs w/o problems for
>>>>> though the system does get sluggish.  I let it run a bit then ran
>>>>> and it found no problems, I'll run it overnight to see if anything
>>>>> turns up.
>>>> That will be great.  Thanks indeed.
>>>> Sagar
>>>>> -Eric

<Prev in Thread] Current Thread [Next in Thread>