[Top] [All Lists]

Re: Xfs Access to block zero exception and system crash

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Xfs Access to block zero exception and system crash
From: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>
Date: Mon, 07 Jul 2008 09:12:11 +0530
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Nathan Scott <nscott@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <48718BF0.2040700@xxxxxxxxxxx>
Organization: PMC Sierra Inc
References: <486B01A6.4030104@xxxxxxxxxxxxxx> <20080702051337.GX29319@disturbed> <486B13AD.2010500@xxxxxxxxxxxxxx> <1214979191.6025.22.camel@xxxxxxxxxxxxxxxxxx> <20080702065652.GS14251@xxxxxxxxxxxxxxxxxxxxx> <486B6062.6040201@xxxxxxxxxxxxxx> <486C4F89.9030009@xxxxxxxxxxx> <486C6053.7010503@xxxxxxxxxxxxxx> <486CE9EA.90502@xxxxxxxxxxx> <486DF8F0.5010700@xxxxxxxxxxxxxx> <20080704122726.GG29319@disturbed> <340C71CD25A7EB49BFA81AE8C839266702997641@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <486E5F4D.1010009@xxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266702997658@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <486FA095.1050106@xxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266702A084A6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <487117FC.9090109@xxxxxxxxxxx> <4871872B.9060107@xxxxxxxxxxxxxx> <487187D2.8080105@xxxxxxxxxxx> <4871885B.6090208@xxxxxxxxxxxxxx> <48718977.1090005@xxxxxxxxxxx> <48718AB6.80709@xxxxxxxxxxxxxx> <48718BF0.2040700@xxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird (X11/20080421)

Eric Sandeen wrote:
Sagar Borikar wrote:

All the the copies are pending and file size in those directories is constant. It is not
And as the processes are in D state, the file system is marked as busy and I can't unmount

Understood.  It looks like you've deadlocked somewhere.  But, this is
not the problem you are really trying to solve, right?  You just were
trying to recreate the mips problem on x86?
That's right. The intention behind testing on 2.6.24 was to check whether we can imitate failure on x86 which is considered to be more robust. If we replicate the failure then there could be some issue in XFS and if the test passes then we can back port this kernel on MIPS ( Which any way I am doing with your patches ). But I faced similar deadlock on MIPS
with exceptions which I posted earlier.

If you want, do a sysrq-t to get traces of all those cp's to see where
they're stuck, but this probably isn't getting you much closer to
solving the original problem.

I'll keep you posted with it.
(BTW: is this the exact same testcase that led to the block 0 access on
mips which started this thread?)

Ok. So initially our multi client iozone stress test used to fail. But as it took 2-3 days to replicate the issue, I tried the test, standalone on MIPS and observed similar failures which I used to get in multi client test. The test is exactly same what I do in mutli client iozoen over network. Hence I came to conclusion that if we fix system to pass my test case then we can try iozone test with that fix. And now on x86 with 2.6.24, I am finding similar deadlock but the system is responsive and there are no lockups or exceptions. Do you observe similar failures on x86 at your setup? Also do you think the issues which I am seeing on x86 and MIPS are coming from the
same sources?


<Prev in Thread] Current Thread [Next in Thread>