[Top] [All Lists]

Re: Xfs Access to block zero exception and system crash

To: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>
Subject: Re: Xfs Access to block zero exception and system crash
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sat, 28 Jun 2008 10:02:03 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4864BD5D.1050202@xxxxxxxxxxxxxx>
Mail-followup-to: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
References: <340C71CD25A7EB49BFA81AE8C839266701323BD8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080625084931.GI16257@xxxxxxxxxxxxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266701323BE8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080626070215.GI11558@disturbed> <4864BD5D.1050202@xxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.17+20080114 (2008-01-14)
On Fri, Jun 27, 2008 at 03:43:49PM +0530, Sagar Borikar wrote:
> Dave Chinner wrote:
>> Yes, but all the same pattern of corruption, so it is likely
>> that it is one problem.
>>   All I can suggest is working out a reproducable test case in your
>> development environment, attaching a debugger and start digging around
>> in memory when the problem is hit and try to find out exactly what
>> is corrupted. If you can't reproduce it or work out what is
>> occurring to trigger the problem, then we're not going to be able to
>> find the cause...
> Thanks Dave
> I did some experiments today with the corrupted filesystem.
> setup : NAS box contains one volume /share and 10 subdirectories.
> In first subdirectory sh1, I kept 512MB file. Through a script I  
> continuously copy this file
> simultaneously from sh2 to sh10 subdirectories.
> The script looks like
> ....
> while [ 1 ]
> do
> cp $1 $2
> done
> uninterruptible sleep state continuously.  Ran xfs_repair with -n option  
> on filesystem mounted on JBOD
> Here is the output :
> entry "iozone_68.tst" in shortform directory 67108993 references free  
> inode 67108995
> entry "iozone_68.tst" in shortform directory 100663425 references free  
> inode 100663427
> entry "iozone_68.tst" in shortform directory 301990016 references free  
> inode 301990019
> entry "iozone_68.tst" in shortform directory 335544448 references free  
> inode 335544451
> entry "iozone_68.tst" in shortform directory 402653313 references free  
> inode 402653318

And so on. There's a pattern here. Can you try to find out what
part of your workload is producing these errors?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>