xfs
[Top] [All Lists]

Re: Xfs Access to block zero exception and system crash

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Xfs Access to block zero exception and system crash
From: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>
Date: Mon, 07 Jul 2008 08:32:03 +0530
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Nathan Scott <nscott@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <487117FC.9090109@xxxxxxxxxxx>
Organization: PMC Sierra Inc
References: <486B01A6.4030104@xxxxxxxxxxxxxx> <20080702051337.GX29319@disturbed> <486B13AD.2010500@xxxxxxxxxxxxxx> <1214979191.6025.22.camel@xxxxxxxxxxxxxxxxxx> <20080702065652.GS14251@xxxxxxxxxxxxxxxxxxxxx> <486B6062.6040201@xxxxxxxxxxxxxx> <486C4F89.9030009@xxxxxxxxxxx> <486C6053.7010503@xxxxxxxxxxxxxx> <486CE9EA.90502@xxxxxxxxxxx> <486DF8F0.5010700@xxxxxxxxxxxxxx> <20080704122726.GG29319@disturbed> <340C71CD25A7EB49BFA81AE8C839266702997641@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <486E5F4D.1010009@xxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266702997658@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <486FA095.1050106@xxxxxxxxxxx> <340C71CD25A7EB49BFA81AE8C839266702A084A6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <487117FC.9090109@xxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 2.0.0.14 (X11/20080421)


Eric Sandeen wrote:
Sagar Borikar wrote:
Sagar Borikar wrote:
Copy is of the same file to 30 different directories and it is
basically
overwrite.

Here is the setup:

It's a JBOD with Volume size 20 GB. The directories are empty and this
is basically continuous copy of the file on all thirty directories.
But
surprisingly none of the copy succeeds. All the copy processes are in Uninterruptible sleep state and xfs_repair log I have already attached With the prep. As mentioned it is with 2.6.24 Fedora kernel.
It would probably be best to try a 2.6.26 kernel from rawhide to be sure
you're closest to the bleeding edge.

<Sagar> Sure Eric but I reran the test and I got similar errors with
2.6.24 kernel on x86. I am still confused with the results that I see on
2.6.24 kernel on x86 machine. I see that the used size shown by ls is
way too huge than the actual size. Here is the log of the system

[root@lab00 ~/test_partition]# ls -lSah
total 202M
-rw-r--r--  1 root root 202M Jul  4 14:06 original ---> this I sthe file
Which I  copy.
drwxr-x--- 65 root root  12K Jul  6 21:57 ..
-rwxr-xr-x  1 root root  189 Jul  4 16:31 runall
-rwxr-xr-x  1 root root   50 Jul  4 16:32 copy
drwxr-xr-x  2 root root   45 Jul  6 22:07 .

It'd be great if you provided these actual scripts so we don't have to
guess at what you're doing or work backwards from the repair output :)
Attaching the scripts with this mail.
dmesg log doesn't give any information. Here is XFS related
info:

XFS mounting filesystem loop0
Ending clean XFS mount for filesystem: loop0
Which is basically for mounting XFS cleanly. But there is no exception
in XFS.

and nothing else of interest either?
Not really. That's why it was surprising. Even after setting the error_level to 11
Filesystem has become completely sluggish and response time is increased
to 3-4 minutes for every command. Not a single copy is complete and all the copy processes are sleeping continuously.

And how did you recover from this; did you power-cycle the box?
There was no failure. Only the processes were stalled. System was operative.
-Eric
#! /bin/sh

while [ 1 ]

do
cp -f $1 $2
done





#! /bin/sh

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
        
mkdir -p testdir_$i     
./copy testfile testdir_$i &
rm -Rf testdir_$1/testfile
./copy testfile testfile_$i &
done
<Prev in Thread] Current Thread [Next in Thread>