xfs
[Top] [All Lists]

Re: Hang in getdents

To: Karl Trygve Kalleberg <karltk@xxxxxxxxxx>
Subject: Re: Hang in getdents
From: Steve Lord <lord@xxxxxxx>
Date: Mon, 20 Nov 2000 10:07:00 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: Message from Karl Trygve Kalleberg <karltk@prosalg.no> of "Mon, 20 Nov 2000 16:31:39 +0100." <3A1943DB.842BD3AE@prosalg.no>
Sender: owner-linux-xfs@xxxxxxxxxxx
> 
> Hi.
> 
> 
> I run today's CVS snapshot (three hours ago), but the problem I'm
> experiencing has been with me as long as I've run XFS (about three
> weeks).
> 
> My XFS filesystem:
> /dev/hdc1             45030088  36350920   8679168  81% /mnt/guest
> 
> Sometimes, doing various operations on the file system leads to a freeze
> which takes a *long time* to thaw. Usually a few hours(!).
> 
> After doing some stracing, I've found that the problem seems to be with
> getdents(). I've not managed to correlate it to the number of files in
> the directory, nor to the directory's depth in the directory tree.
> 
> The freeze seems to be bound to the particular directory: if one command
> calling getdents() on a directory freezes, so will others when calling
> getdents() for the same directory. Calling getdents() for sister or
> parent directories works (ie, doesn't freeze).
> 
> I've not attempted calling getdents() for a child-directory of a frozen
> directory.
> 
> I'm running this on an x86 SMP box; I've not seen any advisory about xfs
> not being smp-safe. The kernel was built with gcc 2.95.2, which
> according to the FAQ might be a problem.
> 
> If you don't manage to reproduce the problem, I'd be willing to rebuild
> the kernel with egcs 2.91.66 just to eliminate that possibility.
> 
> Regards,
> 
> Karl T
> 

Can you tell us which user space you are using? Different versions of glibc
use different mechanisms to call getdents - including doing lseek backwards
in some cases. Can you confirm that it is one single getdents call which
hangs?

You say this is not a permanent hang which is odd, I would not expect a hang
to be something which recovers - unless you have a failing drive - do
you get any console messages out during this time?

Did you build the filesystem from scratch on linux? Or did it come over
from an Irix system?

You say hang - is this a hang which is consuming cpu time (i.e. a loop
in the kernel), or is it just a hung process. The symptoms you describe 
about other getdents calls locking is not surprising - xfs is probably
stuck somewhere holding a lock on an inode. Any other thread attempting
to access this inode will get stuck too.

I would like to see what happens if you rebuild with the other compiler,
there are too many variables in here for us to attempt to replicate the
problem - short of us getting a binary copy of the disk image!

One other thing you could do is dump out the directory contents using
xfs_db.

do an ls -ld in the parent directory of the inode to get its number:

ls -lid .
33602412 drwxr-xr-x    2 root     bin         40960 Nov  8 16:15 .

run xfs_db on the filesystem (the -r is read only, you do not need to
unmount):

xfs_db -r /dev/xxxx

use the inode command to get to the inode in question:
hung.

xfs_db: inode 33602412

print the inode:

xfs_db: print
core.magic = 0x494e
core.mode = 040755
core.version = 1
core.format = 3 (btree)
core.nlinkv1 = 2
core.uid = 0
core.gid = 1
core.atime.sec = Mon Nov 20 15:38:55 2000
core.atime.nsec = 849055000
core.mtime.sec = Wed Nov  8 16:15:06 2000
core.mtime.nsec = 398148000
core.ctime.sec = Wed Nov  8 16:15:06 2000
core.ctime.nsec = 398148000
core.size = 40960
core.nblocks = 17
core.extsize = 0
core.nextents = 15
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.gen = 0
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[0]
u.bmbt.ptrs[1] = 1:2111173

You probably have a format similar to this - the end of the entry will look
different depending on the directory format. This is a large directory where
the actual entries are stored in other blocks. We can walk into the directory
structure using the addr command:

xfs_db: addr u.bmbt.ptrs[1]
xfs_db: print
magic = 0x424d4150
level = 0
numrecs = 15
leftsib = null
rightsib = null
recs[1-15] = [startoff,startblock,blockcount,extentflag] 1:[0,2102062,1,0] 
2:[1,2103642,1,0] 3:[2,2104887,1,0] 4:[3,2106922,1,0] 5:[4,2107330,1,0] 
6:[5,2110551,1,0] 7:[6,2114424,1,0] 8:[7,2117505,1,0] 9:[8,2122696,1,0] 
10:[9,2138354,1,0] 11:[8388608,2102687,1,0] 12:[8388609,2106916,2,0] 
13:[8388611,2111172,1,0] 14:[8388612,2112648,1,0] 15:[16777216,2106915,1,0]


In this case we have a tree with 15 leaf blocks in it, you can walk into the 
leaf blocks by passing the startblock number for each record to the fsblock
commands:

xfs_db: fsblock 2102062

We now have to tell xfs_db what type this is:

xfs_db: type dir2

and we can print it:

xfs_db: print
dhdr.magic = 0x58443244
dhdr.bestfree[0].offset = 0xff8
dhdr.bestfree[0].length = 0x8
dhdr.bestfree[1].offset = 0
dhdr.bestfree[1].length = 0
dhdr.bestfree[2].offset = 0
dhdr.bestfree[2].length = 0
du[0].inumber = 33602412
du[0].namelen = 1
du[0].name = "."
du[0].tag = 0x10
du[1].inumber = 128
du[1].namelen = 2
du[1].name = ".."
du[1].tag = 0x20
du[2].inumber = 33602413
du[2].namelen = 3
du[2].name = "X11"
du[2].tag = 0x30
...
du[187].inumber = 33644452
du[187].namelen = 2
du[187].name = "ee"
du[187].tag = 0xfe8
du[188].freetag = 0xffff
du[188].length = 0x8
du[188].tag = 0xff8

This would need repeating for each block list.

Note that you can record all of this to a file using

log start filename

commands to log

log stop

You could then send us the output.

Steve



<Prev in Thread] Current Thread [Next in Thread>