xfs
[Top] [All Lists]

Re: Missing Files

To: Ted Kline <jtk@xxxxxxx>
Subject: Re: Missing Files
From: Steve Lord <lord@xxxxxxx>
Date: Mon, 03 Apr 2000 08:46:00 -0500
Cc: limhk@xxxxxxxxxxxxxxxxxxxxxx (Hway Kiong Lim), linux-xfs@xxxxxxxxxxx
In-reply-to: Your message of "Sat, 01 Apr 2000 15:59:06 CST
Sender: owner-linux-xfs@xxxxxxxxxxx
> > 
> > Hi,
> > 
> >     Using the following test.sh script :
> > 
> > #!/bin/sh
> > 
> > counter=1000
> > while [ $counter -le 2000 ]
> > do
> > 
> >   echo "File number $counter" >> file$counter
> >   counter=$(( $counter + 1))
> > done
> > 
> > I try to create 1000 (1001 to be precise) files in an xfs mounted directory
. 
> > This is waht happens:
> > 
> > beauty:/mnt/test# sh /tmp/test.sh
> > beauty:/mnt/test# ls | wc
> >     995     995    8955
> > beauty:/mnt/test# rm *
> > beauty:/mnt/test# ls
> > file1144  file1291  file1438  file1585  file1732  file1879
> > beauty:/mnt/test# ls
> > 
> 
> This looks very much like a bug we found in glibc in the getdents syscall
> interface routine having to do with d_off values in the dirent structure
> using bit 2^31, and getdents64 not using lseek64..  Originally it showed
> up when running a 2.3 kernel as a client to an NFS server exporting
> an XFS filesystem.  I was thinking we'd defaulted to dir2 format, and
> that should've kept us from seeing this problem, looks like more
> digging is required..
> 


Ted, this is caused by something in the dir 2 handling of the d_off field
in the dirent structure. We are indeed hitting the scenario where the
glibc getdents code does a seek backwards. The d_off field is supposed to
be the offset of the following directory entry. However, running strace
on an ls on a large directory shows that it seeks to a specific offset,
but the next getdents call comes out starting with the record after the
one we are dealing with - hence we skip one. These offsets are not real
offsets of course.

I modified the script to this:

#!/bin/sh

counter=0
while [ $counter -le 2000 ]
do

  echo "File number $counter" >> long-named-file$counter
  counter=$(( $counter + 1))
done

Here is a snapshot of strace output:

{d_ino=2743392, d_off=6638, d_reclen=32, d_name="long-named-file1644"}
{d_ino=2743393, d_off=6642, d_reclen=32, d_name="long-named-file1645"}
{d_ino=2743394, d_off=6646, d_reclen=32, d_name="long-named-file1646"}
{d_ino=2743395, d_off=6650, d_reclen=32, d_name="long-named-file1647"}
{d_ino=2743396, d_off=6654, d_reclen=32, d_name="long-named-file1648"}
{d_ino=2743397, d_off=6662, d_reclen=32, d_name="long-named-file1649"}
{d_ino=2743398, d_off=6666, d_reclen=32, d_name="long-named-file1650"}
{d_ino=2743399, d_off=6670, d_reclen=32, d_name="long-named-file1651"}
{d_ino=2743400, d_off=6674, d_reclen=32, d_name="long-named-file1652"}
.... and so on
{d_ino=2744476, d_off=6882, d_reclen=32, d_name="long-named-file1704"}
{d_ino=2744477, d_off=6882, d_reclen=32, d_name="long-named-file1705"}}, 
54241) = 54220
lseek(4, 6646, SEEK_SET)                = 6646
brk(0x806f000)                          = 0x806f000
brk(0x807b000)                          = 0x807b000
brk(0x8092000)                          = 0x8092000
mmap(NULL, 188416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x4010b000
mremap(0x4010b000, 188416, 372736, MREMAP_MAYMOVE) = 0x4010b000
getdents(4, {
{d_ino=2743396, d_off=6654, d_reclen=32, d_name="long-named-file1648"}
{d_ino=2743397, d_off=6662, d_reclen=32, d_name="long-named-file1649"}

So we should have seeked to the record for file 1647, but it went
missing.

I suspect some of my other changes fixed the original script - maybe
ls started using larger buffers? Or maybe my ls/libc is different from the
one which hit the original problem.

we also have reports of ls going into an infinite loop over NFS which
could be related to this.

Steve






<Prev in Thread] Current Thread [Next in Thread>