On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote:
> > On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote:
> > > +
> > > + pthread_mutex_lock(&libxfs_max_lsn_lock);
> > > +
> > > + max_cycle = CYCLE_LSN(libxfs_max_lsn);
> > > + max_block = BLOCK_LSN(libxfs_max_lsn);
> > > +
> > > + if ((cycle > max_cycle) ||
> > > + (cycle == max_cycle && block > max_block))
> > > + libxfs_max_lsn = lsn;
Actually, we have XFS_LSN_CMP(lsn1, lsn2) for this. i.e.
if (XFS_LSN_CMP(lsn, libxfs_max_lsn) > 0)
libxfs_max_lsn = lsn;
> > > +
> > > + pthread_mutex_unlock(&libxfs_max_lsn_lock);
> >
> > This will have the same lock contention problems that the kernel
> > code would have had - my repair scalablity tests regularly reach
> > over 1GB/s of metadata being prefetched through tens of threads, so
> > this is going have a significant impact on performance in those
> > tests....
> >
>
> Hmm, Ok. I initially didn't have locking here (by accident) but
> reproduced some breakage for which the exact details escape me. I
> suspect it was a test and set race.
>
> I suppose we could try doing something like what we plan to do on the
> kernel side in terms of a racy check followed by the locked check so the
> lock contention goes away once we find the max lsn. The context here is
> different, however, in that we're using a 64-bit LSN rather than
> independently updated cycle and block fields. It could also take a while
> to stabilize the max lsn depending on the fs. (There are some details on
> the kernel side I'd like to get worked out first as well).
It still has to work on 32 bit machines, where 64 bit operations are
not atomic....
> Perhaps we could try to make this smarter... in the case where either
> the log has been zeroed or we've identified an LSN beyond the current
> log state, we only really need to track the max cycle value since a log
> format is imminent at that point. In the case where the fs is
> consistent, we could seed the starting value with the current log lsn,
> or track per-ag max LSNs without locking and compare them at at the end,
> etc.
Yes, we should seen the initial "max lsn" with whatever we find in
the log, regardless of whether the fs is consistent or not...
Also, per-ag max lsn tracking would break the lock up - that will
bring scope down to the point where contention won't be a problem...
> I'll have to think about this some more and see what's effective. I'd
> also like to quantify the effect the current locking has on performance
> if possible. Can you provide a brief description of your typical repair
> test that you would expect this to hurt? E.g., a large fs, many AGs,
> populated with fs_mark and repaired with many threads..? Any special
> storage configuration? Thanks.
Just my usual 500TB fs_mark test...
$ cat ~/tests/fsmark-50-test-xfs.sh
#!/bin/bash
QUOTA=
MKFSOPTS=
NFILES=100000
DEV=/dev/vdc
LOGBSIZE=256k
while [ $# -gt 0 ]; do
case "$1" in
-q) QUOTA="uquota,gquota,pquota" ;;
-N) NFILES=$2 ; shift ;;
-d) DEV=$2 ; shift ;;
-l) LOGBSIZE=$2; shift ;;
--) shift ; break ;;
esac
shift
done
MKFSOPTS="$MKFSOPTS $*"
echo QUOTA=$QUOTA
echo MKFSOPTS=$MKFSOPTS
echo DEV=$DEV
sudo umount /mnt/scratch > /dev/null 2>&1
sudo mkfs.xfs -f $MKFSOPTS $DEV
sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch
sudo chmod 777 /mnt/scratch
cd /home/dave/src/fs_mark-3.3/
sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear"
time ./fs_mark -D 10000 -S0 -n $NFILES -s 0 -L 32 \
-d /mnt/scratch/0 -d /mnt/scratch/1 \
-d /mnt/scratch/2 -d /mnt/scratch/3 \
-d /mnt/scratch/4 -d /mnt/scratch/5 \
-d /mnt/scratch/6 -d /mnt/scratch/7 \
-d /mnt/scratch/8 -d /mnt/scratch/9 \
-d /mnt/scratch/10 -d /mnt/scratch/11 \
-d /mnt/scratch/12 -d /mnt/scratch/13 \
-d /mnt/scratch/14 -d /mnt/scratch/15 \
| tee >(stats --trim-outliers | tail -1 1>&2)
sync
echo Repair
sudo umount /mnt/scratch
time sudo xfs_repair -o bhash=100101 -v -v -t 1 $DEV
time sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch
echo bulkstat files
time (
sudo ~/src/xfstests-dev/src/bstat -q /mnt/scratch 1024 | wc -l
)
echo walking files
~/tests/walk-scratch.sh
echo removing files
for f in /mnt/scratch/* ; do time rm -rf $f & done
wait
sudo umount /mnt/scratch
$
--
Dave Chinner
david@xxxxxxxxxxxxx
|