Hello,
A while back I reported that the test case 011 trips an ASSERT on POWER
architecture, but not in x86_64.
I started comparing the code and quickly realized that the problem is
_not_ arch specific, but could make the test case 011 fail, with reduced
log on x86_64. But, I could make the POWER not fail by simply increasing
the file system size to 100G (from 20G).
After some debug I found that I get into this racy situation when the
free threshold drops and we flush the log buffer to the disk.
i.e in function xlog_grant_push_ail(), if we return at
if (free_blocks >= free_threshold)
return;
we do not get into the race that trips the ASSERT.
Then I started comparing the behavioral difference bet the two ARCHs,
and I found that in POWER I see more number of threads at a time (max of
4 threads) in the function xlog_grant_log_space(), whereas in x86_64 I
see max of only two and mostly it is only one.
I also noted that in POWER test case 011 takes about 8 seconds whereas
in x86_64, it takes about 165 seconds.
So, I ventured into the core of test case 011, dirstress, and found that
simply creating 1000s of files under a directory takes very long time in
x86_64 compare to POWER(1 min 15s Vs 2s)
Note: Attached is the source file (stripped version of dirstress.c) for
the program b.
------------------POWER----------------------------------
root@test135 chandra]# uname -a
Linux test135.beaverton.ibm.com 2.6.38-rc7 #1 SMP Fri Mar 4 09:36:14 PST
2011 ppc64 ppc64 ppc64 GNU/Linux
[root@test135 chandra]# grep -e xfs -e home /proc/mounts
none /selinux selinuxfs rw,relatime 0 0
/dev/mapper/vg_test135-lv_home /home ext4
rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sda8 /mnt/xfsMntPt xfs rw,seclabel,relatime,attr2,noquota 0 0
[root@test135 chandra]# ###### Run test on XFS filesystem
[root@test135 chandra]# time ./b /mnt/xfsMntPt/dir 10000 1
i 0
real 0m2.055s
user 0m0.011s
sys 0m0.732s
[root@test135 chandra]# ###### Run test of ext4 filesystem
[root@test135 chandra]# time ./b /home/dir 10000 1
i 0
real 0m0.355s
user 0m0.009s
sys 0m0.304s
--------------------x86_64----------------------------------------
[root@test27 chandra]# uname -a
Linux test27 2.6.38-rc7 #4 SMP Wed Mar 9 08:37:32 PST 2011 x86_64 x86_64
x86_64 GNU/Linux
[root@test27 chandra]# grep -e xfs -e home /proc/mounts
none /selinux selinuxfs rw,relatime 0 0
/dev/sdc3 /home ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sdb1 /mnt/xfsMntPt xfs rw,seclabel,relatime,attr2,noquota 0 0
[root@test27 chandra]# ###### Run test on XFS filesystem
[root@test27 chandra]# time ./b /mnt/xfsMntPt/dir 10000 1
i 0
real 1m15.700s
user 0m0.030s
sys 0m1.679s
[root@test27 chandra]# ###### Run test of ext4 filesystem
[root@test27 chandra]# time ./b /home/dir 10000 1
i 0
real 0m0.317s
user 0m0.010s
sys 0m0.306s
-------------------------------------------------------------------
After quite an amount of debug I found that I can make it trip the
ASSERT in x86_64 also, if I start sufficient of threads accessing the
file system. Basically, "./b /mnt/xfsMntPt/dir 100 100" trips the
ASSERT.
I have two questions:
1. Does anybody have any explanation why x86_64 is so slow, compared
with POWER ?
2. Any suggestions on how to debug and fix the race condition ?
Thanks & Regards,
chandra
b.c
Description: Text Data
|