I have a server with a single 3ware 7500-8 board and 8 Maxtor 160GB disks
running as a hardware RAID5 w/ a hot spare. I'm running RedHat 7.3 and
the 2.4.18-18SGI_XFS_1.2.0smp kernel (more on the kernel later). The
partitions look like this:
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda1 505605 133975 345526 28% /
/dev/sda2 4127108 3810308 107152 98% /home
none 1031920 0 1031920 0% /dev/shm
/dev/sda7 1027768 20140 955420 3% /tmp
/dev/sda3 2063536 1797416 161296 92% /usr
/dev/sda5 3099260 2799304 142524 96% /usr/local
/dev/sda6 1027768 117548 858012 13% /var
/dev/sda8 948330428 627683892 320646536 67% /data
Only /data is XFS, the rest are ext3. Backups on /data on this machine
have been slowing to a crawl. I use amanda, and even the estimate phase
(which does 'tar cf /dev/null' and so should be very fast) takes hours. I
tried upgrading to 2.4.21 plus the 1.3 XFS patches (I'm running this on 2
other 3ware based servers), and this made the situation much worse.
During backups, the OOM killer started going crazy, killing bash sessions
and eventually the estimate tar process.
I've pretty much ruled out hardware. I've swapped the 3ware and rebuilt
the array, and the disks all show good SMART data.
In narrowing down the problem, it seems that one particular (large)
directory is the main culprit. This dir is 471,401,788 KB big and has
3,377,520 files (~140KB/file average). Is the large number of files the
entire culprit? If so, is there anything I can do to alleviate the
problem? I already 'mount -o logbufs=8'. Here's xfs_info on that
partition:
meta-data=/data isize=256 agcount=227, agsize=1048576 blks
= sectsz=512
data = bsize=4096 blocks=237115375, imaxpct=25
= sunit=16 swidth=96 blks, unwritten=0
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
Any pointers would be much appreciated.
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
|