XFS hangs and freezes with LSI 9265-8i controller on high i/o
Matthew Whittaker-Williams
matthew at xsnews.nl
Tue Jun 12 10:56:23 CDT 2012
On 6/12/12 3:18 AM, Dave Chinner wrote:
> On Mon, Jun 11, 2012 at 11:37:23PM +0200, Matthew Whittaker-Williams wrote:
>> [ 6110.300098] [<ffffffff813569a4>] ? kernel_thread_helper+0x4/0x10
> That's pretty much a meaningless stack trace. Can you recompile your
> kernel with frame pointers enabled so we can get a reliable stack
> trace?
See attached for new trace.
>> Could you have a look into this issue?
> We know there is a lurking problem that we've been trying to flush
> out over the past couple of months. Do a search for hangs in
> xlog_grant_log_space - we've found several problems in
> the process, but there's still a remaining hang that is likely to be
> the source of your problems.
I see, it indeed seems about the same issues we encounter.
>> If you need any more information I am happy to provide it.
> What workload are you running that triggers this?
About our workload, we are providing usenet with diablo.
We are currently triggering the issue with running several
diloadfromspool commands.
This will scan the entire spool and extracts article location size and
message-id and placement information.
/news/dbin/diloadfromspool -a -S01 &
/news/dbin/diloadfromspool -a -S02 &
/news/dbin/diloadfromspool -a -S03 &
/news/dbin/diloadfromspool -a -S04 &
/news/dbin/diloadfromspool -a -S05 &
/news/dbin/diloadfromspool -a -S06 &
/news/dbin/diloadfromspool -a -S07 &
/news/dbin/diloadfromspool -a -S08 &
/news/dbin/diloadfromspool -a -S09 &
/news/dbin/diloadfromspool -a -S10 &
/news/dbin/diloadfromspool -a -S11 &
/news/dbin/diloadfromspool -a -S12 &
/news/dbin/diloadfromspool -a -S13 &
/news/dbin/diloadfromspool -a -S14 &
/news/dbin/diloadfromspool -a -S15 &
/news/dbin/diloadfromspool -a -S16 &
But we have also been able to produce the same trigger with running
multiple bonnie++ commands.
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie1 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie3 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie4 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie5 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie6 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie7 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie8 -f -b
-n 1 -u root &
It gets triggered when filesystem gets generally alot of reads.
Filesystem info:
meta-data=/dev/sda isize=256 agcount=41,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=10741350400, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
meta-data=/dev/sdb isize=256 agcount=41,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=10741350400, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 40.014 TB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 24
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 40.014 TB
State : Optimal
Strip Size : 1.0 MB
Number Of Drives : 24
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
See attachement for controller info if this might help any bit.
>
> Cheers,
>
> Dave.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: kernel-trace.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120612/305d96c0/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: controllerinfo.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120612/305d96c0/attachment-0003.txt>
More information about the xfs
mailing list