XFS hangs and freezes with LSI 9265-8i controller on high i/o

Matthew Whittaker-Williams matthew at xsnews.nl
Tue Jun 12 10:56:23 CDT 2012


On 6/12/12 3:18 AM, Dave Chinner wrote:
> On Mon, Jun 11, 2012 at 11:37:23PM +0200, Matthew Whittaker-Williams wrote:
>> [ 6110.300098]  [<ffffffff813569a4>] ? kernel_thread_helper+0x4/0x10
> That's pretty much a meaningless stack trace. Can you recompile your
> kernel with frame pointers enabled so we can get a reliable stack
> trace?

See attached for new trace.
>> Could you have a look into this issue?
> We know there is a lurking problem that we've been trying to flush
> out over the past couple of months. Do a search for hangs in
> xlog_grant_log_space - we've found several problems in
> the process, but there's still a remaining hang that is likely to be
> the source of your problems.

I see, it indeed seems about the same issues we encounter.
>> If you need any more information I am happy to provide it.
> What workload are you running that triggers this?

About our workload, we are providing usenet with diablo.
We are currently triggering the issue with running several 
diloadfromspool commands.
This will scan the entire spool and extracts article location size and 
message-id and placement information.

/news/dbin/diloadfromspool -a -S01 &
/news/dbin/diloadfromspool -a -S02 &
/news/dbin/diloadfromspool -a -S03 &
/news/dbin/diloadfromspool -a -S04 &
/news/dbin/diloadfromspool -a -S05 &
/news/dbin/diloadfromspool -a -S06 &
/news/dbin/diloadfromspool -a -S07 &
/news/dbin/diloadfromspool -a -S08 &
/news/dbin/diloadfromspool -a -S09 &
/news/dbin/diloadfromspool -a -S10 &
/news/dbin/diloadfromspool -a -S11 &
/news/dbin/diloadfromspool -a -S12 &
/news/dbin/diloadfromspool -a -S13 &
/news/dbin/diloadfromspool -a -S14 &
/news/dbin/diloadfromspool -a -S15 &
/news/dbin/diloadfromspool -a -S16 &

But we have also been able to produce the same trigger with running 
multiple bonnie++ commands.

nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie1 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie3 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie4 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie5 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie6 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie7 -f -b 
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie8 -f -b 
-n 1 -u root &

It gets triggered when filesystem gets generally alot of reads.

Filesystem info:

meta-data=/dev/sda               isize=256    agcount=41, 
agsize=268435455 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

meta-data=/dev/sdb               isize=256    agcount=41, 
agsize=268435455 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 40.014 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 24
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if 
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if 
Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None


Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 40.014 TB
State               : Optimal
Strip Size          : 1.0 MB
Number Of Drives    : 24
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if 
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if 
Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None


See attachement for controller info if this might help any bit.


>
> Cheers,
>
> Dave.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: kernel-trace.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120612/305d96c0/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: controllerinfo.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120612/305d96c0/attachment-0003.txt>


More information about the xfs mailing list