xfs
[Top] [All Lists]

Re: XFS hangs and freezes with LSI 9265-8i controller on high i/o

To: <xfs@xxxxxxxxxxx>
Subject: Re: XFS hangs and freezes with LSI 9265-8i controller on high i/o
From: Matthew Whittaker-Williams <matthew@xxxxxxxxx>
Date: Tue, 12 Jun 2012 17:56:23 +0200
In-reply-to: <20120612011812.GK22848@dastard>
References: <4FD66513.2000108@xxxxxxxxx> <20120612011812.GK22848@dastard>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20111220 Thunderbird/9.0
On 6/12/12 3:18 AM, Dave Chinner wrote:
On Mon, Jun 11, 2012 at 11:37:23PM +0200, Matthew Whittaker-Williams wrote:
[ 6110.300098]  [<ffffffff813569a4>] ? kernel_thread_helper+0x4/0x10
That's pretty much a meaningless stack trace. Can you recompile your
kernel with frame pointers enabled so we can get a reliable stack
trace?

See attached for new trace.
Could you have a look into this issue?
We know there is a lurking problem that we've been trying to flush
out over the past couple of months. Do a search for hangs in
xlog_grant_log_space - we've found several problems in
the process, but there's still a remaining hang that is likely to be
the source of your problems.

I see, it indeed seems about the same issues we encounter.
If you need any more information I am happy to provide it.
What workload are you running that triggers this?

About our workload, we are providing usenet with diablo.
We are currently triggering the issue with running several diloadfromspool commands. This will scan the entire spool and extracts article location size and message-id and placement information.

/news/dbin/diloadfromspool -a -S01 &
/news/dbin/diloadfromspool -a -S02 &
/news/dbin/diloadfromspool -a -S03 &
/news/dbin/diloadfromspool -a -S04 &
/news/dbin/diloadfromspool -a -S05 &
/news/dbin/diloadfromspool -a -S06 &
/news/dbin/diloadfromspool -a -S07 &
/news/dbin/diloadfromspool -a -S08 &
/news/dbin/diloadfromspool -a -S09 &
/news/dbin/diloadfromspool -a -S10 &
/news/dbin/diloadfromspool -a -S11 &
/news/dbin/diloadfromspool -a -S12 &
/news/dbin/diloadfromspool -a -S13 &
/news/dbin/diloadfromspool -a -S14 &
/news/dbin/diloadfromspool -a -S15 &
/news/dbin/diloadfromspool -a -S16 &

But we have also been able to produce the same trigger with running multiple bonnie++ commands.

nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie1 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie3 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie4 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie5 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie6 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie7 -f -b -n 1 -u root & nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie8 -f -b -n 1 -u root &

It gets triggered when filesystem gets generally alot of reads.

Filesystem info:

meta-data=/dev/sda isize=256 agcount=41, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

meta-data=/dev/sdb isize=256 agcount=41, agsize=268435455 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 40.014 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 24
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None


Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 40.014 TB
State               : Optimal
Strip Size          : 1.0 MB
Number Of Drives    : 24
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU
Access Policy       : Read/Write
Disk Cache Policy   : Enabled
Encryption Type     : None


See attachement for controller info if this might help any bit.



Cheers,

Dave.

Attachment: kernel-trace.txt
Description: Text document

Attachment: controllerinfo.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>