On 6/12/12 3:18 AM, Dave Chinner wrote:
On Mon, Jun 11, 2012 at 11:37:23PM +0200, Matthew Whittaker-Williams wrote:
[ 6110.300098] [<ffffffff813569a4>] ? kernel_thread_helper+0x4/0x10
That's pretty much a meaningless stack trace. Can you recompile your
kernel with frame pointers enabled so we can get a reliable stack
trace?
See attached for new trace.
Could you have a look into this issue?
We know there is a lurking problem that we've been trying to flush
out over the past couple of months. Do a search for hangs in
xlog_grant_log_space - we've found several problems in
the process, but there's still a remaining hang that is likely to be
the source of your problems.
I see, it indeed seems about the same issues we encounter.
If you need any more information I am happy to provide it.
What workload are you running that triggers this?
About our workload, we are providing usenet with diablo.
We are currently triggering the issue with running several
diloadfromspool commands.
This will scan the entire spool and extracts article location size and
message-id and placement information.
/news/dbin/diloadfromspool -a -S01 &
/news/dbin/diloadfromspool -a -S02 &
/news/dbin/diloadfromspool -a -S03 &
/news/dbin/diloadfromspool -a -S04 &
/news/dbin/diloadfromspool -a -S05 &
/news/dbin/diloadfromspool -a -S06 &
/news/dbin/diloadfromspool -a -S07 &
/news/dbin/diloadfromspool -a -S08 &
/news/dbin/diloadfromspool -a -S09 &
/news/dbin/diloadfromspool -a -S10 &
/news/dbin/diloadfromspool -a -S11 &
/news/dbin/diloadfromspool -a -S12 &
/news/dbin/diloadfromspool -a -S13 &
/news/dbin/diloadfromspool -a -S14 &
/news/dbin/diloadfromspool -a -S15 &
/news/dbin/diloadfromspool -a -S16 &
But we have also been able to produce the same trigger with running
multiple bonnie++ commands.
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie1 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie2 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie3 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie4 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie5 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie6 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie7 -f -b
-n 1 -u root &
nohup bonnie++ -r 4096 -s 81920 -d /news/spool/news/P.01/bonnie8 -f -b
-n 1 -u root &
It gets triggered when filesystem gets generally alot of reads.
Filesystem info:
meta-data=/dev/sda isize=256 agcount=41,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=10741350400, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
meta-data=/dev/sdb isize=256 agcount=41,
agsize=268435455 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=10741350400, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 40.014 TB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 24
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 40.014 TB
State : Optimal
Strip Size : 1.0 MB
Number Of Drives : 24
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if
Bad BBU
Access Policy : Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
See attachement for controller info if this might help any bit.
Cheers,
Dave.
kernel-trace.txt
Description: Text document
controllerinfo.txt
Description: Text document
|