Storage server, hung tasks and tracebacks
Brian Candler
B.Candler at pobox.com
Sun May 20 11:35:06 CDT 2012
Another update:
I have been trying some various combinations to see under what circumstances
I can make things lock up.
The main discovery: using ext4 instead of xfs, I cannot get the server to
lock up - after 36 hours of continuous testing anyway. With xfs and
everything else identical, it typically locks up within 10 minutes.
This is not to say that xfs is at fault. It may be that xfs generates a
higher peak load of I/O ops or something, and that tickles the problem. In
any case I see a mixture of unkillable processes: not only bonnie++ and
xfsaild but I have also seen kswapd, kworker, irqbalance, even postfix
processes (which should not even be touching the 24-disk array; there is a
separate system disk directly connected to the motherboard's own SATA
controller)
The test is running four concurrent bonnie++ sessions in separate screen
sessions.
Some of the tests performed:
- 24 SATA disks, LSI HBAs, md RAID0, XFS: rapid lockup
- 24 SATA disks, LSI HBAs, md RAID0, ext4: no lockup seen so far
- 2 SATA disks, LSI HBAs, md RAID0, XFS: no lockup
- 1 system SATA disk, motherboard SATA, no RAID, ext4: no lockup
I did also write a ruby script to do lots of concurrent dd reads (at random
offsets) directly from the array. I wasn't able to replicate the problem
with that.
This is with Seagate 7200rpm drives, and the total I/O bandwidth I can see
is quite a lot (see iostat below). I can also replicate the problem in a
similar system with Hitachi "coolspin" (5940rpm?) drives, but it seems to
take somewhat longer, maybe an hour or two, so perhaps the peak I/O ops is
something to do with it?
(These systems do have only 8GB RAM, so I also wondered if it was something
to do with deadlocking when allocating buffer space if not enough was
available)
Regards,
Brian.
avg-cpu: %user %nice %system %iowait %steal %idle
1.55 0.00 80.67 12.63 0.00 5.15
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.40 0.00 2.40 0 12
sdf 187.80 17817.60 109416.80 89088 547084
sde 182.60 17817.60 112051.20 89088 560256
sdd 183.00 17817.60 108800.00 89088 544000
sdc 167.00 17612.80 105840.80 88064 529204
sdg 162.80 17612.80 107735.20 88064 538676
sdh 180.00 18022.40 112230.40 90112 561152
sdp 168.20 17408.00 107929.60 87040 539648
sdj 179.60 17614.40 111346.40 88072 556732
sdq 174.20 17408.00 108544.00 87040 542720
sdk 201.60 17612.80 111206.40 88064 556032
sdb 189.20 17819.20 108800.00 89096 544000
sdl 195.60 17542.40 110387.20 87712 551936
sdo 196.00 17408.00 111206.40 87040 556032
sdm 200.00 17408.00 110796.80 87040 553984
sdn 189.00 17408.00 108544.00 87040 542720
sdi 168.60 18022.40 112025.60 90112 560128
sdr 192.60 17819.20 111858.40 89096 559292
sdu 193.80 17612.80 108953.60 88064 544768
sdv 202.60 17612.80 108851.20 88064 544256
sdw 178.20 17612.80 108953.60 88064 544768
sdy 191.60 17612.80 110796.80 88064 553984
sdx 196.00 17612.80 111616.00 88064 558080
sds 182.80 17612.80 109158.40 88064 545792
sdt 191.60 17203.20 111219.20 86016 556096
md127 7569.80 415064.00 2620999.20 2075320 13104996
More information about the xfs
mailing list