XFS hangs and freezes with LSI 9265-8i controller on high i/o
Matthew Whittaker-Williams
matthew at xsnews.nl
Thu Jun 14 09:31:15 CDT 2012
On 6/14/12 2:04 AM, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 10:54:04AM +0200, Matthew Whittaker-Williams wrote:
>> On 6/13/12 3:19 AM, Dave Chinner wrote:
>>> With the valid stack traces, I see that it isn't related to the log,
>>> though.
>> Ah ok, we are triggering a new issue?
> No, your system appears to be stalling waiting for IO completion.
Yes, this is indeed what we experience.
> This is a fluke, we are running several new systems and this is just
> one of the new servers.
> Which indeed has a wrong stripe set, this should be 1MB.
> We actually found stripe size set of 1MB to give better performance
> overall than 64/256/512
> So if you fix that, does the problem go away?
No , unfortunately not.
Currently with 1MB stripe set and:
root at sd70:~# xfs_info /dev/sda
meta-data=/dev/sda isize=256 agcount=41,
agsize=268435200 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=10741350400, imaxpct=5
= sunit=256 swidth=5632 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
It still stalled out with the same error.
>> I understand that RAID 10 is better for performance for reads on
>> small files sets. But with raid 10 we of course loose a lot of
>> disk space compared to RAID 6. Side note to this we have been
>> running RAID 6 for years now without any issues.
> but have you been running 24 disk RAID6 volumes? With RAID5/6, the
> number of disks of the volume really matters - for small write IOs,
> the more disks in the RAID6 volume, the slower it will be...
Yes we have, but haven't found large issues with performance in the past
on a disk span of 24.
And most raid controllers don't support more than 32 disk in a raid
array so we kept it at 24 disks per array with large arrays.
>
>> In the past we did tune our xfs filesystem with switches like
>> sunit and swidth. But back then we couldn't see much peformance
>> difference between using:
>>
>> mkfs.xfs -f -L P.01 -l lazy-count=1 -d su=1m,sw=22 /dev/sda
>>
>> and
>>
>> mkfs.xfs -f -L P.01 -l lazy-count=1 /dev/sda
> You won't see much difference with the BBWC enabled. It does affect
> how files and inodes are allocated, though, so the aging
> characteristics of the filesystem will be better for an aligned
> filesystem. i.e. you might not notice the performance now, but after
> a coupl eof years in production you probably will...
We haven't seen this impact just yet and are doing about 120K number of
reads of sector reads.
But this is probably not really high compared to our smaller arrays.
>
>> xfs_info from a system that shows no problems with an H800
>> Controller from dell ( same chipset as the LSI controllers )
>
I went and tried the H800 Controller with a single array of the new
spool and unfortunately this also hung.
[ 6123.108138] INFO: task diablo:11963 blocked for more than 120 seconds.
[ 6123.108208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 6123.108297] diablo D ffff8805ec4e9068 0 11963 11210
0x00000000
[ 6123.108444] ffff88046fd315f8 0000000000000082 ffff8800dd1407a8
ffff88025a8f5000
[ 6123.108681] 000000000025a8f5 ffff88046fd30010 0000000000013340
0000000000013340
[ 6123.108935] ffff88046fd31fd8 0000000000013340 ffff88046fd31fd8
0000000000013340
[ 6123.109171] Call Trace:
[ 6123.109276] [<ffffffff8137e3bf>] schedule+0x5f/0x61
[ 6123.109343] [<ffffffff8137d0a5>] schedule_timeout+0x31/0xde
[ 6123.109415] [<ffffffff810d0efe>] ? __probe_kernel_read+0x36/0x55
[ 6123.109487] [<ffffffff8110b76d>] ? kmem_cache_alloc+0x61/0x118
[ 6123.109557] [<ffffffff8137da22>] __down_common+0x96/0xe4
[ 6123.109634] [<ffffffffa02e04b8>] ? _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.109704] [<ffffffff8137dacf>] __down+0x18/0x1a
[ 6123.109771] [<ffffffff81060428>] down+0x28/0x38
[ 6123.109844] [<ffffffffa02df786>] xfs_buf_lock+0x6f/0xc0 [xfs]
[ 6123.109922] [<ffffffffa02e04b8>] _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.110000] [<ffffffffa02e0713>] xfs_buf_get+0x25/0x172 [xfs]
[ 6123.110090] [<ffffffffa02e087a>] xfs_buf_read+0x1a/0xc5 [xfs]
[ 6123.110169] [<ffffffffa033bc4c>] xfs_trans_read_buf+0x35d/0x54d [xfs]
[ 6123.110258] [<ffffffffa0326507>] xfs_imap_to_bp+0x45/0x1fe [xfs]
[ 6123.110345] [<ffffffffa032873e>] xfs_iread+0x5b/0x195 [xfs]
[ 6123.110423] [<ffffffffa02e5e59>] xfs_iget_cache_miss+0x5e/0x1cf [xfs]
[ 6123.110507] [<ffffffffa02e64e3>] xfs_iget+0xf7/0x184 [xfs]
[ 6123.110591] [<ffffffffa0325b36>] xfs_ialloc+0xc1/0x5ef [xfs]
[ 6123.110673] [<ffffffffa02f3b4d>] ? kmem_zone_zalloc+0x1f/0x30 [xfs]
[ 6123.110757] [<ffffffffa0336007>] ? xlog_grant_head_check+0x8f/0x101
[xfs]
[ 6123.110842] [<ffffffffa02f00e1>] xfs_dir_ialloc+0x9d/0x284 [xfs]
[ 6123.110926] [<ffffffffa02f382e>] xfs_create+0x2f5/0x547 [xfs]
[ 6123.111006] [<ffffffffa02ea4a2>] xfs_vn_mknod+0xcc/0x160 [xfs]
[ 6123.111086] [<ffffffffa02ea557>] xfs_vn_create+0xe/0x10 [xfs]
[ 6123.111156] [<ffffffff8111aaba>] vfs_create+0x67/0x89
[ 6123.111224] [<ffffffff8111b942>] do_last+0x236/0x565
[ 6123.111292] [<ffffffff8111c23a>] path_openat+0xcb/0x30c
[ 6123.111360] [<ffffffff8111c56a>] do_filp_open+0x38/0x84
[ 6123.111429] [<ffffffff8111a32f>] ? getname_flags+0x15b/0x1e2
[ 6123.111499] [<ffffffff8112694b>] ? alloc_fd+0x6c/0xfc
[ 6123.111566] [<ffffffff8110f26c>] do_sys_open+0x6f/0x101
[ 6123.111634] [<ffffffff8110f32b>] sys_open+0x1c/0x1e
[ 6123.111702] [<ffffffff813858f9>] system_call_fastpath+0x16/0x1b
See attached dmesg.txt
iostat:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 81.80 1.40 10.22 0.18
256.00 531.91 5349.11 12.02 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 83.40 1.20 10.37 0.15
254.56 525.35 4350.67 11.82 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 79.20 0.80 9.90 0.10
256.00 530.14 3153.38 12.50 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 72.80 2.20 9.09 0.13
251.72 546.08 8709.54 13.33 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 79.80 1.40 9.95 0.12
254.07 535.35 5172.22 12.32 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 99.60 1.20 12.41 0.08
253.86 529.49 3560.89 9.92 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 60.80 1.40 7.59 0.11
253.77 527.21 6545.50 16.08 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 79.00 1.80 9.84 0.08
251.51 547.93 6400.42 12.38 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 82.20 2.20 10.25 0.01
248.93 536.42 7415.77 11.85 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 89.40 2.20 11.17 0.01
249.90 525.68 7232.96 10.92 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 82.00 1.20 10.22 0.08
253.37 541.60 4170.95 12.02 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 62.80 2.60 7.85 0.14
250.31 541.15 11260.81 15.29 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 85.00 1.80 10.61 0.21
255.47 529.36 6514.85 11.52 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 75.20 1.40 9.38 0.11
253.72 535.68 5416.70 13.05 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 66.80 1.20 8.33 0.11
254.19 546.68 5459.11 14.71 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 81.40 0.80 10.15 0.10
255.38 540.62 3171.57 12.17 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 72.20 1.20 9.02 0.15
255.74 535.26 5345.51 13.62 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 91.00 1.00 11.35 0.12
255.44 531.02 3637.72 10.87 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 81.00 1.60 10.12 0.20
255.96 524.44 6513.22 12.11 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 72.80 2.40 9.04 0.26
253.24 543.25 9071.66 13.30 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 73.80 1.20 9.18 0.15
254.63 539.20 5087.91 13.33 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 79.20 1.40 9.90 0.18
256.00 532.38 5592.38 12.41 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.20 79.40 1.00 9.90 0.12
255.36 528.07 4091.22 12.44 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 88.40 1.20 11.05 0.15
256.00 528.13 4349.35 11.16 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 69.60 2.40 8.65 0.23
252.71 527.46 9334.37 13.89 100.00
> Which indicates the problem is almost certainly related to the
> storage configuration or drivers, not the filesystem....
We have seen issues with xfs in the past but ofcourse this might be
related to the drivers.
The storage configuration shouldn't be the problem here, note also
changed it to 1MB stripe size.
I'll continue to look if I am able to find the issue in the hardware.
>
> Cheers,
>
> Dave.
Thanks
Kind regards
Matthew
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120614/ebab7f2f/attachment-0001.txt>
More information about the xfs
mailing list