XFS hangs and freezes with LSI 9265-8i controller on high i/o

Matthew Whittaker-Williams matthew at xsnews.nl
Thu Jun 14 09:31:15 CDT 2012


On 6/14/12 2:04 AM, Dave Chinner wrote:
> On Wed, Jun 13, 2012 at 10:54:04AM +0200, Matthew Whittaker-Williams wrote:
>> On 6/13/12 3:19 AM, Dave Chinner wrote:
>>> With the valid stack traces, I see that it isn't related to the log,
>>> though.
>> Ah ok, we are triggering a new issue?
> No, your system appears to be stalling waiting for IO completion.

Yes, this is indeed what we experience.
> This is a fluke, we are running several new systems and this is just
> one of the new servers.
> Which indeed has a wrong stripe set, this should be 1MB.
> We actually found stripe size set of 1MB to give better performance
> overall than 64/256/512
> So if you fix that, does the problem go away?

No , unfortunately not.

Currently with 1MB stripe set and:

root at sd70:~# xfs_info /dev/sda
meta-data=/dev/sda               isize=256    agcount=41, 
agsize=268435200 blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
          =                       sunit=256    swidth=5632 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

It still stalled out with the same error.

>> I understand that RAID 10 is better for performance for reads on
>> small files sets.  But with raid 10 we of course loose a lot of
>> disk space compared to RAID 6.  Side note to this we have been
>> running RAID 6 for years now without any issues.
> but have you been running 24 disk RAID6 volumes? With RAID5/6, the
> number of disks of the volume really matters - for small write IOs,
> the more disks in the RAID6 volume, the slower it will be...
Yes we have, but haven't found large issues with performance in the past 
on a disk span of 24.
And most raid controllers don't support more than 32 disk in a raid 
array so we kept it at 24 disks per array with large arrays.

>
>> In the past we did tune our xfs filesystem with switches like
>> sunit and swidth.  But back then we couldn't see much peformance
>> difference between using:
>>
>> mkfs.xfs -f -L P.01 -l lazy-count=1 -d su=1m,sw=22 /dev/sda
>>
>> and
>>
>> mkfs.xfs -f -L P.01 -l lazy-count=1 /dev/sda
> You won't see much difference with the BBWC enabled. It does affect
> how files and inodes are allocated, though, so the aging
> characteristics of the filesystem will be better for an aligned
> filesystem. i.e. you might not notice the performance now, but after
> a coupl eof years in production you probably will...

We haven't seen this impact just yet and are doing about 120K number of 
reads of sector reads.
But this is probably not really high compared to our smaller arrays.

>
>> xfs_info from a system that shows no problems with an H800
>> Controller from dell ( same chipset as the LSI controllers )
>
I went and tried the H800 Controller with a single array of the new 
spool and unfortunately this also hung.

[ 6123.108138] INFO: task diablo:11963 blocked for more than 120 seconds.
[ 6123.108208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 6123.108297] diablo          D ffff8805ec4e9068     0 11963  11210 
0x00000000
[ 6123.108444]  ffff88046fd315f8 0000000000000082 ffff8800dd1407a8 
ffff88025a8f5000
[ 6123.108681]  000000000025a8f5 ffff88046fd30010 0000000000013340 
0000000000013340
[ 6123.108935]  ffff88046fd31fd8 0000000000013340 ffff88046fd31fd8 
0000000000013340
[ 6123.109171] Call Trace:
[ 6123.109276]  [<ffffffff8137e3bf>] schedule+0x5f/0x61
[ 6123.109343]  [<ffffffff8137d0a5>] schedule_timeout+0x31/0xde
[ 6123.109415]  [<ffffffff810d0efe>] ? __probe_kernel_read+0x36/0x55
[ 6123.109487]  [<ffffffff8110b76d>] ? kmem_cache_alloc+0x61/0x118
[ 6123.109557]  [<ffffffff8137da22>] __down_common+0x96/0xe4
[ 6123.109634]  [<ffffffffa02e04b8>] ? _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.109704]  [<ffffffff8137dacf>] __down+0x18/0x1a
[ 6123.109771]  [<ffffffff81060428>] down+0x28/0x38
[ 6123.109844]  [<ffffffffa02df786>] xfs_buf_lock+0x6f/0xc0 [xfs]
[ 6123.109922]  [<ffffffffa02e04b8>] _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.110000]  [<ffffffffa02e0713>] xfs_buf_get+0x25/0x172 [xfs]
[ 6123.110090]  [<ffffffffa02e087a>] xfs_buf_read+0x1a/0xc5 [xfs]
[ 6123.110169]  [<ffffffffa033bc4c>] xfs_trans_read_buf+0x35d/0x54d [xfs]
[ 6123.110258]  [<ffffffffa0326507>] xfs_imap_to_bp+0x45/0x1fe [xfs]
[ 6123.110345]  [<ffffffffa032873e>] xfs_iread+0x5b/0x195 [xfs]
[ 6123.110423]  [<ffffffffa02e5e59>] xfs_iget_cache_miss+0x5e/0x1cf [xfs]
[ 6123.110507]  [<ffffffffa02e64e3>] xfs_iget+0xf7/0x184 [xfs]
[ 6123.110591]  [<ffffffffa0325b36>] xfs_ialloc+0xc1/0x5ef [xfs]
[ 6123.110673]  [<ffffffffa02f3b4d>] ? kmem_zone_zalloc+0x1f/0x30 [xfs]
[ 6123.110757]  [<ffffffffa0336007>] ? xlog_grant_head_check+0x8f/0x101 
[xfs]
[ 6123.110842]  [<ffffffffa02f00e1>] xfs_dir_ialloc+0x9d/0x284 [xfs]
[ 6123.110926]  [<ffffffffa02f382e>] xfs_create+0x2f5/0x547 [xfs]
[ 6123.111006]  [<ffffffffa02ea4a2>] xfs_vn_mknod+0xcc/0x160 [xfs]
[ 6123.111086]  [<ffffffffa02ea557>] xfs_vn_create+0xe/0x10 [xfs]
[ 6123.111156]  [<ffffffff8111aaba>] vfs_create+0x67/0x89
[ 6123.111224]  [<ffffffff8111b942>] do_last+0x236/0x565
[ 6123.111292]  [<ffffffff8111c23a>] path_openat+0xcb/0x30c
[ 6123.111360]  [<ffffffff8111c56a>] do_filp_open+0x38/0x84
[ 6123.111429]  [<ffffffff8111a32f>] ? getname_flags+0x15b/0x1e2
[ 6123.111499]  [<ffffffff8112694b>] ? alloc_fd+0x6c/0xfc
[ 6123.111566]  [<ffffffff8110f26c>] do_sys_open+0x6f/0x101
[ 6123.111634]  [<ffffffff8110f32b>] sys_open+0x1c/0x1e
[ 6123.111702]  [<ffffffff813858f9>] system_call_fastpath+0x16/0x1b


See attached dmesg.txt

iostat:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   81.80    1.40    10.22     0.18   
256.00   531.91 5349.11  12.02 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   83.40    1.20    10.37     0.15   
254.56   525.35 4350.67  11.82 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   79.20    0.80     9.90     0.10   
256.00   530.14 3153.38  12.50 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   72.80    2.20     9.09     0.13   
251.72   546.08 8709.54  13.33 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   79.80    1.40     9.95     0.12   
254.07   535.35 5172.22  12.32 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   99.60    1.20    12.41     0.08   
253.86   529.49 3560.89   9.92 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   60.80    1.40     7.59     0.11   
253.77   527.21 6545.50  16.08 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   79.00    1.80     9.84     0.08   
251.51   547.93 6400.42  12.38 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   82.20    2.20    10.25     0.01   
248.93   536.42 7415.77  11.85 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   89.40    2.20    11.17     0.01   
249.90   525.68 7232.96  10.92 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   82.00    1.20    10.22     0.08   
253.37   541.60 4170.95  12.02 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   62.80    2.60     7.85     0.14   
250.31   541.15 11260.81  15.29 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   85.00    1.80    10.61     0.21   
255.47   529.36 6514.85  11.52 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   75.20    1.40     9.38     0.11   
253.72   535.68 5416.70  13.05 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   66.80    1.20     8.33     0.11   
254.19   546.68 5459.11  14.71 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   81.40    0.80    10.15     0.10   
255.38   540.62 3171.57  12.17 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   72.20    1.20     9.02     0.15   
255.74   535.26 5345.51  13.62 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   91.00    1.00    11.35     0.12   
255.44   531.02 3637.72  10.87 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   81.00    1.60    10.12     0.20   
255.96   524.44 6513.22  12.11 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   72.80    2.40     9.04     0.26   
253.24   543.25 9071.66  13.30 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   73.80    1.20     9.18     0.15   
254.63   539.20 5087.91  13.33 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   79.20    1.40     9.90     0.18   
256.00   532.38 5592.38  12.41 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.20   79.40    1.00     9.90     0.12   
255.36   528.07 4091.22  12.44 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   88.40    1.20    11.05     0.15   
256.00   528.13 4349.35  11.16 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00   69.60    2.40     8.65     0.23   
252.71   527.46 9334.37  13.89 100.00


> Which indicates the problem is almost certainly related to the
> storage configuration or drivers, not the filesystem....
We have seen issues with xfs in the past but ofcourse this might be 
related to the drivers.
The storage configuration shouldn't be the problem here, note also 
changed it to 1MB stripe size.
I'll continue to look if I am able to find the issue in the hardware.

>
> Cheers,
>
> Dave.

Thanks

Kind regards

Matthew
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg.txt
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120614/ebab7f2f/attachment-0001.txt>


More information about the xfs mailing list