xfs
[Top] [All Lists]

Re: XFS hangs and freezes with LSI 9265-8i controller on high i/o

To: <xfs@xxxxxxxxxxx>
Subject: Re: XFS hangs and freezes with LSI 9265-8i controller on high i/o
From: Matthew Whittaker-Williams <matthew@xxxxxxxxx>
Date: Thu, 14 Jun 2012 16:31:15 +0200
In-reply-to: <20120614000411.GY22848@dastard>
References: <4FD66513.2000108@xxxxxxxxx> <20120612011812.GK22848@dastard> <4FD766A7.9030908@xxxxxxxxx> <20120613011950.GN22848@dastard> <4FD8552C.4090208@xxxxxxxxx> <20120614000411.GY22848@dastard>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:9.0) Gecko/20111220 Thunderbird/9.0
On 6/14/12 2:04 AM, Dave Chinner wrote:
On Wed, Jun 13, 2012 at 10:54:04AM +0200, Matthew Whittaker-Williams wrote:
On 6/13/12 3:19 AM, Dave Chinner wrote:
With the valid stack traces, I see that it isn't related to the log,
though.
Ah ok, we are triggering a new issue?
No, your system appears to be stalling waiting for IO completion.

Yes, this is indeed what we experience.
This is a fluke, we are running several new systems and this is just
one of the new servers.
Which indeed has a wrong stripe set, this should be 1MB.
We actually found stripe size set of 1MB to give better performance
overall than 64/256/512
So if you fix that, does the problem go away?

No , unfortunately not.

Currently with 1MB stripe set and:

root@sd70:~# xfs_info /dev/sda
meta-data=/dev/sda isize=256 agcount=41, agsize=268435200 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=10741350400, imaxpct=5
         =                       sunit=256    swidth=5632 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

It still stalled out with the same error.

I understand that RAID 10 is better for performance for reads on
small files sets.  But with raid 10 we of course loose a lot of
disk space compared to RAID 6.  Side note to this we have been
running RAID 6 for years now without any issues.
but have you been running 24 disk RAID6 volumes? With RAID5/6, the
number of disks of the volume really matters - for small write IOs,
the more disks in the RAID6 volume, the slower it will be...
Yes we have, but haven't found large issues with performance in the past on a disk span of 24. And most raid controllers don't support more than 32 disk in a raid array so we kept it at 24 disks per array with large arrays.


In the past we did tune our xfs filesystem with switches like
sunit and swidth.  But back then we couldn't see much peformance
difference between using:

mkfs.xfs -f -L P.01 -l lazy-count=1 -d su=1m,sw=22 /dev/sda

and

mkfs.xfs -f -L P.01 -l lazy-count=1 /dev/sda
You won't see much difference with the BBWC enabled. It does affect
how files and inodes are allocated, though, so the aging
characteristics of the filesystem will be better for an aligned
filesystem. i.e. you might not notice the performance now, but after
a coupl eof years in production you probably will...

We haven't seen this impact just yet and are doing about 120K number of reads of sector reads.
But this is probably not really high compared to our smaller arrays.


xfs_info from a system that shows no problems with an H800
Controller from dell ( same chipset as the LSI controllers )

I went and tried the H800 Controller with a single array of the new spool and unfortunately this also hung.

[ 6123.108138] INFO: task diablo:11963 blocked for more than 120 seconds.
[ 6123.108208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 6123.108297] diablo D ffff8805ec4e9068 0 11963 11210 0x00000000 [ 6123.108444] ffff88046fd315f8 0000000000000082 ffff8800dd1407a8 ffff88025a8f5000 [ 6123.108681] 000000000025a8f5 ffff88046fd30010 0000000000013340 0000000000013340 [ 6123.108935] ffff88046fd31fd8 0000000000013340 ffff88046fd31fd8 0000000000013340
[ 6123.109171] Call Trace:
[ 6123.109276]  [<ffffffff8137e3bf>] schedule+0x5f/0x61
[ 6123.109343]  [<ffffffff8137d0a5>] schedule_timeout+0x31/0xde
[ 6123.109415]  [<ffffffff810d0efe>] ? __probe_kernel_read+0x36/0x55
[ 6123.109487]  [<ffffffff8110b76d>] ? kmem_cache_alloc+0x61/0x118
[ 6123.109557]  [<ffffffff8137da22>] __down_common+0x96/0xe4
[ 6123.109634]  [<ffffffffa02e04b8>] ? _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.109704]  [<ffffffff8137dacf>] __down+0x18/0x1a
[ 6123.109771]  [<ffffffff81060428>] down+0x28/0x38
[ 6123.109844]  [<ffffffffa02df786>] xfs_buf_lock+0x6f/0xc0 [xfs]
[ 6123.109922]  [<ffffffffa02e04b8>] _xfs_buf_find+0x1ea/0x299 [xfs]
[ 6123.110000]  [<ffffffffa02e0713>] xfs_buf_get+0x25/0x172 [xfs]
[ 6123.110090]  [<ffffffffa02e087a>] xfs_buf_read+0x1a/0xc5 [xfs]
[ 6123.110169]  [<ffffffffa033bc4c>] xfs_trans_read_buf+0x35d/0x54d [xfs]
[ 6123.110258]  [<ffffffffa0326507>] xfs_imap_to_bp+0x45/0x1fe [xfs]
[ 6123.110345]  [<ffffffffa032873e>] xfs_iread+0x5b/0x195 [xfs]
[ 6123.110423]  [<ffffffffa02e5e59>] xfs_iget_cache_miss+0x5e/0x1cf [xfs]
[ 6123.110507]  [<ffffffffa02e64e3>] xfs_iget+0xf7/0x184 [xfs]
[ 6123.110591]  [<ffffffffa0325b36>] xfs_ialloc+0xc1/0x5ef [xfs]
[ 6123.110673]  [<ffffffffa02f3b4d>] ? kmem_zone_zalloc+0x1f/0x30 [xfs]
[ 6123.110757] [<ffffffffa0336007>] ? xlog_grant_head_check+0x8f/0x101 [xfs]
[ 6123.110842]  [<ffffffffa02f00e1>] xfs_dir_ialloc+0x9d/0x284 [xfs]
[ 6123.110926]  [<ffffffffa02f382e>] xfs_create+0x2f5/0x547 [xfs]
[ 6123.111006]  [<ffffffffa02ea4a2>] xfs_vn_mknod+0xcc/0x160 [xfs]
[ 6123.111086]  [<ffffffffa02ea557>] xfs_vn_create+0xe/0x10 [xfs]
[ 6123.111156]  [<ffffffff8111aaba>] vfs_create+0x67/0x89
[ 6123.111224]  [<ffffffff8111b942>] do_last+0x236/0x565
[ 6123.111292]  [<ffffffff8111c23a>] path_openat+0xcb/0x30c
[ 6123.111360]  [<ffffffff8111c56a>] do_filp_open+0x38/0x84
[ 6123.111429]  [<ffffffff8111a32f>] ? getname_flags+0x15b/0x1e2
[ 6123.111499]  [<ffffffff8112694b>] ? alloc_fd+0x6c/0xfc
[ 6123.111566]  [<ffffffff8110f26c>] do_sys_open+0x6f/0x101
[ 6123.111634]  [<ffffffff8110f32b>] sys_open+0x1c/0x1e
[ 6123.111702]  [<ffffffff813858f9>] system_call_fastpath+0x16/0x1b


See attached dmesg.txt

iostat:

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 81.80 1.40 10.22 0.18 256.00 531.91 5349.11 12.02 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 83.40 1.20 10.37 0.15 254.56 525.35 4350.67 11.82 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 79.20 0.80 9.90 0.10 256.00 530.14 3153.38 12.50 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 72.80 2.20 9.09 0.13 251.72 546.08 8709.54 13.33 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 79.80 1.40 9.95 0.12 254.07 535.35 5172.22 12.32 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 99.60 1.20 12.41 0.08 253.86 529.49 3560.89 9.92 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 60.80 1.40 7.59 0.11 253.77 527.21 6545.50 16.08 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 79.00 1.80 9.84 0.08 251.51 547.93 6400.42 12.38 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 82.20 2.20 10.25 0.01 248.93 536.42 7415.77 11.85 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 89.40 2.20 11.17 0.01 249.90 525.68 7232.96 10.92 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 82.00 1.20 10.22 0.08 253.37 541.60 4170.95 12.02 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 62.80 2.60 7.85 0.14 250.31 541.15 11260.81 15.29 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 85.00 1.80 10.61 0.21 255.47 529.36 6514.85 11.52 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 75.20 1.40 9.38 0.11 253.72 535.68 5416.70 13.05 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 66.80 1.20 8.33 0.11 254.19 546.68 5459.11 14.71 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 81.40 0.80 10.15 0.10 255.38 540.62 3171.57 12.17 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 72.20 1.20 9.02 0.15 255.74 535.26 5345.51 13.62 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 91.00 1.00 11.35 0.12 255.44 531.02 3637.72 10.87 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 81.00 1.60 10.12 0.20 255.96 524.44 6513.22 12.11 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 72.80 2.40 9.04 0.26 253.24 543.25 9071.66 13.30 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 73.80 1.20 9.18 0.15 254.63 539.20 5087.91 13.33 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 79.20 1.40 9.90 0.18 256.00 532.38 5592.38 12.41 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.20 79.40 1.00 9.90 0.12 255.36 528.07 4091.22 12.44 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 88.40 1.20 11.05 0.15 256.00 528.13 4349.35 11.16 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 69.60 2.40 8.65 0.23 252.71 527.46 9334.37 13.89 100.00


Which indicates the problem is almost certainly related to the
storage configuration or drivers, not the filesystem....
We have seen issues with xfs in the past but ofcourse this might be related to the drivers. The storage configuration shouldn't be the problem here, note also changed it to 1MB stripe size.
I'll continue to look if I am able to find the issue in the hardware.


Cheers,

Dave.

Thanks

Kind regards

Matthew

Attachment: dmesg.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>