xfs
[Top] [All Lists]

Re: Vanilla 3.0.78

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Vanilla 3.0.78
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Mon, 29 Jul 2013 13:02:45 +0200
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, "xfs-masters@xxxxxxxxxxx" <xfs-masters@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130729100134.GH13468@dastard>
References: <51F61C39.6050200@xxxxxxxxxxxx> <20130729082228.GG13468@dastard> <51F62878.4090408@xxxxxxxxxxxx> <20130729100134.GH13468@dastard>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7
Am 29.07.2013 12:01, schrieb Dave Chinner:
> On Mon, Jul 29, 2013 at 10:31:52AM +0200, Stefan Priebe - Profihost AG wrote:
>> Am 29.07.2013 10:22, schrieb Dave Chinner:
>>> On Mon, Jul 29, 2013 at 09:39:37AM +0200, Stefan Priebe - Profihost AG 
>>> wrote:
>>>> Hi,
>>>>
>>>> while running 3.0.78 and doing heavy rsync tasks on a raid 50 i'm gettig
>>>> these call traces:
>>>
>>> Judging by the timestamps the  problem clears and the system keeps
>>> running?
>>
>> Yes.
>>
>>> If so, the problem is likely to be a combination of contention on a
>>> specific AG for allocation and slow IO. Given it is RAID 50, it's
>>> probably really slow IO, and probably lots of threads wanting the
>>> lock and queuing up on it.
>>>
>>> What's 'iostat -m -x -d 5' look like when these messages are dumped
>>> out?
>>
>> Don't have that but some nagios stats. There were 1000 iop/s and 8MB/s.
> 
> Yup, that sounds like it was doing lots of small random IOs and
> hence was IO bound...
> 
>> But i can reduce the tasks done in parallel if this is the problem.
> 
> Try and find out what the average IO times were when the messages
> are being emitted. If that's up in the seconds, then it's a good
> chance you are simply throwing too many small IOs at your storage.

Thanks!

Stefan

<Prev in Thread] Current Thread [Next in Thread>