[Top] [All Lists]

Re: XFS Syncd

To: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Subject: Re: XFS Syncd
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 3 Jun 2015 13:57:19 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CABppvi68E6n+pr6X8TMOBhicVB4mrJbyyvm89r56rRVqSjf1Zg@xxxxxxxxxxxxxx>
References: <CABppvi6pC4qEFZUTesbT0v5agbd67MP4dEoUbaVFwEyCv4h21g@xxxxxxxxxxxxxx> <20150410063210.GJ15810@dastard> <CABppvi4e_xEMY7tDHtEo6miZcN2AZ-mFMHXKaUS0hfpx6AMt0w@xxxxxxxxxxxxxx> <20150410072100.GL13731@dastard> <CABppvi437S9e+DEFOi6ECPu8=AnEK0V=5rRmU5Of1_XtWiQbfA@xxxxxxxxxxxxxx> <20150410131245.GK15810@dastard> <CABppvi68E6n+pr6X8TMOBhicVB4mrJbyyvm89r56rRVqSjf1Zg@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Jun 02, 2015 at 11:43:30AM -0700, Shrinand Javadekar wrote:
> Sorry, I dropped the ball on this one. We found some other problems
> and I was busy fixing them.
> So, the xfsaild thread/s that kick in every 30 seconds are hitting us
> pretty badly. Here's a graph with the latest tests I ran. We get great
> throughput for ~18 seconds but then the world pretty much stops for
> the next ~12 seconds or so making the final numbers look pretty bad.
> This particular graph was plotted when the disk had ~150GB of data
> (total capacity of 3TB).
> I am using a 3.16.0-38-generic kernel (upgraded since the time I wrote
> the first email on this thread).
> I know fs.xfs.xfssyncd_centisecs controls this interval of 30 seconds.
> What other options can I tune for making this work better?
> We have 8 disks. And unfortunately, all 8 disks are brought to a halt
> every 30 seconds. Does XFS have options to only work on a subset of
> disks at a time?
> Also, what does XFS exactly do every 30 seconds? If I understand it
> right, metadata can be 3 locations:
> 1. Memory
> 2. Log buffer on disk
> 3. Final location on disk.
> Every 30 seconds, from where to where is this metadata being copied?
> Are there ways to just disable this to avoid the stop-of-the-world
> pauses (at the cost of lower but sustained performance)?

I can't use this information to help you as you haven't presented
any of the data I've asked for.  We need to restart here and base
everything on data and observation. i.e. first principles.

Can you provide all of the information here:


and most especially the iostat and vmstat outputs while the problem
is occurring. The workload description is not what is going wrong
or what you think is happening, but a description of the application
you are running that causes the problem.

This will give me a baseline of your hardware, the software, the
behaviour and the application you are running, and hence give me
something to start with.

I'd also like to see the output from perf top while the problem is
occurring, so we might be able to see what is generating the IO...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>