xfs
[Top] [All Lists]

Re: XFS Syncd

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS Syncd
From: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Date: Wed, 3 Jun 2015 16:18:20 -0700
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150603035719.GO24666@dastard>
References: <CABppvi6pC4qEFZUTesbT0v5agbd67MP4dEoUbaVFwEyCv4h21g@xxxxxxxxxxxxxx> <20150410063210.GJ15810@dastard> <CABppvi4e_xEMY7tDHtEo6miZcN2AZ-mFMHXKaUS0hfpx6AMt0w@xxxxxxxxxxxxxx> <20150410072100.GL13731@dastard> <CABppvi437S9e+DEFOi6ECPu8=AnEK0V=5rRmU5Of1_XtWiQbfA@xxxxxxxxxxxxxx> <20150410131245.GK15810@dastard> <CABppvi68E6n+pr6X8TMOBhicVB4mrJbyyvm89r56rRVqSjf1Zg@xxxxxxxxxxxxxx> <20150603035719.GO24666@dastard>
Here you go!

- Kernel version
Linux my-host 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8
09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

- xfsprogs version (xfs_repair -V)
xfs_repair version 3.1.9

- number of CPUs
16

- contents of /proc/meminfo
(attached).

- contents of /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=32965720k,nr_inodes=8241430,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=6595420k,mode=755 0 0
/dev/mapper/troll_root_vg-troll_root_lv / ext4 rw,relatime,data=ordered 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
/dev/mapper/troll_root_vg-troll_iso_lv /mnt/factory_reset ext4
rw,relatime,data=ordered 0 0
/dev/mapper/TrollGroup-TrollVolume /lvm ext4 rw,relatime,data=ordered 0 0
/dev/mapper/troll_root_vg-troll_log_lv /var/log ext4
rw,relatime,data=ordered 0 0
systemd /sys/fs/cgroup/systemd cgroup
rw,nosuid,nodev,noexec,relatime,name=systemd 0 0
/dev/mapper/35000c50062e6a12b-part2 /srv/node/r1 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a7eb-part2 /srv/node/r2 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a567-part2 /srv/node/r3 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea068f-part2 /srv/node/r4 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea066b-part2 /srv/node/r5 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e69ecf-part2 /srv/node/r6 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea067b-part2 /srv/node/r7 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a493-part2 /srv/node/r8 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0


- contents of /proc/partitions
(attached)

RAID layout (hardware and/or software)
- No RAID

- LVM configuration
No LVM

- type of disks you are using
Rotational disks

- write cache status of drives
Disabled

- size of BBWC and mode it is running in
No BBWC

- xfs_info output on the filesystem in question

The following is the info on one of the disks. Other 7 disks are identical.

meta-data=/dev/mapper/35000c50062e6a7eb-part2 isize=256    agcount=64,
agsize=11446344 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=732566016, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=357698, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

- dmesg output showing all error messages and stack traces
No errors/stack traces.

- Workload causing the problem:

Openstack Swift. This is what it's doing:

1. A path like /srv/node/r1/objects/1024/eef/tmp already exists.
/srv/node/r1 is the mount point.
2. Creates a tmp file, say tmpfoo in the patch above. Path:
/srv/node/r1/objects/1024/eef/tmp/tmpfoo.
3. Issues a 256KB write into this file.
4. Issues an fsync on the file.
5. Closes this file.
6. Creates another directory named "deadbeef" inside "eef" if it
doesn't exist. Path /srv/node/r1/objects/1024/eef/deadbeef.
7. Moves file tmpfoo into the deadbeef directory using rename().
/srv/node/r1/objects/1023/eef/tmp/tmpfoo -->
/srv/node/r1/objects/1024/eef/deadbeef/foo.data
8. Does a readdir on /srv/node/r1/objects/1024/eef/deadbeef/
9. Iterates over all files obtained in #8 above. Usually #8 gives only one file.

There are 8 mounts for 8 disks: /srv/node/r1 through /srv/node/r8. The
above steps happen concurrently for all 8 disks.

- IOStat and vmstat output
(attached)

- Trace cmd report
Too big to attach. Here's a link:
https://www.dropbox.com/s/3xxe2chsv4fsrv8/trace_report.txt.zip?dl=0

- Perf top output.
Unfortunately, I couldn't run perf top. I keep getting the following error:

WARNING: perf not found for kernel 3.16.0-38

  You may need to install the following packages for this specific kernel:
    linux-tools-3.16.0-38-generic
    linux-cloud-tools-3.16.0-38-generic

On Tue, Jun 2, 2015 at 8:57 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Jun 02, 2015 at 11:43:30AM -0700, Shrinand Javadekar wrote:
>> Sorry, I dropped the ball on this one. We found some other problems
>> and I was busy fixing them.
>>
>> So, the xfsaild thread/s that kick in every 30 seconds are hitting us
>> pretty badly. Here's a graph with the latest tests I ran. We get great
>> throughput for ~18 seconds but then the world pretty much stops for
>> the next ~12 seconds or so making the final numbers look pretty bad.
>> This particular graph was plotted when the disk had ~150GB of data
>> (total capacity of 3TB).
>>
>> I am using a 3.16.0-38-generic kernel (upgraded since the time I wrote
>> the first email on this thread).
>>
>> I know fs.xfs.xfssyncd_centisecs controls this interval of 30 seconds.
>> What other options can I tune for making this work better?
>>
>> We have 8 disks. And unfortunately, all 8 disks are brought to a halt
>> every 30 seconds. Does XFS have options to only work on a subset of
>> disks at a time?
>>
>> Also, what does XFS exactly do every 30 seconds? If I understand it
>> right, metadata can be 3 locations:
>>
>> 1. Memory
>> 2. Log buffer on disk
>> 3. Final location on disk.
>>
>> Every 30 seconds, from where to where is this metadata being copied?
>> Are there ways to just disable this to avoid the stop-of-the-world
>> pauses (at the cost of lower but sustained performance)?
>
> I can't use this information to help you as you haven't presented
> any of the data I've asked for.  We need to restart here and base
> everything on data and observation. i.e. first principles.
>
> Can you provide all of the information here:
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> and most especially the iostat and vmstat outputs while the problem
> is occurring. The workload description is not what is going wrong
> or what you think is happening, but a description of the application
> you are running that causes the problem.
>
> This will give me a baseline of your hardware, the software, the
> behaviour and the application you are running, and hence give me
> something to start with.
>
> I'd also like to see the output from perf top while the problem is
> occurring, so we might be able to see what is generating the IO...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Attachment: mem_info.txt
Description: Text document

Attachment: partitions.txt
Description: Text document

Attachment: vmstat.out
Description: Binary data

Attachment: iostat.out
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>