XFS Syncd
Shrinand Javadekar
shrinand at maginatics.com
Wed Jun 3 18:18:20 CDT 2015
Here you go!
- Kernel version
Linux my-host 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8
09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- xfsprogs version (xfs_repair -V)
xfs_repair version 3.1.9
- number of CPUs
16
- contents of /proc/meminfo
(attached).
- contents of /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=32965720k,nr_inodes=8241430,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=6595420k,mode=755 0 0
/dev/mapper/troll_root_vg-troll_root_lv / ext4 rw,relatime,data=ordered 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
/dev/mapper/troll_root_vg-troll_iso_lv /mnt/factory_reset ext4
rw,relatime,data=ordered 0 0
/dev/mapper/TrollGroup-TrollVolume /lvm ext4 rw,relatime,data=ordered 0 0
/dev/mapper/troll_root_vg-troll_log_lv /var/log ext4
rw,relatime,data=ordered 0 0
systemd /sys/fs/cgroup/systemd cgroup
rw,nosuid,nodev,noexec,relatime,name=systemd 0 0
/dev/mapper/35000c50062e6a12b-part2 /srv/node/r1 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a7eb-part2 /srv/node/r2 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a567-part2 /srv/node/r3 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea068f-part2 /srv/node/r4 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea066b-part2 /srv/node/r5 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e69ecf-part2 /srv/node/r6 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea067b-part2 /srv/node/r7 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a493-part2 /srv/node/r8 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
- contents of /proc/partitions
(attached)
RAID layout (hardware and/or software)
- No RAID
- LVM configuration
No LVM
- type of disks you are using
Rotational disks
- write cache status of drives
Disabled
- size of BBWC and mode it is running in
No BBWC
- xfs_info output on the filesystem in question
The following is the info on one of the disks. Other 7 disks are identical.
meta-data=/dev/mapper/35000c50062e6a7eb-part2 isize=256 agcount=64,
agsize=11446344 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=732566016, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=357698, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
- dmesg output showing all error messages and stack traces
No errors/stack traces.
- Workload causing the problem:
Openstack Swift. This is what it's doing:
1. A path like /srv/node/r1/objects/1024/eef/tmp already exists.
/srv/node/r1 is the mount point.
2. Creates a tmp file, say tmpfoo in the patch above. Path:
/srv/node/r1/objects/1024/eef/tmp/tmpfoo.
3. Issues a 256KB write into this file.
4. Issues an fsync on the file.
5. Closes this file.
6. Creates another directory named "deadbeef" inside "eef" if it
doesn't exist. Path /srv/node/r1/objects/1024/eef/deadbeef.
7. Moves file tmpfoo into the deadbeef directory using rename().
/srv/node/r1/objects/1023/eef/tmp/tmpfoo -->
/srv/node/r1/objects/1024/eef/deadbeef/foo.data
8. Does a readdir on /srv/node/r1/objects/1024/eef/deadbeef/
9. Iterates over all files obtained in #8 above. Usually #8 gives only one file.
There are 8 mounts for 8 disks: /srv/node/r1 through /srv/node/r8. The
above steps happen concurrently for all 8 disks.
- IOStat and vmstat output
(attached)
- Trace cmd report
Too big to attach. Here's a link:
https://www.dropbox.com/s/3xxe2chsv4fsrv8/trace_report.txt.zip?dl=0
- Perf top output.
Unfortunately, I couldn't run perf top. I keep getting the following error:
WARNING: perf not found for kernel 3.16.0-38
You may need to install the following packages for this specific kernel:
linux-tools-3.16.0-38-generic
linux-cloud-tools-3.16.0-38-generic
On Tue, Jun 2, 2015 at 8:57 PM, Dave Chinner <david at fromorbit.com> wrote:
> On Tue, Jun 02, 2015 at 11:43:30AM -0700, Shrinand Javadekar wrote:
>> Sorry, I dropped the ball on this one. We found some other problems
>> and I was busy fixing them.
>>
>> So, the xfsaild thread/s that kick in every 30 seconds are hitting us
>> pretty badly. Here's a graph with the latest tests I ran. We get great
>> throughput for ~18 seconds but then the world pretty much stops for
>> the next ~12 seconds or so making the final numbers look pretty bad.
>> This particular graph was plotted when the disk had ~150GB of data
>> (total capacity of 3TB).
>>
>> I am using a 3.16.0-38-generic kernel (upgraded since the time I wrote
>> the first email on this thread).
>>
>> I know fs.xfs.xfssyncd_centisecs controls this interval of 30 seconds.
>> What other options can I tune for making this work better?
>>
>> We have 8 disks. And unfortunately, all 8 disks are brought to a halt
>> every 30 seconds. Does XFS have options to only work on a subset of
>> disks at a time?
>>
>> Also, what does XFS exactly do every 30 seconds? If I understand it
>> right, metadata can be 3 locations:
>>
>> 1. Memory
>> 2. Log buffer on disk
>> 3. Final location on disk.
>>
>> Every 30 seconds, from where to where is this metadata being copied?
>> Are there ways to just disable this to avoid the stop-of-the-world
>> pauses (at the cost of lower but sustained performance)?
>
> I can't use this information to help you as you haven't presented
> any of the data I've asked for. We need to restart here and base
> everything on data and observation. i.e. first principles.
>
> Can you provide all of the information here:
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> and most especially the iostat and vmstat outputs while the problem
> is occurring. The workload description is not what is going wrong
> or what you think is happening, but a description of the application
> you are running that causes the problem.
>
> This will give me a baseline of your hardware, the software, the
> behaviour and the application you are running, and hence give me
> something to start with.
>
> I'd also like to see the output from perf top while the problem is
> occurring, so we might be able to see what is generating the IO...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
-------------- next part --------------
MemTotal: 65954164 kB
MemFree: 13959108 kB
MemAvailable: 32757820 kB
Buffers: 176636 kB
Cached: 6429784 kB
SwapCached: 103432 kB
Active: 27430416 kB
Inactive: 6313768 kB
Active(anon): 24825928 kB
Inactive(anon): 2326792 kB
Active(file): 2604488 kB
Inactive(file): 3986976 kB
Unevictable: 14108 kB
Mlocked: 14108 kB
SwapTotal: 16777212 kB
SwapFree: 16346352 kB
Dirty: 3992 kB
Writeback: 0 kB
AnonPages: 27093116 kB
Mapped: 80260 kB
Shmem: 9484 kB
Slab: 14808144 kB
SReclaimable: 12460664 kB
SUnreclaim: 2347480 kB
KernelStack: 27696 kB
PageTables: 96588 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 49754292 kB
Committed_AS: 41952748 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 543104 kB
VmallocChunk: 34359013376 kB
HardwareCorrupted: 0 kB
AnonHugePages: 22728704 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 1557220 kB
DirectMap2M: 59236352 kB
DirectMap1G: 8388608 kB
-------------- next part --------------
major minor #blocks name
11 0 1048575 sr0
8 48 2930266584 sdd
8 49 1024 sdd1
8 50 2930264064 sdd2
8 32 2930266584 sdc
8 33 1024 sdc1
8 34 2930264064 sdc2
8 64 2930266584 sde
8 65 1024 sde1
8 66 2930264064 sde2
8 96 2930266584 sdg
8 97 1024 sdg1
8 98 2930264064 sdg2
8 80 2930266584 sdf
8 81 1024 sdf1
8 82 2930264064 sdf2
8 112 2930266584 sdh
8 113 1024 sdh1
8 114 2930264064 sdh2
8 128 2930266584 sdi
8 129 1024 sdi1
8 130 2930264064 sdi2
8 144 2930266584 sdj
8 145 1024 sdj1
8 146 2930264064 sdj2
8 160 2930266584 sdk
8 161 1024 sdk1
8 162 2930264064 sdk2
8 176 2930266584 sdl
8 177 1024 sdl1
8 178 2930264064 sdl2
8 192 2930266584 sdm
8 193 1024 sdm1
8 194 2930264064 sdm2
8 208 2930266584 sdn
8 209 1024 sdn1
8 210 2930264064 sdn2
9 127 2930132800 md127
9 126 2930132800 md126
252 0 1465065472 dm-0
252 1 52428800 dm-1
252 2 5242880 dm-2
252 3 16777216 dm-3
252 4 3145728 dm-4
252 6 2930266584 dm-6
252 5 2930266584 dm-5
252 7 2930266584 dm-7
252 8 2930266584 dm-8
252 9 1024 dm-9
252 10 1024 dm-10
252 11 1024 dm-11
252 12 2930264064 dm-12
252 13 1024 dm-13
252 14 2930264064 dm-14
252 15 2930264064 dm-15
252 16 2930264064 dm-16
252 17 2930266584 dm-17
252 18 2930266584 dm-18
252 19 1024 dm-19
252 20 1024 dm-20
252 21 2930264064 dm-21
252 22 2930266584 dm-22
252 24 1024 dm-24
252 25 2930264064 dm-25
252 23 2930264064 dm-23
252 26 2930266584 dm-26
252 27 1024 dm-27
252 28 2930264064 dm-28
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vmstat.out
Type: application/octet-stream
Size: 28242 bytes
Desc: not available
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150603/b33b273a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iostat.out
Type: application/octet-stream
Size: 256574 bytes
Desc: not available
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150603/b33b273a/attachment-0003.obj>
More information about the xfs
mailing list