[Top] [All Lists]

XFS Syncd

To: xfs@xxxxxxxxxxx
Subject: XFS Syncd
From: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Date: Thu, 9 Apr 2015 21:23:55 -0700
Delivered-to: xfs@xxxxxxxxxxx

I am using the XFS filesystem as the backend for Openstack Swift. On
my setup, I have a single server with 8 data disks; each of them is
one XFS volume.

I am running a workload which does many concurrent writes of 256K
files into the XFS volumes. Openstack Swift takes care of evenly
distributing the data across all the 8 disks. It also uses extended
attributes for each of the files it writes. It also explicitly does a
fsync() at the end for each file.

I am seeing a behavior where the system pretty much stalls for ~5
seconds after every 30 seconds. I see that the # of ios goes up but
the actual write bandwidth during this 5 second period is very low
(see attached images). After a fair bit of investigation, we've
narrowed down the problem to XFS's syncd (fs.xfs.xfssyncd_centisecs).
This runs at a default interval of 30 seconds.

I have a couple of questions:

1. If all file writes are done with an fsync() at the end, what is
xfssyncd doing for several seconds?
2. How does xfssyncd actually work across several disks? Currently, it
seems that when it runs, it pretty much stalls the entire system.
3. I see that fs.xfs.xfssyncd_centisecs is the parameter to tune the
interval. But that doesn't give us much. Increasing the interval
simply postpones the work. When xfssyncd runs, it takes more time. Are
there any other options I can try to make xfssyncd not stall the
system when it runs?

Thanks in advance.

P.S. I'm not a member of this list. Direct replies appreciated.

Attachment: write_throughput6.png
Description: PNG image

Attachment: read_write_requests_complete_rate6.png
Description: PNG image

<Prev in Thread] Current Thread [Next in Thread>