xfs
[Top] [All Lists]

Re: XFS with nfs over rdma performance

To: Samuel Kvasnica <samuel.kvasnica@xxxxxxxxx>
Subject: Re: XFS with nfs over rdma performance
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 23 Jan 2013 10:11:18 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <50FF1789.2050107@xxxxxxxxx>
References: <50FE73A4.7020308@xxxxxxxxx> <20130122222218.GO2498@dastard> <50FF1789.2050107@xxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Jan 22, 2013 at 11:49:45PM +0100, Samuel Kvasnica wrote:
> On 01/22/2013 11:22 PM, Dave Chinner wrote:
> > On Tue, Jan 22, 2013 at 12:10:28PM +0100, Samuel Kvasnica wrote:
> >> Hi folks,
> >>
> >> I would like to hear about your experience with the performance of XFS when
> >> used on NFS client mounted using Infininband RDMA connection on 3.4.11
> >> kernel.
> >>
> >> What we observe is following:
> >>
> >> - we do have local RAID storage with 1.4GB/s read and write performance
> >> (both dd on raw partition
> >> and on xfs filesystem give basically the same performance)
> >>
> >> - we do have QDR Infiniband connection (Mellanox), the rdma benchmark
> >> gives 29Gbit/s throughput
> >>
> >> Now, both above points look pretty Ok but if we mount an nfs export
> >> using rdma on client we never get the 1.4GB/s throughput.
> > Of course not. The efficiency of the NFS client/server write
> > protocol makes it a theoretical impossibility....
> hmm, well, never seen that bottleneck on NFSv4 so far. Does this apply
> for NFSv4 as well (as I use NFSv4, not v3). ?

Yup, it's the same algorithm.

> >> Sporadically (and especially at the beginning) it comes up to some
> >> 1.3GB/s for short period but then it starts oscillating
> >> between 300MB/s and some 1.2GB/s with an average of 500-600MB/s. Even
> >> when using more clients in parallel,
> >> the net throughput behaves the same so it seems to be a server-side
> >> related bottleneck.
> >> We do not see any remarkable CPU load.
> > Sounds exactly like the usual NFS server/client writeback exclusion
> > behaviour. i.e. while there is a commit being processed by the
> > server, the client is not sending any new writes across the wire.
> > hence you get the behaviour:
> >
> >     client                  server
> >     send writes             cache writes
> >     send commit             fsync
> >                             start writeback
> >                             ......
> >                             finish writeback
> >                             send commit response
> >     send writes             cache writes
> >     send commit             fsync
> >                             start writeback
> >                             ......
> >                             finish writeback
> >                             send commit response
> >
> > and so you see binary throughput - either traffic comes across the
> > wire, or the data is being written to disk. They don't happen at the
> > same time.
> >
> > If it's not scaling with multiple clients, then that implies you
> > don't have enough nfsd's configured to handle the incoming IO
> > requests. This is a commmon enough NFS problem, you shoul dbe able
> > to find tips from google dating back for years on how to tune your
> > NFS setup to avoid these sorts of problems.
> Ok, this explanation makes partially sense, on the other hand we are
> speaking here about just 1.4GB/s which is pretty low
> load for a Xeon E5 to process and the oscillation between 300-600MB/s is
> even more low-end.
> 
> And: why do we see exactly the same for read, not only for write ?

A lack of NFSDs, or a too-small r/wsize, or not enough readahead on
the client and/or server, etc. 

> I looks to me like there is some too large buffer somewhere on the way
> which needs to be decreased as it is not needed at all.
> I cannot recall seeing this earlier on 2.6.2x kernels, unfortunately I
> cannot test that on new hardware.
> 
> >> The interesting point is, we use btrfs filesystem on server instead of
> >> xfs now (with otherwise same config) and we are getting consistent,
> >> steady throughput
> >> around 1.2-1.3GB/s.
> > Different fsync implementation, or the btrfs configuration is
> > ignoring commits (async export, by any chance?)
> Well, there is no explicit async mount option. With btrfs write gives

Sure - I'm talking about the server export option, not a client
mount option. And - seriously - you need to check that btrfs is
actually honouring commits correctly otherwise data integrity is
compromised (the async export option makes the server ignore
commits)....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>