[Top] [All Lists]

Re: XFS with nfs over rdma performance

To: Samuel Kvasnica <samuel.kvasnica@xxxxxxxxx>
Subject: Re: XFS with nfs over rdma performance
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 23 Jan 2013 09:22:18 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <50FE73A4.7020308@xxxxxxxxx>
References: <50FE73A4.7020308@xxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Jan 22, 2013 at 12:10:28PM +0100, Samuel Kvasnica wrote:
> Hi folks,
> I would like to hear about your experience with the performance of XFS when
> used on NFS client mounted using Infininband RDMA connection on 3.4.11
> kernel.
> What we observe is following:
> - we do have local RAID storage with 1.4GB/s read and write performance
> (both dd on raw partition
> and on xfs filesystem give basically the same performance)
> - we do have QDR Infiniband connection (Mellanox), the rdma benchmark
> gives 29Gbit/s throughput
> Now, both above points look pretty Ok but if we mount an nfs export
> using rdma on client we never get the 1.4GB/s throughput.

Of course not. The efficiency of the NFS client/server write
protocol makes it a theoretical impossibility....

> Sporadically (and especially at the beginning) it comes up to some
> 1.3GB/s for short period but then it starts oscillating
> between 300MB/s and some 1.2GB/s with an average of 500-600MB/s. Even
> when using more clients in parallel,
> the net throughput behaves the same so it seems to be a server-side
> related bottleneck.
> We do not see any remarkable CPU load.

Sounds exactly like the usual NFS server/client writeback exclusion
behaviour. i.e. while there is a commit being processed by the
server, the client is not sending any new writes across the wire.
hence you get the behaviour:

        client                  server
        send writes             cache writes
        send commit             fsync
                                start writeback
                                finish writeback
                                send commit response
        send writes             cache writes
        send commit             fsync
                                start writeback
                                finish writeback
                                send commit response

and so you see binary throughput - either traffic comes across the
wire, or the data is being written to disk. They don't happen at the
same time.

If it's not scaling with multiple clients, then that implies you
don't have enough nfsd's configured to handle the incoming IO
requests. This is a commmon enough NFS problem, you shoul dbe able
to find tips from google dating back for years on how to tune your
NFS setup to avoid these sorts of problems.

> The interesting point is, we use btrfs filesystem on server instead of
> xfs now (with otherwise same config) and we are getting consistent,
> steady throughput
> around 1.2-1.3GB/s.

Different fsync implementation, or the btrfs configuration is
ignoring commits (async export, by any chance?)


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>