xfs
[Top] [All Lists]

Re: XFS with nfs over rdma performance

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS with nfs over rdma performance
From: Samuel Kvasnica <samuel.kvasnica@xxxxxxxxx>
Date: Tue, 22 Jan 2013 23:49:45 +0100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130122222218.GO2498@dastard>
References: <50FE73A4.7020308@xxxxxxxxx> <20130122222218.GO2498@dastard>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130105 Thunderbird/17.0.2
On 01/22/2013 11:22 PM, Dave Chinner wrote:
> On Tue, Jan 22, 2013 at 12:10:28PM +0100, Samuel Kvasnica wrote:
>> Hi folks,
>>
>> I would like to hear about your experience with the performance of XFS when
>> used on NFS client mounted using Infininband RDMA connection on 3.4.11
>> kernel.
>>
>> What we observe is following:
>>
>> - we do have local RAID storage with 1.4GB/s read and write performance
>> (both dd on raw partition
>> and on xfs filesystem give basically the same performance)
>>
>> - we do have QDR Infiniband connection (Mellanox), the rdma benchmark
>> gives 29Gbit/s throughput
>>
>> Now, both above points look pretty Ok but if we mount an nfs export
>> using rdma on client we never get the 1.4GB/s throughput.
> Of course not. The efficiency of the NFS client/server write
> protocol makes it a theoretical impossibility....
hmm, well, never seen that bottleneck on NFSv4 so far. Does this apply
for NFSv4 as well (as I use NFSv4, not v3). ?
>
>> Sporadically (and especially at the beginning) it comes up to some
>> 1.3GB/s for short period but then it starts oscillating
>> between 300MB/s and some 1.2GB/s with an average of 500-600MB/s. Even
>> when using more clients in parallel,
>> the net throughput behaves the same so it seems to be a server-side
>> related bottleneck.
>> We do not see any remarkable CPU load.
> Sounds exactly like the usual NFS server/client writeback exclusion
> behaviour. i.e. while there is a commit being processed by the
> server, the client is not sending any new writes across the wire.
> hence you get the behaviour:
>
>       client                  server
>       send writes             cache writes
>       send commit             fsync
>                               start writeback
>                               ......
>                               finish writeback
>                               send commit response
>       send writes             cache writes
>       send commit             fsync
>                               start writeback
>                               ......
>                               finish writeback
>                               send commit response
>
> and so you see binary throughput - either traffic comes across the
> wire, or the data is being written to disk. They don't happen at the
> same time.
>
> If it's not scaling with multiple clients, then that implies you
> don't have enough nfsd's configured to handle the incoming IO
> requests. This is a commmon enough NFS problem, you shoul dbe able
> to find tips from google dating back for years on how to tune your
> NFS setup to avoid these sorts of problems.
Ok, this explanation makes partially sense, on the other hand we are
speaking here about just 1.4GB/s which is pretty low
load for a Xeon E5 to process and the oscillation between 300-600MB/s is
even more low-end.

And: why do we see exactly the same for read, not only for write ?

I looks to me like there is some too large buffer somewhere on the way
which needs to be decreased as it is not needed at all.
I cannot recall seeing this earlier on 2.6.2x kernels, unfortunately I
cannot test that on new hardware.

>> The interesting point is, we use btrfs filesystem on server instead of
>> xfs now (with otherwise same config) and we are getting consistent,
>> steady throughput
>> around 1.2-1.3GB/s.
> Different fsync implementation, or the btrfs configuration is
> ignoring commits (async export, by any chance?)
Well, there is no explicit async mount option. With btrfs write gives
full BW, however read is some
10% worse but still acceptable

cheers,

Sam

<Prev in Thread] Current Thread [Next in Thread>