xfs
[Top] [All Lists]

Re: automatic testing of cgroup writeback limiting

To: Lutz Vieweg <lvml@xxxxxx>
Subject: Re: automatic testing of cgroup writeback limiting
From: Tejun Heo <tj@xxxxxxxxxx>
Date: Thu, 3 Dec 2015 10:38:00 -0500
Cc: Martin Steigerwald <martin@xxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>, linux-fsdevel@xxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=8tBYQtzjHWS4vq8iX/RBOV5B48JDuRFffuvWpZ3YDEo=; b=iK7yJX02J/6MVzIJqqdTc+hs2uhnmP2RuFHrxq2qL+JFKYyxvUFpyW7aZ15F2k0gH6 +cI+TnCo7kCMnqRMD7yDApTPy73BnjyfAB1+F0U3J3V1+JvwEJcr7yNsm3KiSC86WUmG ti/c6JK4VZRFg6gC89CI3PNosUdYuNn1cor1BOeholGb1uHOvuDenloIDRLkkh00an6Q pbeyFU5K+vrmIbGON08mbYJOa3Cw7M0+luRaZPkCVoB17QMxPr+aEFFylhpWcSSYvhpY 3y/ccV4dIT97k8U/D/1S/0COt/S/oT4G9ZnCqGIB836lSN3EHmlo1kT8PnjqJtFIfT9A TSsA==
In-reply-to: <565F8A68.9040401@xxxxxx>
References: <5652F311.7000406@xxxxxx> <20151125213500.GK26718@dastard> <565B70F9.8060707@xxxxxx> <1711940.cDn6AztRgi@merkaba> <20151201163815.GB12922@xxxxxxxxxxxxxxx> <565F8A68.9040401@xxxxxx>
Sender: Tejun Heo <htejun@xxxxxxxxx>
User-agent: Mutt/1.5.24 (2015-08-30)
Hello, Lutz.

On Thu, Dec 03, 2015 at 01:18:48AM +0100, Lutz Vieweg wrote:
> On 12/01/2015 05:38 PM, Tejun Heo wrote:
> >As opposed to pages.  cgroup ownership is tracked per inode, not per
> >page, so if multiple cgroups write to the same inode at the same time,
> >some IOs will be incorrectly attributed.
> 
> I can't think of use cases where this could become a problem.
> If more than one user/container/VM is allowed to write to the
> same file at any one time, isolation is probably absent anyway ;-)

Yeap, that's why the trade-off was made.

> >cgroup ownership is per-inode.  IO throttling is per-device, so as
> >long as multiple filesystems map to the same device, they fall under
> >the same limit.
> 
> Good, that's why I assumed it useful to include a scenario with more
> than one filesystem on the same device into the test scenario, just
> to know whether there are unexpected issues if more than one filesystem
> utilizes the same underlying device.

Sure, I'd recommend including multiple writers on a single filesystem
case too as that exposes entanglement in metadata handling.  That
should expose problems in more places.

> I wrote of "evil" processes for simplicity, but 99 out of 100 times
> it's not intentional "evilness" that makes a process exhaust I/O
> bandwidth of some device shared with other users/containers/VMs, it's
> usually just bugs, inconsiderate programming or inappropriate use
> that makes one process write like crazy, making other
> users/containers/VMs suffer.

Right now, what cgroup writeback can control is well-behaving
workloads which aren't dominated by metadata writeback.  We still have
ways to go but it still is a huge leap compared to what we had before.

> Whereever strict service level guarantees are relevant, and
> applications require writing to storage, you currently cannot
> consolidate two or more applications onto the same physical host,
> even if they run under separate users/containers/VMs.

You're right.  It can't do isolation well enough for things like
strict service level guarantee.

> I understand there is no short or medium term solution that
> would allow to isolate processes writing to the same filesytem
> (because of the meta data writing), but is it correct to say
> that at least VMs, which do not allow the virtual guest to
> cause extensive meta data writes on the physical host, only
> writes into pre-allocated image files, can be safely isolated
> by the new "buffered write accounting"?

Sure, that or loop mounts.  Pure data accesses should be fairly well
isolated.

Thanks.

-- 
tejun

<Prev in Thread] Current Thread [Next in Thread>