[Top] [All Lists]

Re: Inode and dentry cache behavior

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Inode and dentry cache behavior
From: Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx>
Date: Wed, 29 Apr 2015 10:46:43 -0700
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150429013024.GU15810@dastard>
References: <CABppvi55C+vE7Ei8u=_ntC_heDQb4HwUcKom-_9hGkunk84Sfw@xxxxxxxxxxxxxx> <20150423224324.GM15810@dastard> <CABppvi7+Mu78FAM75YvJvekX2CHtKk4yeMrU7j35fvvWRb923Q@xxxxxxxxxxxxxx> <20150424061554.GN15810@dastard> <CABppvi6N6McmfLgAPcP9cxXxPrBMaD81UyeiVHWOaxrJisSN=g@xxxxxxxxxxxxxx> <20150429013024.GU15810@dastard>
Awesome!! Thanks Dave!

On Tue, Apr 28, 2015 at 6:30 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Apr 28, 2015 at 05:17:14PM -0700, Shrinand Javadekar wrote:
>> I will look at the hardware. But, I think, there's also a possible
>> software problem here.
>> If you look at the sequence of events, first a tmp file is created in
>> <mount-point>/tmp/tmp_blah. After a few writes, this file is renamed
>> to a different path in the filesystem.
>> rename(<mount-point>/tmp/tmp_blah,
>> <mount-point>/objects/1004/eef/deadbeef/foo.data).
>> The "tmp" directory above is created only once. Temp files get created
>> inside it and then get renamed. We wondered if this causes disk layout
>> issues resulting in slower performance. And then, we stumbled upon
>> this[1]. Someone complaining about the exact same problem.
> That's pretty braindead behaviour. That will screw performance and
> locality on any filesystem you do that on, not to mention age it
> extremely quickly.
> In the case of XFS, it forces allocation of all the inodes in one
> AG, rather than allowing XFs to distribute and balance inode
> allocation around the filesystem and keeping good
> directory/inode/data locality for all your data.
> Best way to do this is to create your tmp files using O_TMPFILE,
> with the source directory being the destination directory and then
> use linkat() rather than rename to make them visible in the
> directory.
>> One quick way to validate this was to delete the "tmp" directory
>> periodically and see what numbers we get. And they do. With 15 runs of
>> writing 80K objects in each run, our performance was dropping from
>> ~100MB/s to 30MB/s. With deleting the tmp directory after each run, we
>> saw the performance only drop from ~100MB/s to 80MB/s.
>>  The explanation in the link below says that when xfs does not find
>> free extents in an existing allocation group, it frees up the extents
>> by copying data from existing extents to their target allocation group
>> (which happens because of renames). Is that explanation still valid?
> No, it wasn't correct even back then.  XFS does not move data around
> once it has been allocated and is on disk. Indeed, rename() does not
> move data, it only modifies directory entries.
> The problem is that the locality of a new inode is determined by the
> parent inode, and so if all new inodes are created in the same
> directory, then they are all created in the same AG. If you have
> millions of inodes, then you have a btree will millions on inodes in
> it in one AG, and pretty much none in any other AG. Hence inode
> allocation, which has to search for free inodes in a btree
> containing millions of records, can be extremely IO and CPU
> intensive and therefore slow. And the larger the number of inodes,
> the slower it will go....
> Cheers,
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>