[Top] [All Lists]

[RFC PATCH 0/6] xfs: truncate vs page fault IO exclusion

To: xfs@xxxxxxxxxxx
Subject: [RFC PATCH 0/6] xfs: truncate vs page fault IO exclusion
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 8 Jan 2015 09:25:37 +1100
Cc: linux-fsdevel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Hi folks,

This patch set is an attempt to address issues with XFS
truncate and hole-punch code from racing with page faults that enter
the IO path. This is traditionally deadlock prone due to the
inversion of filesystem IO path locks and the mmap_sem.

To avoid this issue, I have introduced a new "i_mmaplock" rwsem into
the XFS code similar to the IO lock, but this lock is only taken in
the mmap fault paths on entry into the filesystem (i.e. ->fault and

The concept is that if we invalidate the page cache over a range
after taking both the existing i_iolock and the new i_mmaplock, we
will have prevented any vector for repopulation of the page cache
over the invalidated range until one of the io and mmap locks has
been dropped. i.e. we can guarantee that both the syscall IO path
and page faults won't race with whatever operation the filesystem is

The introduction of a new lock is necessary to avoid deadlocks due
to mmap_sem entanglement. It has a defined lock order during page
faults of:

-> i_mmaplock (read)
   -> page lock
      -> i_ilock (get blocks)

This lock is then taken by any extent manipulation code in XFS in
addition to the IO lock which has the lock ordering of

i_iolock (write)
-> i_mmaplock (write)
   -> page lock (data writeback, page invalidation)
      -> i_lock (data writeback)
   -> i_lock (modification transaction)

Hence we have consistent lock ordering (which has been validated so
far by testing with lockdep enabled) for page fault IO vs
truncate, hole punch, extent shifts, etc.

This patchset passes xfstests and various benchmarks and stress
workloads, so the real question is now:

        What have I missed?

Comments, thoughts, flames?


GI: [RFC PATCH 1/6] xfs: introduce mmap/truncate lock

<Prev in Thread] Current Thread [Next in Thread>