On Wed, Sep 29, 2010 at 10:31:22AM -0500, Shawn Bohrer wrote:
> On Wed, Sep 22, 2010 at 09:15:31AM +1000, Dave Chinner wrote:
> > On Tue, Sep 21, 2010 at 01:05:41PM -0500, Shawn Bohrer wrote:
> > > So I have no idea what I'm looking at but here is the output for the
> > > above numbers (duplicates removed):
> > >
> > > xfs_db -r -c "daddr 474487328" -c "print" /dev/sda5
> > > 000: 424d4150 0000007f 00000000 07000082 00000000 07000092 00000000
> > > 0039a000
> > ^^^^^^^^
> > B M A P
> > #define XFS_BMAP_MAGIC 0x424d4150 /* 'BMAP' */
> > So these are inode extent btree blocks your application is getting
> > stuck on. These only get written back as a result of either log
> > pressure (i.e. tail pushing) or by the xfsbufd based on age. They
> > aren't actually flushed with the data because changes are logged.
> > IOWs, the writeback of the bmap btree blocks is asynchronous to any
> > operation that modifies them, so there's no direct connection
> > between modification and writeback.
> > I'm not sure that there is anything that can really be done to
> > prevent this. If the cause of writeback is age-based flushing on the
> > metadata buffers, you could try increasing the xfsbufd writeback age
> > so that only log pressure will cause them to be flushed.
> So setting fs.xfs.age_buffer_centisecs to 720000 does seem to help,
> but what are the consequences (if any) of doing this?
It means that metadata will stay active in the log for longer. That
means it is likely that recovery will take longer if your system
crashes. It also means that there may be more latency on transaction
reservation as tail-pushing the log is much more likely to occur
because metadata is not being pushed out by background flushing.