[Top] [All Lists]

Re: XFS realtime O_DIRECT failures

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS realtime O_DIRECT failures
From: Alan Cook <acook@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 9 Nov 2011 17:52:15 -0500
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <20111109223314.GQ5534@dastard>
References: <loom.20111108T180925-222@xxxxxxxxxxxxxx> <20111109080133.GB20604@xxxxxxxxxxxxx> <CAGedfzmcmfLXhBEzm9yhpRQTf-7dnMenXqe0FABAzJgP0rxSUA@xxxxxxxxxxxxxx> <20111109223314.GQ5534@dastard>
On Wed, Nov 9, 2011 at 5:33 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> I'm not sure from that description just why the realtime volume adds
> any benefit to your workflow. Separation of data and metadata is
> does not provide you with data compression, so you must be doing
> something different with the real time device to acheive
> compression. Any details on that aspect of your setup?

The compression is done via hardware that sits between the block layer
and the actual storage device (in this case it is a solid state
drive).  Having both the data and meta data reside on the same device
creates a problem, as the block layer has no idea whether it has data
or meta data, and so will compress the meta data along with the
regular data, which is very bad.  Splitting the meta data to a
separate device eliminated that problem.

> I'm really only trying to understand why you need such a setup - it
> helps to understand the full use case you have before trying to
> determine if there is a better way of acheiving your end goal....

The use of a realtime volume is a means to an end, with the end being
separated data and meta data.

> As to your current problem, it's got a NULL pointer dereference
> trying to lock the per-ag structure. That means the per-ag lookup
> failed, which implies that the RT freespace bitmap may be corrupt
> and it's tried to read a bitmap block that is apparently beyond the
> end of the filesystem.  What does xfs_check/xfs_repair -n tell you
> about the filesystem state?

Unfortunately they do not tell a lot.  Running xfs_check/xfs_repair -n
prior to running the test reports no errors.  However, attempting to
run it after the test fails results in an indefinite I/O block (state
of D+ for the process).  In fact, if I run the test utility twice, it
results in a hung system.

Some form of corruption makes sense, at least to me, given the
different behavior between the first run and the second run.  It gives
me a place to start looking anyway.  Thanks.

<Prev in Thread] Current Thread [Next in Thread>