XFS corruption with failover
John Quigley
jquigley at jquigley.com
Thu Aug 13 20:06:18 CDT 2009
Eric Sandeen wrote:
> Are you sure?
>
> if (ohead->oh_clientid != XFS_TRANSACTION &&
> ohead->oh_clientid != XFS_LOG) {
> xlog_warn(
> "XFS: xlog_recover_process_data: bad clientid");
> ASSERT(0);
> return (XFS_ERROR(EIO));
> }
>
> so it does say EIO but that seems to me to be the wrong error; loks more
> like a bad log to me.
Hey Eric:
That would certainly be consistent with our experience, as the only way we're able to bring the file system back online is by zeroing the log.
> It does make me wonder if there's any sort of per-initiator caching on
> the iscsi target or something. </handwave>
There isn't, as mentioned above, though we have several intermediate layers between the file system and iSCSI initiator, including multipath and LVM, both of which I was initially suspicious of. In testing with a similar scenario but in a more isolate fashion without those two intermediates, the behavior was still present. Also, just to clarify the topology:
/-----[Failover Secondary]------\
/ \
NFS Client ----/ \-----[ISCSI Target]----[Distributed Storage]
\ /
\ /
\-----[Failover Primary]--------/
Those two failover machines, Primary and Secondary, act as the NFS server, the XFS mountpoint and ISCSI initiator. Only one failover machine is logged into the ISCSI target/has XFS mounted.
Thanks very much for your cycles on this guys.
- John Quigley
More information about the xfs
mailing list