Eric Sandeen wrote:
Are you sure?
if (ohead->oh_clientid != XFS_TRANSACTION &&
ohead->oh_clientid != XFS_LOG) {
xlog_warn(
"XFS: xlog_recover_process_data: bad clientid");
ASSERT(0);
return (XFS_ERROR(EIO));
}
so it does say EIO but that seems to me to be the wrong error; loks more
like a bad log to me.
Hey Eric:
That would certainly be consistent with our experience, as the only way we're
able to bring the file system back online is by zeroing the log.
It does make me wonder if there's any sort of per-initiator caching on
the iscsi target or something. </handwave>
There isn't, as mentioned above, though we have several intermediate layers
between the file system and iSCSI initiator, including multipath and LVM, both
of which I was initially suspicious of. In testing with a similar scenario but
in a more isolate fashion without those two intermediates, the behavior was
still present. Also, just to clarify the topology:
/-----[Failover Secondary]------\
/ \
NFS Client ----/ \-----[ISCSI
Target]----[Distributed Storage]
\ /
\ /
\-----[Failover Primary]--------/
Those two failover machines, Primary and Secondary, act as the NFS server, the
XFS mountpoint and ISCSI initiator. Only one failover machine is logged into
the ISCSI target/has XFS mounted.
Thanks very much for your cycles on this guys.
- John Quigley
|