----- "Eric Sandeen" <sandeen@xxxxxxxxxxx> wrote:
> Lachlan McIlroy wrote:
> > ----- "Eric Sandeen" <sandeen@xxxxxxxxxxx> wrote:
> >
> >> Felix Blyakher wrote:
> >>> On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
> >>>
> >>>> Folks:
> >>>>
> >>>> We're deploying XFS in a configuration where the file system is
>
> >>>> being exported with NFS. XFS is being mounted on Linux, with
> >>>> default options; an iSCSI volume is the formatted media. We're
>
> >>>> working out a failover solution for this deployment utilizing
> Linux
> >>
> >>>> HA. Things appear to work correctly in the general case, but in
>
> >>>> continuous testing we're getting XFS superblock corruption on a
> >> very
> >>>> reproducible basis.
> >>>> The sequence of events in our test scenario:
> >>>>
> >>>> 1. NFS server #1 online
> >>>> 2. Run IO to NFS server #1 from NFS client
> >>>> 3. NFS server #1 offline, (via passing 'b' to
> /proc/sysrq-trigger)
> >>>> 4. NFS server #2 online
> >>>> 5. XFS mounted as part of failover mechanism, mount fails
> >>>>
> >>>> The mount fails with the following:
> >>>>
> >>>> <snip>
> >>>> kernel: XFS mounting filesystem sde
> >>>> kernel: Starting XFS recovery on filesystem: sde (logdev:
> >> internal)
> >>>> kernel: XFS: xlog_recover_process_data: bad clientid
> >>>> kernel: XFS: log mount/recovery failed: error 5
> >>> This is an IO error. Is the block device (/dev/sde) accessible
> >>> from the server #2 OK? Can you dd from that device?
> >> Are you sure?
> >>
> >> if (ohead->oh_clientid != XFS_TRANSACTION &&
> >> ohead->oh_clientid != XFS_LOG) {
> >> xlog_warn(
> >> "XFS: xlog_recover_process_data: bad clientid");
> >> ASSERT(0);
> >> return (XFS_ERROR(EIO));
> >> }
> >>
> >> so it does say EIO but that seems to me to be the wrong error;
> loks
> >> more
> >> like a bad log to me.
> >>
> >> It does make me wonder if there's any sort of per-initiator
> caching
> >> on
> >> the iscsi target or something. </handwave>
> > Should barriers be enabled in XFS then?
>
> Could try it but I bet the iscsi target doesn't claim to support
> them...
You're probably right.
Is it possible for a transaction record to span two log buffers and
only one made it to disk so the rest of the transction record appears
corrupt?
>
> -eric
>
> >> -Eric
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@xxxxxxxxxxx
> >> http://oss.sgi.com/mailman/listinfo/xfs
> >
|