xfs
[Top] [All Lists]

Re: xfsdump: problems spanning tapes (more info)

To: Ned Haubein <n-haubein@xxxxxxxxxxxxxxxx>
Subject: Re: xfsdump: problems spanning tapes (more info)
From: Timothy Shimmin <tes@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 4 Jul 2001 19:22:23 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.HPP.3.93.1010703151710.17840E-100000@merle.acns.nwu.edu>; from nch425@merle.acns.nwu.edu on Tue, Jul 03, 2001 at 03:22:02PM -0500
References: <Pine.HPP.3.93.1010703140934.17840D-100000@merle.acns.nwu.edu> <Pine.HPP.3.93.1010703151710.17840E-100000@merle.acns.nwu.edu>
Sender: owner-linux-xfs@xxxxxxxxxxx
Hi Ned,

On Tue, Jul 03, 2001 at 03:22:02PM -0500, Ned C. Haubein wrote:
> 
> After my last message, I decided to try and use the -s flag to dump only 2
> users directories to the tape.  xfsdump failed again with the message:
> 
> /usr/sbin/xfsdump: status at 15:09:28: 9782/29233 files dumped, 27.7%
> complete, 3282 seconds elapsed
> /usr/sbin/xfsdump: WARNING: write to machine:/dev/nrmt0h failed: 5
> (Input/output error)
> /usr/sbin/xfsdump: ending media file
> /usr/sbin/xfsdump: media file size 1690304512 bytes
> /usr/sbin/xfsdump: dump size (non-dir files) : 1680453600 bytes
> /usr/sbin/xfsdump: NOTE: dump interrupted: 3516 seconds elapsed
> 
> I noted the following messages in the log on the remote machine:
> 
> Jul  3 15:13:22 machine vmunix: ITPSA0: HTH intr. on bus 0, SBCL = 0x20
> Jul  3 15:13:24 machine vmunix: ITPSA0: SCSI Bus was reset
> 
> My ability to debug SCSI problems isn't that great, so I don't know if
> this is a SCSI prob causing an xfsdump problem, or an xfsdump problem
> causing the SCSI bus reset.
> 
> Any insight would be appreciated here.  We'd like to avoid just dropping
> tar files directly on the tape.
> 
I don't think this would be an xfsdump problem in this case.
We have tested the -s option of xfsdump on linux 
for QA tests of 022, 023 and 043 without any problems.
It looks like your write to machine:/dev/nrmt0h just failed with
an I/O error.
Why don't you try it again with debugging turned on by adding
the "-v5" option.

SEE BELOW!

On Tue, Jul 03, 2001 at 02:30:56PM -0500, Ned C. Haubein wrote:
> Hi,
> 
> We've recently installed XFS on our system (Red Hat 7.1, kernel 2.4-5,
> xfsdump-1.0.9-0) and are having some problems dumping one of our
> filesystems.  The dump won't fit on one tape but we're not prompted for a
> media change at the end of the dump - it just dies.  The dump command line
> is: 
> 
> /usr/sbin/xfsdump -p 300 -l 0 -o -J -f machine:/dev/nrmt0h /home
> 
> The remote device is a Seagate SCSI tape drive on a Tru64 system.  Dumps
> on a single tape seem fine and dumps from our IRIX machines are fine (all
> fit on one tape, though), and regular dump on the Tru64 machines is able
> to span multiple tapes. Has anyone else seen this and if so come up with a
> work-around or solution? 
> 
No I haven't seen this before.
But then again I have never used a Tru64 system before.
I have tested multiple tapes locally on Linux with success.

Can you tell me all the messages you do get from xfsdump ?
Could you run it with -v5 (or -v drive=debug) ?

Looking at the code:
   * the remote writing routine librmt/rmtwrite.c will
     setoserror to EIO if the write fails to write out the requested nbytes.
     This seems pretty general - if any error then make it EIO.
   * in drive_scsitape.c, the write routines call write_record()
     which calls Write() and then calls determine_write_error().
     determine_write_error() will return DRIVE_ERROR_EOM for error of EIO.
   * Any of the dumping functions called from content_stream_dump()
     in a loop, such as:
        inomap_dump()
        dump_dirs()
        bigstat_iter(...,dump_file,...)
     will call the writing routine and if it fails with DRIVE_ERROR_EOM
     will convert it to RV_EOM. 
   * With the result of RV_EOM, content_stream_dump() will goto
     decision_more, which will call Media_mfile_end(...hiteom...)
     which will set cc_Media_begin_entry_state = BES_ENDEOM
   * Next we will go to the start of the dumping loop and call
     Media_mfile_begin(). It will notice the cc_Media_begin_entry_state
     equals BES_ENDEOM and will goto changemedia.
   * in changemedia, it does its stuff.
     However, if -F is used then it will not ask you to change.

[Don't you just love the chain of calls....argh:-]


Hmmmmm....wait a minute....
"write to machine:/dev/nrmt0h failed: 5"
only gets produced by drive_simple.

Arghhh!!!!!!!

This means that drive_scsitape wasn't the chosen strategy !
Arghhh!!!!!!!

This means that drive_scsitape.c/ds_match() scored badly.
This means that
   rmtopen()  
or
   rmtioctl(fd, MTIOCGET, &mt_stat)
failed.
    
My bet, FOR SURE, is that rmtioctl(MTIOCGET) failed !!!!!
Set the environment variable RMTDEBUG
and watch the error messages.
The rmtioctl code is in xfsdump/librmt/rmtioctl.c
and is very UNIX specific. The "S" status command supported
by rmt(1) varies quite a bit and there is special code in it
for Linux and IRIX. For example, byte swapping may need
to be done.
If you want to use the scsitape strategy then the code in
xfsdump/librmt/rmtioctl.c and rmtopen.c will have to be
extended for Tru64.

> Finally, just as a note, when dumping the IRIX machines, we normally use
> the -m -b 245760 options, but these fail on the linux version with the
> error:
> 
> xfsdump: drive_minrmt.c:2201: do_end_write: Assertion `first_rec_w_err >=
> 0' failed. 
> 
I'll have a look tomorrow - please redo with -v5.

BTW, if one has probs with xfsdump/restore 
it's good to use "-v5" and send us all the msgs. Thanks.

Hmmmm, I should have a look at producing some extra warning msgs
for this case.

Cheers,
Tim.

<Prev in Thread] Current Thread [Next in Thread>