All,
Finally back at this and have found the fix.
The problem was the "xfsrestorehousekeepingdir" files.
root: ls -la /clone/xfsrestorehousekeepingdir
total 304k
drwx------ 2 root root 4.0k May 12 02:19 ./
drwxr-xr-x 3 root root 4.0k May 12 02:19 ../
-rw------- 1 root root 24k May 12 02:07 .nfs000067c200000002
-rw------- 1 root root 39k May 12 02:07 .nfs000067c300000003
-rw------- 1 root root 190k May 12 02:07 .nfs000067c400000004
-rw------- 1 root root 46M May 12 02:06 .nfs000067c500000005
-rw------- 1 root root 36k May 12 02:07 .nfs000067c600000001
-rw------- 1 root root 0 May 12 02:06 .nfs0000685d00000006
In the 2.4.x systems we have that dir could be on the NFS-mounted
destination drives with no problem. Evendently they were creating a
problem for 2.6.x as they remained locked by some process even after
the hanging xfsrestore process was killed. Adding a
rm -rf /tmp/xfsrestorehousekeepingdir/
and
"-a /tmp" parameter to the cloning script fixed everything.
- Phil
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Phil Macias" <Philip.Macias@xxxxxxxx>,
RSIS, Inc. * NOAA/GFDL * Princeton, NJ
___,___
|_|_, 609 987 5059 office
| 609 203 5874 cell
>__,
| Date: 4 Feb 2004 13:02:49 -0500
| Bcc: root@xxxxxxxxxxxxxxxxxxxx
| Date: Wed, 4 Feb 2004 13:02:48 -0500
| CC: Philip.Macias@xxxxxxxx, linux-xfs@xxxxxxxxxxx
| From: Phil Macias <Philip.Macias@xxxxxxxx>
|
| Here you go:
|
| root: ~# gdb --pid=7896
| GNU gdb Red Hat Linux (5.2-2)
| Copyright 2002 Free Software Foundation, Inc.
| GDB is free software, covered by the GNU General Public License, and you
are
| welcome to change it and/or distribute copies of it under certain
conditions.
| Type "show copying" to see the conditions.
| There is absolutely no warranty for GDB. Type "show warranty" for details.
| This GDB was configured as "i386-redhat-linux".
| Attaching to process 7896
| Reading symbols from /usr/sbin/xfsrestore...done.
| Reading symbols from /usr/lib/libhandle.so.1...done.
| Loaded symbols for /usr/lib/libhandle.so.1
| Reading symbols from /usr/lib/libattr.so.1...done.
| Loaded symbols for /usr/lib/libattr.so.1
| Reading symbols from /lib/libc.so.6...done.
| Loaded symbols for /lib/libc.so.6
| Reading symbols from /usr/lib/libgcc_s.so.1...done.
| Loaded symbols for /usr/lib/libgcc_s.so.1
| Reading symbols from /lib/ld-linux.so.2...done.
| Loaded symbols for /lib/ld-linux.so.2
| 0x4010c566 in open64 () from /lib/libc.so.6
|
| (gdb) backtrace
| #0 0x4010c566 in open64 () from /lib/libc.so.6
| #1 0x400e4dd9 in opendir () from /lib/libc.so.6
| #2 0x08073a0b in wipepersstate () at content.c:3600
| #3 0x08072701 in content_complete () at content.c:2587
| #4 0x08062de7 in main (argc=4, argv=0xbffff514) at main.c:636
| #5 0x4004ed06 in __libc_start_main () from /lib/libc.so.6
|
|
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| "Phil Macias" <Philip.Macias@xxxxxxxx>,
|
| RSIS, Inc. * NOAA/GFDL * Princeton, NJ
|
| ___,___
| |_|_, 609 987 5059 office
| | 609 203 5874 cell
| >__,
|
|
| | From: Russell Cattelan <cattelan@xxxxxxx>
| | Cc: linux-xfs@xxxxxxxxxxx
| | Date: Wed, 04 Feb 2004 11:42:14 -0600
| |
| | Are you able to attach gdb to the hung process and
| | get a backtrace?
| |
| | On Wed, 2004-02-04 at 05:23, Phil Macias wrote:
| | > Hello,
| | >
| | > We have been using XFS on RedHat linux for over two years (since I
| | > have been here). I have been using a script to clone workstations from
| | > one-another for over a year whereby the host-to-be-cloned exports it's
| | > XFS partitions over NFS and they are mounted by the donor host and
| | > contents copied with:
| | >
| | > xfsdump -v5 -l 0 - /dev/hda7 | xfsrestore -v5 - /clone-var
| | >
| | > This process worked reliably for over a year on the 2.4.x kernels and
| | > these versions of xfs tools:
| | >
| | > acl-2.1.1-gfdl-1-1
| | > attr-2.1.1-gfdl-1-1
| | > dmapi-2.0.5-gfdl-1-1
| | > kernel-2.4.18-XFS-NFS-base-gfdl-2-1
| | > xfsdump-2.2.4-gfdl-1-1
| | > xfsprogs-2.3.6-gfdl-1-1
| | >
| | > PROBLEM: I am testing the 2.6.1 kernel and the following relevant
| | > packages:
| | >
| | > libelf-0.8.2-2-gfdl-1-1
| | > elfutils-libelf-0.89-2-gfdl-1-1
| | > popt-1.8.1-0.31-gfdl-1-1
| | > gcc-3.3-gfdl-1-1
| | > kernel-2.6.0-complete-gfdl-1-1
| | > glibc-2.3.2-gfdl-1-1
| | > beecrypt-3.0.1-gfdl-1-1
| | >
| | > acl-2.2.21-gfdl.tgz
| | > attr-2.4.12-gfdl.tgz
| | > binutils-2.14-gfdl.tgz
| | > dmapi-2.1.0-gfdl.tgz
| | > xfsprogs-2.6.0-gfdl.tgz
| | >
| | > Everything on the system works well except the remote
| | > xfsdump/xfsrestore. Running:
| | >
| | > xfsdump -v5 -l 0 - /dev/hda7 | xfsrestore -v5 - /clone-var
| | >
| | > on the 2.6.1 system does dump/restore properly, but xfsrestore never
| | > exits. I have to issue these commands to release xfsrestore:
| | >
| | > kill %
| | > fuser -k /clone-var/
| | >
| | > ...where /clone-var/ is the target partition. Please note that
| | > the target host is still running the old (2.4.x) kernel.
| | >
| | > Here are the last lines to the "-v5" output:
| | >
| | > ...
| | > xfsrestore: read file hdr off 0 flags 0x0 ino 14680333 mode 0x0000a1ff
| | > xfsrestore: preemptchk( )
| | > xfsrestore: restoring lib/scrollkeeper/pt_BR (14680333 0)
| | > xfsrestore: restoring symbolic link ino 14680333
lib/scrollkeeper/pt_BR
| | > xfsrestore: drive_simple read( want 32 )
| | > xfsrestore: drive_simple return_read_buf( returning 32 )
| | > xfsrestore: xlate_extenthdr
| | > xfsrestore: read extent hdr size 32 offset 0 type 4 flags 00000000
| | > xfsrestore: drive_simple read( want 32 )
| | > xfsrestore: drive_simple return_read_buf( returning 32 )
| | > xfsrestore: drive_simple get_mark( )
| | > xfsrestore: drive_simple read( want 256 )
| | > xfsrestore: drive_simple return_read_buf( returning 256 )
| | > xfsrestore: xlate_bstat
| | > xfsrestore: xlate_bstat: pre-xlate
| | > bs_ino 0
| | > bs_mode 0
| | > xfsrestore: xlate_bstat: post-xlate
| | > bs_ino 0
| | > bs_mode 0
| | > xfsrestore: xlate_filehdr: pre-xlate
| | > fh_offset 0
| | > fh_flags 83886080
| | > fh_checksum 13835040720794157312
| | > xfsrestore: xlate_filehdr: post-xlate
| | > fh_offset 0
| | > fh_flags 5
| | > fh_checksum 13835040720794157312
| | > xfsrestore: read file hdr off 0 flags 0x5 ino 0 mode 0x00000000
| | > xfsrestore: preemptchk( )
| | > xfsrestore: Media_end: pos=3D=3D3
| | > xfsrestore: drive_simple end_read( )
| | > xfsrestore: getting next media file for non-dir restore
| | > xfsrestore: Media_mfile_next: purp=3D=3D2 pos=3D=3D0
| | > xfsrestore: tree finalize
| | > xfsrestore: restore complete: 139 seconds elapsed
| | > -------------------------------------------------
| | >
| | > Syslog shows no erors or any info about xfsrestore.
| | >
| | > Any idea why xfsrestore fails to exit properly?
| | >
| | > Thanx,
| | >
| | >
| | > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | > "Phil Macias" <Philip.Macias@xxxxxxxx>,
| | >
| | > RSIS, Inc. * NOAA/GFDL * Princeton, NJ
| | >
| | > ___,___
| | > |_|_, 609 987 5059 office
| | > | 609 203 5874 cell
| | > >__,
| | >
| | >
| | >
| | --
| | Russell Cattelan <cattelan@xxxxxxxxxxx>
| |
| | --=-P/55uI4EMIJNOPBs/njs
| | Content-Type: application/pgp-signature; name=signature.asc
| | Content-Description: This is a digitally signed message part
| |
| | -----BEGIN PGP SIGNATURE-----
| | Version: GnuPG v1.2.4 (FreeBSD)
| |
| | iD8DBQBAIS72NRmM+OaGhBgRApgsAJ91dL40QRuf489yvuWP0edD5CW3mgCePnHE
| | jJKK3969prkDV3Ty8A15YcE=
| | =PMku
| | -----END PGP SIGNATURE-----
| |
| | --=-P/55uI4EMIJNOPBs/njs--
| |
| |
|
|
|