On Tue, Aug 14, 2001 at 07:06:34PM -0400, Christian, Chip wrote:
> Has anyone seen "mount -f xfs /dev/blah /blah" take an extremely long time ,
> possibly an eternity, to complete?
> We're running 2.4.3-SGI_XFS_1.0.1 #1 SMP, i686, RedHat Linux 7.1 on a pair of
> servers, in an HA fashion.
> When node #1 notes that node #2 isn't responding, he mounts all of node #2's
> filesystems. What we have seen happen exactly twice now (out of many, many
> more successful mounts) is that a mount of one of the filesystems takes a
> long time, long enough that the user gave up and rebooted. In one instance,
> the mount command ate 25 minutes of cpu in 30 minutes before we aborted.
> mount was in state RN.
> Is there anything anyone can think of I might try to diagnose this if it
> happens again? strace and ltrace are useless; we're sitting inside mount().
> Anything else I can use to see what's going on?
build a kernel with kdb enabled, drop into kdb during the hang,
and start with a backtrace. Sounds like its got itself into an
infinite loop, most likely during recovery.
also could try enabling profiling (append profile=2 in lilo.conf)
and use readprofile to see where all the time is being spent...
that will only give you the function though, kdb will be more useful.
> Could it be a timing thing, where node #2 is coming down and is in the
> process of umounting the filesystem while node #1 starts to mount the same
Is node one still writing to the filesystem while node 2 is trying
to recover? that would be bad and you'll need to ensure that doesn't
happen (does node one shoot node two via a reset line if it is not
responding? and only then attempt the mount?).