[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mounts sometimes take forever



hi,

On Tue, Aug 14, 2001 at 07:06:34PM -0400, Christian, Chip wrote:
> Has anyone seen "mount -f xfs /dev/blah /blah" take an extremely long time , possibly an eternity, to complete?  
> 
> We're running 2.4.3-SGI_XFS_1.0.1 #1 SMP, i686, RedHat Linux 7.1 on a pair of servers, in an HA fashion.  
> 
> When node #1 notes that node #2 isn't responding, he mounts all of node #2's filesystems.  What we have seen happen exactly twice now (out of many, many more successful mounts) is that a mount of one of the filesystems takes a long time, long enough that the user gave up and rebooted.  In one instance, the mount command ate 25 minutes of cpu in 30 minutes before we aborted.  mount was in state RN.
> 
> Is there anything anyone can think of I might try to diagnose this if it happens again?  strace and ltrace are useless; we're sitting inside mount().  Anything else I can use to see what's going on?
> 

build a kernel with kdb enabled, drop into kdb during the hang,
and start with a backtrace.  Sounds like its got itself into an
infinite loop, most likely during recovery.

also could try enabling profiling (append profile=2 in lilo.conf)
and use readprofile to see where all the time is being spent...
that will only give you the function though, kdb will be more useful.

> Could it be a timing thing, where node #2 is coming down and is in the process of umounting the filesystem while node #1 starts to mount the same filesystem?  

Is node one still writing to the filesystem while node 2 is trying
to recover?  that would be bad and you'll need to ensure that doesn't
happen (does node one shoot node two via a reset line if it is not
responding?  and only then attempt the mount?).

cheers.

-- 
Nathan