[Top] [All Lists]

Re: mounts sometimes take forever

To: "Christian, Chip" <chip.christian@xxxxxxxxxxxxxxx>
Subject: Re: mounts sometimes take forever
From: Nathan Scott <nathans@xxxxxxx>
Date: Wed, 15 Aug 2001 11:06:52 +1000
Cc: "Linux XFS (E-mail)" <linux-xfs@xxxxxxxxxxx>
In-reply-to: <23D04BDBA646D411BDDD00D0B774B5390460255A@SA-BWMAIL1>; from chip.christian@storageapps.com on Tue, Aug 14, 2001 at 07:06:34PM -0400
References: <23D04BDBA646D411BDDD00D0B774B5390460255A@SA-BWMAIL1>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.2.5i

On Tue, Aug 14, 2001 at 07:06:34PM -0400, Christian, Chip wrote:
> Has anyone seen "mount -f xfs /dev/blah /blah" take an extremely long time , 
> possibly an eternity, to complete?  
> We're running 2.4.3-SGI_XFS_1.0.1 #1 SMP, i686, RedHat Linux 7.1 on a pair of 
> servers, in an HA fashion.  
> When node #1 notes that node #2 isn't responding, he mounts all of node #2's 
> filesystems.  What we have seen happen exactly twice now (out of many, many 
> more successful mounts) is that a mount of one of the filesystems takes a 
> long time, long enough that the user gave up and rebooted.  In one instance, 
> the mount command ate 25 minutes of cpu in 30 minutes before we aborted.  
> mount was in state RN.
> Is there anything anyone can think of I might try to diagnose this if it 
> happens again?  strace and ltrace are useless; we're sitting inside mount().  
> Anything else I can use to see what's going on?

build a kernel with kdb enabled, drop into kdb during the hang,
and start with a backtrace.  Sounds like its got itself into an
infinite loop, most likely during recovery.

also could try enabling profiling (append profile=2 in lilo.conf)
and use readprofile to see where all the time is being spent...
that will only give you the function though, kdb will be more useful.

> Could it be a timing thing, where node #2 is coming down and is in the 
> process of umounting the filesystem while node #1 starts to mount the same 
> filesystem?  

Is node one still writing to the filesystem while node 2 is trying
to recover?  that would be bad and you'll need to ensure that doesn't
happen (does node one shoot node two via a reset line if it is not
responding?  and only then attempt the mount?).



<Prev in Thread] Current Thread [Next in Thread>