xfs
[Top] [All Lists]

Re: RHEL4/SL4 XFS stack problem?

To: Eric Sandeen <sandeen@xxxxxxx>
Subject: Re: RHEL4/SL4 XFS stack problem?
From: "Michael Mansour" <mic@xxxxxxxxxxx>
Date: Wed, 4 Jan 2006 20:03:19 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.44.0601032302050.8801-100000@xxxxxxxxxxxxxxxxxxxxxxxx>
References: <20060104045336.M58734@xxxxxxxxxxx> <Pine.LNX.4.44.0601032302050.8801-100000@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi Eric,

> > Hi Eric,
> > 
> > > > After building a couple of clusters using xfs on the shared storage 
> > > > device
> > > > (and using md and lvm on top of that), I'm getting this error now
which hard
> > > > crashes my machines:
> > > > 
> > > >  do_IRQ: stack overflow: 284
> > > >   [<c01078a2>] do_IRQ+0x44/0x130
> > > 
> > > The rest of the message would be most interesting, to see what your 
> > > stack actually looks like.
> > 
> > What I've shown above is the only bit I can see on the console, can't use 
> > the
> > keyboard or anything at that point and I have to physically powercycle the
server.
> 
> Hm, hard to say then.
> 
> > > Recent xfs is reasonable on 4k stacks and there are a few things in 
> > > the works to make it better.  But depending on what you stack up in 
> > > your IO path you could probably still blow it.
> > 
> > Hmm... ok, my stack is:
> > 
> > md   for IDE disk mirrors
> > lvm  for LV support
> > drbd for the shared storage
> > xfs  formatted the filesystem
> 
> hm, yes, that's pretty optimistic :)
> 
> Just for kicks you could run http://oss.sgi.com/~sandeen/stackcheck-
> i386 against each of those modules & see if any large stack users 
> show up that might matter.

# stackcheck /lib/modules/2.6.9-11.EL.XFSsmp/kernel/drivers/block/drbd.ko
144 drbd_ioctl_set_net
1c0 drbd_ioctl_get_conf

# stackcheck /lib/modules/2.6.9-11.EL.XFSsmp/kernel/fs/xfs/xfs.ko
124 linvfs_mknod
134 xfs_bmapi
134 xfs_swapext
158 xfs_trans_init

# stackcheck /lib/modules/2.6.9-11.EL.XFSsmp/kernel/fs/jbd/jbd.ko
114 log_do_checkpoint
168 journal_commit_transaction

# stackcheck /lib/modules/2.6.9-11.EL.XFSsmp/kernel/fs/lockd/lockd.ko
120 nlm4svc_proc_cancel_msg
120 nlm4svc_proc_granted_msg
120 nlm4svc_proc_lock_msg
120 nlm4svc_proc_test_msg
120 nlm4svc_proc_unlock_msg
120 nlmsvc_proc_cancel_msg
120 nlmsvc_proc_granted_msg
120 nlmsvc_proc_lock_msg
120 nlmsvc_proc_test_msg
120 nlmsvc_proc_unlock_msg
2a0 nlmclnt_reclaim
2b0 nlmclnt_proc

# stackcheck /lib/modules/2.6.9-11.EL.XFSsmp/kernel/fs/nfs/nfs.ko
118 encode_attrs
128 nfs_lookup
130 nfs_lookup_revalidate
14c nfs3_proc_link
158 nfs_proc_create
160 nfs_mkdir
160 nfs_mknod
168 _nfs4_open_delegation_recall
16c nfs3_proc_rename
170 _nfs4_open_reclaim
170 nfs_symlink
19c nfs_readdir
208 _nfs4_do_open
22c nfs3_proc_create
Dynamic 00001794 nfs_sillyrename     17c0:      29 cc                   sub  
 %ecx,%esp

All that doesn't really mean much to me :)

> > I run the linuxha.net HA software which uses drbd for network-linked shared
> > storage.
> > 
> > Do you think all that stacking is the problem? would the previous email
> > stating that I can build from kernel.org using RH config file but changing 
> > to
> > 8k stack make this work?
> 
> It might.  There are some arguments that because  8k stacks must 
> share with IRQ stacks, you're  just as likely to have problems, but 
> it seems that usually 8k stacks are  a bit more forgiving...

I thought long and hard about this Eric, and although I like XFS alot and do
wish to use it, I've relunctantly decided to migrate those XFS filesystems to
ext3. Being these are production clusters built on SL42 (RHEL4 Update 2), I
really do need them to be supported by vendor releases without too much
tinkering from my end.

Because of this though, I feel compelled after more than 12 years with RH, to
checkout SUSE the next time I'm building servers. Out of the box they seem to
support all the good stuff (php5, XFS, etc) where RH seems to lag behind.

Thanks for your help.

Michael.

> -Eric
> 
> > Thanks.
> > 
> > Michael.
> > 
> > > -Eric
> > > 
> > > > I'm using Scientific Linux 4.2 (RHEL4 Update 2) with a SL Contrib
kernel of:
> > > > 
> > > > kernel-smp-2.6.9-11.EL.XFS
> > > > 
> > > > which has xfs support. I also use the xfsprogs rpm supplied by Dag
Wieers. I
> > > > run on an x86 platform.
> > > > 
> > > > After googling quite a bit, it seems that RH have caused an issue with
their
> > > > RHEL4 release by only enabling a 4k stack, where it seems that XFS
requires an
> > > > 8k stack?
> > > > 
> > > > I'd really like to know how to fix this problem as I just finished
months of
> > > > works building a couple of SL4 clustered environments using XFS, and
now with
> > > > this problem am looking at the unpleasant alternative of getting rid
of the
> > > > XFS filesystems and changing them to ext3, which will take me
approximately
> > > > half a day of work per cluster for the added benefit of a slower
filesystem.
> > > > 
> > > > I just visited the SGI site to see if there's any hints to fixes of this
> > > > problem there, which is where I got this email address from.
> > > > 
> > > > Any help is very much appreciated.
> > > > 
> > > > Michael.
> > > > 
> > > >
> > ------- End of Original Message -------
> >
------- End of Original Message -------


<Prev in Thread] Current Thread [Next in Thread>