[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xfs_create looping, missing dirs/files, corrupt inode tables, etc.



Eric Sandeen wrote:
> 
> > Hi all,
> >
> > I've been having a problem that I thought might be related to Tridge's, but
> > maybe not:
> >
> > When copying directories over NFS (v2) from OSF1 clients to a Linux server
> > with XFS, files and directories will mysteriously "vanish" after the cp
> > completes, however, enough of the file/dir remains (i.e., the name of the
> > file in the inode table) that an ls of the dir will yield this:
> >
> >
> > [root@sdssdp9 rawdata]# ls CANNOT_RM.51940/
> > ls: CANNOT_RM.51940/guider: No such file or directory
> 
> Are you doing anything special to hit this?  I move files around NFS all


Nothing terribly special.  But I did lie about the kernel/XFS version,
accidentally: I upgraded to linux-2.4.8-xfs on Friday morning, the
corruption was discovered Friday night, but the files were written back when
the machine was running linux-2.4.7pre6-xfs.  

Which version of NFS are you using?  v2?  This might be the root of the
problem.  Some of our systems are forcing vers=2 for problems we had, oh,
back around 2.0.36 with OSF1 not being able to autonegotiate correctly.  I
think that's all fixed now, so I'm having that mount option removed.  I'll
let you know if the problem goes away.


> the time, I have not seen this.  Anything else you can tell us to try to
> recreate it?  Does this only happen on RAID?

I'd test on a non-RAID system if I had one (how often do you hear that? 
;-).  I suppose I could try it on my desktop.  I'll let you know how that
goes, too. We're moving *lots* of data around (a few hundred GB/week at
least).  I'm trying to get the data analysts to use rcp instead of NFS, but
that'll take some time.


> 
> > Aug 18 00:14:10 sdssdp9 kernel: xfs_create looping, dir ino 0xa25e000, ino
> > 0x101000800, md(9,0)
> > Aug 18 00:14:10 sdssdp9 kernel:
> > Aug 18 00:14:10 sdssdp9 kernel: nfsd: non-standard errno: -990
> 
> The -990 is EFSCORRUPTED, generated directly after the "xfs_create
> looping" message, which comes out of a trap commented like this:
> 
> /*
>  * xfs_create_broken is a trap routine to isolate the cause of a
> infinite
>  *      loop condition reported in IRIX 6.4 by PV 522864. If no
> occurances
>  *      of this error recur (that is, the trap code isn't hit), this
> routine
>  *      should be removed in future releases.
>  */
> 
> so I guess we won't remove it just yet... ;-)


Hm... no.  Not quite yet. 


Cheers,
Dan

PS Tell Steve "Whatever gcc RH ships with 7.1 plus updates."  I guess that's
gcc-2.96-85. You want me to try 2.96.66, per the Makefile?

-- 
Dan Yocum
Sloan Digital Sky Survey, Fermilab  630.840.6509
yocum@fnal.gov, http://www.sdss.org
SDSS.  Mapping the Universe.