xfs
[Top] [All Lists]

Re: XFS corruption!

To: Libor Vanek <libor@xxxxxxxx>
Subject: Re: XFS corruption!
From: Stephen Lord <lord@xxxxxxx>
Date: 27 Jun 2002 09:18:59 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3D1AAB70.4060400@xxxxxxxx>
References: <3D1AAB70.4060400@xxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Thu, 2002-06-27 at 01:06, Libor Vanek wrote:
> Hi,
> we are selling Linux file servers and we wanted to use XFS. Our internal 
> tests passed OK but when we installed first server at customer and 
> migrated data an error occured (usually after copying 60-100 GB). In 
> /var/log/messages we saw this messages:
> 
> Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
> Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 
> leftsib -1 rightsib -129
> Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
> Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 
> leftsib -1 rightsib -129
> ...MANY MANY SAME...
> Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
> Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 
> leftsib -1 rightsib -129
> Jun 27 03:10:30 localhost kernel: xfs_force_shutdown(md(9,0),0x8) called from 
> line 1039 of file xfs_trans.c.  Return address
> = 0xc01e816a
> Jun 27 03:10:30 localhost kernel: Corruption of in-memory data detected.  
> Shutting down filesystem: md(9,0)
> Jun 27 03:10:30 localhost kernel: Please umount the filesystem, and rectify 
> the problem(s)
> 
> 
> We tried migrating 160 GB of data using "cp -a" (over NFS), scp and rsync 
> from old server using RH7.0 (ext2) - all resulted in this.
> The system is running software RAID5 (10x60GB), 1 GHz Celeron, 128 MB RAM, 
> standard RH7.3 with SGI XFS modified installation CD.
> When we rebooted system everything seems OK (nothing lost) but after copying 
> few more MB the same error occurs.
> We have built up 2 VERY same machines from same system image and both behave 
> the very same so I think that it's some software failure.
> 
> I have stress tested system with doing lot of "dd if=/dev/md0 of=/raid/tmp 
> bs=10MB count=100" and recursive directories (about 50 levels deep) and 
> nothing similar occured. Only when copying data over network from the old 
> system.
> 
> 
> Thanks,
> Libor
> 
> 

Can you please run xfs_check on the filesystem after this has happened.
I suspect you may have found a hole in the endian conversion code in
XFS. Doing the copy into the filesystem over NFS is probably
generating more fragmentation and hence more complex free space
structures than doing it locally. 

Steve



<Prev in Thread] Current Thread [Next in Thread>