Hi,
we are selling Linux file servers and we wanted to use XFS. Our internal
tests passed OK but when we installed first server at customer and
migrated data an error occured (usually after copying 60-100 GB). In
/var/log/messages we saw this messages:
Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib
-1 rightsib -129
Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib
-1 rightsib -129
...MANY MANY SAME...
Jun 27 03:09:56 localhost kernel: xfs_btree_check_sblock: Not OK:
Jun 27 03:09:56 localhost kernel: magic 0x41425443 level 0 numrecs 394 leftsib
-1 rightsib -129
Jun 27 03:10:30 localhost kernel: xfs_force_shutdown(md(9,0),0x8) called from
line 1039 of file xfs_trans.c. Return address
= 0xc01e816a
Jun 27 03:10:30 localhost kernel: Corruption of in-memory data detected.
Shutting down filesystem: md(9,0)
Jun 27 03:10:30 localhost kernel: Please umount the filesystem, and rectify the
problem(s)
We tried migrating 160 GB of data using "cp -a" (over NFS), scp and rsync from
old server using RH7.0 (ext2) - all resulted in this.
The system is running software RAID5 (10x60GB), 1 GHz Celeron, 128 MB RAM,
standard RH7.3 with SGI XFS modified installation CD.
When we rebooted system everything seems OK (nothing lost) but after copying
few more MB the same error occurs.
We have built up 2 VERY same machines from same system image and both behave
the very same so I think that it's some software failure.
I have stress tested system with doing lot of "dd if=/dev/md0 of=/raid/tmp bs=10MB
count=100" and recursive directories (about 50 levels deep) and nothing similar
occured. Only when copying data over network from the old system.
Thanks,
Libor
|