Well, I can provide a bit more information.
-- We have a number of these hardware systems. As I said, it is very
easy to reproduce, at some of them. As it goes, it is at our Beta
sites where it is easy to reproduce, and in our lab where it is
tough to reproduce. We are looking into why.
-- I was unable to try the sunit=0 & swidth=0 experiment: no matter
what parameters I give to mkfs.xfs (sunit, swidth, su, sw, various
args), or what options I use in mount, the filesystem is always
created/mounted with the geometry read from the RAID. (perhaps this
is a known issue)
-- we are currently verifying a workaround: we added a pseudo-service
during shutdown that does
dd if=/dev/zero of=/xfs_filesystem/junk bs=64k count=8k
(and removes junk on startup). On a system where this was
nearly 100% repeatable, we have now gone though 10 reboot cycles
without a problem (tests continue -- tough at a beta site).
-- The problem remains unchanged if Linux Software RAID is removed
from the equation. I stopped the RAID, formatted one of the disks
as XFS (installed Postgres, etc.), and got the corruption on the
first reboot.
Is there any definitive information known about what hardware
configurations are susceptible?
Thanks,
--Ian
> -----Original Message-----
> Yes, this does sound like it might be the problem we are working on.
>
> The definitive test is to do a unmount/mount cycle instead of a reboot; if
> data corruption is found, then we are looking at the same thing.
>
> BTW, we have now duplicated this problem on the current Ubuntu Linux
> release,
> and on SUSE 9.2 (we will be checking 9.3 as soon as we get a copy... but
> I think
> we can recreate it there, too).
>
> Fortunately, it seems to be very hard to hit; it seems to be very
> hardware configuration
> dependent.
>
> Jim Foris
|