[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Nasty bug?



Hi.

I've just hit what _could_ be a nasty bug in xfs unless I'm missing
something.

I have a server which was built on stock redhat 7.3 and then has a
2.4.20 kernel which I patched with xfs and lvm 1 and built (also
including net drivers and so on).

There is a hardware RAID device attached (full of IDE disks but
presenting as a single SCSI disk to the OS). 

I put the entire RAID (1.6TB) into a single physical volume under lvm
and created a number of logical volumes on it, all of which I formatted
with xfs.

It's been working just fine for about 4 months, but today I tried to
create the remainder of the logical volumes I need (the last 200G or so
of the disk) and format them with xfs. But xfs won't format them.
Instead, it comes back with a load of i/o errors.

I thought it might be the array playing up, so I tried mkfs -t ext3 on
it ant it worked fine. 

I tried making lots of small logical volumes and formatting them
separately, and after the first few the errors start happening on all
the rest. It's as though xfs can't work on disk sectors above a certain
limit.

Is there some sort of limit on xfs that it can't be used above a certain
sector number? Or am I missing something obvious?

The following errors from the syslog look possibly relevant:

Aug  1 15:17:01 picard modprobe: modprobe: Can't locate module
block-major-43
Aug  1 15:18:22 picard last message repeated 32 times
Aug  1 15:19:08 picard last message repeated 63 times
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2789278339
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2797666947
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2806055555
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2814444163
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2822832771
Aug  1 15:19:20 picard kernel:  I/O error: dev 08:00, sector 2831221379

the last message repeats many many times (> 100) with different sector
numbers. As you can see, the sector numbers are not sequential, so it's
not a simple hardware failure.

I have seen the suggestion that the block-major-43 message (nbd call)
can be ignored because it's an unsupported ioctl call to lvm which isn't
relevant for local file systems. But if it's being made by xfs then it
might be more important than that... :)

Whatever the problem it only seems to happen at the 'top' of the disk.

Any thoughts, anyone?

Paul.