Hi there,
yesterday I replaced the motherboard of my home server (running Debian/GNU
Linux 3.0) for a CPU upgrade (486-133 -> K6-III+ 400) and the network card.
Therefore (to support the card) I had to recompile the kernel, and so I
upgraded the kernel source from 2.4.19-xfs to yesterday's 2.4.20-rc2-xfs ("SGI
XFS CVS-2002-11-22_06:00_UTC with ACLs, quota, no debug enabled.")
Suddenly I noticed strange filesystem corruption. Even after a clean "halt" or
"reboot" upon mounting the XFS filesystems the system would give me error
messages like the following:
Nov 23 00:41:33 Server kernel: Starting XFS recovery on filesystem: ide0(3,9)
(dev: 3/9)
Nov 23 00:41:33 Server kernel: XFS: xlog_recover_process_data: bad clientid
Nov 23 00:41:33 Server kernel: XFS: log mount/recovery failed
Nov 23 00:41:33 Server kernel: XFS: log mount failed
or even
Nov 22 23:39:04 Server kernel: Starting XFS recovery on filesystem: ide0(3,9)
(dev: 3/9)
Nov 22 23:39:04 Server kernel: XFS: log mount/recovery failed
Nov 22 23:39:04 Server kernel: XFS: log mount failed
Nov 22 23:39:04 Server kernel: XFS mounting filesystem ide0(3,10)
Nov 22 23:39:04 Server kernel: XFS mounting filesystem ide0(3,9)
Nov 22 23:39:04 Server kernel: XFS: nil uuid in log - IRIX style log
Nov 22 23:39:04 Server kernel: XFS: failed to locate log tail
Nov 22 23:39:04 Server kernel: XFS: log mount/recovery failed
Nov 22 23:39:04 Server kernel: XFS: log mount failed
Nov 22 23:39:04 Server kernel: XFS mounting filesystem ide0(3,9)
Nov 22 23:39:04 Server kernel: XFS: nil uuid in log - IRIX style log
Nov 22 23:39:04 Server kernel: XFS: failed to locate log tail
Nov 22 23:39:04 Server kernel: XFS: log mount/recovery failed
Nov 22 23:39:05 Server kernel: XFS: log mount failed
Nov 22 23:39:05 Server kernel: XFS mounting filesystem ide0(3,9)
When I booted into single user mode, the same problems occured. xfs_repair
failed and told me to zap the log. I zapped the log by invoking "xfs_repair -
L," and after that I was able to successfully mount the filesystem.
I then entered "exit" to boot into multi-user mode, and since that moment the
system is absolutely stable. So I seriously doubt it's a hardware problem (on-
board IDE controller broken, cabling damaged, etc.)
I seem to remember a problem that had to do with flushing the cache on IDE
drives, but I can't find anything about that on Google. But I found the
following post to this list that EXACTLY describes my problem -- obviously the
poster didn't receive a reply:
http://oss.sgi.com/projects/xfs/mail_archive/200202/msg00446.html
Note that I don't use RAID, just a plain XFS filesystem on a physical
partition.
One thing that I'd like to mention (altho I don't think it's important) is
that the hard drive had been partitioned using OnTrack's drive manager
software because the bios of the old mobo didn't support hard drives larger
than 2.5 gig. I'm pretty sure this isn't the root of my problems because the
partition table is detected properly:
============================ 8x =================================
Nov 22 23:59:23 Server kernel: hda: 16514064 sectors (8455 MB) w/467KiB Cache,
CHS=1027/255/63, (U)DMA
Nov 22 23:59:23 Server kernel: Partition check:
Nov 22 23:59:23 Server kernel: hda: [DM6:DDO] [remap +63] [1027/255/63] hda1
hda2 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 >
cfdisk 2.11n
Disk Drive: /dev/hda
Size: 8455168512 bytes
Heads: 255 Sectors per Track: 63 Cylinders: 1027
Name Flags Part Type FS Type [Label] Size (MB)
-----------------------------------------------------------------------------
hda1 Boot Primary FAT12 [MS-DOS ] 8.23
hda5 Logical Linux ext3 [root] 32.91
hda6 Logical Linux XFS 1579.26
hda7 Logical Linux XFS 526.42
hda8 Logical Linux XFS 2097.45
hda9 Logical Linux XFS 3676.71
hda10 Logical Linux XFS 477.07
hda11 Logical Linux ext2 49.36
Server:/usr/src# fdisk /dev/hda
The number of cylinders for this disk is set to 1027.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/hda: 255 heads, 63 sectors, 1027 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 1 8001 1 FAT12
/dev/hda2 2 1027 8241345 f Win95 Ext'd (LBA)
/dev/hda5 2 5 32098+ 83 Linux
/dev/hda6 6 197 1542208+ 83 Linux
/dev/hda7 198 261 514048+ 83 Linux
/dev/hda8 262 516 2048256 83 Linux
/dev/hda9 517 963 3590496 83 Linux
/dev/hda10 964 1021 465853+ 83 Linux
/dev/hda11 1022 1027 48163+ 83 Linux
============================ 8x =================================
Any idea where to look?
Thanks,
Ralf