xfs
[Top] [All Lists]

XFS corruption on md raid-0 arrays

To: linux-xfs@xxxxxxxxxxx
Subject: XFS corruption on md raid-0 arrays
From: Mark Watts <m.watts@xxxxxxxxxxxxxxxxx>
Date: Sun, 14 Nov 2004 16:02:49 +0000
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.9 (X11/20041103)
I'm running XFS on Mandrake 10.0 (2.6.3 kernel).

System is Athlon XP, 1GM ram, 2 x 40GB Maxtor ATA/66 drives connected to a SiI680 IDE controller.

I have several linux software raid partitions, each being a raid-0 array with an XFS filesystem. /boot is on a small ext3 partition.

Today, while using the system (which had been running just fine for a week), the XFS driver decided to shutdown /dev/md1 citing 'corruptions in memory data' or somesuch error.
dmesg showed a stack trace.


Thinking it may be fixable with a reboot, I rebooted, only to have /dev/md0 get shutdown too.

When the system comes back up, I get the following on a serial console.

Is this an XFS issue or is something wrong with the md arrays?

And the big question: is it fixable?

Cheers,

Mark.


Linux version 2.6.3-16mdk-i686-up-4GB (qateam@xxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 3.3.2 (Mandrake Linux 10.0 3.3.2-6mdk4
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000d4000 - 00000000000da000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
BIOS-e820: 000000003fff0000 - 000000003fff8000 (ACPI data)
BIOS-e820: 000000003fff8000 - 0000000040000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fb940
hm, page 000fb000 reserved twice.
hm, page 000fc000 reserved twice.
hm, page 000f6000 reserved twice.
hm, page 000f7000 reserved twice.
On node 0 totalpages: 262128
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 225280 pages, LIFO batch:16
HighMem zone: 32752 pages, LIFO batch:7
DMI 2.3 present.
ACPI: RSDP (v000 AMI ) @ 0x000fa9d0
ACPI: RSDT (v001 AMIINT VIA_K7 0x00000010 MSFT 0x00000097) @ 0x3fff0000
ACPI: FADT (v001 AMIINT VIA_K7 0x00000011 MSFT 0x00000097) @ 0x3fff0030
ACPI: MADT (v001 AMIINT VIA_K7 0x00000009 MSFT 0x00000097) @ 0x3fff00c0
ACPI: DSDT (v001 VIA VIA_K7 0x00001000 MSFT 0x0100000d) @ 0x00000000
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:8 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0])
IOAPIC[0]: Assigned apic_id 2
IOAPIC[0]: apic_id 2, version 3, address 0xfec00000, IRQ 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI BALANCE SET
Enabling APIC mode: Flat. Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Built 1 zonelists
Kernel command line: BOOT_IMAGE=263i686up4G-16 ro root=900 devfs=mount splash=silent ide=reverse console=ttyS0
bootsplash: silent mode.
ide_setup: ide=reverse : Enabled support for IDE inverse scan order.
Initializing CPU#0
PID hash table entries: 4096 (order 12: 32768 bytes)
Detected 1760.202 MHz processor.
Using pmtmr for high-res timesource
Console: colour VGA+ 80x25
Memory: 1033256k/1048512k available (1955k kernel code, 14348k reserved, 850k data, 292k init, 131008k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 3481.60 BogoMIPS
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
Freeing initrd memory: 478k freed
CPU: CLK_CTL MSR was 6003d22f. Reprogramming to 2003d22f
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: AMD Athlon(tm) XP 2100+ stepping 01
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000080
ESR value after enabling vector: 00000000
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 1759.0019 MHz.
..... host bus clock speed is 270.0618 MHz.
NET: Registered protocol family 16
EISA bus registered
PCI: PCI BIOS revision 2.10 entry at 0xfdaf1, last bus=1
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040211
Looking for DSDT in initrd ... not found!
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: Power Resource [URP1] (off)
ACPI: Power Resource [URP2] (off)
ACPI: Power Resource [FDDP] (off)
ACPI: Power Resource [LPTP] (off)
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 *6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
PnPBIOS: Disabled
testing the IO APIC.......................
.................................... done.
ACPI: No IRQ known for interrupt pin A of device 0000:00:11.1 - using IRQ 255
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off'
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
apm: overridden by ACPI.
ikconfig 0.7 with /proc/config*
highmem bounce pool size: 64 pages
VFS: Disk quotas dquot_6.5.1
devfs: 2004-01-31 Richard Gooch (rgooch@xxxxxxxxxxxxx)
devfs: boot_options: 0x1
Initializing Cryptographic API
PCI: Via IRQ fixup for 0000:00:10.2, from 6 to 5
PCI: Via IRQ fixup for 0000:00:10.0, from 11 to 5
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
pty: 1024 Unix98 ptys configured
Serial: 8250/16550 driver $Revision: 1.90 $ 20 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 32000K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:11.1
ACPI: No IRQ known for interrupt pin A of device 0000:00:11.1 - using IRQ 255
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8235 (rev 00) IDE UDMA133 controller on pci0000:00:11.1
ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio
hda: MAXTOR 6L020J1, ATA DISK drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: SONY CD-RW CRX300E, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
SiI680: IDE controller at PCI slot 0000:00:07.0
SiI680: chipset revision 1
SiI680: BASE CLOCK == 133
SiI680: 100% native mode on irq 19
ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio
hde: Maxtor 94098U8, ATA DISK drive
ide2 at 0xf8807f80-0xf8807f87,0xf8807f8a on irq 19
hdg: Maxtor 94098U8, ATA DISK drive
ide3 at 0xf8807fc0-0xf8807fc7,0xf8807fca on irq 19
hda: max request size: 128KiB
hda: 40132503 sectors (20547 MB) w/1819KiB Cache, CHS=39813/16/63, UDMA(133)
/dev/ide/host0/bus0/target0/lun0: p1
hde: max request size: 64KiB
hde: 80041248 sectors (40981 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(66)
/dev/ide/host2/bus0/target0/lun0: p1 p2 < p5 p6 p7 >
hdg: max request size: 64KiB
hdg: 80041248 sectors (40981 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(66)
/dev/ide/host2/bus1/target0/lun0: p1 p2 < p5 p6 p7 >
mice: PS/2 mouse device common for all mice
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
EISA: Probing bus 0 at eisa0
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET: Registered protocol family 1
BIOS EDD facility v0.13 2004-Mar-09, 3 devices found
Please report your BIOS at http://linux.dell.com/edd/results.html
ACPI: (supports S0 S1 S4 S5)
md: Autodetecting RAID arrays.
md: autorun ...
md: considering hdg7 ...
md: adding hdg7 ...
md: hdg6 has different UUID to hdg7
md: adding hde7 ...
md: hde6 has different UUID to hdg7
md: created mdX
md: bind<hde7>
md: bind<hdg7>
md: running: <hdg7><hde7>
md: personality 2 is not loaded!
md :do_md_run() returned -22
md: md1 stopped.
md: unbind<hdg7>
md: export_rdev(hdg7)
md: unbind<hde7>
md: export_rdev(hde7)
md: considering hdg6 ...
md: adding hdg6 ...
md: adding hde6 ...
md: created md0
md: bind<hde6>
md: bind<hdg6>
md: running: <hdg6><hde6>
md: personality 2 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdg6>
md: export_rdev(hdg6)
md: unbind<hde6>
md: export_rdev(hde6)
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Mounted devfs on /dev
Red Hat nash verSCSI subsystem initialized
sion 3.5.18-mdk starting
Loading scsi_mod.ko module
Loading aic7xxx.ko module
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
<Adaptec 3960D Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs


scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36
        <Adaptec 3960D Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs

Loading sd_mod.kmd: raid0 personality registered as nr 2
o module
LoadinSGI XFS with ACLs, large block numbers, no debug enabled
g raid0.ko modulSGI XFS Quota Management subsystem
e
Loading xfs.kmd: Autodetecting RAID arrays.
o module
Mountimd: autorun ...
md: considering hde6 ...
ng /proc filesysmd: adding hde6 ...
tem
Creating demd: adding hdg6 ...
vice files
Mounmd: hde7 has different UUID to hde6
md: hdg7 has different UUID to hde6
md: created md0
md: bind<hdg6>
md: bind<hde6>
md: running: <hde6><hdg6>
ting sysfs
Actimd0: setting max_sectors to 128, segment boundary to 32767
raid0: looking at hde6
raid0: comparing hde6(5119488) with hde6(5119488)
raid0: END
raid0: ==> UNIQUE
raid0: 1 zones
raid0: looking at hdg6
raid0: comparing hdg6(5119488) with hde6(5119488)
raid0: EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 10238976 blocks.
raid0 : conf->hash_spacing is 10238976 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 4 bytes for hash.
vating md devicemd: considering hde7 ...
s
mknod: failedmd: adding hde7 ...
to create /dev/md: adding hdg7 ...
md: created md1
md: bind<hdg7>
md: bind<hde7>
md: running: <hde7><hdg7>
md0: 17
mknod: md1: setting max_sectors to 128, segment boundary to 32767
raid0: looking at hde7
raid0: comparing hde7(33979584) with hde7(33979584)
raid0: END
raid0: ==> UNIQUE
raid0: 1 zones
raid0: looking at hdg7
raid0: comparing hdg7(33979584) with hde7(33979584)
raid0: EQUAL
raid0: FINAL 1 zones
raid0: done.
raid0 : md_size is 67959168 blocks.
raid0 : conf->hash_spacing is 67959168 blocks.
raid0 : nb_zone is 1.
raid0 : Allocating 4 bytes for hash.
md: ... autorun DONE.
failed to createXFS mounting filesystem md0
/dev/md/0: 17
Creating root device
Mounting root filesystem
Starting XFS recovery on filesystem: md0 (dev: md0)
Filesystem "md0": XFS internal error xlog_valid_rec_header(1) at line 3485 of file fs/xfs/xfs_log_recover.c. Caller 0xf89c
Call Trace:
[<f89bd40b>] 0xf89bd40b
[<f89bd5ec>] 0xf89bd5ec
[<f89bd5ec>] 0xf89bd5ec
[<c02686dd>] i8042_interrupt+0xad/0x140
[<f89bdf8b>] 0xf89bdf8b
[<f89be0b6>] 0xf89be0b6
[<f89be2d5>] 0xf89be2d5
[<f89b5064>] 0xf89b5064
[<f89bfcad>] 0xf89bfcad
[<f89d571c>] 0xf89d571c
[<f89db00d>] 0xf89db00d
[<f89bf080>] 0xf89bf080
[<f89b0ebe>] 0xf89b0ebe
[<f89c7afd>] 0xf89c7afd
[<f89dbce1>] 0xf89dbce1
[<f89dba9f>] 0xf89dba9f
[<c01d38f6>] snprintf+0x26/0x30
[<c018bc37>] disk_name+0xa7/0xc0
[<c015fb10>] sb_set_blocksize+0x20/0x60
[<c015f511>] get_sb_bdev+0x131/0x160
[<c0165663>] real_lookup+0xd3/0x100
[<f89dbc8f>] 0xf89dbc8f
[<f89dba00>] 0xf89dba00
[<c015f789>] do_kern_mount+0x89/0x110
[<c0173a84>] do_add_mount+0x84/0x190
[<c014149b>] __alloc_pages+0x9b/0x360
[<c0173e12>] do_mount+0x182/0x1d0
[<c0141793>] __get_free_pages+0x33/0x40
[<c0173c11>] copy_mount_options+0x81/0x100
[<c01741be>] sys_mount+0x8e/0xd0
[<c010b11d>] sysenter_past_esp+0x52/0x71


XFS: log mount/recovery failed
XFS: log mount failed
mount: error 22 XFS mounting filesystem md0
mounting xfs flags defaults
well, retrying without the option flags
Starting XFS recovery on filesystem: md0 (dev: md0)
Filesystem "md0": XFS internal error xlog_valid_rec_header(1) at line 3485 of file fs/xfs/xfs_log_recover.c. Caller 0xf89c
Call Trace:
[<f89bd40b>] 0xf89bd40b
[<f89bd5ec>] 0xf89bd5ec
[<f89bd5ec>] 0xf89bd5ec
[<c02686dd>] i8042_interrupt+0xad/0x140
[<f89bdf8b>] 0xf89bdf8b
[<f89be0b6>] 0xf89be0b6
[<f89be2d5>] 0xf89be2d5
[<f89b5064>] 0xf89b5064
[<f89bfcad>] 0xf89bfcad
[<f89d571c>] 0xf89d571c
[<f89db00d>] 0xf89db00d
[<f89bf080>] 0xf89bf080
[<f89b0ebe>] 0xf89b0ebe
[<f89c7afd>] 0xf89c7afd
[<f89dbce1>] 0xf89dbce1
[<f89dba9f>] 0xf89dba9f
[<c01d38f6>] snprintf+0x26/0x30
[<c018bc37>] disk_name+0xa7/0xc0
[<c015fb10>] sb_set_blocksize+0x20/0x60
[<c015f511>] get_sb_bdev+0x131/0x160
[<f89dbc8f>] 0xf89dbc8f
[<f89dba00>] 0xf89dba00
[<c015f789>] do_kern_mount+0x89/0x110
[<c0173a84>] do_add_mount+0x84/0x190
[<c014149b>] __alloc_pages+0x9b/0x360
[<c0173e12>] do_mount+0x182/0x1d0
[<c0141793>] __get_free_pages+0x33/0x40
[<c0173c11>] copy_mount_options+0x81/0x100
[<c01741be>] sys_mount+0x8e/0xd0
[<c010b11d>] sysenter_past_esp+0x52/0x71


XFS: log mount/recovery failed
XFS: log mount failed
mount: error 22 XFS mounting filesystem md0
mounting xfs
well, retrying read-only without any flag
Starting XFS recovery on filesystem: md0 (dev: md0)
Filesystem "md0": XFS internal error xlog_valid_rec_header(1) at line 3485 of file fs/xfs/xfs_log_recover.c. Caller 0xf89c
Call Trace:
[<f89bd40b>] 0xf89bd40b
[<f89bd5ec>] 0xf89bd5ec
[<f89bd5ec>] 0xf89bd5ec
[<c02686dd>] i8042_interrupt+0xad/0x140
[<f89bdf8b>] 0xf89bdf8b
[<f89be0b6>] 0xf89be0b6
[<f89be2d5>] 0xf89be2d5
[<f89b5064>] 0xf89b5064
[<f89bfcad>] 0xf89bfcad
[<f89d571c>] 0xf89d571c
[<f89db00d>] 0xf89db00d
[<f89bf080>] 0xf89bf080
[<f89b0ebe>] 0xf89b0ebe
[<f89c7afd>] 0xf89c7afd
[<f89dbce1>] 0xf89dbce1
[<f89dba9f>] 0xf89dba9f
[<c01d38f6>] snprintf+0x26/0x30
[<c018bc37>] disk_name+0xa7/0xc0
[<c015fb10>] sb_set_blocksize+0x20/0x60
[<c015f511>] get_sb_bdev+0x131/0x160
[<f89dbc8f>] 0xf89dbc8f
[<f89dba00>] 0xf89dba00
[<c015f789>] do_kern_mount+0x89/0x110
[<c0173a84>] do_add_mount+0x84/0x190
[<c014149b>] __alloc_pages+0x9b/0x360
[<c0173e12>] do_mount+0x182/0x1d0
[<c0141793>] __get_free_pages+0x33/0x40
[<c0173c11>] copy_mount_options+0x81/0x100
[<c01741be>] sys_mount+0x8e/0xd0
[<c010b11d>] sysenter_past_esp+0x52/0x71


XFS: log mount/recovery failed
XFS: log mount failed
mount: error 22 mounting xfs
pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2
Remounting devfs at correct place if necessary
Mounted devfs on /dev
Freeing unused kernel memory: 292k freed
Kernel panic: No init found.  Try passing init= option to kernel.


<Prev in Thread] Current Thread [Next in Thread>