xfs
[Top] [All Lists]

XFS partition problem

To: linux-xfs@xxxxxxxxxxx
Subject: XFS partition problem
From: Raphael Bauduin <raphael.bauduin@xxxxxxxxxxxxxx>
Date: Thu, 24 Jun 2004 16:20:39 +0200
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.5 (X11/20040208)
Hi,

I have a problem on a XFS partition on a debian stable system. The system 
running was 2.4.20 and
I compiled a 2.4.26. As performance was taking a hit when running on the 
2.4.26, I rebooted it several
times to go from the 2.4.26 to the 2.4.20, and vice versa.

For each reboot I did, I found this in the logs:

Jun 23 19:24:39 dotnet kernel: XFS mounting filesystem sd(8,38)
Jun 23 19:24:39 dotnet kernel: Starting XFS recovery on filesystem: sd(8,38) 
(dev: sd(8,38))
Jun 23 19:24:39 dotnet kernel: Ending XFS recovery on filesystem: sd(8,38) 
(dev: sd(8,38))

Does this mean the recovery was successful? I suppose not as it came back at 
subsequent reboots. Or
Would it mean the partition is not correctly unmounted at each reboot?


The problem became worse:

Jun 23 19:24:39 dotnet kernel: XFS mounting filesystem sd(8,38)
Jun 23 19:24:39 dotnet kernel: Starting XFS recovery on filesystem: sd(8,38) 
(dev: sd(8,38))
Jun 23 19:24:39 dotnet kernel: Ending XFS recovery on filesystem: sd(8,38) 
(dev: sd(8,38))
Jun 23 19:24:39 dotnet kernel: kjournald starting.  Commit interval 5 seconds
Jun 23 19:24:39 dotnet kernel: EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,33), 
internal journal
Jun 23 19:24:39 dotnet kernel: EXT3-fs: mounted filesystem with ordered data 
mode.
Jun 23 19:24:39 dotnet kernel: XFS mounting filesystem sd(8,35)
Jun 23 19:24:39 dotnet kernel: Ending clean XFS mount for filesystem: sd(8,35)
Jun 23 19:24:48 dotnet kernel: eth0: no IPv6 routers present
Jun 23 19:24:56 dotnet kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at 
line 1583 of file xfs_alloc.c.  Caller 0xc017cd0b
Jun 23 19:24:56 dotnet kernel: dd0d7d9c c01a636e c017c063 c02fd946 00000001 
00000000 c02fd920 0000062f
Jun 23 19:24:56 dotnet kernel:        c017cd0b 00000000 0001b1ce dfa01554 
00000000 df9e1800 00000000 df9d2e8c
Jun 23 19:24:56 dotnet kernel:        dfe3c580 00000000 00000001 0001b1ce 
00000010 00000001 00000001 c017cd0b
Jun 23 19:24:56 dotnet kernel: Call Trace:    [add_entropy_words+62/200] 
[count_semncnt+43/104] [sys_semop+7/28] [sys_semop+7/28] [acpi_leave_sleep_
state+247/596]
Jun 23 19:24:56 dotnet kernel:   [do_con_write+1466/1672] 
[ide_taskfile_ioctl+1057/1204] [flagged_taskfile+264/616] [init_irq+370/772] 
[fput+77/244]
[filp_close+156/168]
Jun 23 19:24:56 dotnet kernel:   [sys_close+91/112] [system_call+51/56]
Jun 23 19:24:56 dotnet kernel: xfs_force_shutdown(sd(8,38),0x8) called from 
line 4049 of file xfs_bmap.c.  Return address = 0xc01d52ea
Jun 23 19:24:56 dotnet kernel: Filesystem "sd(8,38)": Corruption of in-memory 
data detected.  Shutting down filesystem: sd(8,38)
Jun 23 19:24:56 dotnet kernel: Please umount the filesystem, and rectify the 
problem(s)

This happened when booting on the 2.4.26, but I suspect it could as well have 
happened with the 2.4.20.
The partition causing the problem is mounted on /usr. The partition was 
mounted, but when listing the directory content,
it returned nothing.


I then umounted the file system and remounted it. here's what dmesg tells of 
that:

XFS mounting filesystem sd(8,38)
Starting XFS recovery on filesystem: sd(8,38) (dev: 8/38)
XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1577 of file xfs_alloc.c.  
Caller 0xc0182098
dcbf9c0c c0181076 c02b4b80 00000001 00000000 c02b4b5a 00000629 c0182098
      00000001 df5aa104 00000000 00000001 00000000 0000b36f 00000027 00000001
      00000001 00000000 df5aa0d0 dfab0574 00000000 c0182098 df5aa0d0 df76d680
Call Trace: [<c0181076>]  [<c0182098>]  [<c0182098>]  [<c01dae3f>]  [<c01be11c>]  [<c01be1b3>]  [<c01bf2b4>]  [<c01b7d03>]  [<c01c0895>]  
[<c01bfbcb>]  [<c01b5928>]  [<c01c7a11>]  [<c01d9f01>]  [<c01d9d36>]  [<c013ce90>]  [<c013d147>]  [<c014f1fa>]  [<c014f46b>]  [<c014f30b>]  
[<c014f86e>]  [<c0108c83>]
Ending XFS recovery on filesystem: sd(8,38) (dev: 8/38)

As all seems running fine now, I don't dare unmounting the partition  before I 
migrate all things to another server.
So I don't know the result of xfs_check now  (which showed problems when I ran 
it but of which I haven't kept the output)

At the boot, I also get some error messages from the scsi controller:

SCSI subsystem driver Revision: 1.00
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
       <Adaptec aic7896/97 Ultra2 SCSI adapter>
       aic7896/97: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.8
       <Adaptec aic7896/97 Ultra2 SCSI adapter>
       aic7896/97: Ultra2 Wide Channel B, SCSI Id=7, 32/253 SCBs

 Vendor: QUANTUM   Model: ATLAS_V_18_SCA    Rev: 0200
 Type:   Direct-Access                      ANSI SCSI revision: 03
(scsi0:A:1): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
 Vendor: QUANTUM   Model: ATLAS_V_18_SCA    Rev: 0200
 Type:   Direct-Access                      ANSI SCSI revision: 03
(scsi0:A:2): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
 Vendor: QUANTUM   Model: ATLAS_V__9_SCA    Rev: 0230
 Type:   Direct-Access                      ANSI SCSI revision: 03
(scsi0:A:3): 80.000MB/s transfers (40.000MHz, offset 63, 16bit)
 Vendor: VA Linux  Model: Fullon 2x2        Rev: 1.01
 Type:   Processor                          ANSI SCSI revision: 02
scsi0:A:1:0: Tagged Queuing enabled.  Depth 253
scsi0:A:2:0: Tagged Queuing enabled.  Depth 253
scsi0:A:3:0: Tagged Queuing enabled.  Depth 253
PCI: Enabling device 00:0c.0 (0116 -> 0117)
PCI: Enabling device 00:0c.1 (0116 -> 0117)
Attached scsi disk sda at scsi0, channel 0, id 1, lun 0
Attached scsi disk sdb at scsi0, channel 0, id 2, lun 0
Attached scsi disk sdc at scsi0, channel 0, id 3, lun 0
scsi0: PCI error Interrupt at seqaddr = 0x9
scsi0: Signaled a Target Abort
scsi1: PCI error Interrupt at seqaddr = 0x9
scsi1: Signaled a Target Abort
SCSI device sda: 35861388 512-byte hdwr sectors (18361 MB)
Partition check:
sda:
SCSI device sdb: 35861388 512-byte hdwr sectors (18361 MB)
sdb:
SCSI device sdc: 17930694 512-byte hdwr sectors (9181 MB)
sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >


From what I read on the web, this could be due to a suboptimal cable,
but could it be the cause of the XFS problems? I don't get these
SCSI error message later, when running (I also read this could mean the
driver has adapted transfer speed to the quality of the cable).


I hope you can help me identify the cause of the problem. I'm not sure where 
the problem is
exactly:
- hardware?
- the kernel upgrade?
- xfs only?

Thanks in advance.

Raph


<Prev in Thread] Current Thread [Next in Thread>