xfs
[Top] [All Lists]

Re: XFS Mount Recovery Failed on Root File System After Power Outag

To: "Dave Chinner" <david@xxxxxxxxxxxxx>
Subject: Re: XFS Mount Recovery Failed on Root File System After Power Outage
From: "Chin Gim Leong" <CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 3 Sep 2012 00:13:05 +0800
Cc: xfs@xxxxxxxxxxx
Importance: Normal
Sender: "Chin Gim Leong" <CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: SquirrelMail/1.4.19

Hi Dave

I did the "xfs_repair -L" on the root file system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
> successfully mounts and the contents of /proc/mounts is when it is
mounted to see if barriers were disabled or not supported on your
hardware.
>
The notebook is Acer Aspire 6530G, AMD Turion X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0, /dev/sdb
chingl@rat:~> cat
/etc/fstab                                                                                                                                                                   

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1 swap                 swap       defaults              0
0                                                                                

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2 /                    xfs        defaults,logbufs=8,logbsize=256k              1
1                                                        
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3 /home                xfs        defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 /dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
chingl@rat:~>
chingl@rat:~> cat /proc/mounts
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0

/dev/sdb2 / xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0

fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none /var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/chingl/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>

From /var/log/boot.msg:


  <6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl 300)
  <3>[    1.256058] ata2: softreset failed (device not ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST workaround and retrying   <3>[    1.256084] ata1: softreset failed (device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP SRST workaround and retrying   <6>[    1.428069] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.428097] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.429454] ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
  <6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[    1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0: Direct-Access     ATA      WDC
WD3200BEKT-0 11.0 PQ: 0 ANSI: 5
  <5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb] Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[    1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[    1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps: 0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2 sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI disk



  <6>[    3.201111] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
  <6>[    3.202279] SGI XFS Quota Management subsystem
  <5>[    3.230174] XFS mounting filesystem sdb2
  <7>[    3.351323] Ending clean XFS mount for filesystem: sdb2

  <5>[   11.865393] XFS mounting filesystem sdb3
  <7>[   12.008707] Ending clean XFS mount for filesystem: sdb3

I do not see anything that says barrier is not supported?


> Only by looking at them can you know. Regardless of what filesystem you
are using, recovery of files and directories from lost+found is the same process. e.g. do an rpm check to see if allteh installed packages are intact. that will narrow down where all your binaries came from. use of strings can also tell you what the binary is. e.g:

> Define "really there" when important metadata (i.e. the log) has been
corrupted and is not available any more. Indeed, if things like btree splits of merges occurred in the log, and they are
> partially written to disk, it's entirely possible that you could lose
directory references to inodes that haven't been modified for some time....
>
> Remember, like all fsck programs, xfs_repair is a best effort
> attempt at correcting the problems found - there are no guarantees
given about what it can and can't recover when it runs...
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

Looking at the messages from xfs_repair, and inspection of /lost+found, the files in it are from /tmp and
/etc/NetworkManager/system-connections/Auto eth0 (a session file). I think one of the index.db in /var/cache/man was also affected.


I did a "rpm -V -a", thanks for the advice.  I manually inspected every output, paying attention to the missing files and file size or checksum mismatches.
There were 5 missing files from two packages, as well as a number of installed files of zero file size.  I have re-installed all of the affected packages.


May I know, if it is possible that, due to the loss of the journal log, that some package installed files either go missing, or show zero file size?

An aside, the reason I use XFS is that when I was a student, I did my work in a school centre with a cluster of 12 SGI Indigo2 R10000 and 3 SGI O2 R5000.  Due to buggy IRIX 6.2 and earlier releases of IRIX 6.5, 6.5.X, the machines had kernel panics; faulty power supplies (the maintenance was discontinued) also caused stoppages.  On restart, the recovery was always instantaneous, no XFS file system repair was ever done
When I built my first computer, it was only natural that I chose XFS, that was SUSE Linux 10.0.  On various versions of SUSE, I have had freezes and power outages, but I have never had to repair file system.  The only time I had ever had to run xfs_repair was when Areca RAID spitted out the WD desktop drives and I had to rescue the RAID, so I was very unprepared for this latest incident.

Any way, a big thanks to all those who contributed towards XFS over the years in IRIX and Linux, I could not imagine back in the 90s that in the future I would have a piece of SGI IRIX technology in my own personal computers.

GL

Hi Dave

I did the "xfs_repair -L" on the root file system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
> successfully mounts and the contents of /proc/mounts is when it is
mounted to see if barriers were disabled or not supported on your
hardware.
>
The notebook is Acer Aspire 6530G, AMD Turion X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0, /dev/sdb
chingl@rat:~> cat
/etc/fstab                                                                                                                                                                   

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1 swap                 swap       defaults              0
0                                                                                

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2 /                    xfs        defaults,logbufs=8,logbsize=256k              1
1                                                        
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3 /home                xfs        defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 /dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
chingl@rat:~>
chingl@rat:~> cat /proc/mounts
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0 /dev/sdb2 / xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none /var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/chingl/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>

  <6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl 300)
  <3>[    1.256058] ata2: softreset failed (device not ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST workaround and retrying   <3>[    1.256084] ata1: softreset failed (device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP SRST workaround and retrying   <6>[    1.428069] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.428097] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.429454] ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
  <6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[    1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0: Direct-Access     ATA      WDC
WD3200BEKT-0 11.0 PQ: 0 ANSI: 5
  <5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb] Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[    1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[    1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps: 0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2 sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI disk



  <6>[    3.201111] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
  <6>[    3.202279] SGI XFS Quota Management subsystem
  <5>[    3.230174] XFS mounting filesystem sdb2
  <7>[    3.351323] Ending clean XFS mount for filesystem: sdb2

  <5>[   11.865393] XFS mounting filesystem sdb3
  <7>[   12.008707] Ending clean XFS mount for filesystem: sdb3

I do not see anything that says barrier is not supported.  In fact, I have never personally seen a system with a message that barrier is not
supported.

> Only by looking at them can you know. Regardless of what filesystem you
are using, recovery of files and directories from lost+found is the same process. e.g. do an rpm check to see if allteh installed packages are intact. that will narrow down where all your binaries came from. use of strings can also tell you what the binary is. e.g:

> Define "really there" when important metadata (i.e. the log) has been
corrupted and is not available any more. Indeed, if things like btree splits of merges occurred in the log, and they are
> partially written to disk, it's entirely possible that you could lose
directory references to inodes that haven't been modified for some time....
>
> Remember, like all fsck programs, xfs_repair is a best effort
> attempt at correcting the problems found - there are no guarantees
given about what it can and can't recover when it runs...
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

Looking at the messages from xfs_repair, and inspection of /lost+found, the files in it are from /tmp and
/etc/NetworkManager/system-connections/Auto eth0 (a session file).

I think one of the man index.db in files was also affected.

I did a "rpm -V -a", thanks for the advice.  I manually inspected every output, paying attention to the missing files and file size or checksum mismatches.

There were 5 missing files from two packages, as well as a number of installed files of zero file size.  I have re-installed all of the affected packages.

May I know, if it is possible that, due to the loss of the journal log, that some packaged installed files either go missing, or show zero file size?

An aside, the reason I use XFS is that when I was a student, I did my work in a school centre with a cluster of 12 SGI Indigo2 R10000 and 3 SGI O2 R5000.  Due to buggy IRIX 6.2 and earlier releases of IRIX 6.5, 6.5.X, the machines had kernel panics; faulty power supplies (the maintenance was discontinued) also caused stoppages.  On restart, the recovery was always instantaneous, no XFS file system repair was ever done

When I built my own computer, it was only natural that I chose XFS, that was SUSE Linux 10.0.  On various versions of SUSE, I have had freezes and power outages, but I have never had to repair file system.  The only time I had ever had to run xfs_repair was when Areca RAID spitted out the WD desktop drives and I had to rescue the RAID, so I was very unprepared for this latest incident.

Any way, a big thanks to all those who contributed towards XFS over the years in IRIX and Linux, I could not imagine back in the 90s that in the future I would have a piece of SGI IRIX technology in my own personal computers.

GL

Hi Dave

I did the xfs_repair -L on the root file system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
> successfully mounts and the contents of /proc/mounts is when it is
mounted to see if barriers were disabled or not supported on your
hardware.
>
The notebook is Acer Aspire 6530G, AMD Turion X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0, /dev/sdb
chingl@rat:~> cat /etc/fstab                                                                                                                                                                   

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1 swap                 swap       defaults              0
0                                                                                

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2 /                    xfs        defaults,logbufs=8,logbsize=256k              1
1                                                        
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3 /home                xfs        defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 /dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
chingl@rat:~>
chingl@rat:~> cat /proc/mounts
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0 /dev/sdb2 / xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none /var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/chingl/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>

  <6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl 300)
  <3>[    1.256058] ata2: softreset failed (device not ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST workaround and retrying   <3>[    1.256084] ata1: softreset failed (device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP SRST workaround and retrying   <6>[    1.428069] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.428097] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.429454] ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
  <6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[    1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0: Direct-Access     ATA      WDC
WD3200BEKT-0 11.0 PQ: 0 ANSI: 5
  <5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb] Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[    1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[    1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps: 0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2 sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI disk



  <6>[    3.201111] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
  <6>[    3.202279] SGI XFS Quota Management subsystem
  <5>[    3.230174] XFS mounting filesystem sdb2
  <7>[    3.351323] Ending clean XFS mount for filesystem: sdb2

  <5>[   11.865393] XFS mounting filesystem sdb3
  <7>[   12.008707] Ending clean XFS mount for filesystem: sdb3

I do not see anything that says barrier is not supported.  In fact, I have never personally seen a system with a message that barrier is not supported.

> Only by looking at them can you know. Regardless of what filesystem you
are using, recovery of files and directories from lost+found is the same process. e.g. do an rpm check to see if allteh installed packages are intact. that will narrow down where all your binaries came from. use of strings can also tell you what the binary is. e.g:

>> Intuitively, the only files in root file system (by the way, root also
contains /boot) that are open for writing are those in logs and /tmp and
>> /var/tmp, I hope that is the case and I can safely discard those in
lost+found.
>> I also hope that the inode link clean-ups done by xfs_repair do not
actually remove any files that are really there.
>
> Define "really there" when important metadata (i.e. the log) has been
corrupted and is not available any more. Indeed, if things like btree splits of merges occurred in the log, and they are
> partially written to disk, it's entirely possible that you could lose
directory references to inodes that haven't been modified for some time....
>
> Remember, like all fsck programs, xfs_repair is a best effort
> attempt at correcting the problems found - there are no guarantees
given about what it can and can't recover when it runs...
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

Looking at the messages from xfs_repair, and inspection of /lost+found, the files in it are from /tmp and /etc/NetworkManager/system-connections/Auto eth0.  I did rpm -V -a.  All the files in lost






Hi Dave

I did the xfs_repair -L on the root file system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
> successfully mounts and the contents of /proc/mounts is when it is mounted to see if barriers were disabled or not supported on your hardware.
>
The notebook is Acer Aspire 6530G, AMD Turion X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0, /dev/sdb
chingl@rat:~> cat
/etc/fstab                                                                                                                                                                   

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1 swap                 swap       defaults              0
0                                                                                

/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2 /                    xfs        defaults,logbufs=8,logbsize=256k              1
1                                                        
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3 /home                xfs        defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 /dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0 proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
chingl@rat:~>
chingl@rat:~> cat /proc/mounts
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0 /dev/sdb2 / xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none /var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/chingl/.gvfs fuse.gvfs-fuse-daemon
rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>

  <6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl 300)
  <3>[    1.256058] ata2: softreset failed (device not ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST workaround and retrying   <3>[    1.256084] ata1: softreset failed (device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP SRST workaround and retrying   <6>[    1.428069] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.428097] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)   <6>[    1.429454] ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
  <6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[    1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0: Direct-Access     ATA      WDC
WD3200BEKT-0 11.0 PQ: 0 ANSI: 5
  <5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb] Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[    1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[    1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps: 0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2 sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI disk



  <6>[    3.201111] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
  <6>[    3.202279] SGI XFS Quota Management subsystem
  <5>[    3.230174] XFS mounting filesystem sdb2
  <7>[    3.351323] Ending clean XFS mount for filesystem: sdb2

  <5>[   11.865393] XFS mounting filesystem sdb3
  <7>[   12.008707] Ending clean XFS mount for filesystem: sdb3


> It's not charging Li-ion batteries that shortens their life - it's high temperatures that shorten it. IOWs, the reason for removing the battery when on AC is to prevent heat soak and the battery
> sustaining elevated temperatures over long periods of time.
>
> Even so, properly designed laptops don't suffer from heat soak or charge cycle related battery life problems, and environmental
> conditions play more of a part in determining battery life than
> usage/charge patterns...
>
>> > The files in lost+found are numbered by their inode number. You need to look at the contents of them to determine where they came from.
>> Inode numbers are not informative, right?
>
> Sure, but when you've had a directory corruption and the names have been lost, what do you name the files and directories that are
> found? All we can do is name them something unique, hence the use of the inode number.
>
>> Text files are readable, but binaries......, I will have no clue, and
if
>> I
>> delete them, who knows if I am deleting some thing really important?
>
> Only by looking at them can you know. Regardless of what filesystem you are using, recovery of files and directories from lost+found is the same process. e.g. do an rpm check to see if allteh installed packages are intact. that will narrow down where all your binaries came from. use of strings can also tell you what the binary is. e.g:
>
> $ strings /sbin/xfs_repair |grep xfs_repair
> re-running xfs_repair. If you are unable to mount the filesystem, then use
> Please run a more recent version of xfs_repair.
> $
>
>> Intuitively, the only files in root file system (by the way, root also contains /boot) that are open for writing are those in logs and /tmp
and
>> /var/tmp, I hope that is the case and I can safely discard those in lost+found.
>> I also hope that the inode link clean-ups done by xfs_repair do not actually remove any files that are really there.
>
> Define "really there" when important metadata (i.e. the log) has been corrupted and is not available any more. Indeed, if things like btree splits of merges occurred in the log, and they are
> partially written to disk, it's entirely possible that you could lose directory references to inodes that haven't been modified for some time....
>
> Remember, like all fsck programs, xfs_repair is a best effort
> attempt at correcting the problems found - there are no guarantees given about what it can and can't recover when it runs...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>


Chin Gim Leong
Performance Computing Engineer
Performance Computing LLP

http://performance-computing.net
CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx
Tel: +65-97469482


Hi Dave

I did the xfs_repair -L on the root file system; xfs_repair is version 3.1.8.  The repair was successful.

> You misunderstood. I was asking for the messages when it
> successfully mounts and the contents of /proc/mounts is when it is
> mounted to see if barriers were disabled or not supported on your
> hardware.
>

The notebook is Acer Aspire 6530G, AMD Turion X2 RM-74, chipset is AMD M780G, southbridge is AMD SB 700.  The hard drive is Western Digital Scorpio Black SATA 320 GB, WDC WD3200BEKT-00F3T0, /dev/sdb

chingl@rat:~> cat /etc/fstab                                                                                                                                                                   
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part1 swap                 swap       defaults              0 0                                                                                
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part2 /                    xfs        defaults,logbufs=8,logbsize=256k              1 1                                                        
/dev/disk/by-path/pci-0000:00:11.0-scsi-1:0:0:0-part3 /home                xfs        defaults,logbufs=8,logbsize=256k              1 2
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part2 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
/dev/disk/by-path/pci-0000:00:11.0-scsi-0:0:0:0-part3 /windows/D           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
chingl@rat:~>

chingl@rat:~> cat /proc/mounts
rootfs / rootfs rw 0 0
devtmpfs /dev devtmpfs rw,relatime,size=1761184k,nr_inodes=440296,mode=755 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
/dev/sdb2 / xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/sdb3 /home xfs rw,relatime,attr2,logbufs=8,logbsize=256k,noquota 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
none /var/lib/ntp/proc proc ro,nosuid,nodev,relatime 0 0
gvfs-fuse-daemon /home/chingl/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,relatime,user_id=1000,group_id=100 0 0
chingl@rat:~>

  <6>[    1.084085] ata3: SATA link down (SStatus 0 SControl 300)
  <6>[    1.084172] ata4: SATA link down (SStatus 0 SControl 300)
  <3>[    1.256058] ata2: softreset failed (device not ready)
  <4>[    1.256067] ata2: applying SB600 PMP SRST workaround and retrying
  <3>[    1.256084] ata1: softreset failed (device not ready)
  <4>[    1.256095] ata1: applying SB600 PMP SRST workaround and retrying
  <6>[    1.428069] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  <6>[    1.428097] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  <6>[    1.429454] ata1.00: ATA-8: Hitachi HTS543232L9A300, FB4OC40C, max UDMA/133
  <6>[    1.429458] ata1.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.430923] ata1.00: configured for UDMA/133
  <5>[    1.431164] scsi 0:0:0:0: Direct-Access     ATA      Hitachi HTS54323 FB4O PQ: 0 ANSI: 5
  <5>[    1.431560] sd 0:0:0:0: [sda] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.431672] sd 0:0:0:0: [sda] Write Protect is off
  <7>[    1.431675] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
  <5>[    1.431915] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.441064] ata2.00: ATA-8: WDC WD3200BEKT-00F3T0, 11.01A11, max UDMA/133
  <6>[    1.441067] ata2.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  <6>[    1.442978] ata2.00: configured for UDMA/133
  <5>[    1.443207] scsi 1:0:0:0: Direct-Access     ATA      WDC WD3200BEKT-0 11.0 PQ: 0 ANSI: 5
  <5>[    1.443387] sd 1:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
  <5>[    1.443707] sd 1:0:0:0: [sdb] Write Protect is off
  <7>[    1.443710] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
  <5>[    1.443756] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
  <6>[    1.459438]  sda: sda1 sda2 sda3 sda4
  <5>[    1.460197] sd 0:0:0:0: [sda] Attached SCSI disk
  <6>[    1.476394] Synaptics Touchpad, model: 1, fw: 6.3, id: 0x12a0b1, caps: 0xa04711/0xa04000/0x0
  <6>[    1.481898]  sdb: sdb1 sdb2 sdb3
  <5>[    1.482254] sd 1:0:0:0: [sdb] Attached SCSI disk


>> >
>> >> I would like to know the cause of this log recovery failure and if
>> there
>> >
>> > You've got an old, unsupported kernel that was going through
>> > significant changes to the log code at the time, so it may not be
>> > possible to work out what the problem was. It seems likely that the
>> > power loss caused the disk not to write everything it should have to
>> > the log - laptops are not supposed to just lose power because
>> > they have a built in UPS (i.e. battery)....
>>
>> My note book is connected to the mains, and there is no battery. The
>> Acer
>> user manual advises removing the battery when one is connected to the
>> mains, since repeated charging of battery will shorten its life span.
>
> It's not charging Li-ion batteries that shortens their life - it's
> high temperatures that shorten it. IOWs, the reason for removing the
> battery when on AC is to prevent heat soak and the battery
> sustaining elevated temperatures over long periods of time.
>
> Even so, properly designed laptops don't suffer from heat soak or
> charge cycle related battery life problems, and environmental
> conditions play more of a part in determining battery life than
> usage/charge patterns...
>
>> > The files in lost+found are numbered by their inode number. You need
>> > to look at the contents of them to determine where they came from.
>>
>> Inode numbers are not informative, right?
>
> Sure, but when you've had a directory corruption and the names have
> been lost, what do you name the files and directories that are
> found? All we can do is name them something unique, hence the use of
> the inode number.
>
>> Text files are readable, but binaries......, I will have no clue, and if
>> I
>> delete them, who knows if I am deleting some thing really important?
>
> Only by looking at them can you know. Regardless of what filesystem
> you are using, recovery of files and directories from lost+found is
> the same process. e.g. do an rpm check to see if allteh installed
> packages are intact. that will narrow down where all your binaries
> came from. use of strings can also tell you what the binary is. e.g:
>
> $ strings /sbin/xfs_repair |grep xfs_repair
> re-running xfs_repair. If you are unable to mount the filesystem, then
> use
> Please run a more recent version of xfs_repair.
> $
>
>> Intuitively, the only files in root file system (by the way, root also
>> contains /boot) that are open for writing are those in logs and /tmp and
>> /var/tmp, I hope that is the case and I can safely discard those in
>> lost+found.
>>
>> I also hope that the inode link clean-ups done by xfs_repair do not
>> actually remove any files that are really there.
>
> Define "really there" when important metadata (i.e. the log) has
> been corrupted and is not available any more. Indeed, if things
> like btree splits of merges occurred in the log, and they are
> partially written to disk, it's entirely possible that you could
> lose directory references to inodes that haven't been modified for
> some time....
>
> Remember, like all fsck programs, xfs_repair is a best effort
> attempt at correcting the problems found - there are no guarantees
> given about what it can and can't recover when it runs...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>


Chin Gim Leong
Performance Computing Engineer
Performance Computing LLP

http://performance-computing.net
CHIN_Gim_Leong@xxxxxxxxxxxxxxxxxxxxxxxxx
Tel: +65-97469482

<Prev in Thread] Current Thread [Next in Thread>
  • Re: XFS Mount Recovery Failed on Root File System After Power Outage, Chin Gim Leong <=