help investigating some xfs errors
Eric Sandeen
sandeen at sandeen.net
Tue Jan 12 14:26:40 CST 2010
Alexandru Coman wrote:
> Hello,
>
> I'm having some problems with an XFS filesystem, and I'm wondering if
> anyone can point me in the right direction, it would be greatly appreciated.
>
> I have several XFS filesystems on top of LVM in a RAID-1 (mdadm) created
> on a pair of 1TB SATA drives. Running on Linux (Debian, amd64). One of
> the XFS filesystems is 600GB in size (65% used), storing ~19 mil files
> under 100KB (jpeg), usually under high load (read+write). There are also
> a few other smaller XFS partitions on the same drives. It has been
> running like this for 11 months, until a few days ago when I started to
> get a lot of errors.
>
> On Jan 10, I got a few lines with "ata3: hard resetting link", after
hardware problem...
> which the partition could not be accessed, I couldn't umount/mount it.
> All other partitions were fine. I rebooted the server, but that
> filesystem still wouldn't mount (it said "Structure needs cleaning"), I
> then ran xfs_repair on it, which reported that I needed to use the "-L"
> option to destroy the log. I then ran "xfs_repair -L" which appeared to
> fix a lot of errors, and then I was able to mount the filesystem again.
> Everything appeared to be ok at that point.
>
> Jan 10 night: a lot of xfs call traces start to appear in the log
>
> Jan 11: xfs call traces along with
> - xfs_force_shutdown(dm-4,0x8) called from line 1164 of file
> fs/xfs/xfs_trans.c. Return address = 0xffffffffa01999ff
> - xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-4.
> Returning error.
5 is EIO - your storage had an IO error, xfs reacted.
> - lots of "Filesystem "dm-4": xfs_log_force: error 5 returned."
> The filesystem disappeared, but I could unmount and mount it again with
> no errors. At this point I've also decided to update the kernel, and
> switched from 2.6.26 to 2.6.30 Then ran xfs_repair which again found a
> few errors.
after those IO errors, the fs may well be in bad shape, which
xfs_repair will do its best to fix. You'll need to get your
hardware problems sorted out, it seems.
-Eric
> Jan 12: xfs call traces along with:
> - Filesystem "dm-4": corrupt dinode 1293803384, extent total = 1,
> nblocks = 0. Unmount and run xfs_repair.
> - Filesystem "dm-4": corrupt dinode 665458404, extent total = 1, nblocks
> = 0. Unmount and run xfs_repair.
> - Filesystem "dm-4": corrupt dinode 225720890, extent total = 1, nblocks
> = 0. Unmount and run xfs_repair.
> I then unmounted the fs and ran xfs_repair again. This time the output
> was massive compared to the previous runs, and it put around ~ 100.000
> files in lost+found.
>
> Beside 3 lines on Jan 10 with "ata3: hard resetting link", there have
> been no sign of possible hardware problems. The raid and the hdd's
> appear to be fine, no errors. What's curious is that I'm experiencing
> problems only with the large XFS filesystem, and there hasn't been not
> even a single error in the logs about the other xfs partitions.
>
> So, if anyone has any ideea what I can research next, to help me find
> out more information about what's happening here...
>
> I've uploaded some detailed logs at http://ghost3k.net/xfs1/
>
>
> Thanks,
> Alexandru Coman
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
More information about the xfs
mailing list