xfs
[Top] [All Lists]

Re: EFSCORRUPTED on mount?

To: xfs@xxxxxxxxxxx
Subject: Re: EFSCORRUPTED on mount?
From: Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx>
Date: Tue, 22 Nov 2011 10:47:24 -0800
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, sandeen@xxxxxxxxxxx
In-reply-to: <20111122014114.GJ2386@dastard>
References: <CAF3hT9B8-ou-4RhfCkfFWTwwB_tb7nWSP-5pgP3G6oTE+1gAvA@xxxxxxxxxxxxxx> <CAF3hT9AurrVi7xosauVmhQcsbqJgLsxkNYm6dWDNCpB+GR69=w@xxxxxxxxxxxxxx> <20111122014114.GJ2386@dastard>
On Mon, Nov 21, 2011 at 5:41 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> In other words, your admin basically told the system to shutdown
> without syncing the data or running shutdown scripts that sync data.
> i.e. it forces an immediate reboot while the system is still active,
> causing an unclean shutdown and guaranteed data loss.
And he's been yelled at appropriately. ;) But the data loss actually
isn't a problem for us here as long as the filesystem isn't corrupted.

>> But I've been assured this shouldn't have been able to
>> corrupt the filesystem, so troubleshooting continues.
>
> That depends entirely on your hardware. Are you running with
> barriers enabled?  If you don't have barriers active, then metadata
> corruption is entirely possible in this scenarion, especially if the
> hardware does a drive reset or power cycle during the reboot
> procedure. Even with barriers, there are RAID controllers that
> enable back end drive caches and they fail to get flushed and hence
> can cause corruption on unclean shutdowns.
Barriers on (at least, nobody turned them off); the RAID card is
battery-backed; here are megacli dumps:
http://pastebin.com/yTskgzWG
http://pastebin.com/ekhczycy

Sorry if I seem to eager to assume it's an xfs bug but Ceph is a magic
machine for taking stable filesystems and making them cry. :/


On Tue, Nov 22, 2011 at 7:06 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
> Others have had good comments but also:
>
>> 2011-11-17 16:00:37.294876 7f83f3eef720 filestore(/mnt/osd.17)
>> truncate meta/pginfo_12.7c8/0 size 0
>> 2011-11-17 16:00:37.483407 7f83f3eef720 filestore(/mnt/osd.17)
>> truncate meta/pginfo_12.7c8/0 size 0 = -117
>> 2011-11-17 16:00:37.483476 7f83f3eef720 filestore(/mnt/osd.17)  error
>> error 117: Structure needs cleaning not handled
>
> was there anything in dmesg/system logs right at this point?  XFS should
> have said something about this original error.
Whoops. The following is a sample of what was in dmesg and kern.log
after that point but before I did anything else (it repeated a lot but
there weren't any other lines of output):
xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8
[56459.526220] XFS (sdg1): xfs_log_force: error 5 returned.
[56489.544153] XFS (sdg1): xfs_log_force: error 5 returned.
[56519.562087] XFS (sdg1): xfs_log_force: error 5 returned.
[56549.580021] XFS (sdg1): xfs_log_force: error 5 returned.
[56579.597956] XFS (sdg1): xfs_log_force: error 5 returned.
[56609.615889] XFS (sdg1): xfs_log_force: error 5 returned.
[56613.036430] XFS (sdg1): xfs_log_force: error 5 returned.
[56613.041731] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line
1037 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8
[56619.430497] XFS (sdg1): xfs_log_force: error 5 returned.
[56619.435796] XFS (sdg1): xfs_do_force_shutdown(0x1) called from line
1037 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffff811c2aa8

Thanks!
-Greg

<Prev in Thread] Current Thread [Next in Thread>