[Top] [All Lists]

Re: XFS filesystem on EC2 instance corrupts and shuts down

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS filesystem on EC2 instance corrupts and shuts down
From: Shrinath M <shrinath.m@xxxxxxxxxx>
Date: Thu, 14 Mar 2013 06:58:19 +0530
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Sabyasachi Ruj <sabyasachi.ruj@xxxxxxxxxx>, Vivek Goel <vivek.goel@xxxxxxxxxx>, Supratik Goswami <supratik.goswami@xxxxxxxxxx>, Ric Wheeler <rwheeler@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=webyog.com; s=google; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=l/W9A/oJBxeiAOJtK0UgK0Yg34xWopaRiA0YqSy5zQA=; b=MQ2ObfC+L9Bpau/NYcJWkSHoQ06LMxNXrgh2GszSBa1yZZYt0TMrchr//vyxG41/18 Yp+ZSpOSyN5PcwLezCfzkIeuWhjHD3n10z0um2/qbqv1fLujECGJ/mD2wBaA+BiEoQ1O h2Ru/E6xyi1AOKj2esi+LVjbkdsHzqhKVN6u8=
In-reply-to: <20130313234213.GW21651@dastard>
References: <CAOdS1h=7X4O1O7X8YOwxtLm7G=fc+J+6hJxJ1RKbDmfTZXTpeg@xxxxxxxxxxxxxx> <51373DB8.2020707@xxxxxxxxxx> <CAOdS1hnXGj9puaHxeToqmpK40A-3WvJnM7=5HckpyyZYqZTvEQ@xxxxxxxxxxxxxx> <51373FC1.6010101@xxxxxxxxxx> <CAOurMUeasru6ekDYcvVR1QnaWVJFV+-coZsUG5SgG6LnENBvXg@xxxxxxxxxxxxxx> <513751F2.2060109@xxxxxxxxxx> <CAOdS1hngSuHn_HiremLyUS7Qd9eZ68=8arfBuHnEpwXQaBw9Wg@xxxxxxxxxxxxxx> <5140CBE3.80705@xxxxxxxxxxx> <20130313234213.GW21651@dastard>
Thanks Ben, Dave and Eric.Â

>>but I am wondering if there might be more information before this which is not in your trimmed logs.
No, this was the first entry every time we have it in /var/log/messages. dmesg also holds the same. After reboot, it simply fixes without anyone doing anything.

The Linux we are running is definitely amazon baked one, looks like this -Â
$~: uname -a Linux ip-100-0-100-1 3.2.34-55.46.amzn1.x86_64 #1 SMP Tue Nov 20 10:06:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Â- dmesg shows something like this after repairing/rebooting -Â

[ Â Â8.414176] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[ Â Â8.415342] SGI XFS Quota Management subsystem
[ Â Â8.417664] XFS (md0): Mounting Filesystem
[ Â Â8.771553] XFS (md0): Starting recovery (logdev: internal)
[ Â Â9.977325] XFS (md0): Ending recovery (logdev: internal)

Check the first line there, it says no debug enabled. How good/bad is this debug mode in production environments? We are not getting any corruption in our local/test environments, in production, we are getting it once on every third day.

You say unlinked inode list, but if that, it should have an entry in /var/log/messages, right?
Anyway, how can we create this situation? By forcing multiple processes to write/delete files from small disk? Since we are still unaware of what is causing this issue, reproducing it in local/production environment is just shooting in dark... :(

Does turning up the error level affect the data in any way? Or is it *just* detailed good logging while being sensitive to all small errors?

Really appreciate the support that you devs are giving which really is the job of AWS support... I so wish they had some helpful and knowledgeable people in support.

On Thu, Mar 14, 2013 at 5:12 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Wed, Mar 13, 2013 at 01:56:35PM -0500, Eric Sandeen wrote:
> XFS (md0): xfs_iunlink_remove: xfs_itobp() returned error 117.

Corrupted unlinked inode list. You need to run xfs_repair to fix


Dave Chinner


<Prev in Thread] Current Thread [Next in Thread>