xfs
[Top] [All Lists]

Re: XFS filesystem on EC2 instance corrupts and shuts down

To: Shrinath M <shrinath.m@xxxxxxxxxx>
Subject: Re: XFS filesystem on EC2 instance corrupts and shuts down
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Thu, 14 Mar 2013 08:31:35 -0500
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Sabyasachi Ruj <sabyasachi.ruj@xxxxxxxxxx>, Vivek Goel <vivek.goel@xxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx, Supratik Goswami <supratik.goswami@xxxxxxxxxx>, Ric Wheeler <rwheeler@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CAOdS1hnVoMtXnOrECrU8xUyRn82UUJ=jMzX0_odnAs0GH8V-yA@xxxxxxxxxxxxxx>
References: <CAOdS1h=7X4O1O7X8YOwxtLm7G=fc+J+6hJxJ1RKbDmfTZXTpeg@xxxxxxxxxxxxxx> <51373DB8.2020707@xxxxxxxxxx> <CAOdS1hnXGj9puaHxeToqmpK40A-3WvJnM7=5HckpyyZYqZTvEQ@xxxxxxxxxxxxxx> <51373FC1.6010101@xxxxxxxxxx> <CAOurMUeasru6ekDYcvVR1QnaWVJFV+-coZsUG5SgG6LnENBvXg@xxxxxxxxxxxxxx> <513751F2.2060109@xxxxxxxxxx> <CAOdS1hngSuHn_HiremLyUS7Qd9eZ68=8arfBuHnEpwXQaBw9Wg@xxxxxxxxxxxxxx> <5140CBE3.80705@xxxxxxxxxxx> <20130313234213.GW21651@dastard> <CAOdS1hnVoMtXnOrECrU8xUyRn82UUJ=jMzX0_odnAs0GH8V-yA@xxxxxxxxxxxxxx>
Reply-to: stan@xxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130307 Thunderbird/17.0.4
On 3/13/2013 8:28 PM, Shrinath M wrote:
> Thanks Ben, Dave and Eric.
> 
> Eric,
>>> but I am wondering if there might be more information before this which
> is not in your trimmed logs.
> No, this was the first entry every time we have it in /var/log/messages.
> dmesg also holds the same. After reboot, it simply fixes without anyone
> doing anything.
...
>  - dmesg shows something like this after repairing/rebooting -
> 
> [    8.414176] SGI XFS with ACLs, security attributes, realtime, large
> block/inode numbers, no debug enabled
> [    8.415342] SGI XFS Quota Management subsystem
> [    8.417664] XFS (md0): Mounting Filesystem
> [    8.771553] XFS (md0): Starting recovery (logdev: internal)
> [    9.977325] XFS (md0): Ending recovery (logdev: internal)

The active log displayed by the dmesg command is cleared and started
fresh at each reboot, which maybe is why you don't see the IO errors.
You should find them in the previous dmesg log files.

$ ls -la /var/log/dmesg*
-rw-r--r-- 1 root adm  12K Feb 29  2012 /var/log/dmesg
-rw-r--r-- 1 root adm  12K Feb 20  2012 /var/log/dmesg.0
-rw-r--r-- 1 root adm 4.7K Aug 18  2011 /var/log/dmesg.1.gz
-rw-r--r-- 1 root adm 4.7K Aug 18  2011 /var/log/dmesg.2.gz
-rw-r--r-- 1 root adm 4.7K Jun 27  2011 /var/log/dmesg.3.gz
-rw-r--r-- 1 root adm 4.7K May 18  2011 /var/log/dmesg.4.gz

Don't let the file dates in this example throw you, here's why:

 08:14:42 up 378 days, 19:44,  1 user,  load average: 0.06, 0.31, 0.22
          ^^^^^^^^^^^

-- 
Stan

<Prev in Thread] Current Thread [Next in Thread>