[Top] [All Lists]

Re: 3.9.0: general protection fault

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: 3.9.0: general protection fault
From: Bernd Schubert <bernd.schubert@xxxxxxxxxxxxxxxxxx>
Date: Tue, 07 May 2013 13:18:13 +0200
Cc: linux-xfs@xxxxxxxxxxx
Delivered-to: linux-xfs@xxxxxxxxxxx
In-reply-to: <20130507011254.GP19978@dastard>
References: <kltu6o$33j$1@xxxxxxxxxxxxx> <km7oop$28c$1@xxxxxxxxxxxxx> <20130506122844.GL19978@dastard> <5187A663.707@xxxxxxxxxxxxxxxxxx> <20130507011254.GP19978@dastard>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130404 Thunderbird/17.0.5
On 05/07/2013 03:12 AM, Dave Chinner wrote:
On Mon, May 06, 2013 at 02:47:31PM +0200, Bernd Schubert wrote:
On 05/06/2013 02:28 PM, Dave Chinner wrote:
On Mon, May 06, 2013 at 10:14:22AM +0200, Bernd Schubert wrote:
And anpther protection fault, this time with 3.9.0. Always happens
on one of the servers. Its ECC memory, so I don't suspect a faulty
memory bank. Going to fsck now-


Isn't that a bit overhead? And I can't provide /proc/meminfo and
others, as this issue causes a kernel panic a few traces later.

Provide what information you can.  Without knowing a single thing
about your hardware, storage config and workload, I can't help you
at all. You're asking me to find a needle in a haystack blindfolded
and with both hands tied behind my back....

I see that xfs_info, meminfo, etc are useful, but /proc/mounts? Maybe you want "cat /proc/mounts | grep xfs"?. Attached is the output of /proc/mounts, please let me know if you were really interested in all of that non-xfs output?

And I just wonder what you are going to do with the information about the hardware. So it is an Areca hw-raid5 device with 9 disks. But does this help? It doesn't tell if one of the disks reads/writes with hickups or provides any performance characteristics at all.

Stuff like /proc/meminfo doesn't have to be provided from exactly
the time of the crash - it's just the simplest way to find out how
much RAM you have in the machine, so a dump from whenever the
machine is up and running the workload is fine. Other information we
ask for (e.g. capturing the output of `vmstat 5` as suggested in the
FAQ) gives us the runtime variation of memory usage and easy to
capture right up to the failure point...

I have started collectl now, it logs meminfo and other useful information. But still with all of that, are you sure xfs debugging information wouldn't be more useful? For example setting a
"#define debug" in xfs_trans_ail.c?


Attachment: mounts.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>