xfs
[Top] [All Lists]

Re: XFS recovery resumes...

To: xfs@xxxxxxxxxxx
Subject: Re: XFS recovery resumes...
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Sat, 24 Aug 2013 22:44:44 -0500
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20493414.4932.1377387802955.JavaMail.root@xxxxxxxxxxxxxxxxxxxx>
References: <20493414.4932.1377387802955.JavaMail.root@xxxxxxxxxxxxxxxxxxxx>
Reply-to: stan@xxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
On 8/24/2013 6:43 PM, Jay Ashworth wrote:
> ----- Original Message -----
>> From: "Stan Hoeppner" <stan@xxxxxxxxxxxxxxxxx>
> 
>> Joe appears to have hit the nail on the head WRT this being a hardware
>> problem. This error confirms it. It would appear that when the Antec
>> PSU went South it damaged a motherboard device, possibly a VRM, probably
>> a cap or two, or more. Maybe damaged a DRAM cell or few that work fine
>> with memtest86+ but not with the access pattern generated by your XFS
>> workload.
> 
> Well, it appears you may be right. 
...
> Aug 22 13:34:13 duckling kernel: [67215.034952] XFS (sda1): Corruption of 
> in-memory data detected.  Shutting down filesystem

I don't see any other possibility than a hardware problem.  And given
the age of that hardware, it's cheaper in dollars and time to start over
with new gear.


> I'll try swapping it; this mobo has always gotten whacky if we went over 512M,
> which is why we haven't. 

The manual says up to 2GB DDR2.  Board has two DIMM sockets, which means
1GB DIMMs supported.  If anything over 512MB (2x256MB DIMMs) causes
problems then the board had a flaw, or needed a BIOS update, etc.  And
now it's physically damaged.

> I don't know if I can manually reclock the ram, though I might can turn the 
> waitstates up.

That probably won't help but you can try it.  The manual shows the BIOS
does not support independent clocking of the DRAM.

>> If that doesn't fix it, this may be a viable inexpensive
>> solution:
>>
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16813186215
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16819103888
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16820145252
>>
>> $109 to replace your central electronics complex. This is the least
>> expensive quality set of parts with good feature set I could come up
>> with at Newegg, to take the sting out of dropping cash on a forced
>> upgrade. $15 more for the Foxconn AM3 board w/HDMI if you have a newer
>> TV or AV receiver.
> 
> Well, I can live without HDMI, but my present MS-7021 mobo has 5 PCI
> slots, and I'm using all of them: 2 PVR-150s, a PVR-500, and a SiI
> 4-port raid (which will talk to 2 and 3TB drives; the motherboard SATA
> won't even see them).

You'll be extremely hard pressed to find a current board with more than
3 PCI unless you buy used.  Hmmm...let's see....here we go:

http://www.newegg.com/Product/Product.aspx?Item=N82E16813135329
http://www.newegg.com/Product/Product.aspx?Item=N82E16819113283
http://www.newegg.com/Product/Product.aspx?Item=N82E16820148194

-- $155

For less than $50 more you not only get all the slots/ports you need,
but also a much faster dual core CPU and GPU, plus HDMI.  And you'll no
longer have disks on the slow PCI bus.  Looks like a winner.

> I forget what's in 5, but I think it was the only VGA card I had with
> S-Video out.

If you absolutely need Svideo/composite output then you'll need to use
an external converter or switch box, something like this:

http://www.newegg.com/Product/Product.aspx?Item=9SIA0U00JZ2490

> So, while that's a damn nice price point, it will require me to buy
> a bunch of Ethernet tuners as well.  <sigh>

Not now. ;)

> I'll try the RAM.  It's really odd, though, that the badblocks workload 
> and both memtests couldn't find a problem, if it is the memory plane...

This isn't odd at all and actually quite common.  The problem likely is
not in the DRAM modules or individual transistors in the DRAM chips.
The problem is likely unstable signalling to/from the DIMM sockets, or
unstable power to the CPU or Northbridge, caused by old and now damaged
power delivery circuits on the mainboard.

Download and run burnp6 for 5-10 minutes.  That'll tell you if the CPU
is getting sufficient power.  Make sure the CPU fan is in working order
first.  It's called BURNp6 for a reason.  The Athlons didn't have
thermal shutdown capability, and this will literally destroy the CPU
with heat build up if the fans aren't working properly.  If cooling is
good, and the system hard locks or exhibits other strange behavior, then
you know it's time to replace the board.  But I think you know that
already.  This will simply be the exclamation point.

-- 
Stan

<Prev in Thread] Current Thread [Next in Thread>