xfs
[Top] [All Lists]

Re: XFS recovery resumes...

To: xfs@xxxxxxxxxxx
Subject: Re: XFS recovery resumes...
From: Joe Landman <joe.landman@xxxxxxxxx>
Date: Sun, 18 Aug 2013 18:57:10 -0400
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=kr06lBie/qTrXStuA4vy9Jw3exzD+tNGARqZCiustC0=; b=eKL6xtCJ9Mxbzj64JmA4WKB198u3EiINtLxVdpp2Wqb/0QyxliHgnWvY3EPEmt2gWM DWe/gXDdB0ZCnbUevbWECprT4Q4SAn8yXImRkctEM1b7BlDE+2NDGhwJ/eY0ZzRYsN3I +E6NseM9PDQmie9xILA65ASoUZ2gMQdy1N6KmN112SSjKQBHv1xbMqi/m5j/75nltYKa gNAZjUZbnkgRk75PYNXRNmcE6ORKtUnpU0SJj4RdLyRjLBrpws4fMKP2tBc4ArYYGsuI b1C0GOIY9a82leP9Adijg4HUGs+OWbIMtv9QuXbSf2kihA2AiJ22Ik7SGZa8vGQFT80R Pc5A==
In-reply-to: <240990.4028.1376863911761.JavaMail.root@xxxxxxxxxxxxxxxxxxxx>
References: <240990.4028.1376863911761.JavaMail.root@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130803 Thunderbird/17.0.8
On 08/18/2013 06:11 PM, Jay Ashworth wrote:
----- Original Message -----
From: "Joe Landman" <joe.landman@xxxxxxxxx>

You need at least 497MB RAM to run with prefetching enabled.

^^^^^

This is 1/2 GB ram, and you didn't specify the memory options of the
xfs_repair ... so I'm going to guess at this point that you ran out of
ram. Paging while running xfs_repair is no fun.

How much ram do you have in this box? Next question is, is this an ECC
memory box?

512M.  It's a *very* old KT6V based board, and when we tried to expand
it several years back, it went bat-guano with any more than half a gig.

Ahhh .... ok.  Got it.



Not sure if you are hitting a bug as much as running into something
else like a hardware limit (RAM) or a memory stick issue.

Well, the upstream cause was a 7 year old Antec power supply that
finally died, about a month ago, slowly.

Ok. I've had power supplies take down memory in the past. You might be hitting a bad memory cell courtesy of the PS.


Do you have EDAC (or mcelog) on? Any errors from this?

I don't have mcelog on, and no, the memory isn't registered, but a
4-pass run of Memtest+ came up clean, so I'm speculating that the

Not registered (which is just buffered), but ECC. ECC does a parity computation on some number of bits, and provides you a rough "good/bad" binary state of a particular area of memory. If the parity bits stored don't match what is computed on read, then odds are that something is wrong. Its not foolproof, but its a good mechanism to catch potential errors.

We've had cases where Memtest(*) reported everything fine, yet I was able to generate ECC errors in a few minutes by running a memory intensive app. Memtest does do some hardware exercise, but its not usually hitting memory the way apps do. That difference can be significant. This is in part why the day job stopped using memtest for testing a number of years ago. We now run heavy duty electronic structure codes, and pi/e/... computations for burn in.

*continuing* problem isn't hardware; I'm pretty sure it was just the
failing 12V rail on the dying PS.  I just have to clean up after it
enough to get *one* of these 2 drives cleaned off, then I can make a
new FS, and play musical files.

Ahhh ...

I was running a Plex server on an old machine for a while. I had to shift over to a beefier box with ECC ram and more CPUs. Right now my Plex server has 8 cpus, 24 GB RAM, and about 1TB of disk (old). Once you start doing recoding on the fly (multi-resolution output), you need the ram and processor power.


Or, I may just go grab a 3TB external after all.  :-)

If you do that, and you still hit the error, chances are you might need to swap out your MB and CPU/RAM to something newer (not to mention the PS). I'd recommend ECC based systems if at all possible. Xfs can and will get very unhappy if bits are flipped on its data structures while you are making changes to the file system.

--

Joe


Cheers,
-- jra


<Prev in Thread] Current Thread [Next in Thread>