[Top] [All Lists]

Re: mkfs.xfs error creating large agcount an raid

To: Paul Anderson <pha@xxxxxxxxx>
Subject: Re: mkfs.xfs error creating large agcount an raid
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Mon, 27 Jun 2011 10:37:15 -0500
Cc: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx, Marcus Pereira <marcus@xxxxxxxxxxx>
In-reply-to: <BANLkTikJe7ayzwD2Yqc7BHePfZ4x-M_SyQ@xxxxxxxxxxxxxx>
References: <4E063BC6.9000801@xxxxxxxxxxx> <4E0694CC.8050003@xxxxxxxxxxxxxxxxx> <4E06C967.2060107@xxxxxxxxxxx> <20110626235959.GC32466@dastard> <4E07FA07.4050907@xxxxxxxxxxxxxxxxx> <4E0803AA.20809@xxxxxxxxxxx> <4E08456F.1090503@xxxxxxxxxxxxxxxxx> <BANLkTimJm5Fe1LvD1AQYZC5QCDs0gXJpFA@xxxxxxxxxxxxxx> <4E089D4E.1060503@xxxxxxxxxxx> <BANLkTikJe7ayzwD2Yqc7BHePfZ4x-M_SyQ@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv: Gecko/20110616 Thunderbird/3.1.11
On 6/27/11 10:27 AM, Paul Anderson wrote:
> On Mon, Jun 27, 2011 at 11:10 AM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
>> On 6/27/11 8:04 AM, Paul Anderson wrote:
>>> One thing this thread indicates is the need for a warning in mkfs.xfs
>>> - according to several developers, there is, I think, linear increase
>>> in allocation time to number of allocation groups.
>>> It would be helpful for the end user to simply issue a warning stating
>>> this when the AG count seems high with a brief explanation as to why
>>> it seems high.  I would allow it, but print the warning.  Even a
>>> simple linear check like agroups>500 should suffice for "a while".
>> I disagree.
>> There are all sorts of ways a user can shoot themselves in the foot with
>> unix commands.  Detecting and warning about all of them is a fool's errand.
> Clearly a philosophical difference.
> In managing complex software, it is far better for users if the
> software itself can simply report why something is a problem, without
> resorting to expecting users to read source code or ask developers
> why.
> There is nothing in the man page I see indicating what is good or bad
> regarding allocation groups - either document it there or warn in the
> software.  If allocation algorithms are linear with respect to
> allocation groups, the something like this should be stated in the man
> pages.
> Doing neither leads to frustrated end users.  If you answer is "use
> the defaults" then explain why and which parameters is applies to
> (again in the documentation).
> Also, it is not hard to do, and would have in this instance saved
> developer time.  Since the issue has come up a few times the last
> month or so, it seems worthwhile to deal with.

This one instance would not be hard to do, agreed.

To point out every potential pitfall in every bad option or combination
of options would be impossible.

I'd be happy with a version of the FAQ entry Dave pointed to in
the mkfs.xfs manpage, though, basically "don't change the defaults
unless you know for sure that it will address a shortcoming you have
seen in testing."

> It's sort of like the story about giving a person a fish versus
> teaching them how to fish.

Or a little like performing neurosurgery on a person vs. teaching
them how to do it themselves.  ;)

There is lots of "teaching to fish" in the documentation and the FAQ;
until you are really able to delve into the technical complexities
of XFS you probably should not try to fish in water that is too deep.

Just because a knob is there doesn't mean you should turn it as
far as it can go, and I don't think it's our job to warn against
that in every instance, either...

Those who wish to learn would be well advised to read up on the
many detailed technical docs available; I don't think that the
mkfs.xfs code is the right place to do this teaching, though.

But I guess it is a philosophical difference.


> Paul
>> ======================================
>> = Warning!  mkfs.xfs detected insane =
>> =   option specification.  Cancel?   =
>> =                                    =
>> =      [   OK   ]     [ Cancel ]     =
>> ======================================
>> -Eric
>>> Paul
>>> On Mon, Jun 27, 2011 at 4:55 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> 
>>> wrote:
>>>> On 6/26/2011 11:14 PM, Marcus Pereira wrote:
>>>>> Em 27-06-2011 00:33, Stan Hoeppner escreveu:
>>>>>> I recommend 3 changes, one of which I previously mentioned:
>>>>>> 1.  Use 8 mirror pairs instead of 4
>>>>>> 2.  Don't use striping.  Make an mdraid --linear device of the 8 mirrors
>>>>>> 3.  Format with '-d agcount=32' which will give you 4 AGs per spindle
>>>>>> Test this configuration and post your results.
>>>>> I am thanks for all advices. I will make the tests and post, may take
>>>>> some time.
>>>>> About all other messages. My system may not be a Ferrari but its not a
>>>>> Volks. I certainly do not have that many HDs in fiber channel, but the
>>>>> sever is a dual core Xeon 6 cores with HT. Linux sees a total of 24
>>>>> cores, total RAM is 24GB. The HDs are all SAS 15Krpm and the system runs
>>>>> on SSD. They are dedicated to handle the maildir files and I have
>>>>> several of those servers running nicely.
>>>>> But I don’t want to make the thread about my system larger.
>>>> So you do or don't have the excessive head seek problem you previously
>>>> mentioned?  If not then use the mkfs.xfs defaults.
>>>>> Yes, I don’t know much about XFS and Allocation groups, thanks for you
>>>>> all to help me a bit.
>>>> You're welcome.  Google should turn up a decent amount of information
>>>> about XFS allocation groups if you're interested in further reading.
>>>>> At the end the reason why I opened the thread it the error and the
>>>>> developers should take some care about that.
>>>>> Ok, no reason to use that many agcount but giving a "mkfs.xfs: pwrite64
>>>>> failed: No space left on device" error for me stills seems a bug.
>>>> The definition of a software bug stipulates incorrect or unexpected
>>>> program behavior.  Error messages aren't bugs unless the wrong error
>>>> message is returned for a given fault condition, or no error is returned
>>>> when one should be.
>>>> Are you stipulating that the above isn't the correct error message for
>>>> the fault condition?  Or do you simply not understand the error message?
>>>>  If the latter, maybe you should simply ask what that error means before
>>>> saying the error message is a bug. :)
>>>> --
>>>> Stan
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@xxxxxxxxxxx
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@xxxxxxxxxxx
>>> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>