xfs
[Top] [All Lists]

xfs open questions

To: xfs@xxxxxxxxxxx
Subject: xfs open questions
From: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Tue, 27 Jan 2009 09:28:23 +0100
Organization: it-management http://it-management.at
User-agent: KMail/1.10.3 (Linux/2.6.27.10-ZMI; KDE/4.1.3; x86_64; ; )
Dear list,

I'm new here, experienced admin, trying to understand XFS correctly. 
I've read 
http://xfs.org/index.php/XFS_Status_Updates
http://oss.sgi.com/projects/xfs/training/index.html
http://en.wikipedia.org/wiki/Xfs
and still have some xfs questions, which I guess should be in the FAQ 
also because they were the first questions I raised when trying XFS. I 
hope this is the correct list to ask this, and hope this very long first 
mail isn't too intrusive:

- Stripe Alignment
It's very nice to have the FS understand where it runs on, and that you 
can optimize for it. But the documentation on how to do that correctly 
is incomplete.
http://oss.sgi.com/projects/xfs/training/xfs_slides_04_mkfs.pdf
On page 5 is an example an an "8+1 RAID". Does it mean "9 disks in 
RAID-5"? So 8 are data and 1 is parity, and for XFS only the data disks 
are important?
If so, when I have a 8 disks RAID 6 (where 2 are parity, 6 data) and a 8 
disks RAID-50 (again 2 parity, 6 data) would be the same?
Let's say I have 64k stripe size on the RAID controller, with above 8 
disks RAID 6. So best performance would be
mkfs -d su=64k,sw=$((64*6))k
is that correct? It would be good if there's clearer documentation with 
more examples.

- 64bit Inodes
On the allocator's slides 
http://oss.sgi.com/projects/xfs/training/xfs_slides_06_allocators.pdf
it's said that if the volume is >1TB, 32bit Inodes make the FS suffer, 
and that 64bit Inodes should be used. Is that a safe function? 
Documentation says some backup tools can't handle 64bit Inodes, are 
there problems with other programs as well? Is the system fully 
supporting 64bit Inodes? 64bit Linux kernel needed I guess?
And if I already created a FS >1TB with 32bit Inodes, it would be better 
to recreate it with 64bit Inodes and restore all data then?

- Allocation Groups
When I create a XFS with 2TB, and I know it will be growing as we expand 
the RAID later, how do I optimize the AG's? If I now start with 
agcount=16, and later expand the RAID +1TB so having 3 instead 2TB, what 
happens to the agcount? Is it increased, or are existing AGs expanded so 
you still have 16 AGs? I guess that new AG's are created, but it's 
nowhere documented.

- mkfs warnings about stripe width multiples
For a RAID 5 with 4 disks having 2,4TB on LVM I did:
# mkfs.xfs -f -L oriondata -b size=4096 -d su=65536,sw=3,agcount=40 -i 
attr=2 -l lazy-count=1,su=65536 /dev/p3u_data/data1
Warning: AG size is a multiple of stripe width.  This can cause 
performance problems by aligning all AGs on the same disk.  To avoid 
this, run mkfs with an AG size that is one stripe unit smaller, for 
example 13762544.
meta-data=/dev/p3u_data/data1    isize=256    agcount=40, 
agsize=13762560 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=550502400, 
imaxpct=5
         =                       sunit=16     swidth=48 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=16 blks, lazy-
count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

and so I did it again with
# mkfs.xfs -f -L oriondata -b size=4096 -d 
su=65536,sw=3,agsize=13762544b -i attr=2 -l lazy-count=1,su=65536 
/dev/p3u_data/data1
meta-data=/dev/p3u_data/data1    isize=256    agcount=40, 
agsize=13762544 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=550501760, 
imaxpct=5
         =                       sunit=16     swidth=48 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=16 blks, lazy-
count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

It would be good if mkfs would correctly says "... run mkfs with an AG 
size that is one stripe unit smaller, for example 13762544b". The "b" at 
the end is very important, that cost me a lot of search in the 
beginning.
Is there a limit on the number of AG's? Theoretical and practical? Is 
there a guideline how many AGs to use? Depending on CPU cores, or number 
of parallel users, or spindles, or something else? Page 4 of the mkfs 
docs (link above) says "too few or too many AG's should be avoided", but 
what numbers are "few" and "many"?

- PostgreSQL
The PostgreSQL database creates a directory per DB. From the docs I read 
that this creates all Inodes within the same AG. But wouldn't it be 
better for performance to have each table on a different AG? This could 
be manually achieved manually, but I'd like to hear if that's better or 
not.
Or are there other tweaks to remember when using PostgreSQL on XFS? This 
question was raised on the PostgreSQL admin list, and if there are good 
guidelines I'm happy to post them there.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

Attachment: signature.asc
Description: This is a digitally signed message part.

<Prev in Thread] Current Thread [Next in Thread>