Hi!
I already brought this one up yesterday on #xfs@freenode where it was suggested
to write this on this ML. Here I go...
I'm running Debian unstable on my desktop and lately added a new RAID set
consisting of 3x 4 TB disks (namely Hitachi HDS724040ALE640). My partition
layout is:
Model: ATA Hitachi HDS72404 (scsi)
Disk /dev/sdd: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 1018kB 1000kB bios_grub
2 2097kB 212MB 210MB ext3 raid
3 212MB 1286MB 1074MB xfs raid
4 1286MB 4001GB 4000GB raid
Partition #2 is intended as /boot disk (RAID1), partition #3 as small rescue
disk or swap (RAID1), partition #4 will be used as physical device for LVM
(RAID5).
muaddib:~# mdadm --detail /dev/md7
/dev/md7:
Version : 1.2
Creation Time : Fri Jun 22 22:47:15 2012
Raid Level : raid5
Array Size : 7811261440 (7449.40 GiB 7998.73 GB)
Used Dev Size : 3905630720 (3724.70 GiB 3999.37 GB)
Raid Devices : 3
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Sat Jun 23 13:47:19 2012
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : muaddib:7 (local to host muaddib)
UUID : 0be7f76d:90fe734e:ac190ee4:9b5f7f34
Events : 20
Number Major Minor RaidDevice State
0 8 68 0 active sync /dev/sde4
1 8 52 1 active sync /dev/sdd4
3 8 84 2 active sync /dev/sdf4
So, a cat /proc/mdstat shows all of my RAID devices:
muaddib:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid5 sdf4[3] sdd4[1] sde4[0]
7811261440 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
md6 : active raid1 sdd3[0] sdf3[2] sde3[1]
1048564 blocks super 1.2 [3/3] [UUU]
md5 : active (auto-read-only) raid1 sdd2[0] sdf2[2] sde2[1]
204788 blocks super 1.2 [3/3] [UUU]
md4 : active raid5 sdc6[0] sda6[2] sdb6[1]
1938322304 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
md3 : active (auto-read-only) raid1 sdc5[0] sda5[2] sdb5[1]
1052160 blocks [3/3] [UUU]
md2 : active raid1 sdc3[0] sda3[2] sdb3[1]
4192896 blocks [3/3] [UUU]
md1 : active (auto-read-only) raid1 sdc2[0] sda2[2] sdb2[1]
2096384 blocks [3/3] [UUU]
md0 : active raid1 sdc1[0] sda1[2] sdb1[1]
256896 blocks [3/3] [UUU]
unused devices: <none>
The RAID devices /dev/md0 to /dev/md4 are on my old 3x 1 TB Seagate disks.
Anyway, to finally come to the problem, when I try to create a filesystem on
the new RAID5 I get the following:
muaddib:~# mkfs.xfs /dev/lv/usr
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/lv/usr isize=256 agcount=16, agsize=327552 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=5240832, imaxpct=25
= sunit=128 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
As you can see I follow the "mkfs.xfs knows best, don't fiddle around with
options unless you know what you're doing!"-advice. But apparently mkfs.xfs
wanted to create a log stripe unit of 512 kiB, most likely because it's the
same chunk size as the underlying RAID device.
The problem seems to be related to RAID5, because when I try to make a
filesystem on /dev/md6 (RAID1), there's no error message:
muaddib:~# mkfs.xfs /dev/md6
meta-data=/dev/md6 isize=256 agcount=8, agsize=32768 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=262141, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=1200, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Additional info:
I first bought two 4 TB disks and ran them for about 6 weeks as a RAID1 and
already did some tests (because the 4 TB Hitachis were sold out in the
meantime). I can't remember seeing the log stripe error message during those
tests while working with a RAID1.
So, the question is:
- is this a bug somewhere in XFS, LVM or Linux's software RAID implementation?
- will performance suffer from log stripe size adjusted to just 32 kiB? Some of
my logical volumes will just store data, but one or the other will have some
workload acting as storage for BackupPC.
- would it be worth the effort to raise log stripe to at least 256 kiB?
- or would it be better to run with external log on the old 1 TB RAID?
End note: the 4 TB disks are not yet "in production", so I can run tests with
both RAID setup as well as mkfs.xfs. Reshaping the RAID will take up to 10
hours, though...
--
Ciao... // Fon: 0381-2744150
Ingo \X/ http://blog.windfluechter.net
gpg pubkey: http://www.juergensmann.de/ij_public_key.asc
|