[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: XFS and I/O alignment



On Tue, 2002-06-18 at 12:46, Luciano Chavez wrote:
> Hello,
> 
> Recently on the EVMS mailing list, we had a gentlemen report a problem
> using Linux XFS 1.1 on a RAID5 storage object (Linux MD compatibility
> storage object). See
> http://sourceforge.net/mailarchive/forum.php?thread_id=799287&forum_id=2003 for the initial post.
> 
> After some research I found that moving the internal log to another
> device worked around the problem.
> 
> In short the problem appears to be related to I/O requests of 4K in
> length coming in on devices sensitive to alignment such as striped LVs
> or MD devices (specifically when these unaligned I/O requests cross
> boundaries like outside a chunksize). This problem should also manifest
> itself on non-striped entities such as fragmented LVs where a PE may get
> an unaligned I/O request that may span into a PE corresponding to a
> different LV.
> 
> Also, the problem manifested itself most easily with striped devices. I
> found explanations under man mkfs.xfs of some options specific to
> striping so I experimented. Below is the output of the several mkfs.xfs
> attempts on a RAID 5 storage object composed of 6 partitions.
>       
> root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0
> meta-data=/dev/evms/md/md0       isize=256    agcount=8, agsize=31360
> blks
> data     =                       bsize=4096   blocks=250880, imaxpct=25
>          =                       sunit=0      swidth=0 blks, unwritten=0
> naming   =version 2              bsize=4096  
> log      =internal log           bsize=4096   blocks=1200
> realtime =none                   extsz=65536  blocks=0, rtextents=0
> root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0 -d sunit=8,swidth=40
> meta-data=/dev/evms/md/md0       isize=256    agcount=8, agsize=31360
> blks
> data     =                       bsize=4096   blocks=250880, imaxpct=25
>          =                       sunit=1      swidth=5 blks, unwritten=0
> naming   =version 2              bsize=4096  
> log      =internal log           bsize=4096   blocks=1200
> realtime =none                   extsz=65536  blocks=0, rtextents=0
> 
> root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0 -d su=32768,sw=5    
> meta-data=/dev/evms/md/md0       isize=256    agcount=8, agsize=31360
> blks
> data     =                       bsize=4096   blocks=250880, imaxpct=25
>          =                       sunit=8      swidth=40 blks,
> unwritten=0
> naming   =version 2              bsize=4096  
> log      =internal log           bsize=4096   blocks=1200
> realtime =none                   extsz=65536  blocks=0, rtextents=0
> 
> None of these helped. Not even specifying the same option on the mount.
> I still ended up with unaligned I/O coming in and crossing chunksize
> stripe boundaries essentially corrupting data. I also tried the mkfs.xfs
> options set to sunit=64,swidth=320 which produced sunit=8 and swidth=40
> on output and still didn't help.
> 
> I noticed that xfsprogs libdisk source files make tests of the device to
> see if it is a MD or LV striped device to automatically set the sunit
> and swidth values in your superblock to provide proper alignment on log
> I/O for example. But in my attempts to isolate this, there also must be
> a mount time check somewhere to determine whether to use these since
> formatting it correctly and mounting it with these options using the
> EVMS MD plug-in, they don't seem to get honored.
> 
> I would appreciate any help the XFS developers could offer in allowing
> XFS to work on top of block devices sensitive to alignment under Linux.
> 
> Please cross-post any responses to the evms-devel@lists.sourceforge.net
> so that others not subscribed to the linux-xfs list can see them.
> 
> We (EVMS) will offer any assistance we can as we would like to see
> customers using XFS and EVMS together seamlessly and happily on their
> enterprise systems.
> 
> -- 
> regards,
> 
> Luciano Chavez
> 
> lnx1138@us.ibm.com          
> http://evms.sourceforge.net

Hi,

The answer to this problem is sitting on my workstation right now, and
I am trying to decide if pushing it out into the world just before I
leave for OLS followed by a week's vacation is a good idea or not.

The stripe alignment code in xfs does not apply to the log, the log is
written in chunks of upto 32K which can be any multiple of 512 bytes and
can start on any 512 byte boundary. The only 'safe' way now to make this
work with volumes where that can end up crossing device boundaries is to
do all the I/O in 512 byte buffer heads. Which as you are probably aware
is not the best thing in the world to do from a cpu and memory usage
standpoint. This is why moving the log to a different device made the
problem go away.

A quick check of if this is going to fix things for EVMS is to take this
code in fs/xfs/pagebuf/page_buf.c:

        if ((MAJOR(dev) != LVM_BLK_MAJOR) && (MAJOR(dev) != MD_MAJOR)) {
                sector = blk_length << SECTOR_SHIFT;
                blk_length = 1;
         } else if ((MAJOR(dev) == MD_MAJOR) && (pg_offset == 0) &&
                   (pg_length == PAGE_CACHE_SIZE) &&
                   (((unsigned int) bn) & BN_ALIGN_MASK) == 0) {
                sector = blk_length << SECTOR_SHIFT;
                blk_length = 1;
        } else {
                sector = SECTOR_SIZE;
        }

and replace it with:

	sector = SECTOR_SIZE;

------------------------

The code I have sitting here introduces a new log format in xfs which can
be aligned on different boundaries. It introduces new mkfs options:

	-l version=2,sunit=xxxx

Log writes then become aligned on and padded to the stripe unit specified,
4K is enough in most cases. You can also do larger logwrites with this code,
but that is not the issue here.

Steve

p.s. LVM2 has hit exactly the same problem.

-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@sgi.com