On Tue, 2002-06-18 at 13:04, Steve Lord wrote:
> On Tue, 2002-06-18 at 12:46, Luciano Chavez wrote:
> > Hello,
> >
> > Recently on the EVMS mailing list, we had a gentlemen report a problem
> > using Linux XFS 1.1 on a RAID5 storage object (Linux MD compatibility
> > storage object). See
> > http://sourceforge.net/mailarchive/forum.php?thread_id=799287&forum_id=2003
> > for the initial post.
> >
> > After some research I found that moving the internal log to another
> > device worked around the problem.
> >
> > In short the problem appears to be related to I/O requests of 4K in
> > length coming in on devices sensitive to alignment such as striped LVs
> > or MD devices (specifically when these unaligned I/O requests cross
> > boundaries like outside a chunksize). This problem should also manifest
> > itself on non-striped entities such as fragmented LVs where a PE may get
> > an unaligned I/O request that may span into a PE corresponding to a
> > different LV.
> >
> > Also, the problem manifested itself most easily with striped devices. I
> > found explanations under man mkfs.xfs of some options specific to
> > striping so I experimented. Below is the output of the several mkfs.xfs
> > attempts on a RAID 5 storage object composed of 6 partitions.
> >
> > root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0
> > meta-data=/dev/evms/md/md0 isize=256 agcount=8, agsize=31360
> > blks
> > data = bsize=4096 blocks=250880, imaxpct=25
> > = sunit=0 swidth=0 blks, unwritten=0
> > naming =version 2 bsize=4096
> > log =internal log bsize=4096 blocks=1200
> > realtime =none extsz=65536 blocks=0, rtextents=0
> > root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0 -d sunit=8,swidth=40
> > meta-data=/dev/evms/md/md0 isize=256 agcount=8, agsize=31360
> > blks
> > data = bsize=4096 blocks=250880, imaxpct=25
> > = sunit=1 swidth=5 blks, unwritten=0
> > naming =version 2 bsize=4096
> > log =internal log bsize=4096 blocks=1200
> > realtime =none extsz=65536 blocks=0, rtextents=0
> >
> > root@gunslinger ~ # mkfs.xfs -f /dev/evms/md/md0 -d su=32768,sw=5
> > meta-data=/dev/evms/md/md0 isize=256 agcount=8, agsize=31360
> > blks
> > data = bsize=4096 blocks=250880, imaxpct=25
> > = sunit=8 swidth=40 blks,
> > unwritten=0
> > naming =version 2 bsize=4096
> > log =internal log bsize=4096 blocks=1200
> > realtime =none extsz=65536 blocks=0, rtextents=0
> >
> > None of these helped. Not even specifying the same option on the mount.
> > I still ended up with unaligned I/O coming in and crossing chunksize
> > stripe boundaries essentially corrupting data. I also tried the mkfs.xfs
> > options set to sunit=64,swidth=320 which produced sunit=8 and swidth=40
> > on output and still didn't help.
> >
> > I noticed that xfsprogs libdisk source files make tests of the device to
> > see if it is a MD or LV striped device to automatically set the sunit
> > and swidth values in your superblock to provide proper alignment on log
> > I/O for example. But in my attempts to isolate this, there also must be
> > a mount time check somewhere to determine whether to use these since
> > formatting it correctly and mounting it with these options using the
> > EVMS MD plug-in, they don't seem to get honored.
> >
> > I would appreciate any help the XFS developers could offer in allowing
> > XFS to work on top of block devices sensitive to alignment under Linux.
> >
> > Please cross-post any responses to the evms-devel@xxxxxxxxxxxxxxxxxxxxx
> > so that others not subscribed to the linux-xfs list can see them.
> >
> > We (EVMS) will offer any assistance we can as we would like to see
> > customers using XFS and EVMS together seamlessly and happily on their
> > enterprise systems.
> >
> > --
> > regards,
> >
> > Luciano Chavez
> >
> > lnx1138@xxxxxxxxxx
> > http://evms.sourceforge.net
>
> Hi,
>
> The answer to this problem is sitting on my workstation right now, and
> I am trying to decide if pushing it out into the world just before I
> leave for OLS followed by a week's vacation is a good idea or not.
>
> The stripe alignment code in xfs does not apply to the log, the log is
> written in chunks of upto 32K which can be any multiple of 512 bytes and
> can start on any 512 byte boundary. The only 'safe' way now to make this
> work with volumes where that can end up crossing device boundaries is to
> do all the I/O in 512 byte buffer heads. Which as you are probably aware
> is not the best thing in the world to do from a cpu and memory usage
> standpoint. This is why moving the log to a different device made the
> problem go away.
>
> A quick check of if this is going to fix things for EVMS is to take this
> code in fs/xfs/pagebuf/page_buf.c:
>
> if ((MAJOR(dev) != LVM_BLK_MAJOR) && (MAJOR(dev) != MD_MAJOR)) {
> sector = blk_length << SECTOR_SHIFT;
> blk_length = 1;
> } else if ((MAJOR(dev) == MD_MAJOR) && (pg_offset == 0) &&
> (pg_length == PAGE_CACHE_SIZE) &&
> (((unsigned int) bn) & BN_ALIGN_MASK) == 0) {
> sector = blk_length << SECTOR_SHIFT;
> blk_length = 1;
> } else {
> sector = SECTOR_SIZE;
> }
>
> and replace it with:
>
> sector = SECTOR_SIZE;
>
> ------------------------
>
Steve,
Thank you much for the speedy reply! My page_buf.c didn't quite look
like yours (I assume this was the _pagebuf_page_io routine). I made the
following change to version of the source and it now appears to be
working.
int concat_ok=0; /* <---- I initialized this to zero */
/*
if ((MAJOR(dev) != LVM_BLK_MAJOR) && (MAJOR(dev) != MD_MAJOR)) {
concat_ok = 1;
} else if ((MAJOR(dev) == MD_MAJOR) && (pg_offset == 0) &&
(pg_length == PAGE_CACHE_SIZE) &&
((bn & ((page_buf_daddr_t)(PAGE_CACHE_SIZE - 1) >>
9)) == 0)) {
concat_ok = 1;
} else {
concat_ok = 0;
}
*/
Will the recommended code change be the permanent fix?
> The code I have sitting here introduces a new log format in xfs which can
> be aligned on different boundaries. It introduces new mkfs options:
>
> -l version=2,sunit=xxxx
>
> Log writes then become aligned on and padded to the stripe unit specified,
> 4K is enough in most cases. You can also do larger logwrites with this code,
> but that is not the issue here.
>
What about non-striped devices? How are they aligned now?
> Steve
>
> p.s. LVM2 has hit exactly the same problem.
>
>
> Steve Lord voice: +1-651-683-3511
> Principal Engineer, Filesystem Software email: lord@xxxxxxx
>
--
regards,
Luciano Chavez
lnx1138@xxxxxxxxxx
http://evms.sourceforge.net
|