xfs
[Top] [All Lists]

Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches

To: Russell Cattelan <cattelan@xxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: [PATCH] (and bad attr2 bug) - pack xfs_sb_t for 64-bit arches
From: Tim Shimmin <tes@xxxxxxx>
Date: Mon, 27 Nov 2006 23:50:16 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4565DC6A.9080602@thebarn.com>
Organization: SGI
References: <455CB54F.8080901@sandeen.net> <455CE1E3.7020703@sandeen.net> <45612621.5010404@sandeen.net> <45627A4D.3020502@sandeen.net> <1164157336.19915.43.camel@xenon.msp.redhat.com> <5A1AC29043EE33BEB778198A@timothy-shimmins-power-mac-g5.local> <45647042.2040604@sandeen.net> <1164212695.19915.65.camel@xenon.msp.redhat.com> <45647CF8.8020104@sandeen.net> <26F2AE58A7D40E5170649BC2@timothy-shimmins-power-mac-g5.local> <4565DC6A.9080602@thebarn.com>
Reply-to: tes@xxxxxxx
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5.0.8 (Macintosh/20061025)
Hi Russell & Eric,

Basing on Russell's patch, I was thinking of something like the
attached patch.
However, I'm wondering if the xfs_attr_set_int() change to use
a req_size for non-fitting shortform EA's is worth it - as it is a bit
of a prediction (trying to codify what I was thinking).
Russell, perhaps I should just send in sf_size like you initially intended.
In fact, the more I think about it, I'm more inclined to just pass
in sf_size.

Haven't tested the patch out yet. Just wanted to discuss it a bit.

Cheers,
Tim.

Russell Cattelan wrote:
Timothy Shimmin wrote:

Hi Guys,

So just looking at the first part, which as Eric suggested can be considered
on its own.


Index: work_gfs/fs/xfs/xfs_attr.c
===================================================================
--- work_gfs.orig/fs/xfs/xfs_attr.c 2006-11-21 18:38:27.572949303 -0600
+++ work_gfs/fs/xfs/xfs_attr.c 2006-11-21 18:44:51.666033422 -0600
@@ -210,8 +210,20 @@ xfs_attr_set_int(xfs_inode_t *dp, const
* (inode must not be locked when we call this routine)
*/
if (XFS_IFORK_Q(dp) == 0) {
- if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
- return(error);
+ if ((dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) ||
+ ((dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS) &&
+ (dp->i_d.di_anextents == 0))) {
+ /* xfs_bmap_add_attrfork will set the forkoffset based on
+ * the size needed, the local attr case needs the size
+ * attr plus the size of the hdr, if the size of
+ * header is not accounted for initially the forkoffset
+ * won't allow enough space, the actually attr add will
+ * then be forced out out line to extents
+ */
+ size += sizeof(xfs_attr_sf_hdr_t);
+ if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+ return(error);
+ }
}


--On 22 November 2006 10:38:16 AM -0600 Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:

By fixing the initial size calculation at least things like SElinux
which is adding one attr won't cause the attr segment to flip to extents
immediately.
The second attr will cause the flip but not the first one.


I'd say this part (fixing up proper space for the initial attr fork setup) should probably go in
soon if it gets good reviews (with the removal of the extra tests, as we discussed on irc last
night). I think this proper change stands on its own just fine.



So yeah, as you said in IRC, the brace is in the wrong spot and
the di_aformat tests don't make any sense here.
Basically, we know that fork offset is zero and therefore that the di_aformat should be
set at XFS_DINODE_FMT_EXTENTS and di_anetents will be zero.
As this is the state before we add in an attribute fork.
Why we have this initial state as extents, I'm not too sure and wondered in the
past. Maybe because this state is one which doesn't occupy any space in the literal area.
A shortform EA has a header at least.


My next concern is that the size that is calculated is presumably trying to accomodate
the shortform EA. However, the calculation is for the sf header and the space for a
a xfs_attr_leaf_name_local with given namelen and valuelen.
It would be better to base it on an xfs_attr_sf_entry type.
So I think we need to rework this calculation.


Which leads me on to the next issue.
We don't know what EA form we are going to take,
so we can't really assume that it will be shortform.
If the EA name or value is big then the EA will go into extents and could occupy very
little room in the inode.
With the current & proposed test this could make the bytesfit function return 0
(the offset calculated in bytesfit could also go negative) and
then we would set the forkoff back at the old attr1 default.
So we might have 1 EA extent in the inode taking little space and yet setting the forkoff
in the middle.

Yes I agree worst cast scenario is that the inode has reverted to an attr1 split and
that space is being wasted in the attr portion. By the time an inode has flipped to
btree mode for di_u how much of a performance hit is really going to be noticed?
mapping the blocks for that inode is going to take multiple reads.
Attr2 seems most effective at space optimization for the local and extent
versions of di_u and probably not so much for btree.


At least by fixing the size calculation the shortforms that do fit into di_a will now
be added inline. What is happening now the btree is being re factored
which is probably expensive and the attr is being added as extents since the
original size used for the btree refactoring wasn't enough.


So the change to add in the header size will at least make single case
attrs more efficient since they will now be inline.
If the attr does not fit inline then worst case the forkoff flips to attr1
default  or a half and half split.

Given the cost of refactoring a btree it might be better to have attr1 behavior?
Since di_a will have extra space additional attr adds won't cause forkoff to
move and thus won't cause a rebalance of di_u.


So in thinking about this more does it make sense to actually not
try to optimize the space needed for di_u when it is a btree?
Maybe the first time an attr is added simply split the inode
space doing the rebalance once?
That would allow for more attrs to be added without rebalancing
the data btree.
The other scheme of space optimzation if the root btree node
is sparse would say sure give more space to di_a but at
the expense of a reblanace.

So ya it's a bit of a guessing game.


Of course the setting of the forkoff is a bit of a guessing game since we can't
predict the future usage but I think the plan is to set it to the minimum to fit
on a first come first served basis.


So I'm thinking that we should set it based on the size of shortform if that
is how it will be stored or to the size taken up by the EA extents -
I was initially thinking that this would be 1 extent but with a remote value
block of up to 64K this could in theory be an extent for each fsb of the value
I guess.
Have to think about this some more.


--Tim


--- .pc/attr2_tes/fs/xfs/xfs_attr.c     2006-10-26 17:45:01.000000000 +1000
+++ fs/xfs/xfs_attr.c   2006-11-27 22:58:49.753629073 +1100
@@ -210,7 +210,19 @@ xfs_attr_set_int(xfs_inode_t *dp, const 
         * (inode must not be locked when we call this routine)
         */
        if (XFS_IFORK_Q(dp) == 0) {
-               if ((error = xfs_bmap_add_attrfork(dp, size, rsvd)))
+               int req_size;
+               int sf_size = sizeof(xfs_attr_sf_hdr_t) + 
XFS_ATTR_SF_ENTSIZE_BYNAME(namelen, valuelen);
+
+               if (local && (sf_size <= (XFS_LITINO(mp) - 
xfs_ifork_dsize_used(dp))))
+                       req_size = sf_size;
+               else
+                       /*
+                        * We can't fit our SF EA inline, so leave space for 2 
EA extents
+                        * which should cover most initial EAs and most EAs in 
general
+                        */
+                       req_size = 2 * sizeof(xfs_bmbt_rec_t);
+
+               if ((error = xfs_bmap_add_attrfork(dp, req_size, rsvd)))
                        return(error);
        }
 
--- .pc/attr2_tes/fs/xfs/xfs_attr_leaf.c        2006-10-26 17:45:01.000000000 
+1000
+++ fs/xfs/xfs_attr_leaf.c      2006-11-27 22:59:07.295306537 +1100
@@ -170,18 +170,25 @@ xfs_attr_shortform_bytesfit(xfs_inode_t 
        }
 
        /* data fork btree root can have at least this many key/ptr pairs */
-       minforkoff = MAX(dp->i_df.if_bytes, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
+       minforkoff = MAX(xfs_ifork_dsize_used(dp), 
XFS_BMDR_SPACE_CALC(MINDBTPTRS));
        minforkoff = roundup(minforkoff, 8) >> 3;
 
        /* attr fork btree root can have at least this many key/ptr pairs */
        maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS);
        maxforkoff = maxforkoff >> 3;   /* rounded down */
 
-       if (offset >= minforkoff && offset < maxforkoff)
-               return offset;
+       /* we can't fit inline */
+       if (offset < minforkoff)
+               return 0;
+
+       /* don't move the forkoff for data btree */
+       if (dp->i_d.di_format == XFS_DINODE_FMT_BTREE && dp->i_d.di_forkoff)
+               return dp->i_d.di_forkoff << 3;
+
        if (offset >= maxforkoff)
                return maxforkoff;
-       return 0;
+       else
+               return offset;
 }
 
 /*
--- .pc/attr2_tes/fs/xfs/xfs_bmap.c     2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_bmap.c   2006-11-27 15:54:33.166590715 +1100
@@ -3543,6 +3543,7 @@ xfs_bmap_forkoff_reset(
        if (whichfork == XFS_ATTR_FORK &&
            (ip->i_d.di_format != XFS_DINODE_FMT_DEV) &&
            (ip->i_d.di_format != XFS_DINODE_FMT_UUID) &&
+           (ip->i_d.di_format != XFS_DINODE_FMT_BTREE) &&
            ((mp->m_attroffset >> 3) > ip->i_d.di_forkoff)) {
                ip->i_d.di_forkoff = mp->m_attroffset >> 3;
                ip->i_df.if_ext_max = XFS_IFORK_DSIZE(ip) /
--- .pc/attr2_tes/fs/xfs/xfs_inode.c    2006-11-27 23:20:19.000000000 +1100
+++ fs/xfs/xfs_inode.c  2006-11-27 23:21:29.604958540 +1100
@@ -4747,3 +4747,34 @@ xfs_iext_irec_update_extoffs(
                ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
        }
 }
+
+/*
+ * return how much space is used by the inode's data fork
+ */
+int
+xfs_ifork_dsize_used(xfs_inode_t *ip)
+{
+       switch (ip->i_d.di_format) {
+       case XFS_DINODE_FMT_DEV:
+               return sizeof(xfs_dev_t);
+       case XFS_DINODE_FMT_UUID:
+               return sizeof(uuid_t);
+       case XFS_DINODE_FMT_LOCAL:
+       case XFS_DINODE_FMT_EXTENTS:
+               return ip->i_df.if_bytes;
+       case XFS_DINODE_FMT_BTREE:
+               if (ip->i_d.di_forkoff)
+                       return ip->i_d.di_forkoff << 3;
+               else
+                       /*
+                        * For new attr fork, data btree takes all the space,
+                        * so no room for any attrs with the current layout
+                        * but we can know how much space it really needs
+                        * i.e. the ptrs are half way along but we could 
compress to
+                        * preserve the num of records.
+                        */
+                       return 
XFS_BMDR_SPACE_CALC(XFS_BMAP_BROOT_NUMRECS(ip->i_df.if_broot));
+       default:
+               return 0;
+       }
+}
--- .pc/attr2_tes/fs/xfs/xfs_inode.h    2006-11-17 14:35:46.000000000 +1100
+++ fs/xfs/xfs_inode.h  2006-11-27 23:24:09.376554289 +1100
@@ -535,6 +535,7 @@ void                xfs_iext_irec_compact(xfs_ifork_t 
 void           xfs_iext_irec_compact_pages(xfs_ifork_t *);
 void           xfs_iext_irec_compact_full(xfs_ifork_t *);
 void           xfs_iext_irec_update_extoffs(xfs_ifork_t *, int, int);
+int            xfs_ifork_dsize_used(xfs_inode_t *);
 
 #define xfs_ipincount(ip)      ((unsigned int) atomic_read(&ip->i_pincount))
 
<Prev in Thread] Current Thread [Next in Thread>