xfs
[Top] [All Lists]

Re: Data type overflow in xfs_trans_unreserve_and_mod_sb

To: Shailendra Tripathi <stripathi@xxxxxxxxx>
Subject: Re: Data type overflow in xfs_trans_unreserve_and_mod_sb
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Mon, 25 Sep 2006 09:32:46 -0500
Cc: David Chinner <dgc@xxxxxxx>, xfs@xxxxxxxxxxx, Timothy Shimmin <tes@xxxxxxx>
In-reply-to: <45179573.3020007@agami.com>
References: <55EF1E5D5804A542A6CA37E446DDC206655888@mapibe17.exchange.xchg> <45179573.3020007@agami.com>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5.0.7 (Macintosh/20060909)
Shailendra Tripathi wrote:
Hi David,
As part of fixing xfs_reserve_blocks issue, you might want to fix an issue in xfs_trans_unreserve_and_mod_sb as well. Since, I am on much older version, my patch is not applicable on newer trees. However, the patch is attached for your reference.


The problem is as below:

Superblock modifications required during transaction are stored in delta fields in transaction. These fields are applied to the superblock when transaction commits.

The in-core superblock changes are done in xfs_trans_unreserve_and_mod_sb. It calls xfs_mod_incore_sb_batch function to apply the changes. This function tries to apply the deltas and if it fails for any reason, it backs out all the changes. One typical modification done is like that:

        case XFS_SBS_DBLOCKS:
                lcounter = (long long)mp->m_sb.sb_dblocks;
                lcounter += delta;
                if (lcounter < 0) {
                        ASSERT(0);
                        return (XFS_ERROR(EINVAL));
                }
                mp->m_sb.sb_dblocks = lcounter;
                return (0);

So, when it returns EINVAL, the second part of the code backs out the changes made to superblock. However, the worst part is that xfs_trans_unreserve_and_mod_sb does not return any error value.

Hm, yep, just ASSERT(error == 0);

I suppose this is the trickiness of canceling a transaction at some points...

The transaction appears to be committed peacefully without returning the error. You don't notice this unless you do I/O on the filesystem. Later, it hits some sort of in-memory corruption or other errors.

We hit this issue in our testing we tried to grow the filesystem from from 100GB to 10000GB. This is beyond the interger (31 bits) limit and, hence, for dblocks and fdblocks, xfs_mod_sb struct does not pass in correct data.



First thoughts, "long" won't help on 32 bit machines, perhaps this should be an explicitly-sized 64-bit type?


-Eric

p.s. good to see agami's recently active participation on the list!


<Prev in Thread] Current Thread [Next in Thread>