On Thu, Oct 06, 2011 at 07:17:52AM -0500, Alex Elder wrote:
> On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> >
> > Fix a potential prefetch read problem due to the first loop
> > execution of pf_batch_read potentially not initialising the fsbno
> > varaible:
> >
> > ==10177== Thread 6:
> > ==10177== Conditional jump or move depends on uninitialised value(s)
> > ==10177== at 0x8079CAB: pf_batch_read (prefetch.c:408)
> > ==10177== by 0x6A2996D: clone (clone.S:130)
> > ==10177==
> >
> > Fix a bunch of invalid read/write errors due to excessive blkmap
> > allocations when inode forks are corrupted. These show up some time
> > after making a blkmap allocation for 536870913 extents on i686,
> > which is followed some time later by a crash caused bymemory
> > corruption.
> >
> > This blkmap allocation size overflows 32 bits in such a
> > way that it results in a 32 byte allocation and so access to the
> > second extent results in access beyond the allocated memory and
> > corrupts random memory.
> >
> > ==5419== Invalid write of size 4
> > ==5419== at 0x80507DA: blkmap_set_ext (bmap.c:260)
> > ==5419== by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
> > ==5419== by 0x8056206: process_bmbt_reclist (dinode.c:813)
> > ==5419== by 0x80579DA: process_exinode (dinode.c:1324)
> > ==5419== by 0x8059B77: process_dinode_int (dinode.c:2036)
> > ==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419== by 0xFFF: ???
> > ==5419== Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
> > ==5419== at 0x48E1102: realloc (in
> > /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
> > ==5419== by 0x80501F3: blkmap_alloc (bmap.c:56)
> > ==5419== by 0x80599F5: process_dinode_int (dinode.c:2027)
> > ==5419== by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419== by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419== by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419== by 0xFFF: ???
> >
> > Add overflow detection code into the blkmap allocation code to avoid
> > this problem, and also free large allocations once they are finished
> > with to avoid pinning large amounts of memory due to the occasional
> > large extent list in a filesystem.
> >
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
>
> This is good but I have a few comments below, a couple of
> which really indicate you need to update this.
>
> -Alex
>
> > ---
> > repair/bmap.c | 37 ++++++++++++++++++++++++++++++++++++-
> > repair/prefetch.c | 2 +-
> > 2 files changed, 37 insertions(+), 2 deletions(-)
> >
> > diff --git a/repair/bmap.c b/repair/bmap.c
> > index 79b9f79..1127a87 100644
> > --- a/repair/bmap.c
> > +++ b/repair/bmap.c
> > @@ -47,6 +47,17 @@ blkmap_alloc(
> > if (nex < 1)
> > nex = 1;
> >
> > +#if (BITS_PER_LONG != 64)
>
> This should be == 32, not != 64.
OK.
> (And if it
> were possible, sizeof (int) == 32.)
The BITS_PER_LONG are derived from the output of the configure
process, and that's what the other code uses as well. So I'm just
being consistent ;).
> > + if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
>
> See the comment below about this calculation.
>
> > + do_warn(
> > + _("Number of extents requested in blkmap_alloc (%u) overflows 32
> > bits.\n"
> > + "If this is not a corruption, then will need a 64 bit system\n"
> ...then you will need...
>
> > + "to repair this filesystem.\n"),
> > + nex);
> > + return NULL;
> > + }
> > +#endif
> > +
> > key = whichfork ? ablkmap_key : dblkmap_key;
> > blkmap = pthread_getspecific(key);
> > if (!blkmap || blkmap->naexts < nex) {
>
> . . .
>
> > @@ -218,6 +244,15 @@ blkmap_grow(
> > }
> >
> > blkmap->naexts += 4;
>
> The check needs to go *before* you update naexts.
It's in the right place - adding 4 to the unchecked extent count can
push it over the limit, so we have to check it after adding 4 to it.
> > +#if (BITS_PER_LONG != 64)
> > + if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
>
> I don't really follow this calculation. I would expect
> it to be based more closely on how BLKMAP_SIZE() is
> defined.
>
> If you move it before the increment I think it would
> be better to use:
> if (BLKMAP_SIZE(nex) >= INT_MAX - sizeof (blkent_t *))
We can't use BLKMAP_SIZE(), because it will overflow 32 bits.
Indeed, use of BLKMAP_SIZE() on unchecked extent counts is
*precisely* the cause of the memory corruption this fix addresses.
In more detail:
typedef struct blkmap {
int naexts;
int nexts;
bmap_ext_t exts[1];
} blkmap_t;
#define BLKMAP_SIZE(n) \
(offsetof(blkmap_t, exts) + (sizeof(bmap_ext_t) * (n)))
sizeof()/offsetof() on 32 bit platforms return a 32 bit number,
naexts is a 32 bit number, so BLKMAP_SIZE() will overflow if
blkmap->naexts >= INT_MAX / sizeof(bmap_ext_t).
IOWs, we can't use BLKMAP_SIZE to detect an overflow, because it
is the code that overflows...
> And since this would expose the internals of what
> BLKMAP_SIZE() does, it might be nicer if some sort of
> BLKMAP_NENTS_MAX constant were defined next to the
> definition of BLKMAP_SIZE().
I can add a BLKMAP_NENTS_MAX constant to repair/bmap.h.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|