[Top] [All Lists]

Re: Patch 1300 & rpm issue with 1.3.0

To: Steve Lord <lord@xxxxxxx>
Subject: Re: Patch 1300 & rpm issue with 1.3.0
From: "Foris, Jim (MED)" <foris@xxxxxxxxxxxxxxxx>
Date: Fri, 29 Aug 2003 06:57:11 -0500
Cc: Russell Cattelan <cattelan@xxxxxxx>, "Foris, Jim (MED)" <foris@xxxxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxx>, Kai Leibrandt <k_leibrandt@xxxxxxxxxxx>, "'Simon Matter'" <simon.matter@xxxxxxxxxxxxxxxx>, "'Axel Thimm'" <Axel.Thimm@xxxxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <1062115583.1695.25.camel@xxxxxxxxxxxxxxxxxxxxxxx>
References: <Pine.LNX.4.44.0308280914100.19961-100000@xxxxxxxxxxxxxxxxxxxxxx> <3F4E5AD3.80101@xxxxxxxxxx> <1062111109.4318.6.camel@naboo> <1062115583.1695.25.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Reply-to: "Foris, Jim (MED)" <james.foris@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314
Steve Lord wrote:
On Thu, 2003-08-28 at 17:51, Russell Cattelan wrote:

On Thu, 2003-08-28 at 14:41, Foris, Jim (MED) wrote:

Eric Sandeen wrote:

On Thu, 28 Aug 2003, Kai Leibrandt wrote:

That's just what I was thinking; is rpm only an indication that other
apps might have issues as well? If so, how do we identify them and
rectify the problems? In the kernel, or in the app?

That's not clear to me yet, but we have dome some O_DIRECT stresstesting
and it's all been fine.  So this doesn't seem to be a problem with
O_DIRECT in general, which makes me think it might be the app.

Using "strace" on a RH 2.4.20-20.9.XFS1.3.0 system to follow what "rpm" does
during an install, the key difference seems to be the following sequence:

WORKS (created a EXT3 partition, copied /var/lib/rpm/* to it, then mounted it at /var/lib/rpm)

I this ext2 or ext3?
ext2 will turn off O_DIRECT after the open call
ext3 was suppose to, eric has a new patch to fix that.

As stated above, it was EXT3 (I forgot to mention that Eric's patch had
been applied :-) ).

This looks like memory alignment of the write buffer. The alignment of
the memory may be constrained differently, possibly ext3 is not doing
O_DIRECT so is not constraining I/O alignment. It would be good to see
the address of the buffer passed into the write call.

Turns out that information is in my original posting:

    4144  write(2, "write: 0xbffed120, 8192: Invalid"..., 41) = 41 <0.000012>

So the buffer address, 0xbffed120, is NOT correctly alligned.

AND THE MYSTERY IS SOLVED; RPM fails because the person who tried to use
O_DIRECT file access to an internal database file did not check for/guarantee
correct buffer address alignment.  This bug did not show up to Red Hat because
they never tested it (RPM) on a file system that actually supports O_DIRECT
(because they don't have any).

The options to solve the problem become clear:

1. Build the kernel w/o O_DIRECT support (leave in patch 1300).
2. Build a kernel with Erics patch and always have "/var/lib/rpm" reside on a
    non-XFS partition.
3. Put LD_ASSUME_KERNEL into the environment when any "rpm" call is made.
4. Fix "rpm-4.2" (either by removing the ability to set O_DIRECT  , or by adding
    the necessary buffer boundry checks).

Personally, I think I will probably patch "rpm-4.2" since that is where the bug 

Thanks to everyone,

Jim Foris

(And, by the way, there is no use of O_DIRECT in the db-4 code.  This is a pure
 RPM bug).


<Prev in Thread] Current Thread [Next in Thread>