On Thu, 2003-08-28 at 14:41, Foris, Jim (MED) wrote:
> Eric Sandeen wrote:
> > On Thu, 28 Aug 2003, Kai Leibrandt wrote:
> >
> >
> >>That's just what I was thinking; is rpm only an indication that other
> >>apps might have issues as well? If so, how do we identify them and
> >>rectify the problems? In the kernel, or in the app?
> >
> >
> > That's not clear to me yet, but we have dome some O_DIRECT stresstesting
> > and it's all been fine. So this doesn't seem to be a problem with
> > O_DIRECT in general, which makes me think it might be the app.
> >
>
> Using "strace" on a RH 2.4.20-20.9.XFS1.3.0 system to follow what "rpm" does
> during an install, the key difference seems to be the following sequence:
>
> WORKS (created a EXT3 partition, copied /var/lib/rpm/* to it, then mounted it
> at
> /var/lib/rpm)
I this ext2 or ext3?
ext2 will turn off O_DIRECT after the open call
ext3 was suppose to, eric has a new patch to fix that.
>
> 4217 access("/var/lib/rpm", W_OK) = 0 <0.000011>
> 4217 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or
> directory) <0.000011>
> 4217 access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
> 4217 stat64("/var/lib/rpm/DB_CONFIG", 0xbffeeb60) = -1 ENOENT (No such file
> or
> directory) <0.000019>
> 4217 brk(0) = 0x807e000 <0.000006>
> 4217 brk(0x807f000) = 0x807f000 <0.000008>
> 4217 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No
> such
> file or directory) <0.000011>
> 4217 stat64("/var/lib/rpm/__db.001", 0xbffeeb90) = -1 ENOENT (No such file
> or
> directory) <0.000010>
> 4217 open("/var/lib/rpm/__db.001",
> O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE,
> 0644) = 4 <0.000044>
> 4217 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 <0.000007>
> 4217 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE,
> 0644) =
> 5 <0.000011>
> 4217 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
> 4217 _llseek(5, 0, [0], SEEK_END) = 0 <0.000006>
> 4217 _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000007>
> 4217 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 8192) = 8192 <0.000137>
> 4217 mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x40019000
> <0.000011>
> 4217 close(5) = 0 <0.000007>
>
>
> FAILS (/var/lib/rpm resides on a XFS partition)
XFS does not turn of O_DIRECT ... that is the point of this.
It appears db4 is breaking some O_DIRECT rule that is causing it
to fail.
fsx with directio support runs just fine on this kernel on and xfs
partition doing all sorts of things strange things to a file
including creating lots of holes.
>
> 4144 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or
> directory) <0.000010>
> 4144 access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
> 4144 stat64("/var/lib/rpm/DB_CONFIG", 0xbffef0e0) = -1 ENOENT (No such file
> or
> directory) <0.000010>
> 4144 brk(0) = 0x807e000 <0.000006>
> 4144 brk(0x807f000) = 0x807f000 <0.000008>
> 4144 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No
> such
> file or directory) <0.000012>
> 4144 stat64("/var/lib/rpm/__db.001", 0xbffef110) = -1 ENOENT (No such file
> or
> directory) <0.000010>
> 4144 open("/var/lib/rpm/__db.001",
> O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE,
> 0644) = 4 <0.000103>
> 4144 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
> 4144 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE,
> 0644) =
> 5 <0.000012>
> 4144 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
> 4144 _llseek(5, 0, [0], SEEK_END) = 0 <0.000006>
> 4144 _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000006>
> 4144 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 8192) = -1 EINVAL (Invalid argument) <0.000007>
> 4144 open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) =
> -1
> ENOENT (No such file or directory) <0.000016>
> 4144 open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
> ENOENT (No such file or directory) <0.000011>
> 4144 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1
> ENOENT
> (No such file or directory) <0.000013>
> 4144 open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
> ENOENT (No such file or directory) <0.000011>
> 4144 open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
> ENOENT (No such file or directory) <0.000010>
> 4144 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT
> (No
> such file or directory) <0.000013>
> 4144 write(2, "rpmdb: ", 7) = 7 <0.000017>
> 4144 write(2, "write: 0xbffed120, 8192: Invalid"..., 41) = 41 <0.000012>
> 4144 write(2, "\n", 1) = 1 <0.000012>
> 4144 close(5) = 0 <0.000008>
>
>
> From the RPM 4.2 source, the file "__db.001" contains database environment
> information and is used also
> used to syncronize between multiple threads/processes. But the details of
> how/why "rpm" uses this file
> is not as significant as the different behavior shown in the example above:
> There is a difference
> in behavior between XFS and EXT3 with how sparse files are created/handled.
> From the above example
> it looks like XFS+O_DIRECT+sparse file creation is broken/not supported.
>
> (Things work if LD_ASSUME_KERNEL is set because then "rpm" uses a different
> method to control
> its database accesses..... it never runs through the above offending code.
> Although it finds that __db.001
> is not there, it does not try to create it.)
>
> Does this ring any bells with anyone ?
>
> On the bright side, it DOES look like it is an application-specific
> combination
> of factors that cause
> the failure..... so the problem is not likely to be widely seen.
>
> Jim Foris
>
>
>
>
>
>
>
> > -Eric
> >
> >
>
|