[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Patch 1300 & rpm issue with 1.3.0
Eric Sandeen wrote:
> On Thu, 28 Aug 2003, Kai Leibrandt wrote:
>
>
>>That's just what I was thinking; is rpm only an indication that other
>>apps might have issues as well? If so, how do we identify them and
>>rectify the problems? In the kernel, or in the app?
>
>
> That's not clear to me yet, but we have dome some O_DIRECT stresstesting
> and it's all been fine. So this doesn't seem to be a problem with
> O_DIRECT in general, which makes me think it might be the app.
>
Using "strace" on a RH 2.4.20-20.9.XFS1.3.0 system to follow what "rpm" does
during an install, the key difference seems to be the following sequence:
WORKS (created a EXT3 partition, copied /var/lib/rpm/* to it, then mounted it at
/var/lib/rpm)
4217 access("/var/lib/rpm", W_OK) = 0 <0.000011>
4217 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or
directory) <0.000011>
4217 access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
4217 stat64("/var/lib/rpm/DB_CONFIG", 0xbffeeb60) = -1 ENOENT (No such file or
directory) <0.000019>
4217 brk(0) = 0x807e000 <0.000006>
4217 brk(0x807f000) = 0x807f000 <0.000008>
4217 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such
file or directory) <0.000011>
4217 stat64("/var/lib/rpm/__db.001", 0xbffeeb90) = -1 ENOENT (No such file or
directory) <0.000010>
4217 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE,
0644) = 4 <0.000044>
4217 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 <0.000007>
4217 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 0644) =
5 <0.000011>
4217 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
4217 _llseek(5, 0, [0], SEEK_END) = 0 <0.000006>
4217 _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000007>
4217 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192 <0.000137>
4217 mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x40019000
<0.000011>
4217 close(5) = 0 <0.000007>
FAILS (/var/lib/rpm resides on a XFS partition)
4144 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or
directory) <0.000010>
4144 access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
4144 stat64("/var/lib/rpm/DB_CONFIG", 0xbffef0e0) = -1 ENOENT (No such file or
directory) <0.000010>
4144 brk(0) = 0x807e000 <0.000006>
4144 brk(0x807f000) = 0x807f000 <0.000008>
4144 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such
file or directory) <0.000012>
4144 stat64("/var/lib/rpm/__db.001", 0xbffef110) = -1 ENOENT (No such file or
directory) <0.000010>
4144 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE,
0644) = 4 <0.000103>
4144 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
4144 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 0644) =
5 <0.000012>
4144 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
4144 _llseek(5, 0, [0], SEEK_END) = 0 <0.000006>
4144 _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000006>
4144 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = -1 EINVAL (Invalid argument) <0.000007>
4144 open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000016>
4144 open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000011>
4144 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT
(No such file or directory) <0.000013>
4144 open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000011>
4144 open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory) <0.000010>
4144 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No
such file or directory) <0.000013>
4144 write(2, "rpmdb: ", 7) = 7 <0.000017>
4144 write(2, "write: 0xbffed120, 8192: Invalid"..., 41) = 41 <0.000012>
4144 write(2, "\n", 1) = 1 <0.000012>
4144 close(5) = 0 <0.000008>
From the RPM 4.2 source, the file "__db.001" contains database environment
information and is used also
used to syncronize between multiple threads/processes. But the details of
how/why "rpm" uses this file
is not as significant as the different behavior shown in the example above:
There is a difference
in behavior between XFS and EXT3 with how sparse files are created/handled.
From the above example
it looks like XFS+O_DIRECT+sparse file creation is broken/not supported.
(Things work if LD_ASSUME_KERNEL is set because then "rpm" uses a different
method to control
its database accesses..... it never runs through the above offending code.
Although it finds that __db.001
is not there, it does not try to create it.)
Does this ring any bells with anyone ?
On the bright side, it DOES look like it is an application-specific combination
of factors that cause
the failure..... so the problem is not likely to be widely seen.
Jim Foris
> -Eric
>
>