xfs
[Top] [All Lists]

Re: Patch 1300 & rpm issue with 1.3.0

To: Eric Sandeen <sandeen@xxxxxxx>
Subject: Re: Patch 1300 & rpm issue with 1.3.0
From: "Foris, Jim (MED)" <foris@xxxxxxxxxxxxxxxx>
Date: Thu, 28 Aug 2003 14:41:07 -0500
Cc: Kai Leibrandt <k_leibrandt@xxxxxxxxxxx>, "'Simon Matter'" <simon.matter@xxxxxxxxxxxxxxxx>, "'Axel Thimm'" <Axel.Thimm@xxxxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.LNX.4.44.0308280914100.19961-100000@xxxxxxxxxxxxxxxxxxxxxx>
References: <Pine.LNX.4.44.0308280914100.19961-100000@xxxxxxxxxxxxxxxxxxxxxx>
Reply-to: "Foris, Jim (MED)" <james.foris@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030314
Eric Sandeen wrote:
On Thu, 28 Aug 2003, Kai Leibrandt wrote:


That's just what I was thinking; is rpm only an indication that other
apps might have issues as well? If so, how do we identify them and
rectify the problems? In the kernel, or in the app?


That's not clear to me yet, but we have dome some O_DIRECT stresstesting
and it's all been fine.  So this doesn't seem to be a problem with
O_DIRECT in general, which makes me think it might be the app.


Using "strace" on a RH 2.4.20-20.9.XFS1.3.0 system to follow what "rpm" does
during an install, the key difference seems to be the following sequence:

WORKS (created a EXT3 partition, copied /var/lib/rpm/* to it, then mounted it at /var/lib/rpm)

4217  access("/var/lib/rpm", W_OK)      = 0 <0.000011>
4217 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or directory) <0.000011>
4217  access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
4217 stat64("/var/lib/rpm/DB_CONFIG", 0xbffeeb60) = -1 ENOENT (No such file or directory) <0.000019>
4217  brk(0)                            = 0x807e000 <0.000006>
4217  brk(0x807f000)                    = 0x807f000 <0.000008>
4217 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) <0.000011> 4217 stat64("/var/lib/rpm/__db.001", 0xbffeeb90) = -1 ENOENT (No such file or directory) <0.000010> 4217 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE, 0644) = 4 <0.000044>
4217  fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0 <0.000007>
4217 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 0644) = 5 <0.000011>
4217  fcntl64(5, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
4217  _llseek(5, 0, [0], SEEK_END)      = 0 <0.000006>
4217  _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000007>
4217 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.000137> 4217 mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x40019000 <0.000011>
4217  close(5)                          = 0 <0.000007>


FAILS (/var/lib/rpm resides on a XFS partition)

4144 access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or directory) <0.000010>
4144  access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
4144 stat64("/var/lib/rpm/DB_CONFIG", 0xbffef0e0) = -1 ENOENT (No such file or directory) <0.000010>
4144  brk(0)                            = 0x807e000 <0.000006>
4144  brk(0x807f000)                    = 0x807f000 <0.000008>
4144 open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory) <0.000012> 4144 stat64("/var/lib/rpm/__db.001", 0xbffef110) = -1 ENOENT (No such file or directory) <0.000010> 4144 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE, 0644) = 4 <0.000103>
4144  fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
4144 open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 0644) = 5 <0.000012>
4144  fcntl64(5, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
4144  _llseek(5, 0, [0], SEEK_END)      = 0 <0.000006>
4144  _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000006>
4144 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = -1 EINVAL (Invalid argument) <0.000007> 4144 open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000016> 4144 open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000011> 4144 open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013> 4144 open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000011> 4144 open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000010> 4144 open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000013>
4144  write(2, "rpmdb: ", 7)            = 7 <0.000017>
4144  write(2, "write: 0xbffed120, 8192: Invalid"..., 41) = 41 <0.000012>
4144  write(2, "\n", 1)                 = 1 <0.000012>
4144  close(5)                          = 0 <0.000008>


From the RPM 4.2 source, the file "__db.001" contains database environment information and is used also used to syncronize between multiple threads/processes. But the details of how/why "rpm" uses this file is not as significant as the different behavior shown in the example above: There is a difference in behavior between XFS and EXT3 with how sparse files are created/handled. From the above example
it looks like XFS+O_DIRECT+sparse file creation is broken/not supported.

(Things work if LD_ASSUME_KERNEL is set because then "rpm" uses a different method to control its database accesses..... it never runs through the above offending code. Although it finds that __db.001
  is not there, it does not try to create it.)

Does this ring any bells with anyone ?

On the bright side, it DOES look like it is an application-specific combination of factors that cause
the failure..... so the problem is not likely to be widely seen.

Jim Foris







-Eric




<Prev in Thread] Current Thread [Next in Thread>