xfs
[Top] [All Lists]

Re: Patch 1300 & rpm issue with 1.3.0

To: "Foris, Jim (MED)" <james.foris@xxxxxxxxxx>
Subject: Re: Patch 1300 & rpm issue with 1.3.0
From: Russell Cattelan <cattelan@xxxxxxx>
Date: Thu, 28 Aug 2003 17:51:50 -0500
Cc: Eric Sandeen <sandeen@xxxxxxx>, Kai Leibrandt <k_leibrandt@xxxxxxxxxxx>, "'Simon Matter'" <simon.matter@xxxxxxxxxxxxxxxx>, "'Axel Thimm'" <Axel.Thimm@xxxxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <3F4E5AD3.80101@xxxxxxxxxx>
References: <Pine.LNX.4.44.0308280914100.19961-100000@xxxxxxxxxxxxxxxxxxxxxx> <3F4E5AD3.80101@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Thu, 2003-08-28 at 14:41, Foris, Jim (MED) wrote:
> Eric Sandeen wrote:
> > On Thu, 28 Aug 2003, Kai Leibrandt wrote:
> > 
> > 
> >>That's just what I was thinking; is rpm only an indication that other
> >>apps might have issues as well? If so, how do we identify them and
> >>rectify the problems? In the kernel, or in the app?
> > 
> > 
> > That's not clear to me yet, but we have dome some O_DIRECT stresstesting
> > and it's all been fine.  So this doesn't seem to be a problem with
> > O_DIRECT in general, which makes me think it might be the app.
> > 
> 
> Using "strace" on a RH 2.4.20-20.9.XFS1.3.0 system to follow what "rpm" does
> during an install, the key difference seems to be the following sequence:
> 
> WORKS (created a EXT3 partition, copied /var/lib/rpm/* to it, then mounted it 
> at 
> /var/lib/rpm)
I this ext2 or ext3?
ext2 will turn off O_DIRECT after the open call
ext3 was suppose to, eric has a new patch to fix that.

> 
> 4217  access("/var/lib/rpm", W_OK)      = 0 <0.000011>
> 4217  access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or 
> directory) <0.000011>
> 4217  access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
> 4217  stat64("/var/lib/rpm/DB_CONFIG", 0xbffeeb60) = -1 ENOENT (No such file 
> or 
> directory) <0.000019>
> 4217  brk(0)                            = 0x807e000 <0.000006>
> 4217  brk(0x807f000)                    = 0x807f000 <0.000008>
> 4217  open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
> such 
> file or directory) <0.000011>
> 4217  stat64("/var/lib/rpm/__db.001", 0xbffeeb90) = -1 ENOENT (No such file 
> or 
> directory) <0.000010>
> 4217  open("/var/lib/rpm/__db.001", 
> O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE, 
> 0644) = 4 <0.000044>
> 4217  fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0 <0.000007>
> 4217  open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 
> 0644) = 
> 5 <0.000011>
> 4217  fcntl64(5, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
> 4217  _llseek(5, 0, [0], SEEK_END)      = 0 <0.000006>
> 4217  _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000007>
> 4217  write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 8192) = 8192 <0.000137>
> 4217  mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x40019000 
> <0.000011>
> 4217  close(5)                          = 0 <0.000007>
> 
> 
> FAILS (/var/lib/rpm resides on a XFS partition)
XFS does not turn of O_DIRECT ... that is the point of this.

It appears db4 is breaking some O_DIRECT rule that is causing it
to fail.

fsx with directio support runs just fine on this kernel on and xfs
partition doing all sorts of things strange things to a file 
including creating lots of holes.

> 
> 4144  access("/var/lib/rpm/__db.001", F_OK) = -1 ENOENT (No such file or 
> directory) <0.000010>
> 4144  access("/var/lib/rpm/Packages", F_OK) = 0 <0.000011>
> 4144  stat64("/var/lib/rpm/DB_CONFIG", 0xbffef0e0) = -1 ENOENT (No such file 
> or 
> directory) <0.000010>
> 4144  brk(0)                            = 0x807e000 <0.000006>
> 4144  brk(0x807f000)                    = 0x807f000 <0.000008>
> 4144  open("/var/lib/rpm/DB_CONFIG", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No 
> such 
> file or directory) <0.000012>
> 4144  stat64("/var/lib/rpm/__db.001", 0xbffef110) = -1 ENOENT (No such file 
> or 
> directory) <0.000010>
> 4144  open("/var/lib/rpm/__db.001", 
> O_RDWR|O_CREAT|O_EXCL|O_DIRECT|O_LARGEFILE, 
> 0644) = 4 <0.000103>
> 4144  fcntl64(4, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
> 4144  open("/var/lib/rpm/__db.001", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 
> 0644) = 
> 5 <0.000012>
> 4144  fcntl64(5, F_SETFD, FD_CLOEXEC)   = 0 <0.000006>
> 4144  _llseek(5, 0, [0], SEEK_END)      = 0 <0.000006>
> 4144  _llseek(5, 8192, [8192], SEEK_CUR) = 0 <0.000006>
> 4144  write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 8192) = -1 EINVAL (Invalid argument) <0.000007>
> 4144  open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = 
> -1 
> ENOENT (No such file or directory) <0.000016>
> 4144  open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
> ENOENT (No such file or directory) <0.000011>
> 4144  open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
> ENOENT 
> (No such file or directory) <0.000013>
> 4144  open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
> ENOENT (No such file or directory) <0.000011>
> 4144  open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
> ENOENT (No such file or directory) <0.000010>
> 4144  open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT 
> (No 
> such file or directory) <0.000013>
> 4144  write(2, "rpmdb: ", 7)            = 7 <0.000017>
> 4144  write(2, "write: 0xbffed120, 8192: Invalid"..., 41) = 41 <0.000012>
> 4144  write(2, "\n", 1)                 = 1 <0.000012>
> 4144  close(5)                          = 0 <0.000008>
> 
> 
>  From the RPM 4.2 source, the file "__db.001" contains database environment 
> information and is used also
> used to syncronize between multiple threads/processes.  But the details of 
> how/why "rpm" uses this file
> is not as significant as the different behavior shown in the example above: 
> There is a difference
> in behavior between XFS and EXT3 with how sparse files are created/handled. 
>  From the above example
> it looks like XFS+O_DIRECT+sparse file creation is broken/not supported.
> 
> (Things work if LD_ASSUME_KERNEL is set because then "rpm" uses a different 
> method to control
>    its database accesses..... it never runs through the above offending code. 
> Although it finds that __db.001
>    is not there, it does not try to create it.)
> 
> Does this ring any bells with anyone ?
> 
> On the bright side, it DOES look like it is an application-specific 
> combination 
> of factors that cause
> the failure..... so the problem is not likely to be widely seen.
> 
> Jim Foris
> 
> 
> 
> 
> 
> 
> 
> > -Eric
> > 
> > 
> 


<Prev in Thread] Current Thread [Next in Thread>