xfs
[Top] [All Lists]

Re: [reiserfs-list] Re: benchmarks

To: Xuan Baldauf <xuan--reiserfs@xxxxxxxxxxx>
Subject: Re: [reiserfs-list] Re: benchmarks
From: Nikita Danilov <NikitaDanilov@xxxxxxxxx>
Date: Mon, 16 Jul 2001 23:57:37 +0400
Cc: Russell Coker <russell@xxxxxxxxxxxx>, Chris Wedgwood <cw@xxxxxxxx>, rsharpe@xxxxxxxxxx, Seth Mos <knuffie@xxxxxxxxx>, Federico Sevilla III <jijo@xxxxxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx, reiserfs-list@xxxxxxxxxxx
In-reply-to: <3B5341BA.1F68F755@baldauf.org>
References: <Pine.BSI.4.10.10107141752080.18419-100000@xs3.xs4all.nl> <3B5169E5.827BFED@namesys.com> <20010716210029.I11938@weta.f00f.org> <20010716101313.2DC3E965@lyta.coker.com.au> <3B52C49F.9FE1F503@namesys.com> <15186.51514.66966.458597@beta.namesys.com> <3B5341BA.1F68F755@baldauf.org>
Sender: owner-linux-xfs@xxxxxxxxxxx
Xuan Baldauf writes:
 > 
 > 
 > Nikita Danilov wrote:
 > 
 > > Hans Reiser writes:
 > >  > Russell Coker wrote:
 > >  > >
 > >  > > On Mon, 16 Jul 2001 11:00, Chris Wedgwood wrote:
 > >  > > > On Sun, Jul 15, 2001 at 02:01:09PM +0400, Hans Reiser wrote:
 > >  > > >
 > >  > > >     Making the server stateless is wrong
 > >  > > >
 > >  > > > why?
 > >  > >
 > >  > > Because it leads to all the problems we have seen!  Why not have the 
 > > client
 > >  > > have an open file handle (the way Samba works and the way the Unix 
 > > file
 > >  > > system API works)?  Then when the server goes down the client sends a 
 > > request
 > >  > > to open the file again...
 > >
 > > If you have 10000 clients each opening 100 files you got 1e6 opened
 > > files on the server---it wouldn't work. NFS was designed to be stateless
 > > to be scalable.
 > 
 > Every existing file has at least one name (or is member of the hidden 
 > to-be-deleted-directory, and so has
 > a name, too), and an object_id. Suppose the object_id is 32 bytes long. A 
 > virtual filedescriptor may be 4
 > bytes long, some housekeeping metadata 28 bytes, so we will have 64MB 
 > occupied in you scenario. Where's
 > the problem? 80% of those 64MB can be swapped out.

For each open file you have:

 struct file (96b)
 struct inode (460b)
 struct dentry (112b)

at least. This totals to 668M of kernel memory, that is, unpageable.
All files are kept in several hash tables and hash-tables are known to
degrade. Well, actually, I am afraid current Linux kernels cannot open
1e6 of files.

 > 
 > >
 > >
 > >  > >
 > >  > > >     making the readdir a multioperation act is wrong
 > >  > > >
 > >  > > > why? i have 3M directories... ar you saying clients should read the
 > >  > > > whole things at once?
 > >  > >
 > >  > > No.  findfirst()/findnext() is the correct way of doing this.  
 > > Forcing the
 > >  > > client to read through 3M directory entries to see if "foo.*" matches
 > >  > > anything is just wrong.  The client should be able to ask for a list 
 > > of file
 > >  > > names matching certain criteria (time stamp, name, ownership, etc).  
 > > The
 > >  > > findfirst() and findnext() APIs on DOS, OS/2, and Windows do this 
 > > quite well.
 > >  >
 > >  > there is a fundamental conflict between having cookies, shrinkable 
 > > directories, and the ability to
 > >  > find foo.* without reading the whole directory, all at the same time.
 > >  >
 > >  > NFS V4 is designed by braindead twerps incapable of layering software 
 > > when designing it.
 > >
 > > Just cannot stand it. You mean that NFS v4 features database in a kernel
 > > too? (It's our insider joke.)
 > 
 > NFS should not be kernel-bound. The nfs server application mimics the 
 > applications on the NFS clients. If
 > this is not possible, something is wrong.
 > 
 > >
 > >
 > >  >
 > >  > >
 > >  > > If you have 3M directory entries then SMB should kick butt over NFS.
 > >  > >
 > >  > > Also while we're at it, one of the worst things about NFS is the 
 > > issue of
 > >  > > delete.  Because it's stateless NFS servers implement unlink as "mv" 
 > > and
 > >  > > things get nasty from there...
 > >  > >
 > >  > > --
 > >  > > http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
 > >  > > http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
 > >  > > http://www.coker.com.au/projects.html Projects I am working on
 > >  > > http://www.coker.com.au/~russell/     My home page
 > >
 > > Nikita.
 > 
 > Being "stateless" is only a weak way to implement disconnected
 > operation. If there was state (if the
 > server could know that a client has a filedescriptor to a file), the
 > client could be informed that it's
 > virtual file descriptors to the files to be deleted are invalid
 > now. This only fails if the network is
 > down, so this is a disconnected operation problem.
 >
 > By the way: if NFS was scaleable, why doesn't it allow every handle
 > from the server (like inode numbers,
 > directory position cookies) to be of variable, dynamic,
 > server-determined size? This would be scaleable.
 >
 > P.S.: Hans, how do you prevent object_id|inode reusing? Using
 > mount_id+generation counter per mount?

There is generation counter stored persistently with super-block,
incremented on each inode deletion.

 > 

Nikita.


<Prev in Thread] Current Thread [Next in Thread>