xfs
[Top] [All Lists]

Re: [reiserfs-list] Re: benchmarks

To: Nikita Danilov <NikitaDanilov@xxxxxxxxx>
Subject: Re: [reiserfs-list] Re: benchmarks
From: Xuan Baldauf <xuan--reiserfs@xxxxxxxxxxx>
Date: Mon, 16 Jul 2001 22:15:48 +0200
Cc: Russell Coker <russell@xxxxxxxxxxxx>, Chris Wedgwood <cw@xxxxxxxx>, rsharpe@xxxxxxxxxx, Seth Mos <knuffie@xxxxxxxxx>, Federico Sevilla III <jijo@xxxxxxxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx, reiserfs-list@xxxxxxxxxxx
References: <Pine.BSI.4.10.10107141752080.18419-100000@xxxxxxxxxxxxx> <3B5169E5.827BFED@xxxxxxxxxxx> <20010716210029.I11938@xxxxxxxxxxxxx> <20010716101313.2DC3E965@xxxxxxxxxxxxxxxxx> <3B52C49F.9FE1F503@xxxxxxxxxxx> <15186.51514.66966.458597@xxxxxxxxxxxxxxxx> <3B5341BA.1F68F755@xxxxxxxxxxx> <15187.18225.196286.123754@xxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx

Nikita Danilov wrote:

> Xuan Baldauf writes:
>  >
>  >
>  > Nikita Danilov wrote:
>  >
>  > > Hans Reiser writes:
>  > >  > Russell Coker wrote:
>  > >  > >
>  > >  > > On Mon, 16 Jul 2001 11:00, Chris Wedgwood wrote:
>  > >  > > > On Sun, Jul 15, 2001 at 02:01:09PM +0400, Hans Reiser wrote:
>  > >  > > >
>  > >  > > >     Making the server stateless is wrong
>  > >  > > >
>  > >  > > > why?
>  > >  > >
>  > >  > > Because it leads to all the problems we have seen!  Why not have 
> the client
>  > >  > > have an open file handle (the way Samba works and the way the Unix 
> file
>  > >  > > system API works)?  Then when the server goes down the client sends 
> a request
>  > >  > > to open the file again...
>  > >
>  > > If you have 10000 clients each opening 100 files you got 1e6 opened
>  > > files on the server---it wouldn't work. NFS was designed to be stateless
>  > > to be scalable.
>  >
>  > Every existing file has at least one name (or is member of the hidden 
> to-be-deleted-directory, and so has
>  > a name, too), and an object_id. Suppose the object_id is 32 bytes long. A 
> virtual filedescriptor may be 4
>  > bytes long, some housekeeping metadata 28 bytes, so we will have 64MB 
> occupied in you scenario. Where's
>  > the problem? 80% of those 64MB can be swapped out.
>
> For each open file you have:
>
>  struct file (96b)
>  struct inode (460b)
>  struct dentry (112b)
>
> at least. This totals to 668M of kernel memory, that is, unpageable.

As I said below, the NFS server should be a user-space daemon..

>
> All files are kept in several hash tables and hash-tables are known to
> degrade. Well, actually, I am afraid current Linux kernels cannot open
> 1e6 of files.

This is one thing I cannot understand, too.

>
>
>  >
>  > >
>  > >
>  > >  > >
>  > >  > > >     making the readdir a multioperation act is wrong
>  > >  > > >
>  > >  > > > why? i have 3M directories... ar you saying clients should read 
> the
>  > >  > > > whole things at once?
>  > >  > >
>  > >  > > No.  findfirst()/findnext() is the correct way of doing this.  
> Forcing the
>  > >  > > client to read through 3M directory entries to see if "foo.*" 
> matches
>  > >  > > anything is just wrong.  The client should be able to ask for a 
> list of file
>  > >  > > names matching certain criteria (time stamp, name, ownership, etc). 
>  The
>  > >  > > findfirst() and findnext() APIs on DOS, OS/2, and Windows do this 
> quite well.
>  > >  >
>  > >  > there is a fundamental conflict between having cookies, shrinkable 
> directories, and the ability to
>  > >  > find foo.* without reading the whole directory, all at the same time.
>  > >  >
>  > >  > NFS V4 is designed by braindead twerps incapable of layering software 
> when designing it.
>  > >
>  > > Just cannot stand it. You mean that NFS v4 features database in a kernel
>  > > too? (It's our insider joke.)
>  >
>  > NFS should not be kernel-bound. The nfs server application mimics the 
> applications on the NFS clients. If
>  > this is not possible, something is wrong.
>  >
>  > >
>  > >
>  > >  >
>  > >  > >
>  > >  > > If you have 3M directory entries then SMB should kick butt over NFS.
>  > >  > >
>  > >  > > Also while we're at it, one of the worst things about NFS is the 
> issue of
>  > >  > > delete.  Because it's stateless NFS servers implement unlink as 
> "mv" and
>  > >  > > things get nasty from there...
>  > >  > >
>  > >  > > --
>  > >  > > http://www.coker.com.au/bonnie++/     Bonnie++ hard drive benchmark
>  > >  > > http://www.coker.com.au/postal/       Postal SMTP/POP benchmark
>  > >  > > http://www.coker.com.au/projects.html Projects I am working on
>  > >  > > http://www.coker.com.au/~russell/     My home page
>  > >
>  > > Nikita.
>  >
>  > Being "stateless" is only a weak way to implement disconnected
>  > operation. If there was state (if the
>  > server could know that a client has a filedescriptor to a file), the
>  > client could be informed that it's
>  > virtual file descriptors to the files to be deleted are invalid
>  > now. This only fails if the network is
>  > down, so this is a disconnected operation problem.
>  >
>  > By the way: if NFS was scaleable, why doesn't it allow every handle
>  > from the server (like inode numbers,
>  > directory position cookies) to be of variable, dynamic,
>  > server-determined size? This would be scaleable.
>  >
>  > P.S.: Hans, how do you prevent object_id|inode reusing? Using
>  > mount_id+generation counter per mount?
>
> There is generation counter stored persistently with super-block,
> incremented on each inode deletion.

Is the superblock always logged on inode-deletion for other reasons than the 
generation counter? If not, would
the above method be more efficient, because it does not require logging 
superblocks?

>
>
>  >
>
> Nikita.


<Prev in Thread] Current Thread [Next in Thread>