Wouldn't GFS be a good idea then?
--
Austin Gonyou
Systems Architect, CCNA
Coremetrics, Inc.
Phone: 512-796-9023
email: austin@xxxxxxxxxxxxxxx
> -----Original Message-----
> From: Xuan Baldauf [mailto:xuan--reiserfs@xxxxxxxxxxx]
> Sent: Monday, July 16, 2001 2:34 PM
> To: Nikita Danilov
> Cc: Hans Reiser; Russell Coker; Chris Wedgwood;
> rsharpe@xxxxxxxxxx; Xuan
> Baldauf; Seth Mos; Federico Sevilla III; linux-xfs@xxxxxxxxxxx;
> reiserfs-list@xxxxxxxxxxx
> Subject: Re: [reiserfs-list] Re: benchmarks
>
>
>
>
> Nikita Danilov wrote:
>
> > Hans Reiser writes:
> > > Russell Coker wrote:
> > > >
> > > > On Mon, 16 Jul 2001 11:00, Chris Wedgwood wrote:
> > > > > On Sun, Jul 15, 2001 at 02:01:09PM +0400, Hans Reiser wrote:
> > > > >
> > > > > Making the server stateless is wrong
> > > > >
> > > > > why?
> > > >
> > > > Because it leads to all the problems we have seen!
> Why not have the client
> > > > have an open file handle (the way Samba works and the
> way the Unix file
> > > > system API works)? Then when the server goes down the
> client sends a request
> > > > to open the file again...
> >
> > If you have 10000 clients each opening 100 files you got 1e6 opened
> > files on the server---it wouldn't work. NFS was designed to
> be stateless
> > to be scalable.
>
> Every existing file has at least one name (or is member of
> the hidden to-be-deleted-directory, and so has
> a name, too), and an object_id. Suppose the object_id is 32
> bytes long. A virtual filedescriptor may be 4
> bytes long, some housekeeping metadata 28 bytes, so we will
> have 64MB occupied in you scenario. Where's
> the problem? 80% of those 64MB can be swapped out.
>
> >
> >
> > > >
> > > > > making the readdir a multioperation act is wrong
> > > > >
> > > > > why? i have 3M directories... ar you saying clients
> should read the
> > > > > whole things at once?
> > > >
> > > > No. findfirst()/findnext() is the correct way of
> doing this. Forcing the
> > > > client to read through 3M directory entries to see if
> "foo.*" matches
> > > > anything is just wrong. The client should be able to
> ask for a list of file
> > > > names matching certain criteria (time stamp, name,
> ownership, etc). The
> > > > findfirst() and findnext() APIs on DOS, OS/2, and
> Windows do this quite well.
> > >
> > > there is a fundamental conflict between having cookies,
> shrinkable directories, and the ability to
> > > find foo.* without reading the whole directory, all at
> the same time.
> > >
> > > NFS V4 is designed by braindead twerps incapable of
> layering software when designing it.
> >
> > Just cannot stand it. You mean that NFS v4 features
> database in a kernel
> > too? (It's our insider joke.)
>
> NFS should not be kernel-bound. The nfs server application
> mimics the applications on the NFS clients. If
> this is not possible, something is wrong.
>
> >
> >
> > >
> > > >
> > > > If you have 3M directory entries then SMB should kick
> butt over NFS.
> > > >
> > > > Also while we're at it, one of the worst things about
> NFS is the issue of
> > > > delete. Because it's stateless NFS servers implement
> unlink as "mv" and
> > > > things get nasty from there...
> > > >
> > > > --
> > > > http://www.coker.com.au/bonnie++/ Bonnie++ hard
> drive benchmark
> > > > http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
> > > > http://www.coker.com.au/projects.html Projects I am working on
> > > > http://www.coker.com.au/~russell/ My home page
> >
> > Nikita.
>
> Being "stateless" is only a weak way to implement
> disconnected operation. If there was state (if the
> server could know that a client has a filedescriptor to a
> file), the client could be informed that it's
> virtual file descriptors to the files to be deleted are
> invalid now. This only fails if the network is
> down, so this is a disconnected operation problem.
>
> By the way: if NFS was scaleable, why doesn't it allow every
> handle from the server (like inode numbers,
> directory position cookies) to be of variable, dynamic,
> server-determined size? This would be scaleable.
>
> P.S.: Hans, how do you prevent object_id|inode reusing? Using
> mount_id+generation counter per mount?
>
>
>
|