Ragnar Kjørstad wrote:
> > > For each open file you have:
> > >
> > > struct file (96b)
> > > struct inode (460b)
> > > struct dentry (112b)
> > >
> > > at least. This totals to 668M of kernel memory, that is, unpageable.
> > As I said below, the NFS server should be a user-space daemon..
> But the datastructures mentioned above would still be kernel-memory, so
> that will not solve the problem.
Actually, you are right. :-(
But once there is a filesystem change notification API, a userspace NFS
server application could create a "catch-all" notification rule which says
that it should be informed about every change within the whole VFS,
including deletion of objects (like a file or a filename). All the tracking
and NFS client notification then becomes a swappable user space problem.
If Nikita does not like 1M open files by filedescriptor on a server due to
memory scalability reasons, maybe he does like 1M accessible files by
object_id|inode. Then the server issues virtual NFS filehandles to clients
and stores internally which inode number maps to which virtual NFS
filehandles. For every file deleted, the NFS server checks which clients to
notify using the network. If this is not possible, then it is a
Or let it say the other way around: an inode number is a "ticket for
accessing a specified file on the server, issued by the server". This inode
number is state. (The same applies to directory cookies). Just because
currently the ticket cannot be invalidated by the server, it does not loose
its property to be state, it only is state which is never released. Because
the more files are opened by NFS on the server, the more state is given to
the client. Because the state is not released (the server has to accept
requests to files whose inode numbers have been sent some reboots before),
NFS imposes the nfs server to have a memory leak by design - or to be broken
in another way by design (like risking to refer a different file by the same
object_id; requiring the server to have unique object_ids over indefinite
space and time while object_ids are limited; requiring the server to use
only a limited amount directory cookies over indefinite space and time, thus
requiring reiserfs to be unneccessarily inefficient for directories (yet it
is more efficient than many other filesystems)).