xfs
[Top] [All Lists]

Re: xfs block size

To: "Davida, Joe" <Joe_Davida@xxxxxxxxxx>
Subject: Re: xfs block size
From: Steve Lord <lord@xxxxxxx>
Date: Wed, 17 Jan 2001 13:54:02 -0600
Cc: "'linux-xfs@xxxxxxxxxxx '" <linux-xfs@xxxxxxxxxxx>
In-reply-to: Message from "Davida, Joe" <Joe_Davida@xxxxxxxxxx> of "Wed, 17 Jan 2001 10:45:17 MST." <09D1E9BD9C30D311919200A0C9DD5C2C025370A4@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
>       On the web page you sited, I read:
> 
> "...We are using pages to cache metadata, pages are
> allocated one at a time, so each page sized chunk
> of memory usually is not adjacent in the address
> space to the page covering the next block of the
> disk."
> 
>       How does FreeBSD's UFS let you mkfs filesystems
>       with 16K FS pagesize? It is on same X86 architecture
>       as Linux!

A. It is a totally different filesystem running on a different
   operating system.

> 
>       Also, why does the XFS have to allocate metadata
>       pages that are same size as FS logical block size?
>       At the expense of trying to answer my own question,
>       I vaguely recall reading - and I might be wrong here -
>       that for very small files, the file data is stored
>       in the metadata (inode) page, which is only 1
>       mmu-sized 4K page.
>       Is this correct?, and if so,


This is not correct, XFS does not store data for small files in the
inode - half the support is there - a nice project for someone who
wants to really get to know XFS.

The reason for XFS being this way is that it was originally written on
an OS where the kernel guaranteed the availability of large chunks of
contiguous memory in the kernel, linux does not guarantee this, the larger
the size memory you ask for in the kernel, the greater the chance the
allocation will fail, it will not fail at the page size level unless
things are really bad.

Some of the metadata structures in XFS are extremely complex (directory
blocks, btree structures etc). The code which manipulates these is all
written under the assumption that you can read them into a buffer, and
then manipulate them as a contiguous chunk of memory. Rewriting this
code is not an option because a) it would destroy the stability of
the filesystem (the current code base probably has millions of cpu
hours on it on Irix) and b) it would be a very large task.

This leaves us with finding ways of getting contiguous memory in a
guaranteed manner. One option was to use the VM system to remap the
pages into contiguous address space - this has been rejected because:
a) Linus has many times stated he will not accept code that does this,
b) it is an expensive operation on the intel architecture - especially
once you have multiple cpus (it involves interrupting them all and
synchronizing them).

Finally, this work has not been at the top of our priority list, and 
may not be for some time. However, it does appear to be high on your list,
and has been for some time. Since Maxtor is looking at xfs from the point
of view of something it could use in a product it makes money off, you
should perhaps be making more of an investment in Linux here.

Steve


>       a. Is this the primary reason why you currently only
>          support 4K FS logical block size?
> 
>       b. Could this be removed, and make all data live in
>          seperate data blocks (extents) in the tree, as is
>          the case with large files?
> 
>       c. Could it be made dependent on user selected block
>          size during mkfs? i.e., if user selects 4K blocksize,
>          then default to current implementation, else
>          let all metadata live in in pages seperately from
>          data.
> 
>       Cheers,
> 
>       Joe
> 
> 


<Prev in Thread] Current Thread [Next in Thread>