File: [Development] / xfs-website / faq.html (download) (as text)
Revision 1.92, Fri Jul 21 06:07:10 2006 UTC (11 years, 2 months ago) by nathans
Branch: MAIN
Changes since 1.91: +1 -1
lines
Update dir2 corruption entry to fix extent zero typo.
|
<& xfsTemplate,top=>1,side=>1 &>
<FONT FACE="ARIAL NARROW, HELVETICA" SIZE="5"><B>XFS FAQ</B></FONT>
<FONT FACE="ARIAL NARROW, HELVETICA">
<p>
Quick links:
</p>
<ul>
<li>Introductory questions:<br>
<a href="#xfsfaq">Where can I find this FAQ?</A><br>
<a href="#xfsinfo">Where can I find information about XFS?</A><br>
<a href="#aclinfo">Where can I find information about ACLs?</A><br>
</li>
<li>General questions:<br>
<a href="#partitiontype">What partition type should I use for XFS?</A><br>
<a href="#mountoptions">What mount options does XFS have?</A><br>
<a href="#xfsprogskernel">Is there any relation between the XFS utilities and the kernel version?</A><br>
</li>
<li>Functionality questions:<br>
<a href="#platformsrun">Does it run on platforms other than i386?</A><br>
<a href="#quotaswork">Do quotas work on XFS?</A><br>
<a href="#dumprestore">Are there any dump/restore tools for XFS?</A><br>
<a href="#lilowork">Does LILO work with XFS?</A><br>
<a href="#grubwork">Does GRUB work with XFS?</A><br>
<a href="#usexfsroot">Can XFS be used for a root filesystem?</A><br>
<a href="#useirixxfs">Will I be able to use IRIX XFS filesystems on Linux?</A><br>
<a href="#resize">Is there a way to make a XFS filesystem larger or smaller?</A><br>
</li>
<li>Problems:<br>
<a href="#problemreport">What info should I include when reporting a problem?</A><br>
<a href="#xfsmountfail">Mounting a XFS filesystem does not work - what is wrong?</A><br>
<a href="#undelete">Does the filesystem have a undelete function?</A><br>
<a href="#backingupxfs">How can I backup a XFS filesystem and ACLs?</A><br>
<a href="#error990">I see applications returning error 990, what is wrong?</A><br>
<a href="#nulls">Why do I see binary NULLS in my files after recovery when I unplugged the power?</A><br>
</li>
<li>Write Back Cache:<br>
<a href="#wcache">What is the problem with the write cache on journaled filesystems?</a><br>
<a href="#wcache_query">How can I tell if I have the write cache enabled?</a><br>
<a href="#wcache_fix">How can I address the problem with the write cache?</a><br>
<a href="#wcache_persistent">Should barriers be enabled with storage which
has a persistent write cache?</a><br>
<li>Directory corruption issue:<br>
<a href="#dir2">What is the issue with directory corruption in Linux 2.6.17?</a><br>
</li>
</ul>
<A name="xfsfaq"></A>
<h2>
Q: Where can I find this FAQ?
</h2>
<p>
<A HREF="http://oss.sgi.com/projects/xfs/faq.html">http://oss.sgi.com/projects/xfs/faq.html</A>
</p>
<p>
Many thanks to earlier maintainers of this document - Thomas Graichen
and Seth Mos.
</p>
<A name="xfsinfo"></A>
<h2>
Q: Where can I find documentation about XFS?
</h2>
<p>
The SGI XFS project page
<a href="http://oss.sgi.com/projects/xfs/">http://oss.sgi.com/projects/xfs/</a>
is the definitive reference.
It contains pointers to whitepapers, books, articles, etc.
</p>
<p>
You could also join the <A HREF="mail.html">xfs</A> mailing list,
and there is also the <b>#xfs</b> IRC channel on <i>irc.freenode.net</i>.
</p>
<A name="aclinfo"></A>
<h2>
Q: Where can I find documentation about ACLs?
</h2>
<p>
Andreas Gruenbacher maintains the Extended Attribute and POSIX ACL
documentation for Linux at
<a href="http://acl.bestbits.at/">http://acl.bestbits.at/</a>
</p>
<p>
The <b>acl(5)</b> manual page is also quite extensive.
</p>
<A name="partitiontype"></A>
<h2>
Q: What partition type should I use for XFS on Linux?
</h2>
<p>
Linux native filesystem (83).
</p>
<A name="mountoptions"></A>
<h2>
Q: What mount options does XFS have?
</h2>
<p>
There are a number of mount options influencing XFS filesystems -
refer to the <b>mount(8)</b> manual page or the documentation in the
kernel source tree itself (<tt>Documentation/filesystems/xfs.txt</tt>)
</p>
<A name="xfsprogskernel"></A>
<h2>
Q: Is there any relation between the XFS utilities and the kernel version?
</h2>
<p>
No, there is no relation.
Newer utilities tend to mainly have fixes and checks the previous versions
might not have.
New features are also added in a backward compatible way - if they are
enabled via mkfs, an incapable (old) kernel will recognize that it does
not understand the new feature, and refuse to mount the filesystem.
</p>
<A name="platformsrun"></A>
<h2>
Q: Does it run on platforms other than i386?
</h2>
<p>
XFS runs on all of the platforms that Linux supports.
It is more tested on the more common platforms, especially the i386 family.
Its also well tested on the IA64 platform since thats the platform SGI
Linux products use.
</p>
<A name="quotaswork"></A>
<h2>
Q: Do quotas work on XFS?
</h2>
<p>
Yes.
</p>
<p>
To use quotas with XFS, you need to enable XFS quota support when you
configure your kernel. You also need to specify quota support when mounting.
You can get the Linux quota utilities at their sourceforge website
<a href="http://sourceforge.net/projects/linuxquota/">
http://sourceforge.net/projects/linuxquota/</a> or use <b>xfs_quota(8)</b>.
</p>
<A name="dumprestore"></A>
<h2>
Q: Are there any dump/restore tools for XFS?
</h2>
<p>
<b>xfsdump(8)</b> and <b>xfsrestore(8)</b> are fully supported.
The tape format is the same as on IRIX, so tapes are interchangeable
between operating systems.
</p>
<A name="lilowork"></A>
<h2>
Q: Does LILO work with XFS?
</h2>
<p>
This depends on where you install LILO.
</p>
<p>
Yes, for MBR (Master Boot Record) installations.
</p>
<p>
No, for root partition installations because the XFS superblock is
written at block zero, where LILO would be installed.
This is to maintain compatibility with the IRIX on-disk format, and
will not be changed.
</p>
<A name="grubwork"></A>
<h2>
Q: Does GRUB work with XFS?
</h2>
<p>
There is native XFS filesystem support for GRUB starting with
version 0.91 and onward.
Unfortunately, GRUB used to make incorrect assumptions about being
able to read a block device image while a filesystem is mounted
and actively being written to, which could cause intermittent problems
when using XFS.
This has reportedly since been fixed, and the 0.97 version (at least)
of GRUB is apparently stable.
</p>
<A name="usexfsroot"></A>
<h2>
Q: Can XFS be used for a root filesystem?
</h2>
<p>
Yes.
</p>
<A name="useirixxfs"></A>
<h2>
Q: Will I be able to use my IRIX XFS filesystems on Linux?
</h2>
<p>
Yes. The on-disk format of XFS is the same on IRIX and Linux. Obviously,
you should back-up your data before trying to move it between systems.
Filesystems must be "clean" when moved (i.e. unmounted).
If you plan to use IRIX filesystems on Linux keep the following points in mind:
the kernel needs to have SGI partition support enabled;
there is no XLV support in Linux, so you are unable to read IRIX filesystems
which use the XLV volume manager; also not all blocksizes available on
IRIX are available on Linux (only blocksizes less than or equal to the pagesize
of the architecture: 4k for i386, ppc, ... 8k for alpha, sparc, ... is
possible for now).
Make sure that the directory format is version 2 on the IRIX filesystems
(this is the default since IRIX 6.5.5).
Linux can only read v2 directories.
</p>
<A name="resize"></A>
<h2>
Q: Is there a way to make a XFS filesystem larger or smaller?
</h2>
<p>
You can <em>NOT</em> make a XFS partition smaller online.
The only way to shrink is to do a complete dump, mkfs and restore.
</p>
<p>
An XFS filesystem may be enlarged by using
<b>xfs_growfs(8)</b>.
<p>
If using partitions, you need to have free space after this partition to do so.
Remove partition, recreate it larger with the <em>exact same</em> starting
point.
Run <b>xfs_growfs</b> to make the partition larger.
Note - editing partition tables is a dangerous pastime, so
back up your filesystem before doing so.
</p>
<p>
Using XFS filesystems on top of a volume manager makes this a lot easier.
</p>
<A name="problemreport"></A>
<h2>
Q: What information should I include when reporting a problem?
</h2>
<p>
Things to include are what version of XFS you are using, if this is
a CVS version of what date and version of the kernel.
If you have problems with userland packages please report the version of the
package you are using.
</p>
<p>
If the problem relates to a particular filesystem, the output from the
<b>xfs_info(8)</b> command and any <b>mount(8)</b> options in use will
also be useful to the developers.
</p>
<p>
If you experience an oops, please run it through <b>ksymoops</b> so that
it can be interpreted.
</p>
<A name="xfsmountfail"></A>
<h2>
Q: Mounting a XFS filesystem does not work - what is wrong?
</h2>
<p>
If mount prints an error message something like:
</p>
<pre>
mount: /dev/hda5 has wrong major or minor number
</pre>
<p>
you either do not have XFS compiled into the kernel (or you forgot
to load the modules) or you did not use the "-t xfs" option on mount
or the "xfs" option in <tt>/etc/fstab</tt>.
</p>
<p>
If you get something like:
<pre>
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
or too many mounted file systems
</pre>
Refer to your system log file (<tt>/var/log/messages</tt>) for a
detailed diagnostic message from the kernel.
</p>
<A name="undelete"></A>
<h2>
Q: Does the filesystem have an undelete capability?
</h2>
<p>
There is no undelete in XFS. Always keep backups.
</p>
<A name="backingupxfs"></A>
<h2>
Q: How can I backup a XFS filesystem and ACLs?
</h2>
<p>
You can backup a XFS filesystem with utilities like
<b>xfsdump(8)</b> and standard <b>tar(1)</b> for standard files.
If you want to backup ACLs you will need to use <b>xfsdump</b>,
this is the only tool at the moment that supports backing up
extended attributes.
<b>xfsdump</b> can also be integrated with <b>amanda(8)</b>.
</p>
<A name="error990"></A>
<h2>
Q: I see applications returning error 990, what is wrong?
</h2>
<p>
The error 990 stands for EFSCORRUPTED which usually means XFS has
detected a filesystem metadata problem and has shut the filesystem
down to prevent further damage.
</p>
<p>
The cause can be pretty much anything, unfortunately - filesystem,
virtual memory manager, volume manager, device driver, or hardware.
</p>
<p>
There should be a detailed console message when this initially happens.
The messages have important information giving hints to developers as
to the earliest point that a problem was detected.
It is there to protect your data.
</p>
<A name="nulls"></A>
<h2>
Q: Why do I see binary NULLS in some files after recovery when I unplugged the power?
</h2>
<p>
XFS journals metadata updates, not data updates.
After a crash you are supposed to get a consistent filesystem
which looks like the state sometime shortly before the crash, NOT what
the in memory image looked like the instant before the crash.
</p>
<p>
Since XFS does not write data out immediately unless you tell
it to with fsync, an O_SYNC or O_DIRECT open (the same is true
of other filesystems), you are looking at an inode which was
flushed out, but whose data was not.
Typically you'll find that the inode is not taking any space
since all it has is a size but no extents allocated (try examining
the file with the <b>xfs_bmap(8)</b> command).
</p>
<A name="wcache"></A>
<h2>
Q: What is the problem with the write cache on journaled filesystems?
</h2>
<p>
Many drives use a write back cache in order to speed up the performance
of writes. However, there are conditions such as power failure when the
write cache memory is never flushed to the actual disk. This causes problems
for XFS and journaled filesystems in general because they rely on knowing
when a write has completed to the disk. They need to know that the log
information has made it to disk before allowing metadata to go to disk.
When the metadata makes it to disk then the tail of the log can move.
So if the writes never make it to the physical disk, then the ordering is
violated and the log and metadata can be lost, resulting in filesystem
corruption.
</p>
<A name="wcache_query"></A>
<h2>
Q: How can I tell if I have the write cache enabled?
</h2>
<p>
For SCSI/SATA:
<ul>
<li>Look in dmesg(8) output for a driver line, such as:<br>
"SCSI device sda: drive cache: write back"
<li># sginfo -c /dev/sda | grep -i 'write cache'
</ul>
<p>
For PATA/SATA
(although for SATA this only works on a recent kernel with
ATA command passthrough):
<ul>
<li># hdparm -I /dev/sda<br>
and look under "Enabled Supported" for "Write cache"
</ul>
</p>
<A name="wcache_fix"></A>
<h2>
Q: How can I address the problem with the write cache?
</h2>
<p>
<h3>
Disabling the write back cache.
</h3>
For SATA/PATA(IDE):
(although for SATA this only works on a recent kernel with
ATA command passthrough):<br>
<ul>
<li># hdparm -W0 /dev/sda<br>
# hdparm -W0 /dev/hda
<li># blktool /dev/sda wcache off<br>
# blktool /dev/hda wcache off
</ul>
For SCSI:
<ul>
<li>Using sginfo(8) which is a little tedious<br>
It takes 3 steps. For example:
<ol>
<li>#sginfo -c /dev/sda<br>
which gives a list of attribute names and values
<li>#sginfo -cX /dev/sda<br>
which gives an array of cache values which you must match up
with from step 1, e.g.<br>
0 0 0 1 0 1 0 0 0 0 65535 0 65535 65535 1 0 0 0 3 0 0
<li>#sginfo -cXR /dev/sda 0 0 0 1 0 0 0 0 0 0 65535 0 65535 65535 1 0 0 0 3 0 0<br>
allows you to reset the value of the cache attributes.
</ol>
</ul>
</p>
<p>
This disabling is kept persistent for a SCSI disk.
However, for a SATA/PATA disk this needs to be done after every reset as
it will reset back to the default of the write cache enabled.
And a reset can happen after reboot or on error recovery of the drive.
This makes it rather difficult to guarantee that the write cache is
maintained as disabled.<br>
</p>
<h3>
Using an external log.
</h3>
<p>
Some people have considered the idea of using an external log on a separate
drive with the write cache disabled and the rest of the file system on another
disk with the write cache enabled. However, that will <b>not</b> solve the problem.
For example, the tail of the log is moved when we are notified that a
metadata write is completed to disk and we won't be able to guarantee that if the
metadata is on a drive with the write cache enabled.
</p>
<p>
In fact using an external log, will disable XFS' write barrier support.
</p>
<h3>
Write barrier support.
</h3>
<p>
Write barrier support is enabled by default in XFS since 2.6.17.
It is disabled by mounting the filesystem with "nobarrier".
Barrier support will flush the write back cache at the appropriate times
(such as on XFS log writes).
This is generally the recommended solution, however, you should
check the system logs to ensure it was successful.
Barriers will be disabled if one is using an external log,
the underlying device does not support them or a test
barrier write fails.
</p>
<A name="wcache_persistent"></A>
<h2>
Q. Should barriers be enabled with storage which has a persistent write cache?
</h2>
<p>
Many hardware RAID have a persistent write cache which preserves it
across power failure, interface resets, system crashes, etc.
Using write barriers in this instance is not warranted and will
in fact lower performance.
Therefore, it is recommended to turn off the barrier support and
mount the filesystem with "nobarrier".
</p>
<A name="dir2"></A>
<h2>
Q: What is the issue with directory corruption in Linux 2.6.17?
</h2>
<p>
In the Linux kernel 2.6.17 release a subtle bug was accidentally
introduced into the XFS directory code by some "sparse" endian
annotations.
This bug was sufficiently uncommon (it only affects a certain type
of format change, in Node or B-Tree format directories, and only in
certain situations) that it was not detected during our regular
regression testing, but it has been observed in the wild by a number
of people now.
</p>
<p>
To add insult to injury, <b>xfs_repair(8)</b> is currently not correcting
these directories on detection of this corrupt state either.
This <b>xfs_repair</b> issue is actively being worked on, and a fixed
version will be available shortly.
</p>
<p>
No other kernel versions are affected. However, using a corrupt
filesystem on other kernels can still result in the filesystem being
shutdown if the problem has not been rectified (on disk), making it
seem like other kernels are affected.
</p>
<p>
The <b>xfs_check</b> tool, or <b>xfs_repair -n</b>, should be able to
detect any directory corruption.
</p>
<p>
Until a fixed <b>xfs_repair</b> binary is available, one can make use of
the <b>xfs_db(8)</b> command to mark the problem directory for removal
(see the example below).
A subsequent <b>xfs_repair</b> invocation will remove the directory
and move all contents into "lost+found", named by inode number (see
second example on how to map inode number to directory entry name,
which needs to be done _before_ removing the directory itself). The
inode number of the corrupt directory is included in the shutdown
report issued by the kernel on detection of directory corruption.
Using that inode number, this is how one would ensure it is removed:
</p>
<pre>
# xfs_db -x /dev/sdXXX
xfs_db> inode NNN
xfs_db> print
core.magic = 0x494e
core.mode = 040755
core.version = 2
core.format = 3 (btree)
...
xfs_db> write core.mode 0
xfs_db> quit
</pre>
<p>
A subsequent <b>xfs_repair</b> will clear the directory, and add new entries
(named by inode number) in lost+found.
</p>
<p>
The easiest way to map inode numbers to full paths is via <b>xfs_ncheck(8)</b>:
<pre>
# xfs_ncheck -i 14101 -i 14102 /dev/sdXXX
14101 full/path/mumble_fratz_foo_bar_1495
14102 full/path/mumble_fratz_foo_bar_1494
</pre>
<p>
Should this not work, we can manually map inode numbers in B-Tree format
directory by taking the following steps:
</p>
<pre>
# xfs_db -x /dev/sdXXX
xfs_db> inode NNN
xfs_db> print
core.magic = 0x494e
...
next_unlinked = null
u.bmbt.level = 1
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[0]
u.bmbt.ptrs[1] = 1:3628
xfs_db> fsblock 3628
xfs_db> type bmapbtd
xfs_db> print
magic = 0x424d4150
level = 0
numrecs = 19
leftsib = null
rightsib = null
recs[1-19] = [startoff,startblock,blockcount,extentflag]
1:[0,3088,4,0] 2:[4,3128,8,0] 3:[12,3308,4,0] 4:[16,3360,4,0]
5:[20,3496,8,0] 6:[28,3552,8,0] 7:[36,3624,4,0] 8:[40,3633,4,0]
9:[44,3688,8,0] 10:[52,3744,4,0] 11:[56,3784,8,0]
12:[64,3840,8,0] 13:[72,3896,4,0] 14:[33554432,3092,4,0]
15:[33554436,3488,8,0] 16:[33554444,3629,4,0]
17:[33554448,3748,4,0] 18:[33554452,3900,4,0]
19:[67108864,3364,4,0]
</pre>
<p>
At this point we are looking at the extents that hold all of the
directory information. There are three types of extent here, we
have the data blocks (extents 1 through 13 above), then the leaf
blocks (extents 14 through 18), then the freelist blocks (extent
19 above). The jumps in the first field (start offset) indicate
our progression through each of the three types. For recovering
file names, we are only interested in the data blocks, so we can
now feed those offset numbers into the <b>xfs_db</b> dblock command.
So, for the fifth extent - 5:[20,3496,8,0] - listed above:
</p>
<pre>
...
xfs_db> dblock 20
xfs_db> print
dhdr.magic = 0x58443244
dhdr.bestfree[0].offset = 0
dhdr.bestfree[0].length = 0
dhdr.bestfree[1].offset = 0
dhdr.bestfree[1].length = 0
dhdr.bestfree[2].offset = 0
dhdr.bestfree[2].length = 0
du[0].inumber = 13937
du[0].namelen = 25
du[0].name = "mumble_fratz_foo_bar_1595"
du[0].tag = 0x10
du[1].inumber = 13938
du[1].namelen = 25
du[1].name = "mumble_fratz_foo_bar_1594"
du[1].tag = 0x38
...
</pre>
<p>
So, here we can see that inode number 13938 matches up with name
"mumble_fratz_foo_bar_1594". Iterate through all the extents, and
extract all the name-to-inode-number mappings you can, as these
will be useful when looking at "lost+found" (once <b>xfs_repair</b> has
removed the corrupt directory).
</p>
<& xfsTemplate,bottom=>1 &>