xfs
[Top] [All Lists]

Infinite loop in xfssyncd on full file system

To: linux-xfs@xxxxxxxxxxx
Subject: Infinite loop in xfssyncd on full file system
From: Stephane Doyon <sdoyon@xxxxxxxxx>
Date: Tue, 22 Aug 2006 16:01:10 -0400 (EDT)
Sender: xfs-bounce@xxxxxxxxxxx
I'm seeing what appears to be an infinite loop in xfssyncd. It is triggered when writing to a file system that is full or nearly full. I have pinpointed the change that introduced this problem: it's

    "TAKE 947395 - Fixing potential deadlock in space allocation and
    freeing due to ENOSPC"

git commit d210a28cd851082cec9b282443f8cc0e6fc09830.

I first saw the problem with a 2.6.17 kernel patched to add the 2.6.18-rc* XFS changes. I later confirmed that 2.6.17 does not exhibit this behavior, while addding just that one commit brings the problem back.

In the simplest case, I had a 7.5GB test file system, created with no mkfs.xfs option and mounted with no option. I filled it up, leaving half a GB free, simply using dd (single-threaded). Then I did

    while [ 1 ]; do dd if=/dev/zero of=f bs=1M; done
or
    i=1; while [ 1 ]; do echo $i; dd if=/dev/zero of=f$i bs=1M; \
                         i=$(($i+1)); done

and after very few iterations, my dd got stuck in uninterruptible sleep and I soon got: "BUG: soft lockup detected on CPU#1!" with xfssyncd at the bottom of the backtrace.

I took a few backtraces using KDB, letting it run a bit between taking each backtrace. All backtraces I saw had xfssyncd doing:

xfssyncd xfs_flush_inode_work filemap_flush __filemap_fdatawrite_range
do_writepages xfs_vm_writepage xfs_page_state_convert xfs_map_blocks
xfs_bmap xfs_iomap ...

then I've seen either:

xfs_iomap_write_allocate xfs_trans_reserve xfs_mod_incore_sb xfs_icsb_modify_counters xfs_icsb_modify_counters_int

or

xfs_iomap_write_allocate xfs_bmapi xfs_bmap_alloc xfs_bmap_btalloc xfs_alloc_vextent xfs_alloc_fix_freelist

or

xfs_icsb_balance_counter xfs_icsb_disable_counter

or

xfs_iomap_write_allocate xfs_trans_alloc _xfs_trans_alloc kmem_zone_zalloc

dd is doing: sys_write vfs_write do_sync_write xfs_file_aio_write xfs_write generic_file_buffered_write xfs_get_blocks __xfs_get_blocks xfs_bmap xfs_iomap xfs_iomap_write_delay xfs_flush_space xfs_flush_device _xfs_log_force xlog_state_sync_all schedule_timeout.

From then on, other processes start piling up because of the held locks, and if I'm patient enough, something on my machine eventually eats away all the memory...

A similar problem was discussed here: http://oss.sgi.com/archives/xfs/2006-08/msg00144.html

For some reason I can't seem to find the original bug submission either in the list archives or in your bugzilla... I would comment that I have preemption disabled. AFAICT this is not a matter of spinlocks being held for too long. The "soft lockup" should trigger if a CPU doesn't reschedule for more than 10secs.

I saw the problem on two different machines, one has 8 pseudo CPUs (counting hyper-threading) and one has 4.

Most of my tests were done using a fast external storage array. But I also tried it on a 1GB file system that I made in a file on an ordinary disk and mounted using the loopback device. The lockup did not happen with dd as before, but then I umount'ed the file system and umount hung, and I got the same soft lockup for xfssyncd as before.

I hope you XFS experts see what might be wrong with that bug fix. It's ironic but for me, this (apparent) infinite loop seems much easier to hit than the out-of-order locking problem that the commit in question was supposed to fix. Let me know if I can get you any more info.

Thanks


<Prev in Thread] Current Thread [Next in Thread>