xfs
[Top] [All Lists]

Re: XFS unhappy with large holey loopback and syncs

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: XFS unhappy with large holey loopback and syncs
From: David Chinner <dgc@xxxxxxx>
Date: Wed, 30 Nov 2005 12:59:59 +1100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20051129003611.GF7209@xxxxxxxxxxxxxx>
References: <20051129003611.GF7209@xxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Tue, Nov 29, 2005 at 01:36:11AM +0100, Andi Kleen wrote:
> 
> I just found an new exciting way to break XFS. Or rather the 
> version that's in 2.6.13. But might be a interesting try anyways.

Spending the time to understand these ugly looking stack traces
a bit (give me kdb anyday), it seems there's no deadlock in XFS
here.

> xfssyncd      S 0000000000000a7c     0  8846     15          8861  8845 
> (L-TLB)
>        <ffffffff8035cf66>{schedule_timeout+150}
>        <ffffffff884794a9>{:xfs:xfssyncd+105}
>        <ffffffff88479440>{:xfs:xfssyncd+0}
>        <ffffffff80152b90>{keventd_create_kthread+0}
>        <ffffffff80152e43>{kthread+243}

so xfssynd is asleep. It's idle as far as i can tell.

> loop0         D ffff8100057a00f0     0  8856      1                7860 
> (L-TLB)
>        <ffffffff80145e38>{__mod_timer+216}
>        <ffffffff8035cf66>{schedule_timeout+150}
>        <ffffffff8847941a>{:xfs:xfs_flush_device+90}
>        <ffffffff8845661e>{:xfs:xfs_iomap_write_delay+1150}
>        <ffffffff8845540c>{:xfs:xfs_iomap+700}
>        <ffffffff8846ffe6>{:xfs:__linvfs_get_block+134}

And loop0 is in an uninterruptible sleep waiting for 500ms
in xfs_flush_device():

        igrab(inode);
        xfs_syncd_queue_work(vfs, inode, xfs_flush_device_work);
        delay(msecs_to_jiffies(500));
        xfs_log_force(ip->i_mount, (xfs_lsn_t)0, XFS_LOG_FORCE|XFS_LOG_SYNC);

And xfs_flush_device_work(), which is executed by xfssyncd does:

        sync_blockdev(vfs->vfs_super->s_bdev);
        iput((struct inode *)inode);

but seeing that xfssyncd is idle, I would assume that this is complete.

Now, the key here is where xfs_flush_device() gets called from
xfs_iomap_write_delay() via xfs_flush_space(). It disk block _reservation_
for delalloc writes fails 3-4 times in a row for the same allocation. i.e.
ENOSPC

So, in terms of disk blocks:

budgie:~ # ls -lsh /mnt/dgc/stripe/LARGE
23G -rw-r--r--  1 root root 8.0T Nov 30 10:15 /mnt/dgc/stripe/LARGE

You need at least 23GiB of disk space for the test to run. Given the
filesytem i tested on was ~8.5TB, I guess I wouldn't have seen this.

So I ran the test on a 1GiB filesystem, and it got 5224 inode tables in before
hanging with the above error message:

Nov 30 11:23:53 budgie kernel: Buffer I/O error on device loop0, logical block 
16581037
Nov 30 11:23:53 budgie kernel: lost page write due to I/O error on loop0
Nov 30 11:23:54 budgie kernel: Buffer I/O error on device loop0, logical block 
16581038
Nov 30 11:23:54 budgie kernel: lost page write due to I/O error on loop0
Nov 30 11:23:55 budgie kernel: Buffer I/O error on device loop0, logical block 
16581041
Nov 30 11:23:55 budgie kernel: lost page write due to I/O error on loop0

And loop0 is doing:

[0]kdb> btt 0xe000003065da0000
Stack traceback for pid 20095
0xe000003065da0000    20095        1  0    3   D  0xe000003065da0330  loop0
0xa000000100715a80 schedule+0x1480
0xa0000001007179d0 schedule_timeout+0xf0
0xa000000100717aa0 schedule_timeout_uninterruptible+0x20
0xa0000001003a3840 xfs_flush_device+0x80
0xa0000001003917c0 xfs_iomap_write_delay+0x3a0
0xa000000100392f70 xfs_iomap+0xad0
0xa00000010039f320 xfs_bmap+0x40
0xa000000100393d20 __linvfs_get_block+0x120
0xa0000001003942e0 linvfs_get_block+0x40
0xa000000100160f20 __block_prepare_write+0x640
0xa0000001001613a0 block_prepare_write+0x40
0xa000000100396ad0 linvfs_prepare_write+0x30
0xa0000001005004b0 do_lo_send_aops+0x190
0xa000000100500150 loop_thread+0x710
0xa000000100010ee0 kernel_thread_helper+0xe0
0xa000000100009120 start_kernel_thread+0x20

And the inode it's writing to:

[0]kdb> inode 0xe000003017edac48
struct inode at  0xe000003017edac48
 i_ino = 1539 i_count = 1 i_size 8796093018112
 i_mode = 0100644  i_nlink = 1  i_rdev = 0x0
 i_hash.nxt = 0x0000000000000000 i_hash.pprev = 0xa0000002021cbe50
 i_list.nxt = 0xe000003016d81648 i_list.prv = 0xe00000301184f1a8
 i_dentry.nxt = 0xe0000039eec604d0 i_dentry.prv = 0xe0000039eec604d0
 i_sb = 0xe0000039ebd8e300 i_op = 0xa00000010094e5f8 i_data = 
0xe000003017edad78 nrpages = 65587
 i_fop= 0xa00000010094e2c8 i_flock = 0x0000000000000000 i_mapping = 
0xe000003017edad78
 i_flags 0x0 i_state 0x0 []  fs specific info @ 0xe000003017edaeb8

has about 1GB of pages attached to it.

So i'd say this is a bug in the loopback driver in handling ENOSPC of the
underlying filesystem.....

mkfs.ext2 has hung at this point here:

[0]kdb> btt 0xe000003423220000
Stack traceback for pid 20101
0xe000003423220000    20101    19053  0    2   D  0xe000003423220330  mkfs.ext2
0xa000000100715a80 schedule+0x1480
        args (0xe000003423227b00, 0x1001e20d2, 0xe00000b003015e80, 
0xe000003423227b30, 0x0)
0xa0000001007179d0 schedule_timeout+0xf0
        args (0x19, 0x1001e20d2, 0xa0000001008cc080, 0xa000000100713cc0, 0x286)
0xa000000100713cc0 io_schedule_timeout+0x80
        args (0xe00000b003015b74, 0xe000003423227b60, 0xa0000001004e1dc0, 
0x309, 0x309)
0xa0000001004e1dc0 blk_congestion_wait+0x100
        args (0x1, 0x19, 0xa00000010095a200, 0xa0000001001169f0, 0xb1a)
0xa0000001001169f0 balance_dirty_pages_ratelimited+0x2d0
        args (0xe0000039ef731070, 0xe000003423227ba0, 0x180, 
0xe000003423227bd8, 0xe000003423227c08)
0xa000000100108300 generic_file_buffered_write+0x660
        args (0xe000003423227d40, 0x60000000000210d0, 0x1, 0xa340110000, 
0xe000003423227e38)
0xa000000100108f00 __generic_file_aio_write_nolock+0x660
        args (0xe000003423227d40, 0xe000003423227e20, 0x0, 0xe000003423227e38, 
0x8000)
0xa000000100109880 generic_file_aio_write_nolock+0x60
        args (0xe0000039ea6c0a00, 0xe000003423227e20, 0xe0000039ef731070, 
0xa34010a000, 0xe0000039ef730f40)
0xa000000100109b10 generic_file_write_nolock+0xf0
        args (0xe0000039ea6c0a00, 0xe000003423227e20, 0x1, 0xe000003423227e38, 
0xa00000010016c310)
0xa00000010016c310 blkdev_file_write+0x50
        args (0xe0000039ea6c0a00, 0x600000000001b0d0, 0x8000, 
0xe000003423227e38, 0xa000000100158080)
0xa000000100158080 vfs_write+0x1c0
        args (0xe0000039ea6c0a00, 0x600000000001b0d0, 0x8000, 
0xe000003423227e38, 0xe0000039ea6c0a20)
0xa0000001001583c0 sys_write+0x80
        args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0, 
0xc000000000000592)
0xa00000010000bb20 ia64_ret_from_syscall
        args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0, 
0xc000000000000592)
0xa000000000010640 __kernel_syscall_via_break
        args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0, 
0xc000000000000592)

Which tends to implicate that the loop0 blockdev is stuck because it hasn't
handled the ENOSPC from the underlying fs properly.

I can't test this on ext3 or reiserfs because they don't appear to support 
sparse
files larger than the filesytem itself.

I'd try running this on a larger XFS filesystem (e.g. 30GB) and see if the
problem goes away.

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>