On Tue, Nov 29, 2005 at 01:36:11AM +0100, Andi Kleen wrote:
>
> I just found an new exciting way to break XFS. Or rather the
> version that's in 2.6.13. But might be a interesting try anyways.
Spending the time to understand these ugly looking stack traces
a bit (give me kdb anyday), it seems there's no deadlock in XFS
here.
> xfssyncd S 0000000000000a7c 0 8846 15 8861 8845
> (L-TLB)
> <ffffffff8035cf66>{schedule_timeout+150}
> <ffffffff884794a9>{:xfs:xfssyncd+105}
> <ffffffff88479440>{:xfs:xfssyncd+0}
> <ffffffff80152b90>{keventd_create_kthread+0}
> <ffffffff80152e43>{kthread+243}
so xfssynd is asleep. It's idle as far as i can tell.
> loop0 D ffff8100057a00f0 0 8856 1 7860
> (L-TLB)
> <ffffffff80145e38>{__mod_timer+216}
> <ffffffff8035cf66>{schedule_timeout+150}
> <ffffffff8847941a>{:xfs:xfs_flush_device+90}
> <ffffffff8845661e>{:xfs:xfs_iomap_write_delay+1150}
> <ffffffff8845540c>{:xfs:xfs_iomap+700}
> <ffffffff8846ffe6>{:xfs:__linvfs_get_block+134}
And loop0 is in an uninterruptible sleep waiting for 500ms
in xfs_flush_device():
igrab(inode);
xfs_syncd_queue_work(vfs, inode, xfs_flush_device_work);
delay(msecs_to_jiffies(500));
xfs_log_force(ip->i_mount, (xfs_lsn_t)0, XFS_LOG_FORCE|XFS_LOG_SYNC);
And xfs_flush_device_work(), which is executed by xfssyncd does:
sync_blockdev(vfs->vfs_super->s_bdev);
iput((struct inode *)inode);
but seeing that xfssyncd is idle, I would assume that this is complete.
Now, the key here is where xfs_flush_device() gets called from
xfs_iomap_write_delay() via xfs_flush_space(). It disk block _reservation_
for delalloc writes fails 3-4 times in a row for the same allocation. i.e.
ENOSPC
So, in terms of disk blocks:
budgie:~ # ls -lsh /mnt/dgc/stripe/LARGE
23G -rw-r--r-- 1 root root 8.0T Nov 30 10:15 /mnt/dgc/stripe/LARGE
You need at least 23GiB of disk space for the test to run. Given the
filesytem i tested on was ~8.5TB, I guess I wouldn't have seen this.
So I ran the test on a 1GiB filesystem, and it got 5224 inode tables in before
hanging with the above error message:
Nov 30 11:23:53 budgie kernel: Buffer I/O error on device loop0, logical block
16581037
Nov 30 11:23:53 budgie kernel: lost page write due to I/O error on loop0
Nov 30 11:23:54 budgie kernel: Buffer I/O error on device loop0, logical block
16581038
Nov 30 11:23:54 budgie kernel: lost page write due to I/O error on loop0
Nov 30 11:23:55 budgie kernel: Buffer I/O error on device loop0, logical block
16581041
Nov 30 11:23:55 budgie kernel: lost page write due to I/O error on loop0
And loop0 is doing:
[0]kdb> btt 0xe000003065da0000
Stack traceback for pid 20095
0xe000003065da0000 20095 1 0 3 D 0xe000003065da0330 loop0
0xa000000100715a80 schedule+0x1480
0xa0000001007179d0 schedule_timeout+0xf0
0xa000000100717aa0 schedule_timeout_uninterruptible+0x20
0xa0000001003a3840 xfs_flush_device+0x80
0xa0000001003917c0 xfs_iomap_write_delay+0x3a0
0xa000000100392f70 xfs_iomap+0xad0
0xa00000010039f320 xfs_bmap+0x40
0xa000000100393d20 __linvfs_get_block+0x120
0xa0000001003942e0 linvfs_get_block+0x40
0xa000000100160f20 __block_prepare_write+0x640
0xa0000001001613a0 block_prepare_write+0x40
0xa000000100396ad0 linvfs_prepare_write+0x30
0xa0000001005004b0 do_lo_send_aops+0x190
0xa000000100500150 loop_thread+0x710
0xa000000100010ee0 kernel_thread_helper+0xe0
0xa000000100009120 start_kernel_thread+0x20
And the inode it's writing to:
[0]kdb> inode 0xe000003017edac48
struct inode at 0xe000003017edac48
i_ino = 1539 i_count = 1 i_size 8796093018112
i_mode = 0100644 i_nlink = 1 i_rdev = 0x0
i_hash.nxt = 0x0000000000000000 i_hash.pprev = 0xa0000002021cbe50
i_list.nxt = 0xe000003016d81648 i_list.prv = 0xe00000301184f1a8
i_dentry.nxt = 0xe0000039eec604d0 i_dentry.prv = 0xe0000039eec604d0
i_sb = 0xe0000039ebd8e300 i_op = 0xa00000010094e5f8 i_data =
0xe000003017edad78 nrpages = 65587
i_fop= 0xa00000010094e2c8 i_flock = 0x0000000000000000 i_mapping =
0xe000003017edad78
i_flags 0x0 i_state 0x0 [] fs specific info @ 0xe000003017edaeb8
has about 1GB of pages attached to it.
So i'd say this is a bug in the loopback driver in handling ENOSPC of the
underlying filesystem.....
mkfs.ext2 has hung at this point here:
[0]kdb> btt 0xe000003423220000
Stack traceback for pid 20101
0xe000003423220000 20101 19053 0 2 D 0xe000003423220330 mkfs.ext2
0xa000000100715a80 schedule+0x1480
args (0xe000003423227b00, 0x1001e20d2, 0xe00000b003015e80,
0xe000003423227b30, 0x0)
0xa0000001007179d0 schedule_timeout+0xf0
args (0x19, 0x1001e20d2, 0xa0000001008cc080, 0xa000000100713cc0, 0x286)
0xa000000100713cc0 io_schedule_timeout+0x80
args (0xe00000b003015b74, 0xe000003423227b60, 0xa0000001004e1dc0,
0x309, 0x309)
0xa0000001004e1dc0 blk_congestion_wait+0x100
args (0x1, 0x19, 0xa00000010095a200, 0xa0000001001169f0, 0xb1a)
0xa0000001001169f0 balance_dirty_pages_ratelimited+0x2d0
args (0xe0000039ef731070, 0xe000003423227ba0, 0x180,
0xe000003423227bd8, 0xe000003423227c08)
0xa000000100108300 generic_file_buffered_write+0x660
args (0xe000003423227d40, 0x60000000000210d0, 0x1, 0xa340110000,
0xe000003423227e38)
0xa000000100108f00 __generic_file_aio_write_nolock+0x660
args (0xe000003423227d40, 0xe000003423227e20, 0x0, 0xe000003423227e38,
0x8000)
0xa000000100109880 generic_file_aio_write_nolock+0x60
args (0xe0000039ea6c0a00, 0xe000003423227e20, 0xe0000039ef731070,
0xa34010a000, 0xe0000039ef730f40)
0xa000000100109b10 generic_file_write_nolock+0xf0
args (0xe0000039ea6c0a00, 0xe000003423227e20, 0x1, 0xe000003423227e38,
0xa00000010016c310)
0xa00000010016c310 blkdev_file_write+0x50
args (0xe0000039ea6c0a00, 0x600000000001b0d0, 0x8000,
0xe000003423227e38, 0xa000000100158080)
0xa000000100158080 vfs_write+0x1c0
args (0xe0000039ea6c0a00, 0x600000000001b0d0, 0x8000,
0xe000003423227e38, 0xe0000039ea6c0a20)
0xa0000001001583c0 sys_write+0x80
args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0,
0xc000000000000592)
0xa00000010000bb20 ia64_ret_from_syscall
args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0,
0xc000000000000592)
0xa000000000010640 __kernel_syscall_via_break
args (0x4, 0x600000000001b0d0, 0x8000, 0x200000000006f5c0,
0xc000000000000592)
Which tends to implicate that the loop0 blockdev is stuck because it hasn't
handled the ENOSPC from the underlying fs properly.
I can't test this on ext3 or reiserfs because they don't appear to support
sparse
files larger than the filesytem itself.
I'd try running this on a larger XFS filesystem (e.g. 30GB) and see if the
problem goes away.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
|