Hi,
I've been investigating a deadlock problem on a ENOSPC device.
The phenomenon is repeatable with the following method:
1. Make some files to fill a XFS filesystem leaving 80MB.
2. Execute dd.sh, which spawn 10 `dd's. Each dd writes 16MB, so
total is 160MB against 80MB free.
8<------8<------ dd.sh
#!/bin/sh
for i in `seq 10`
do
( while :; do dd if=/dev/zero of=F$i bs=1024 count=16384 > /dev/null 2>&1; done
) &
done
8<------8<------ dd.sh
3. Wait a minutes, then two (or more) processes will be deadlocked
with `D' state. Its WCHAN is `text.l'.
I tested on HT Pentium4 box with Linux-2.6.13-rc[123] + TAKE 938502.
But I guess older version also have the same flaw.
Here is kernel back trace.
ADDR S PID SESS UID EUID MM NAME FLAGS
df112530 U 1376 0 0 0 0 xfssyncd forknoexec fstrans
randomize
ded6b588 c03ee853 schedule+6f3 ()
[ded6b5fc] c03edf65 __down+75 (decef93c,decef93c,ded6b654)
[ded6b634] c03ee0f2 __down_failed+a ()
[ded6b644] c02884de [.text.lock.xfs_buf+1f]
[ded6b644] c0287034 pagebuf_lock+34 (d3215abc,14005,de2e11fc,0)
[ded6b658] c0286811 _pagebuf_find+161 (df6a0280,4841ad1,0,200)
[ded6b690] c02868ff xfs_buf_get_flags+6f (df6a0280,4841ad1,0,1)
[ded6b6c4] c0286a22 xfs_buf_read_flags+32 (df6a0280,4841ad1,0,1)
[ded6b6e8] c0277e31 xfs_trans_read_buf+211 (dedde400,c9d74730,df6a0280,4841ad1)
[ded6b718] c0223e03 xfs_alloc_read_agf+a3 (dedde400,c9d74730,a,0)
[ded6b75c] c0223a39 xfs_alloc_fix_freelist+449 (ded6b97c,0,0,0)
[ded6b804] c0224285 xfs_alloc_vextent+345 (ded6b97c,ded6b8f0,0,ae71d5)
[ded6b868] c02346ba xfs_bmap_alloc+15ca (ded6bb34,ded6baf4,0,0)
[ded6b9dc] c02389ef xfs_bmapi+d1f (c9d74730,d0d64d20,7f1,0)
[ded6bb84] c0264d54 xfs_iomap_write_allocate+2b4 (d0d64d20,7f1000,0,1000)
[ded6bc74] c02639f0 xfs_iomap+460 (d0d64dfc,7f1000,0,1000)
[ded6bd00] c028d9d1 xfs_bmap+41 (d0d64d40,7f1000,0,1000)
[ded6bd24] c02843af xfs_map_blocks+4f (d3c2204c,7f1000,0,1000)
[ded6bd58] c0285580 xfs_page_state_convert+510 (d3c2204c,c111d3e0,ded6bf44,1)
[ded6be24] c0285d2f linvfs_writepage+6f (c111d3e0,ded6bf44,ded6be94,0)
[ded6be58] c018e94e mpage_writepages+24e (d3c220f8,ded6bf44,0,ded6bf80)
[ded6bef4] c014cc92 do_writepages+42
(d3c220f8,ded6bf44,0,0,0,fe6,0,0,0,0,0,0,ded6bf88,ffffffff,0,0,0,fe6,0,0,0,0,0,0,ded6bf88,28852)
[ded6bf08] c01459ef __filemap_fdatawrite_range+9f ()
ADDR S PID SESS UID EUID MM NAME FLAGS
dd3e3530 U 13387 0 524 524 cf073800 dd fstrans randomize
cf511950 c03ee853 schedule+6f3 ()
[cf5119c4] c03edf65 __down+75 (decefa2c,decefa2c,cf511a1c)
[cf5119fc] c03ee0f2 __down_failed+a ()
[cf511a0c] c02884de [.text.lock.xfs_buf+1f]
[cf511a0c] c0287034 pagebuf_lock+34 (d321557c,c16e2800,cf510000,0)
[cf511a20] c0286811 _pagebuf_find+161 (df6a0280,6c62839,0,200)
[cf511a58] c02868ff xfs_buf_get_flags+6f (df6a0280,6c62839,0,1)
[cf511a8c] c0286a22 xfs_buf_read_flags+32 (df6a0280,6c62839,0,1)
[cf511ab0] c0277e31 xfs_trans_read_buf+211 (dedde400,ce19dad0,df6a0280,6c62839)
[cf511ae0] c0223e03 xfs_alloc_read_agf+a3 (dedde400,ce19dad0,f,0)
[cf511b24] c0223a39 xfs_alloc_fix_freelist+449 (cf511bf0,0,a,e730a)
[cf511bcc] c0224549 xfs_free_extent+99 (ce19dad0,fd3ac4,0,60)
[cf511c50] c0237225 xfs_bmap_finish+185 (cf511d84,cf511cf0,ffffffff,ffffffff)
[cf511c8c] c025ffdf xfs_itruncate_finish+29f (cf511d84,d0d64bb0,0,0)
[cf511d10] c027d53b xfs_setattr+f5b (d0d64bd0,cf511dbc,0,0)
[cf511da0] c028be8d linvfs_setattr+fd (ce66c5c8,cf511e7c,dedde418,cf511e68)
[cf511e3c] c0184a1c notify_change+3cc (ce66c5c8,cf511e7c,48,0)
[cf511e70] c0164e62 do_truncate+42 (ce66c5c8,0,0,ce66c5c8)
[cf511ec4] c0177b0f may_open+24f (cf511f44,2,8242,c0167e6a)
[cf511ee8] c01780a6 open_namei+526 (d26a2000,8242,1b6,cf511f44)
[cf511f30] c0165f7a filp_open+3a (d26a2000,8241,1b6,d25bc880)
[cf511f8c] c0166389 sys_open+59 (bff419a6,8241,1b6,8241)
# xfs_info /opt
meta-data=/opt isize=256 agcount=16, agsize=947081 blks
= sectsz=512
data = bsize=4096 blocks=15153296, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=7399, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
# df /opt
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda6 60583588 60583584 4 100% /opt
After some investigation, I've found in this case:
xfssyncd: allocating extents; locking AG#15 AGF, waiting AG#10 AGF.
Because XFS could not allocate all of the delayed blocks
in a single AG.
dd: freeing extents; locking AG#10 AGF, waiting AGF15 AGF.
Because the file is made from multiple AGs and XFS defines
XFS_ITRUNC_MAX_EXTENTS as 2.
Both processes are in a transaction region (PF_FSTRANS) and operating
2 AGs. It looks like AB-BA deadlock.
So, I have a question. Is multiple AGs in a single transaction safe?
IMHO, multiple AGs in a single transaction is easy to be deadlocked,
because XFS must keep the xfs_buf busy(semaphore down) until it is
committed to in-core log.
--
masano
|