Steve, Eric
This is the problem that I was discussing earlier on IRC.
I've come across a deadlock problem with xfs_freeze when used with reasonable
I/O activity.
Attached are 2 scripts. The script automagically creates snapshots and is
designed to
be put into crond.daily etc to create snapshots for backing up and other
purposes.
One script has xfs_freeze as part of the script and the other has the related
parts
commented out.
The test is copying 18G from server to server over NFS - and during the
transfer kick-off
the snapshot script on the server that is writing the data. The script with
xfs_freeze will
deadlock after a couple of seconds; whereas the script without is fine. You
will note that
from ps lvcreate & a couple of NFS daemons are deadlocked; whereas, kdb's
output
shows nothing running.
The order of events is to make sure the copy has been going
for a while and then kick off the snapshot script. When this occurs lvcreate
chews up
50%+ CPU and then deadlocks - the NFS processes follow one by one after that.
CPU usage then returns to normal; however, any activity on that volume
deadlocks.
Any other volume is fine - and streaming media is unaffected.
This is using LVM1.1rc2 and the XFS CVS kernel -
2.4.18-xfs-lvm1.1rc2-CVS20020427-0000.
The kernel was compiled using RH7.2 standard gcc-2.96-98 which I was told could
be a
problem. If it is I'll recompile using the latest.
Not sure what more information I can give at the moment - if there is anything
else I need to
supply - please tell me.
Attached below is the ps & kdb outputs.
[akh@atlas pts/3:0 akh]$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 0.1 1412 520 ? S 01:05 0:10 init
root 2 0.0 0.0 0 0 ? SW 01:05 0:00 [keventd]
root 3 0.0 0.0 0 0 ? SWN 01:05 0:00 [ksoftirqd_CPU0]
root 4 0.4 0.0 0 0 ? SW 01:05 0:12 [kswapd]
root 5 0.0 0.0 0 0 ? SW 01:05 0:00 [bdflush]
root 6 0.0 0.0 0 0 ? DW 01:05 0:02 [kupdated]
root 7 0.0 0.0 0 0 ? SW< 01:05 0:00 [mdrecoveryd]
root 8 0.0 0.0 0 0 ? SW< 01:05 0:00 [raid5d]
root 9 0.0 0.0 0 0 ? SW 01:05 0:00 [kjournald]
root 309 0.0 0.0 0 0 ? SW 01:05 0:00 [kjournald]
root 310 0.0 0.0 0 0 ? SW 01:05 0:00 [kjournald]
root 311 0.0 0.0 0 0 ? SW 01:05 0:00 [kjournald]
root 313 0.0 0.0 0 0 ? SW 01:05 0:00 [kreiserfsd]
root 314 0.0 0.0 0 0 ? SW 01:05 0:00 [kjournald]
root 316 0.0 0.0 0 0 ? SW 01:05 0:00 [pagebuf_daemon]
root 650 0.0 0.0 432 164 ? S 01:06 0:00 /sbin/dhcpcd -n
-H eth0
root 786 0.0 0.2 1696 788 ? S 01:06 0:00 syslogd -m 0
root 791 0.0 0.2 2068 1120 ? S 01:06 0:00 klogd -2
rpc 811 0.0 0.1 1552 588 ? S 01:06 0:00 portmap
rpcuser 839 0.0 0.2 1688 840 ? S 01:06 0:00 rpc.statd
root 952 0.0 0.3 2672 1264 ? S 01:06 0:00 /usr/sbin/sshd
root 983 0.0 0.1 1788 632 ? S 01:06 0:00 rpc.rquotad
root 988 0.0 0.1 1540 564 ? S 01:06 0:00 rpc.mountd
root 993 1.0 0.0 0 0 ? SW 01:06 0:29 [nfsd]
root 994 0.0 0.0 0 0 ? SW 01:06 0:00 [lockd]
root 995 0.0 0.0 0 0 ? SW 01:06 0:00 [rpciod]
root 996 1.1 0.0 0 0 ? SW 01:06 0:33 [nfsd]
root 997 1.0 0.0 0 0 ? DW 01:06 0:31 [nfsd]
root 998 1.0 0.0 0 0 ? SW 01:06 0:32 [nfsd]
root 999 1.0 0.0 0 0 ? SW 01:06 0:32 [nfsd]
root 1000 1.0 0.0 0 0 ? SW 01:06 0:31 [nfsd]
root 1001 1.1 0.0 0 0 ? DW 01:06 0:32 [nfsd]
root 1002 1.1 0.0 0 0 ? SW 01:06 0:32 [nfsd]
root 1020 0.0 0.1 1584 660 ? S 01:06 0:00 crond
daemon 1056 0.0 0.1 1444 568 ? S 01:06 0:00 /usr/sbin/atd
root 1063 0.0 0.1 1384 448 tty1 S 01:06 0:00 /sbin/mingetty
tty1
root 1064 0.0 0.1 1384 448 tty2 S 01:06 0:00 /sbin/mingetty
tty2
root 1065 0.0 0.1 1384 448 tty3 S 01:06 0:00 /sbin/mingetty
tty3
root 1066 0.0 0.1 1384 448 tty4 S 01:06 0:00 /sbin/mingetty
tty4
root 1067 0.0 0.1 1384 448 tty5 S 01:06 0:00 /sbin/mingetty
tty5
root 1068 0.0 0.1 1384 448 tty6 S 01:06 0:00 /sbin/mingetty
tty6
root 1069 0.0 0.1 1636 688 ? S 01:06 0:00 /sbin/mgetty -r
-s 115200 ttyS0
root 1072 0.0 0.5 3568 2072 ? S 01:06 0:00 /usr/sbin/sshd
akh 1073 0.0 0.3 2464 1320 pts/0 S 01:06 0:00 -bash
root 1119 0.0 0.2 2356 1016 pts/0 S 01:07 0:00 su -
root 1120 0.0 0.3 2544 1400 pts/0 S 01:07 0:00 -bash
root 1173 0.0 0.5 3656 2088 ? S 01:14 0:00 /usr/sbin/sshd
akh 1174 0.0 0.3 2456 1296 pts/1 S 01:14 0:00 -bash
akh 1218 0.0 0.2 2088 1040 pts/1 S 01:14 0:00 top
root 1219 0.0 0.5 3568 2072 ? S 01:15 0:00 /usr/sbin/sshd
akh 1220 0.0 0.3 2464 1320 pts/2 S 01:15 0:00 -bash
akh 1265 0.0 0.1 1444 492 pts/2 S 01:17 0:00 vmstat 10
root 1266 0.0 0.2 2248 1036 pts/0 S 01:50 0:00 /bin/bash
./raid0_snapshot.sh
root 1278 0.8 0.4 2672 1572 pts/0 D 01:50 0:02 /sbin/lvcreate
-L 1G -s -n DATA_SNAP_DY /dev/RAID0/DATA
root 1311 0.6 0.5 3656 2088 ? R 01:55 0:00 /usr/sbin/sshd
akh 1312 0.0 0.3 2456 1300 pts/3 S 01:55 0:00 -bash
akh 1357 0.0 0.1 2628 716 pts/3 R 01:55 0:00 ps aux
Entering kdb (current=0xc02de000, pid 0) due to Keyboard Entry
kdb> ps
Task Addr Pid Parent [*] cpu State Thread Command
0xd7fe2000 00000001 00000000 1 000 stop 0xd7fe2270 init
0xc163c000 00000002 00000001 1 000 run 0xc163c270 keventd
0xc1638000 00000003 00000000 1 000 stop 0xc1638270 ksoftirqd_CPU0
0xc1636000 00000004 00000000 1 000 stop 0xc1636270 kswapd
0xc1634000 00000005 00000000 1 000 stop 0xc1634270 bdflush
0xc1632000 00000006 00000000 1 000 stop 0xc1632270 kupdated
0xd7ec4000 00000007 00000001 1 000 stop 0xd7ec4270 mdrecoveryd
0xd7eac000 00000008 00000001 1 000 stop 0xd7eac270 raid5d
0xd7b34000 00000009 00000001 1 000 stop 0xd7b34270 kjournald
0xd6bf6000 00000309 00000001 1 000 stop 0xd6bf6270 kjournald
0xd6be8000 00000310 00000001 1 000 stop 0xd6be8270 kjournald
0xd6be2000 00000311 00000001 1 000 stop 0xd6be2270 kjournald
0xd6ac0000 00000313 00000001 1 000 stop 0xd6ac0270 kreiserfsd
0xd69fc000 00000314 00000001 1 000 stop 0xd69fc270 kjournald
0xd6626000 00000316 00000001 1 000 stop 0xd6626270 pagebuf_daemon
0xd616a000 00000650 00000001 1 000 stop 0xd616a270 dhcpcd
0xd7838000 00000786 00000001 1 000 stop 0xd7838270 syslogd
0xd615a000 00000791 00000001 1 000 stop 0xd615a270 klogd
0xd600c000 00000811 00000001 1 000 stop 0xd600c270 portmap
0xd5fa4000 00000839 00000001 1 000 stop 0xd5fa4270 rpc.statd
0xd5d98000 00000952 00000001 1 000 stop 0xd5d98270 sshd
more>
0xd5c94000 00000983 00000001 1 000 stop 0xd5c94270 rpc.rquotad
0xd5d68000 00000988 00000001 1 000 stop 0xd5d68270 rpc.mountd
0xd5cdc000 00000993 00000001 1 000 stop 0xd5cdc270 nfsd
0xd5cd4000 00000994 00000001 1 000 stop 0xd5cd4270 lockd
0xd5cd2000 00000995 00000994 1 000 stop 0xd5cd2270 rpciod
0xd5cd0000 00000996 00000001 1 000 stop 0xd5cd0270 nfsd
0xd5c18000 00000997 00000001 1 000 stop 0xd5c18270 nfsd
0xd5c0e000 00000998 00000001 1 000 stop 0xd5c0e270 nfsd
0xd5c0c000 00000999 00000001 1 000 stop 0xd5c0c270 nfsd
0xd5c04000 00001000 00000001 1 000 stop 0xd5c04270 nfsd
0xd5bfa000 00001001 00000001 1 000 stop 0xd5bfa270 nfsd
0xd5bf8000 00001002 00000001 1 000 stop 0xd5bf8270 nfsd
0xd5bb8000 00001020 00000001 1 000 stop 0xd5bb8270 crond
0xd71c6000 00001056 00000001 1 000 stop 0xd71c6270 atd
0xd71c4000 00001063 00000001 1 000 stop 0xd71c4270 mingetty
0xd735e000 00001064 00000001 1 000 stop 0xd735e270 mingetty
0xd7282000 00001065 00000001 1 000 stop 0xd7282270 mingetty
0xd797c000 00001066 00000001 1 000 stop 0xd797c270 mingetty
0xd5bf0000 00001067 00000001 1 000 stop 0xd5bf0270 mingetty
0xd5edc000 00001068 00000001 1 000 stop 0xd5edc270 mingetty
0xd5d70000 00001069 00000001 1 000 stop 0xd5d70270 mgetty
0xd5e6e000 00001072 00000952 1 000 stop 0xd5e6e270 sshd
0xd735a000 00001073 00001072 1 000 stop 0xd735a270 bash
more>
0xd5ab4000 00001119 00001073 1 000 stop 0xd5ab4270 su
0xd5a82000 00001120 00001119 1 000 stop 0xd5a82270 bash
0xc7d12000 00001173 00000952 1 000 stop 0xc7d12270 sshd
0xc79d8000 00001174 00001173 1 000 stop 0xc79d8270 bash
0xc63cc000 00001218 00001174 1 000 stop 0xc63cc270 top
0xc466c000 00001219 00000952 1 000 stop 0xc466c270 sshd
0xc4394000 00001220 00001219 1 000 stop 0xc4394270 bash
0xcba92000 00001265 00001220 1 000 stop 0xcba92270 vmstat
0xc0e4c000 00001266 00001120 1 000 stop 0xc0e4c270 raid0_snapshot.
0xd59f6000 00001278 00001266 1 000 stop 0xd59f6270 lvcreate
0xd63ce000 00001311 00000952 1 000 stop 0xd63ce270 sshd
0xd6400000 00001312 00001311 1 000 stop 0xd6400270 bash
kdb> btp 1278
EBP EIP Function(args)
0xd59f79d4 0xc0112ab1 schedule+0x2e1 (0x0, 0xd59f6000, 0x0, 0x0, 0x1)
kernel .text 0xc0100000 0xc01127d0 0xc0112ae0
0xd8f53035 [xfs_support]_sv_wait_R0f1e187b+0xc5 (0xd6766b74,
0xd6766b6c, 0x202, 0x0, 0x0)
xfs_support .text 0xd8f52060 0xd8f52f70
0xd8f53060
0xd8fb07b7 [xfs]xfs_check_frozen+0x87 (0xd6766800, 0x0, 0x0, 0x2)
xfs .text 0xd8f62060 0xd8fb0730 0xd8fb07f0
0xd8fb3496 [xfs]xfs_trans_alloc+0x16 (0xd6766800, 0xf, 0xc034c938,
0xff, 0x0)
xfs .text 0xd8f62060 0xd8fb3480 0xd8fb34b0
0xd8fc8d11 [xfs]xfs_strategy+0x321 (0xcb8c0058, 0x1000, 0x0, 0x1000,
0x10002)
xfs .text 0xd8f62060 0xd8fc89f0 0xd8fc90b0
0xd59f7bac 0xd8fc7335 [xfs]linvfs_pb_bmap+0x55 (0xcb8c0058, 0xc12e6280,
0xcba1d8c0, 0xcb83d4c0, 0xc12e6280)
xfs .text 0xd8f62060 0xd8fc72e0 0xd8fc73b0
0xd8fc76ca [xfs]linvfs_write_full_page+0x7a (0xd73fd380, 0xc4319940,
0x4004fe20, 0xbfffec50, 0xbffff940)
xfs .text 0xd8f62060 0xd8fc7650 0xd8fc7700
0xc013dae6 sys_ioctl+0x216 (0x4, 0x4004fe20, 0xbfffec50, 0x80bcbec,
0xbffff940)
kernel .text 0xc0100000 0xc013d8d0 0xc013db00
0xc0106d2b system_call+0x33
kernel .text 0xc0100000 0xc0106cf8 0xc0106d30
kdb> btp 1001
EBP EIP Function(args)
0xd5bfbde0 0xc0112ab1 schedule+0x2e1 (0x1, 0xd5bfa000, 0xc0abb2f8, 0xc0abb2f8)
kernel .text 0xc0100000 0xc01127d0 0xc0112ae0
0xc0105a84 __down+0x54
kernel .text 0xc0100000 0xc0105a30 0xc0105ad0
0xc0105bd0 __down_failed+0x8 (0xd5bfbe20, 0x0, 0x0, 0xd5c06804,
0xc0abb280)
kernel .text 0xc0100000 0xc0105bc8 0xc0105bd4
0xd8fc4a03 [xfs]_text_lock_xfs_file+0x5 (0xc0abb280, 0xd5bfbeb8,
0xc321e540, 0xe000, 0x14fea8c0)
xfs .text 0xd8f62060 0xd8fc49fe 0xd8fc4a20
0xd90199d1 [nfsd]nfsd_open+0x181 (0xd5bfbeb8, 0xcece00da, 0x2000,
0xd5bfbed8, 0x0)
nfsd .text 0xd9016060 0xd9019850 0xd9019a20
0xd9019ea5 [nfsd]nfsd_write+0x145 (0xd5c06400, 0xd5c06804, 0x36000,
0x0, 0xcece00da)
nfsd .text 0xd9016060 0xd9019d60 0xd901a020
0xd901ef70 [nfsd]nfsd3_proc_write+0xe0 (0xd5c06400, 0xd5bfc014, 0x0,
0x27, 0x7)
nfsd .text 0xd9016060 0xd901ee90 0xd901ef90
0xd90085a9 [sunrpc]svc_process_Reb6f1b94+0x329 (0xd6124b40,
0xd5c06400, 0xd5bfa570, 0xd6124b40, 0xd5c05fd0)
sunrpc .text 0xd9002060 0xd9008280 0xd9008760
0xd901635b [nfsd]nfsd+0x1cb
nfsd .text 0xd9016060 0xd9016190 0xd90164a0
0xc0105536 kernel_thread+0x26
kernel .text 0xc0100000 0xc0105510 0xc0105540
kdb> %eip
%eip = 0xc01bb622 (fbcon_cfb8_putcs+0x1e2)
kdb> go
--
Adrian Head
(Public Key available on request.)
raid0_snapshot.sh-without_xfsfreeze
Description: application/shellscript
raid0_snapshot.sh-with_xfsfreeze
Description: application/shellscript
|