[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Deadlock when using xfs_freeze



Steve, Eric

This is the problem that I was discussing earlier on IRC.

I've come across a deadlock problem with xfs_freeze when used with reasonable I/O activity.

Attached are 2 scripts.  The script automagically creates snapshots and is designed to 
be put into crond.daily etc to create snapshots for backing up and other purposes.

One script has xfs_freeze as part of the script and the other has the related parts 
commented out.   

The test is copying 18G from server to server over NFS - and during the transfer kick-off 
the snapshot script on the server that is writing the data.  The script with xfs_freeze will 
deadlock after a couple of seconds;  whereas the script without is fine.  You will note that 
from ps lvcreate & a couple of NFS daemons are deadlocked; whereas, kdb's output 
shows nothing running.  

The order of events is to make sure the copy has been going 
for a while and then kick off the snapshot script.  When this occurs lvcreate chews up 
50%+ CPU and then deadlocks - the NFS processes follow one by one after that.  
CPU usage then returns to normal; however, any activity on that volume deadlocks.  
Any other volume is fine - and streaming media is unaffected.

This is using LVM1.1rc2 and the XFS CVS kernel - 2.4.18-xfs-lvm1.1rc2-CVS20020427-0000.

The kernel was compiled using RH7.2 standard gcc-2.96-98 which I was told could be a 
problem.  If it is I'll recompile using the latest.

Not sure what more information I can give at the moment - if there is anything else I need to
supply - please tell me.

Attached below is the ps & kdb outputs.

[akh@atlas pts/3:0 akh]$ ps aux
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.3  0.1  1412  520 ?        S    01:05   0:10 init
root         2  0.0  0.0     0    0 ?        SW   01:05   0:00 [keventd]
root         3  0.0  0.0     0    0 ?        SWN  01:05   0:00 [ksoftirqd_CPU0]
root         4  0.4  0.0     0    0 ?        SW   01:05   0:12 [kswapd]
root         5  0.0  0.0     0    0 ?        SW   01:05   0:00 [bdflush]
root         6  0.0  0.0     0    0 ?        DW   01:05   0:02 [kupdated]
root         7  0.0  0.0     0    0 ?        SW<  01:05   0:00 [mdrecoveryd]
root         8  0.0  0.0     0    0 ?        SW<  01:05   0:00 [raid5d]
root         9  0.0  0.0     0    0 ?        SW   01:05   0:00 [kjournald]
root       309  0.0  0.0     0    0 ?        SW   01:05   0:00 [kjournald]
root       310  0.0  0.0     0    0 ?        SW   01:05   0:00 [kjournald]
root       311  0.0  0.0     0    0 ?        SW   01:05   0:00 [kjournald]
root       313  0.0  0.0     0    0 ?        SW   01:05   0:00 [kreiserfsd]
root       314  0.0  0.0     0    0 ?        SW   01:05   0:00 [kjournald]
root       316  0.0  0.0     0    0 ?        SW   01:05   0:00 [pagebuf_daemon]
root       650  0.0  0.0   432  164 ?        S    01:06   0:00 /sbin/dhcpcd -n -H eth0
root       786  0.0  0.2  1696  788 ?        S    01:06   0:00 syslogd -m 0
root       791  0.0  0.2  2068 1120 ?        S    01:06   0:00 klogd -2
rpc        811  0.0  0.1  1552  588 ?        S    01:06   0:00 portmap
rpcuser    839  0.0  0.2  1688  840 ?        S    01:06   0:00 rpc.statd
root       952  0.0  0.3  2672 1264 ?        S    01:06   0:00 /usr/sbin/sshd
root       983  0.0  0.1  1788  632 ?        S    01:06   0:00 rpc.rquotad
root       988  0.0  0.1  1540  564 ?        S    01:06   0:00 rpc.mountd
root       993  1.0  0.0     0    0 ?        SW   01:06   0:29 [nfsd]
root       994  0.0  0.0     0    0 ?        SW   01:06   0:00 [lockd]
root       995  0.0  0.0     0    0 ?        SW   01:06   0:00 [rpciod]
root       996  1.1  0.0     0    0 ?        SW   01:06   0:33 [nfsd]
root       997  1.0  0.0     0    0 ?        DW   01:06   0:31 [nfsd]
root       998  1.0  0.0     0    0 ?        SW   01:06   0:32 [nfsd]
root       999  1.0  0.0     0    0 ?        SW   01:06   0:32 [nfsd]
root      1000  1.0  0.0     0    0 ?        SW   01:06   0:31 [nfsd]
root      1001  1.1  0.0     0    0 ?        DW   01:06   0:32 [nfsd]
root      1002  1.1  0.0     0    0 ?        SW   01:06   0:32 [nfsd]
root      1020  0.0  0.1  1584  660 ?        S    01:06   0:00 crond
daemon    1056  0.0  0.1  1444  568 ?        S    01:06   0:00 /usr/sbin/atd
root      1063  0.0  0.1  1384  448 tty1     S    01:06   0:00 /sbin/mingetty tty1
root      1064  0.0  0.1  1384  448 tty2     S    01:06   0:00 /sbin/mingetty tty2
root      1065  0.0  0.1  1384  448 tty3     S    01:06   0:00 /sbin/mingetty tty3
root      1066  0.0  0.1  1384  448 tty4     S    01:06   0:00 /sbin/mingetty tty4
root      1067  0.0  0.1  1384  448 tty5     S    01:06   0:00 /sbin/mingetty tty5
root      1068  0.0  0.1  1384  448 tty6     S    01:06   0:00 /sbin/mingetty tty6
root      1069  0.0  0.1  1636  688 ?        S    01:06   0:00 /sbin/mgetty -r -s 115200 ttyS0
root      1072  0.0  0.5  3568 2072 ?        S    01:06   0:00 /usr/sbin/sshd
akh       1073  0.0  0.3  2464 1320 pts/0    S    01:06   0:00 -bash
root      1119  0.0  0.2  2356 1016 pts/0    S    01:07   0:00 su -
root      1120  0.0  0.3  2544 1400 pts/0    S    01:07   0:00 -bash
root      1173  0.0  0.5  3656 2088 ?        S    01:14   0:00 /usr/sbin/sshd
akh       1174  0.0  0.3  2456 1296 pts/1    S    01:14   0:00 -bash
akh       1218  0.0  0.2  2088 1040 pts/1    S    01:14   0:00 top
root      1219  0.0  0.5  3568 2072 ?        S    01:15   0:00 /usr/sbin/sshd
akh       1220  0.0  0.3  2464 1320 pts/2    S    01:15   0:00 -bash
akh       1265  0.0  0.1  1444  492 pts/2    S    01:17   0:00 vmstat 10
root      1266  0.0  0.2  2248 1036 pts/0    S    01:50   0:00 /bin/bash ./raid0_snapshot.sh
root      1278  0.8  0.4  2672 1572 pts/0    D    01:50   0:02 /sbin/lvcreate -L 1G -s -n DATA_SNAP_DY /dev/RAID0/DATA
root      1311  0.6  0.5  3656 2088 ?        R    01:55   0:00 /usr/sbin/sshd
akh       1312  0.0  0.3  2456 1300 pts/3    S    01:55   0:00 -bash
akh       1357  0.0  0.1  2628  716 pts/3    R    01:55   0:00 ps aux


Entering kdb (current=0xc02de000, pid 0) due to Keyboard Entry
kdb> ps
Task Addr  Pid      Parent   [*] cpu  State Thread     Command
0xd7fe2000 00000001 00000000  1  000  stop  0xd7fe2270 init
0xc163c000 00000002 00000001  1  000  run   0xc163c270 keventd
0xc1638000 00000003 00000000  1  000  stop  0xc1638270 ksoftirqd_CPU0
0xc1636000 00000004 00000000  1  000  stop  0xc1636270 kswapd
0xc1634000 00000005 00000000  1  000  stop  0xc1634270 bdflush
0xc1632000 00000006 00000000  1  000  stop  0xc1632270 kupdated
0xd7ec4000 00000007 00000001  1  000  stop  0xd7ec4270 mdrecoveryd
0xd7eac000 00000008 00000001  1  000  stop  0xd7eac270 raid5d
0xd7b34000 00000009 00000001  1  000  stop  0xd7b34270 kjournald
0xd6bf6000 00000309 00000001  1  000  stop  0xd6bf6270 kjournald
0xd6be8000 00000310 00000001  1  000  stop  0xd6be8270 kjournald
0xd6be2000 00000311 00000001  1  000  stop  0xd6be2270 kjournald
0xd6ac0000 00000313 00000001  1  000  stop  0xd6ac0270 kreiserfsd
0xd69fc000 00000314 00000001  1  000  stop  0xd69fc270 kjournald
0xd6626000 00000316 00000001  1  000  stop  0xd6626270 pagebuf_daemon
0xd616a000 00000650 00000001  1  000  stop  0xd616a270 dhcpcd
0xd7838000 00000786 00000001  1  000  stop  0xd7838270 syslogd
0xd615a000 00000791 00000001  1  000  stop  0xd615a270 klogd
0xd600c000 00000811 00000001  1  000  stop  0xd600c270 portmap
0xd5fa4000 00000839 00000001  1  000  stop  0xd5fa4270 rpc.statd
0xd5d98000 00000952 00000001  1  000  stop  0xd5d98270 sshd
more>
0xd5c94000 00000983 00000001  1  000  stop  0xd5c94270 rpc.rquotad
0xd5d68000 00000988 00000001  1  000  stop  0xd5d68270 rpc.mountd
0xd5cdc000 00000993 00000001  1  000  stop  0xd5cdc270 nfsd
0xd5cd4000 00000994 00000001  1  000  stop  0xd5cd4270 lockd
0xd5cd2000 00000995 00000994  1  000  stop  0xd5cd2270 rpciod
0xd5cd0000 00000996 00000001  1  000  stop  0xd5cd0270 nfsd
0xd5c18000 00000997 00000001  1  000  stop  0xd5c18270 nfsd
0xd5c0e000 00000998 00000001  1  000  stop  0xd5c0e270 nfsd
0xd5c0c000 00000999 00000001  1  000  stop  0xd5c0c270 nfsd
0xd5c04000 00001000 00000001  1  000  stop  0xd5c04270 nfsd
0xd5bfa000 00001001 00000001  1  000  stop  0xd5bfa270 nfsd
0xd5bf8000 00001002 00000001  1  000  stop  0xd5bf8270 nfsd
0xd5bb8000 00001020 00000001  1  000  stop  0xd5bb8270 crond
0xd71c6000 00001056 00000001  1  000  stop  0xd71c6270 atd
0xd71c4000 00001063 00000001  1  000  stop  0xd71c4270 mingetty
0xd735e000 00001064 00000001  1  000  stop  0xd735e270 mingetty
0xd7282000 00001065 00000001  1  000  stop  0xd7282270 mingetty
0xd797c000 00001066 00000001  1  000  stop  0xd797c270 mingetty
0xd5bf0000 00001067 00000001  1  000  stop  0xd5bf0270 mingetty
0xd5edc000 00001068 00000001  1  000  stop  0xd5edc270 mingetty
0xd5d70000 00001069 00000001  1  000  stop  0xd5d70270 mgetty
0xd5e6e000 00001072 00000952  1  000  stop  0xd5e6e270 sshd
0xd735a000 00001073 00001072  1  000  stop  0xd735a270 bash
more>
0xd5ab4000 00001119 00001073  1  000  stop  0xd5ab4270 su
0xd5a82000 00001120 00001119  1  000  stop  0xd5a82270 bash
0xc7d12000 00001173 00000952  1  000  stop  0xc7d12270 sshd
0xc79d8000 00001174 00001173  1  000  stop  0xc79d8270 bash
0xc63cc000 00001218 00001174  1  000  stop  0xc63cc270 top
0xc466c000 00001219 00000952  1  000  stop  0xc466c270 sshd
0xc4394000 00001220 00001219  1  000  stop  0xc4394270 bash
0xcba92000 00001265 00001220  1  000  stop  0xcba92270 vmstat
0xc0e4c000 00001266 00001120  1  000  stop  0xc0e4c270 raid0_snapshot.
0xd59f6000 00001278 00001266  1  000  stop  0xd59f6270 lvcreate
0xd63ce000 00001311 00000952  1  000  stop  0xd63ce270 sshd
0xd6400000 00001312 00001311  1  000  stop  0xd6400270 bash
kdb> btp 1278
    EBP       EIP         Function(args)
0xd59f79d4 0xc0112ab1 schedule+0x2e1 (0x0, 0xd59f6000, 0x0, 0x0, 0x1)
                               kernel .text 0xc0100000 0xc01127d0 0xc0112ae0
           0xd8f53035 [xfs_support]_sv_wait_R0f1e187b+0xc5 (0xd6766b74, 0xd6766b6c, 0x202, 0x0, 0x0)
                               xfs_support .text 0xd8f52060 0xd8f52f70 0xd8f53060
           0xd8fb07b7 [xfs]xfs_check_frozen+0x87 (0xd6766800, 0x0, 0x0, 0x2)
                               xfs .text 0xd8f62060 0xd8fb0730 0xd8fb07f0
           0xd8fb3496 [xfs]xfs_trans_alloc+0x16 (0xd6766800, 0xf, 0xc034c938, 0xff, 0x0)
                               xfs .text 0xd8f62060 0xd8fb3480 0xd8fb34b0
           0xd8fc8d11 [xfs]xfs_strategy+0x321 (0xcb8c0058, 0x1000, 0x0, 0x1000, 0x10002)
                               xfs .text 0xd8f62060 0xd8fc89f0 0xd8fc90b0
0xd59f7bac 0xd8fc7335 [xfs]linvfs_pb_bmap+0x55 (0xcb8c0058, 0xc12e6280, 0xcba1d8c0, 0xcb83d4c0, 0xc12e6280)
                               xfs .text 0xd8f62060 0xd8fc72e0 0xd8fc73b0
           0xd8fc76ca [xfs]linvfs_write_full_page+0x7a (0xd73fd380, 0xc4319940, 0x4004fe20, 0xbfffec50, 0xbffff940)
                               xfs .text 0xd8f62060 0xd8fc7650 0xd8fc7700
           0xc013dae6 sys_ioctl+0x216 (0x4, 0x4004fe20, 0xbfffec50, 0x80bcbec, 0xbffff940)
                               kernel .text 0xc0100000 0xc013d8d0 0xc013db00
           0xc0106d2b system_call+0x33
                               kernel .text 0xc0100000 0xc0106cf8 0xc0106d30
kdb> btp 1001
    EBP       EIP         Function(args)
0xd5bfbde0 0xc0112ab1 schedule+0x2e1 (0x1, 0xd5bfa000, 0xc0abb2f8, 0xc0abb2f8)
                               kernel .text 0xc0100000 0xc01127d0 0xc0112ae0
           0xc0105a84 __down+0x54
                               kernel .text 0xc0100000 0xc0105a30 0xc0105ad0
           0xc0105bd0 __down_failed+0x8 (0xd5bfbe20, 0x0, 0x0, 0xd5c06804, 0xc0abb280)
                               kernel .text 0xc0100000 0xc0105bc8 0xc0105bd4
           0xd8fc4a03 [xfs]_text_lock_xfs_file+0x5 (0xc0abb280, 0xd5bfbeb8, 0xc321e540, 0xe000, 0x14fea8c0)
                               xfs .text 0xd8f62060 0xd8fc49fe 0xd8fc4a20
           0xd90199d1 [nfsd]nfsd_open+0x181 (0xd5bfbeb8, 0xcece00da, 0x2000, 0xd5bfbed8, 0x0)
                               nfsd .text 0xd9016060 0xd9019850 0xd9019a20
           0xd9019ea5 [nfsd]nfsd_write+0x145 (0xd5c06400, 0xd5c06804, 0x36000, 0x0, 0xcece00da)
                               nfsd .text 0xd9016060 0xd9019d60 0xd901a020
           0xd901ef70 [nfsd]nfsd3_proc_write+0xe0 (0xd5c06400, 0xd5bfc014, 0x0, 0x27, 0x7)
                               nfsd .text 0xd9016060 0xd901ee90 0xd901ef90
           0xd90085a9 [sunrpc]svc_process_Reb6f1b94+0x329 (0xd6124b40, 0xd5c06400, 0xd5bfa570, 0xd6124b40, 0xd5c05fd0)
                               sunrpc .text 0xd9002060 0xd9008280 0xd9008760
           0xd901635b [nfsd]nfsd+0x1cb
                               nfsd .text 0xd9016060 0xd9016190 0xd90164a0
           0xc0105536 kernel_thread+0x26
                               kernel .text 0xc0100000 0xc0105510 0xc0105540
kdb> %eip
%eip = 0xc01bb622 (fbcon_cfb8_putcs+0x1e2)
kdb> go
-- 
Adrian Head

(Public Key available on request.)

Attachment: raid0_snapshot.sh-without_xfsfreeze
Description: application/shellscript

Attachment: raid0_snapshot.sh-with_xfsfreeze
Description: application/shellscript