xfs
[Top] [All Lists]

Repeatable Panics with XFS and RAID1 (long)

To: Linux-XFS <linux-xfs@xxxxxxxxxxx>
Subject: Repeatable Panics with XFS and RAID1 (long)
From: Danny <danscox@xxxxxxxxxxxxxx>
Date: Wed, 21 Feb 2001 14:13:23 -0500
Organization: Connex Inc
Reply-to: dcox@xxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
All,

        In my testing, I can produce various problems with XFS and RAID1.  I'll
reproduce three scenarios below.

        First, my test configuration is: Intel Celeron @ 566 MHz, 64 MB RAM, 4
45 GB IDE drives, two of which form a RAID1 (mirror) of about 40 GB (I
don't believe partition size has any bearing on this).  While the RAID
array is still resyncing, I build an XFS fs on it, mount it, and run
Bonnie (1.2) "./Bonnie -s 128 -d /share", in an infinite loop.  After
some time (usually 5 to 20 minutes), an error results.  I posted a patch
for one of those this AM.  I don't know if the resync is necessary, but
it sure stresses things!

        First, here is an edited typescript file, showing a deadlock situation:
================================================================================
Script started on Wed Feb 21 07:50:08 2001
[danny@dsc_proto_1 danny]$ ps -axlww
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME
COMMAND
100     0     1     0   8   0  1120   68 do_sel S    ?          0:03
init [3]   
040     0     2     1   9   0     0    0 contex SW   ?          0:00
[keventd]
040     0     3     1   9   0     0    0 apm_ma SW   ?          0:00
[kapm-idled]
040     0     4     1   9   0     0    0 raid1_ DW   ?          0:06
[kswapd]
040     0     5     1   9   0     0    0 krecla SW   ?          0:00
[kreclaimd]
040     0     6     1   9   0     0    0 raid1_ DW   ?          0:09
[bdflush]
040     0     7     1   9   0     0    0 kupdat SW   ?          0:00
[kupdate]
040     0     8     1  -1 -20     0    0 md_thr SW<  ?          0:00
[mdrecoveryd]
040     0     9     1  -1 -20     0    0 md_thr SW<  ?          0:00
[raid1d]
140     0   510     1   8   0  2444  168 do_sel S    ?          0:00
smbd -D
140     0   519     1   9   0  2024  656 do_sel S    ?          0:00
nmbd -D
040     0   645     1  -1 -20     0    0 md_thr SW<  tty1       0:11
[raid1d]
040     0   646     1  19  19     0    0 md_thr SWN  tty1       0:09
[raid1syncd]
040     0   703     1   9   0     0    0 pagebu SW   tty1       0:00
[pagebuf_daemon]
040     0   704     1   9   0     0    0 raid1_ DW   tty1       0:28
[page_daemon]
000     0   723   623   9   0  1648   84 wait4  S    tty1       0:00
/bin/sh ./torture
100     0   729   616  10   0  1740  552 read_c S    tty2       0:00
-bash
000     0  1238   723   9   0  1340  328 pipe_w S    tty1       1:20
Bonnie -s 256 -d /share -y
140   500  1264   510   9   0  3824 1148 raid1_ D    ?          0:01
smbd -D
040     0  1265  1238   9   0  1404   60 lock_p D    tty1       0:00
Bonnie -s 256 -d /share -y
040     0  1266  1238   9   0  1404   60 mracce D    tty1       0:00
Bonnie -s 256 -d /share -y
040     0  1267  1238   9   0  1404   68 mracce D    tty1       0:00
Bonnie -s 256 -d /share -y
000   500  1757  1737  14   0  2748 1100 -      R    pts/0      0:00 ps
-axlww
[danny@dsc_proto_1 danny]$ grep -A2 __alloc_page /var/log/messages.1 |
tail -3
Feb 20 12:51:38 dsc_proto_1 kernel: __alloc_pages: 0-order allocation
failed. 
Feb 20 12:51:41 dsc_proto_1 last message repeated 21 times
Feb 20 13:51:52 dsc_proto_1 kernel: md: md1: sync done. 
[danny@dsc_proto_1 danny]$ exit
Script done on Wed Feb 21 07:55:22 2001
=============================================================================
Note the Bonnie processes: #1238 is the parent.  #s 1265, 1266, and 1267
are the seekers it creates as it's last measurement.  #1265 is in
"lock_page", while the others are in "mraccessf".  Just before the hang,
the kernel spouted the "0-order alocation failed." messages.  Last week,
I changed the __alloc_pages function to panic() after that printk, and
traced the stack back to raid1_alloc_rlbh() in raid1.c.  This function
does the Right Thing, checking the pointer for NULL, and if so, calls
"wait_event".  Well, this causes a schedule(), while waiting for mem to
be freed, but obviously something was locked when that happened.  I'd
suspect something in the XFS code isn't expecting down-stream drivers to
call schedule().

        Item next: in page_buf.c, line 1369, the BUG() macro is called if psync
is NULL after the kmalloc() call.  I've hit this several times with the
above setup.

        Item final: I've seen the messages:

        cluster_write: can't get pb (page_buf_io.c:1839)
        cluster_write: pb_lookup_pages failed! (page_buf_io.c:1844)

and the process hung.  The only out was a reboot, and that filesystem
refused to be unmounted.  ps(1) showed the process status as 'D',
uninterruptable sleep :-(.

        I'd love to help with this, but I'm at my limit of what I know.  For
example, with the BUG() at page_buf.c:1369, can you set up for a
wait_event()?  Can you simply return with a status of try_again?  I
don't know what might need to be unrolled by this point, so am at a loss
of how to proceed.

        Help!

        Thanks in advance!

-- 
"Men occasionally stumble over the truth, but most of them pick 
themselves up and hurry off as if nothing had happened." 
   -- Winston Churchill 

Danny

<Prev in Thread] Current Thread [Next in Thread>