----- Forwarded message from Mail Delivery Subsystem
<MAILER-DAEMON@xxxxxxxxxxxxxxxxxxxxx> -----
Date: Sun, 6 Jun 2004 02:38:42 -0500
To: <nathans@xxxxxxxxxxxxxxxxxxxxxxxx>
From: Mail Delivery Subsystem <MAILER-DAEMON@xxxxxxxxxxxxxxxxxxxxx>
Subject: Returned mail: see transcript for details
The original message was received at Fri, 4 Jun 2004 02:28:11 -0500
from larry.melbourne.sgi.com [134.14.52.130]
----- The following addresses had permanent fatal errors -----
<linux-xfs@xxxxxxxxxxx>
(reason: 451 4.4.1 reply: read error from oss.sgi.com.)
----- Transcript of session follows -----
<linux-xfs@xxxxxxxxxxx>... Deferred: Connection timed out with oss.sgi.com.
Message could not be delivered for 2 days
Message will be deleted from queue
Reporting-MTA: dns; omx1.americas.sgi.com
Arrival-Date: Fri, 4 Jun 2004 02:28:11 -0500
Final-Recipient: RFC822; linux-xfs@xxxxxxxxxxx
Action: failed
Status: 4.4.7
Diagnostic-Code: SMTP; 451 4.4.1 reply: read error from oss.sgi.com.
Last-Attempt-Date: Sun, 6 Jun 2004 02:38:42 -0500
Date: Fri, 4 Jun 2004 18:25:21 +1000
To: Andi Kleen <ak@xxxxxxx>
Cc: linux-xfs@xxxxxxxxxxx
User-Agent: Mutt/1.5.3i
From: Nathan Scott <nathans@xxxxxxx>
Subject: Re: XFS deadlock
Hi Andi,
On Thu, Jun 03, 2004 at 03:05:46AM +0200, Andi Kleen wrote:
>
> I found an easily reproducible way to deadlock XFS on 2.6.7-rc2/x86-64.
>
> Create a few GB XFS.
> Fill it with 100MB files so that only a few MBs are left
> Run fsstress -p30 -n50000 -d /xfsmount
> Run a while true ; do cp -a /bin /xfsmount ; done in parallel
> (the cp should run out of disk space all the time)
> (I did this all as root)
>
> Deadlocks in less than half a minute. The processes become unkillable
>
> Here's a sysrq-t listing from after the fact.
>
> The deadlock seems to be on some pagebuf page, a lot of processes
> stall forever trying to get the semaphore of a pagebuf.
>
> Test machine has two CPUs with 1GB of memory.
Here's the first deadlock fix. I suspect this is not the same
as the one you saw, but I haven't been able to hit that one yet
(this is on a 4cpu ia32 box) - job for next week.
cheers.
--
Nathan
===========================================================================
fs/mpage.c
===========================================================================
--- /usr/tmp/TmpDir.1485-0/fs/mpage.c_1.2 2004-06-04 14:18:02.000000000
+1000
+++ fs/mpage.c 2004-06-03 15:00:20.466750560 +1000
@@ -644,7 +644,10 @@
* mapping
*/
- lock_page(page);
+ if (wbc->sync_mode != WB_SYNC_FAST)
+ lock_page(page);
+ else if (TestSetPageLocked(page))
+ continue;
if (wbc->sync_mode != WB_SYNC_NONE)
wait_on_page_writeback(page);
===========================================================================
include/linux/fs.h
===========================================================================
--- /usr/tmp/TmpDir.1485-0/include/linux/fs.h_1.7 2004-06-04
14:18:02.000000000 +1000
+++ include/linux/fs.h 2004-06-04 04:25:01.761643000 +1000
@@ -1290,6 +1290,7 @@
extern void write_inode_now(struct inode *, int);
extern int filemap_fdatawrite(struct address_space *);
extern int filemap_flush(struct address_space *);
+extern int filemap_flushfast(struct address_space *);
extern int filemap_fdatawait(struct address_space *);
extern int filemap_write_and_wait(struct address_space *mapping);
extern void sync_supers(void);
===========================================================================
include/linux/writeback.h
===========================================================================
--- /usr/tmp/TmpDir.1485-0/include/linux/writeback.h_1.2 2004-06-04
14:18:02.000000000 +1000
+++ include/linux/writeback.h 2004-06-03 14:51:52.909910928 +1000
@@ -26,6 +26,7 @@
WB_SYNC_NONE, /* Don't wait on anything */
WB_SYNC_ALL, /* Wait on every mapping */
WB_SYNC_HOLD, /* Hold the inode on sb_dirty for sys_sync() */
+ WB_SYNC_FAST, /* Really don't wait on anything */
};
/*
===========================================================================
mm/filemap.c
===========================================================================
--- /usr/tmp/TmpDir.1485-0/mm/filemap.c_1.9 2004-06-04 14:18:02.000000000
+1000
+++ mm/filemap.c 2004-06-03 14:49:41.612871112 +1000
@@ -173,6 +173,18 @@
EXPORT_SYMBOL(filemap_flush);
/*
+ * This is a completely non-blocking flush. Not suitable for much,
+ * used by filesystems where page locks may be held already - not
+ * only may I/O not be started against all dirty pages, but we will
+ * only trylock pages too.
+ */
+int filemap_flushfast(struct address_space *mapping)
+{
+ return __filemap_fdatawrite(mapping, WB_SYNC_FAST);
+}
+EXPORT_SYMBOL(filemap_flushfast);
+
+/*
* Wait for writeback to complete against pages indexed by start->end
* inclusive
*/
Index: xfs-linux/linux-2.6/xfs_super.c
===================================================================
--- xfs-linux.orig/linux-2.6/xfs_super.c 2004-06-01 12:27:53.000000000
+1000
+++ xfs-linux/linux-2.6/xfs_super.c 2004-06-04 04:22:59.000000000 +1000
@@ -258,7 +258,7 @@
{
struct inode *inode = LINVFS_GET_IP(XFS_ITOV(ip));
- filemap_flush(inode->i_mapping);
+ filemap_flushfast(inode->i_mapping);
}
void
----- End forwarded message -----
--
Nathan
|