X-Spam-Checker-Version: SpamAssassin 3.3.0-rupdated (updated) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.0-rupdated Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id n7DLJHi4051988 for ; Thu, 13 Aug 2009 16:19:28 -0500 X-ASG-Debug-ID: 1250198350-7b8100740000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from smtp5-g21.free.fr (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id EBE2314AF272 for ; Thu, 13 Aug 2009 14:19:14 -0700 (PDT) Received: from smtp5-g21.free.fr (smtp5-g21.free.fr [212.27.42.5]) by cuda.sgi.com with ESMTP id HWxYNKurXR09W2Fj for ; Thu, 13 Aug 2009 14:19:14 -0700 (PDT) Received: from smtp5-g21.free.fr (localhost [127.0.0.1]) by smtp5-g21.free.fr (Postfix) with ESMTP id 00C4FD480D4; Thu, 13 Aug 2009 23:19:04 +0200 (CEST) Received: from galadriel.home (pla78-1-82-235-234-79.fbx.proxad.net [82.235.234.79]) by smtp5-g21.free.fr (Postfix) with ESMTP id 03E59D4808B; Thu, 13 Aug 2009 23:19:01 +0200 (CEST) Date: Thu, 13 Aug 2009 23:17:39 +0200 From: Emmanuel Florac To: John Quigley Cc: XFS Development X-ASG-Orig-Subj: Re: XFS corruption with failover Subject: Re: XFS corruption with failover Message-ID: <20090813231739.5c7db91d@galadriel.home> In-Reply-To: <4A8474D2.7050508@jquigley.com> References: <4A8474D2.7050508@jquigley.com> Organization: Intellique X-Mailer: Claws Mail 3.0.2 (GTK+ 2.12.9; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Barracuda-Connect: smtp5-g21.free.fr[212.27.42.5] X-Barracuda-Start-Time: 1250198361 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.6123 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean Le Thu, 13 Aug 2009 15:17:22 -0500 vous =E9criviez: > Any advice or insight into what we're doing wrong would be very much > appreciated. My apologies in advance for the somewhat off-topic > question. By killing abruptly the primary server while doing IO, you're probably pushing the envelope... You may have a somewhat better luck with a cluster fs, OCFS2 works very well for me usually (GFS is a complete PITA to setup).=20 The better option would be to disallow completely write caching on the client side (because this is probably where it's going wrong) however I don't know how. You can get it to flush extremely often by playing with /proc/sys/vm/dirty_expire_centiseconds and /proc/sys/vm/dirty_writeback_centisecs, though. Safer settings generally imply terrible performance, though, you've been warned. Ah another thing may be some cache option in the iSCSI target. what target are you using? =20 --=20 -------------------------------------------------- Emmanuel Florac www.intellique.com =20 --------------------------------------------------