Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 02 Nov 2005 11:11:04 -0800 (PST) Received: from science.horizon.com (science.horizon.com [192.35.100.1]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id jA2JAwO0022765 for ; Wed, 2 Nov 2005 11:10:59 -0800 Received: (qmail 29536 invoked by uid 1000); 2 Nov 2005 14:07:44 -0500 Date: 2 Nov 2005 14:07:44 -0500 Message-ID: <20051102190744.29534.qmail@science.horizon.com> From: linux@horizon.com To: sandeen@sgi.com Subject: Re: 2.6.13.2 amd64: XFS: xlog_recover_process_data: bad clientid Cc: linux@horizon.com, linux-xfs@oss.sgi.com In-Reply-To: <43683663.9030807@sgi.com> X-archive-position: 6494 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: linux@horizon.com Precedence: bulk X-list: linux-xfs Content-Length: 1118 Lines: 30 > xfs_repair is your only option. Run it and hope for the best. Ah, it complains about an unflushed log and won't run. It might be a worthwhile addition to the sfs_repair man page to mention that "-n" implies "-L". If it is indeed the case that the *only* code which can replay a log is in the kernel, that might be worth saying explicitly, too. I'm poking at xfs_logprint wondering if there's a way to get it to do something useful. >> - Can we extract any information about what misbehaved to help the SATA >> debugging process > I doubt it. Well, we can at least conclude that it didn't "fail fast" and freeze at a particular point in time, right? Because that would have left consistent metadata. (OF course, it could been the RAID-10 setup. If I have mirror pairs A/B and C/D, and the B&C driver got wedged, so the last write went only to A and D, and on recovery the RAID system synchronized A to B and C to D, that would leave a half-written log entry. But I'm using a 256K stripe size, and log entries are 32K, so they shouldn't be split across stripes....) Anyway, thanks a lot for your help!